ABSTRACT
Literature review requires understanding the contents from several view points, such as the problem and the method that the articles describe. Search from these viewpoints will improve the efficiency of survey, if particular segments of articles were extracted, indexed and can be used as auxiliary query. This paper focuses on sentences that describe the problem in an abstract and the feature sets that classify such problem sentences. Classification performance are evaluated by 10-fold cross-validation for six candidate sets of feature words. It turned out that the set of all words gains the best performance if 90% of the data are used as training data. However, the set of a small number of words with positive scores outperforms other feature sets, if the training data is only 10%. In such a realistic situation, the feature words are effective in improving classification performance.
- Alonso-Gonzalez, C. J., Moro, Q. I., Prieto, O. J., Simon, M. A., Selectiong Few Genes for Microarray Gene Expression Classification, Springer LNCS 5988, pp. 111--120, 2010 Google ScholarDigital Library
- Cortes, C., Vapnik, V., Suppert Vector Networkds, Machine Lerning, Vol. 20, pp. 273--297, 1995 Google ScholarDigital Library
- Chujo, K., Utiyama, M., Selecting Level-Specific Specialized Vocabulary Using Statistical Measures, System, Vol. 34, No. 2, pp. 255--269, 2006Google ScholarCross Ref
- Forman, G., Cohen, I., Learning from little: Comparison of classifiers given little training, Springer LNCS 3202, pp. 161--172, 2004 Google ScholarDigital Library
- Grinblat, G. L., Lzetta, J., Granitto, P. M., SVM Based Feature Selection: Whey Are We Using the Dual?, Springer LNCS 6433, pp. 413--422, 2010 Google ScholarDigital Library
- Hermes, L., Buhmann, J. M., Feature Selection for Support Vector Machines, Proc. Pattern Recognition, Vol. 2, pp. 712--715, 2000Google ScholarCross Ref
- Hirokawa, S., Feature Extraction using Restricted Bootstrapping, Proc. International Symposium on Innovative E-Services and Information Systems (IEIS 2012), pp. 283--287, 2012 Google ScholarDigital Library
- Komachi, M., Kudo, T., Shimbo, M., Matsumoto, Y., Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms, Proc. EMNLP 2008, pp. 1011--1020, 2008 Google ScholarDigital Library
- Murata, M., Tanji, H., Yamamoto, K., Saeger, S. D., Kakizawa, Y., Torisawa, K., Extraction from the Web of Articles Describing Problems, Their Solutions, and Their Causes, IEICE Trans Information and Systems Vol. E94-D, No. 3, pp. 734--737, 2011Google Scholar
- Nonaka, H., Kobayashi, A., Sakaji, H., Suzuki, Y., Sakai, H., Masuyama, S., Extraction of the effect and the technology terms from a patent document, Proc. Computers and Industrial Engineering(CIE), pp. 1--6, 2010Google ScholarCross Ref
- Pal, M., Foody, G. M., Feature selection for classification of hyperspectral data by SVM, IEEE Trans Geoscience and Remote Sensing, Vol. 48, No. 5, pp. 2297--2307, 2010Google ScholarCross Ref
- Pantel, M., Pennacchiotti, M., Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations, Proc. ACL 2006, p. 113--120, 2006 Google ScholarDigital Library
- Qi, B., Zhao, C., Youn, E., Nansen, C., Use of weighting algorithms to improve traditional support vector machine based classifications of reflectance data Optics Express, Vol. 19, No. 27, pp. 26816--26826, 2011Google ScholarCross Ref
- Radev, D., Mihalcea, R., Network and Natural Language Processing, AI Magazine, pp. 16--28, 2008 Google ScholarDigital Library
- Sadamitsu, K., Saito, K., Imamura, K., Kikui, G., Entity Set Expansion using Topic information, Proc. ACL 2011, pp. 726--731, 2011 Google ScholarDigital Library
- Salperwyck, C., Lemaire, V., Learning with few examples: An empirical study on leading classifiers, Proc. International Joint Conference on Neural Networks, pp. 1010--1019, 2011Google ScholarCross Ref
- Shen, K.-Q., Ong, C.-J., Li, X.-P., Wilder-Smith, E. P. V., Feature selection via sensitivity analysis ov SVM probabilistic outputs, Machine Learning, Vol. 70, pp. 1--20, 2008 Google ScholarDigital Library
- Taira, H., Haruno, M., Feature Selection in SVM Text Classification, Proc. AAI99, pp. 480--486, 1999 Google ScholarDigital Library
- Waske, B., van der Linden, S., Benediktsson, J. A., Rabe, A., Hostert, P., Sensitivity of Support Vector Machines to Random Feature Selection in Classification of Hyperspectral Data, IEEE Trans Geoscience and Remote Sensing, Vol. 48, No. 7, pp. 2880--2889, 2010Google ScholarCross Ref
- Yang, Y., Pedersen, J. O., A Comparative Study on Feature Selection in Text Categorization, Proceedings of ICML97, pp. 412--420, 1997 Google ScholarDigital Library
- Angrosh, M. A., Craneeld, S., and Stanger, N.: Context identication of sentences in related work sections using a conditional random field: towards intelligent digital libraries, Proc. of the 10th annual joint conference on Digital libraries, JCDL '10, pp. 293--302, 2010 Google ScholarDigital Library
- Toshihiko Sakai, J. Zeng, B. Flanagan, T. Nakatoh and S. Hirokawa, Descriminant Words for Problems in Scientific Articles, Proc. IIAI/ACIS International Symposium on Innovative E-Services and Information Systems (IEIS 2012), pp. 267--271, 2012 Google ScholarDigital Library
- Masaki Murata, Hiroki Tanji, Kazuhide Yamamoto, Stijn De Saeger, Yasunori Kakizawa, and Kentaro Torisawa, Extraction from the Web of Articles Describing Problems, Their Solutions, and Their Causes, Proc. IEICE Transactions on Information and Systems Vol. E94-D, No. 3, pp. 734--737, 2011Google Scholar
- Hiroyuki Sakai, Hirofumi Nonaka, Shigeru Masuyama, Extraction of Information on the Technical Effect from a Patent Document, The Japanese Society for Artificial Intelligence, Vol. 24, No. 6, pp. 531--540, 2009(in Japanese)Google ScholarCross Ref
- Akihiro Shinmori, Manabu Okumura, Yuzo Marukawa, Makoto Iwayama, Structure Analysis of Japanese Patent Claims Using Clue Phrases, IPSJ Journal 45(3), pp. 891--905, 2004(in Japanese)Google Scholar
- Taku Kudoh and Yuji Matsumoto, Japanese Dependency Structure Analysis based on Support Vector Machine, IEICE technical report, Natural language understanding and models of communication 100(201), pp. 25--32, 2000(in Japanese)Google Scholar
- Nguyen, M. H., De la Torre, F., Optimal Feature Selection for Support Vector Machines, Journal of Pattern Recognition archive, Vol. 43, No. 3, pp. 584--591, 2010 Google ScholarDigital Library
Index Terms
- Feature words that classify problem sentence in scientific article
Recommendations
Classification of Imbalanced Documents by Feature Selection
ICCDA '17: Proceedings of the International Conference on Compute and Data AnalysisWe previously worked on category classification problem of reuter 's newspaper article using SVM and feature selection. In the study, feature selection by SVM-score [Sakai, Hirokawa, 2012] showed high accuracy. It was also expected to be superior to ...
Efficient Feature Representation Based on the Effect of Words Frequency for Arabic Documents Classification
ICTCE '18: Proceedings of the 2nd International Conference on Telecommunications and Communication EngineeringThis paper is based on the influence of the frequency of words in the classification of Arabic documents, its effects on the representation of characteristics namely Bag of word (Bow) and Term frequency- Inverse Documents Frequency (TF-IDF). Three ...
Feature selection based on measurement of ability to classify subproblems
Feature selection is important and necessary especially for processing large scale data. Existing feature selection methods generally compute a discriminant value with respect to class variable for a feature to indicate its classification ability. Such ...
Comments