skip to main content
10.1145/2428736.2428803acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Feature words that classify problem sentence in scientific article

Published:03 December 2012Publication History

ABSTRACT

Literature review requires understanding the contents from several view points, such as the problem and the method that the articles describe. Search from these viewpoints will improve the efficiency of survey, if particular segments of articles were extracted, indexed and can be used as auxiliary query. This paper focuses on sentences that describe the problem in an abstract and the feature sets that classify such problem sentences. Classification performance are evaluated by 10-fold cross-validation for six candidate sets of feature words. It turned out that the set of all words gains the best performance if 90% of the data are used as training data. However, the set of a small number of words with positive scores outperforms other feature sets, if the training data is only 10%. In such a realistic situation, the feature words are effective in improving classification performance.

References

  1. Alonso-Gonzalez, C. J., Moro, Q. I., Prieto, O. J., Simon, M. A., Selectiong Few Genes for Microarray Gene Expression Classification, Springer LNCS 5988, pp. 111--120, 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Cortes, C., Vapnik, V., Suppert Vector Networkds, Machine Lerning, Vol. 20, pp. 273--297, 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chujo, K., Utiyama, M., Selecting Level-Specific Specialized Vocabulary Using Statistical Measures, System, Vol. 34, No. 2, pp. 255--269, 2006Google ScholarGoogle ScholarCross RefCross Ref
  4. Forman, G., Cohen, I., Learning from little: Comparison of classifiers given little training, Springer LNCS 3202, pp. 161--172, 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Grinblat, G. L., Lzetta, J., Granitto, P. M., SVM Based Feature Selection: Whey Are We Using the Dual?, Springer LNCS 6433, pp. 413--422, 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hermes, L., Buhmann, J. M., Feature Selection for Support Vector Machines, Proc. Pattern Recognition, Vol. 2, pp. 712--715, 2000Google ScholarGoogle ScholarCross RefCross Ref
  7. Hirokawa, S., Feature Extraction using Restricted Bootstrapping, Proc. International Symposium on Innovative E-Services and Information Systems (IEIS 2012), pp. 283--287, 2012 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Komachi, M., Kudo, T., Shimbo, M., Matsumoto, Y., Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms, Proc. EMNLP 2008, pp. 1011--1020, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Murata, M., Tanji, H., Yamamoto, K., Saeger, S. D., Kakizawa, Y., Torisawa, K., Extraction from the Web of Articles Describing Problems, Their Solutions, and Their Causes, IEICE Trans Information and Systems Vol. E94-D, No. 3, pp. 734--737, 2011Google ScholarGoogle Scholar
  10. Nonaka, H., Kobayashi, A., Sakaji, H., Suzuki, Y., Sakai, H., Masuyama, S., Extraction of the effect and the technology terms from a patent document, Proc. Computers and Industrial Engineering(CIE), pp. 1--6, 2010Google ScholarGoogle ScholarCross RefCross Ref
  11. Pal, M., Foody, G. M., Feature selection for classification of hyperspectral data by SVM, IEEE Trans Geoscience and Remote Sensing, Vol. 48, No. 5, pp. 2297--2307, 2010Google ScholarGoogle ScholarCross RefCross Ref
  12. Pantel, M., Pennacchiotti, M., Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations, Proc. ACL 2006, p. 113--120, 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Qi, B., Zhao, C., Youn, E., Nansen, C., Use of weighting algorithms to improve traditional support vector machine based classifications of reflectance data Optics Express, Vol. 19, No. 27, pp. 26816--26826, 2011Google ScholarGoogle ScholarCross RefCross Ref
  14. Radev, D., Mihalcea, R., Network and Natural Language Processing, AI Magazine, pp. 16--28, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sadamitsu, K., Saito, K., Imamura, K., Kikui, G., Entity Set Expansion using Topic information, Proc. ACL 2011, pp. 726--731, 2011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Salperwyck, C., Lemaire, V., Learning with few examples: An empirical study on leading classifiers, Proc. International Joint Conference on Neural Networks, pp. 1010--1019, 2011Google ScholarGoogle ScholarCross RefCross Ref
  17. Shen, K.-Q., Ong, C.-J., Li, X.-P., Wilder-Smith, E. P. V., Feature selection via sensitivity analysis ov SVM probabilistic outputs, Machine Learning, Vol. 70, pp. 1--20, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Taira, H., Haruno, M., Feature Selection in SVM Text Classification, Proc. AAI99, pp. 480--486, 1999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Waske, B., van der Linden, S., Benediktsson, J. A., Rabe, A., Hostert, P., Sensitivity of Support Vector Machines to Random Feature Selection in Classification of Hyperspectral Data, IEEE Trans Geoscience and Remote Sensing, Vol. 48, No. 7, pp. 2880--2889, 2010Google ScholarGoogle ScholarCross RefCross Ref
  20. Yang, Y., Pedersen, J. O., A Comparative Study on Feature Selection in Text Categorization, Proceedings of ICML97, pp. 412--420, 1997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Angrosh, M. A., Craneeld, S., and Stanger, N.: Context identication of sentences in related work sections using a conditional random field: towards intelligent digital libraries, Proc. of the 10th annual joint conference on Digital libraries, JCDL '10, pp. 293--302, 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Toshihiko Sakai, J. Zeng, B. Flanagan, T. Nakatoh and S. Hirokawa, Descriminant Words for Problems in Scientific Articles, Proc. IIAI/ACIS International Symposium on Innovative E-Services and Information Systems (IEIS 2012), pp. 267--271, 2012 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Masaki Murata, Hiroki Tanji, Kazuhide Yamamoto, Stijn De Saeger, Yasunori Kakizawa, and Kentaro Torisawa, Extraction from the Web of Articles Describing Problems, Their Solutions, and Their Causes, Proc. IEICE Transactions on Information and Systems Vol. E94-D, No. 3, pp. 734--737, 2011Google ScholarGoogle Scholar
  24. Hiroyuki Sakai, Hirofumi Nonaka, Shigeru Masuyama, Extraction of Information on the Technical Effect from a Patent Document, The Japanese Society for Artificial Intelligence, Vol. 24, No. 6, pp. 531--540, 2009(in Japanese)Google ScholarGoogle ScholarCross RefCross Ref
  25. Akihiro Shinmori, Manabu Okumura, Yuzo Marukawa, Makoto Iwayama, Structure Analysis of Japanese Patent Claims Using Clue Phrases, IPSJ Journal 45(3), pp. 891--905, 2004(in Japanese)Google ScholarGoogle Scholar
  26. Taku Kudoh and Yuji Matsumoto, Japanese Dependency Structure Analysis based on Support Vector Machine, IEICE technical report, Natural language understanding and models of communication 100(201), pp. 25--32, 2000(in Japanese)Google ScholarGoogle Scholar
  27. Nguyen, M. H., De la Torre, F., Optimal Feature Selection for Support Vector Machines, Journal of Pattern Recognition archive, Vol. 43, No. 3, pp. 584--591, 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Feature words that classify problem sentence in scientific article

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
          December 2012
          432 pages
          ISBN:9781450313063
          DOI:10.1145/2428736

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 December 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader