research-article

Feature words that classify problem sentence in scientific article

Authors:
Toshihiko Sakai

Kyushu University, Fukuoka, Japan

Kyushu University, Fukuoka, Japan
View Profile

,
Sachio Hirokawa

Kyushu University, Fukuoka, Japan

Kyushu University, Fukuoka, Japan
View Profile

IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & ServicesDecember 2012Pages 360–367https://doi.org/10.1145/2428736.2428803

Published:03 December 2012Publication History

IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services

Pages 360–367

ABSTRACT

Literature review requires understanding the contents from several view points, such as the problem and the method that the articles describe. Search from these viewpoints will improve the efficiency of survey, if particular segments of articles were extracted, indexed and can be used as auxiliary query. This paper focuses on sentences that describe the problem in an abstract and the feature sets that classify such problem sentences. Classification performance are evaluated by 10-fold cross-validation for six candidate sets of feature words. It turned out that the set of all words gains the best performance if 90% of the data are used as training data. However, the set of a small number of words with positive scores outperforms other feature sets, if the training data is only 10%. In such a realistic situation, the feature words are effective in improving classification performance.

References

Alonso-Gonzalez, C. J., Moro, Q. I., Prieto, O. J., Simon, M. A., Selectiong Few Genes for Microarray Gene Expression Classification, Springer LNCS 5988, pp. 111--120, 2010 Google ScholarDigital Library
Cortes, C., Vapnik, V., Suppert Vector Networkds, Machine Lerning, Vol. 20, pp. 273--297, 1995 Google ScholarDigital Library
Chujo, K., Utiyama, M., Selecting Level-Specific Specialized Vocabulary Using Statistical Measures, System, Vol. 34, No. 2, pp. 255--269, 2006Google ScholarCross Ref
Forman, G., Cohen, I., Learning from little: Comparison of classifiers given little training, Springer LNCS 3202, pp. 161--172, 2004 Google ScholarDigital Library
Grinblat, G. L., Lzetta, J., Granitto, P. M., SVM Based Feature Selection: Whey Are We Using the Dual?, Springer LNCS 6433, pp. 413--422, 2010 Google ScholarDigital Library
Hermes, L., Buhmann, J. M., Feature Selection for Support Vector Machines, Proc. Pattern Recognition, Vol. 2, pp. 712--715, 2000Google ScholarCross Ref
Hirokawa, S., Feature Extraction using Restricted Bootstrapping, Proc. International Symposium on Innovative E-Services and Information Systems (IEIS 2012), pp. 283--287, 2012 Google ScholarDigital Library
Komachi, M., Kudo, T., Shimbo, M., Matsumoto, Y., Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms, Proc. EMNLP 2008, pp. 1011--1020, 2008 Google ScholarDigital Library
Murata, M., Tanji, H., Yamamoto, K., Saeger, S. D., Kakizawa, Y., Torisawa, K., Extraction from the Web of Articles Describing Problems, Their Solutions, and Their Causes, IEICE Trans Information and Systems Vol. E94-D, No. 3, pp. 734--737, 2011Google Scholar
Nonaka, H., Kobayashi, A., Sakaji, H., Suzuki, Y., Sakai, H., Masuyama, S., Extraction of the effect and the technology terms from a patent document, Proc. Computers and Industrial Engineering(CIE), pp. 1--6, 2010Google ScholarCross Ref
Pal, M., Foody, G. M., Feature selection for classification of hyperspectral data by SVM, IEEE Trans Geoscience and Remote Sensing, Vol. 48, No. 5, pp. 2297--2307, 2010Google ScholarCross Ref
Pantel, M., Pennacchiotti, M., Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations, Proc. ACL 2006, p. 113--120, 2006 Google ScholarDigital Library
Qi, B., Zhao, C., Youn, E., Nansen, C., Use of weighting algorithms to improve traditional support vector machine based classifications of reflectance data Optics Express, Vol. 19, No. 27, pp. 26816--26826, 2011Google ScholarCross Ref
Radev, D., Mihalcea, R., Network and Natural Language Processing, AI Magazine, pp. 16--28, 2008 Google ScholarDigital Library
Sadamitsu, K., Saito, K., Imamura, K., Kikui, G., Entity Set Expansion using Topic information, Proc. ACL 2011, pp. 726--731, 2011 Google ScholarDigital Library
Salperwyck, C., Lemaire, V., Learning with few examples: An empirical study on leading classifiers, Proc. International Joint Conference on Neural Networks, pp. 1010--1019, 2011Google ScholarCross Ref
Shen, K.-Q., Ong, C.-J., Li, X.-P., Wilder-Smith, E. P. V., Feature selection via sensitivity analysis ov SVM probabilistic outputs, Machine Learning, Vol. 70, pp. 1--20, 2008 Google ScholarDigital Library
Taira, H., Haruno, M., Feature Selection in SVM Text Classification, Proc. AAI99, pp. 480--486, 1999 Google ScholarDigital Library
Waske, B., van der Linden, S., Benediktsson, J. A., Rabe, A., Hostert, P., Sensitivity of Support Vector Machines to Random Feature Selection in Classification of Hyperspectral Data, IEEE Trans Geoscience and Remote Sensing, Vol. 48, No. 7, pp. 2880--2889, 2010Google ScholarCross Ref
Yang, Y., Pedersen, J. O., A Comparative Study on Feature Selection in Text Categorization, Proceedings of ICML97, pp. 412--420, 1997 Google ScholarDigital Library
Angrosh, M. A., Craneeld, S., and Stanger, N.: Context identication of sentences in related work sections using a conditional random field: towards intelligent digital libraries, Proc. of the 10th annual joint conference on Digital libraries, JCDL '10, pp. 293--302, 2010 Google ScholarDigital Library
Toshihiko Sakai, J. Zeng, B. Flanagan, T. Nakatoh and S. Hirokawa, Descriminant Words for Problems in Scientific Articles, Proc. IIAI/ACIS International Symposium on Innovative E-Services and Information Systems (IEIS 2012), pp. 267--271, 2012 Google ScholarDigital Library
Masaki Murata, Hiroki Tanji, Kazuhide Yamamoto, Stijn De Saeger, Yasunori Kakizawa, and Kentaro Torisawa, Extraction from the Web of Articles Describing Problems, Their Solutions, and Their Causes, Proc. IEICE Transactions on Information and Systems Vol. E94-D, No. 3, pp. 734--737, 2011Google Scholar
Hiroyuki Sakai, Hirofumi Nonaka, Shigeru Masuyama, Extraction of Information on the Technical Effect from a Patent Document, The Japanese Society for Artificial Intelligence, Vol. 24, No. 6, pp. 531--540, 2009(in Japanese)Google ScholarCross Ref
Akihiro Shinmori, Manabu Okumura, Yuzo Marukawa, Makoto Iwayama, Structure Analysis of Japanese Patent Claims Using Clue Phrases, IPSJ Journal 45(3), pp. 891--905, 2004(in Japanese)Google Scholar
Taku Kudoh and Yuji Matsumoto, Japanese Dependency Structure Analysis based on Support Vector Machine, IEICE technical report, Natural language understanding and models of communication 100(201), pp. 25--32, 2000(in Japanese)Google Scholar
Nguyen, M. H., De la Torre, F., Optimal Feature Selection for Support Vector Machines, Journal of Pattern Recognition archive, Vol. 43, No. 3, pp. 584--591, 2010 Google ScholarDigital Library

Index Terms

Feature words that classify problem sentence in scientific article
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Classification of Imbalanced Documents by Feature Selection
ICCDA '17: Proceedings of the International Conference on Compute and Data Analysis

We previously worked on category classification problem of reuter 's newspaper article using SVM and feature selection. In the study, feature selection by SVM-score [Sakai, Hirokawa, 2012] showed high accuracy. It was also expected to be superior to ...
Read More
Efficient Feature Representation Based on the Effect of Words Frequency for Arabic Documents Classification
ICTCE '18: Proceedings of the 2nd International Conference on Telecommunications and Communication Engineering

This paper is based on the influence of the frequency of words in the classification of Arabic documents, its effects on the representation of characteristics namely Bag of word (Bow) and Term frequency- Inverse Documents Frequency (TF-IDF). Three ...
Read More
Feature selection based on measurement of ability to classify subproblems

Feature selection is important and necessary especially for processing large scale data. Existing feature selection methods generally compute a discriminant value with respect to class variable for a feature to indicate its classification ability. Such ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
December 2012
432 pages
ISBN:9781450313063
DOI:10.1145/2428736
General Chair:
Eric Pardede
La Trobe University, Australia
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 December 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SVM
feature selection
problem sentence
scientific article
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 232
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Feature words that classify problem sentence in scientific article

IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services

ABSTRACT

References

Cited By

Index Terms

Recommendations

Classification of Imbalanced Documents by Feature Selection

Efficient Feature Representation Based on the Effect of Words Frequency for Arabic Documents Classification

Feature selection based on measurement of ability to classify subproblems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Feature words that classify problem sentence in scientific article

IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services

ABSTRACT

References

Cited By

Index Terms

Recommendations

Classification of Imbalanced Documents by Feature Selection

Efficient Feature Representation Based on the Effect of Words Frequency for Arabic Documents Classification

Feature selection based on measurement of ability to classify subproblems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media