ABSTRACT
We propose a math-aware search engine that is capable of handling both textual keywords as well as mathematical expressions. Our math feature extraction and representation framework captures the semantics of math expressions via a Finite State Machine model. We adapt the passive aggressive online learning binary classifier as the ranking model. We benchmarked our approach against three classical information retrieval (IR) strategies on math documents crawled from Math Overflow, a well-known online math question answering system. Experimental results show that our proposed approach can perform better than other methods by more than 9%.
- A. Andrea, G. Ferruccio, C. C. Sacerdoti, T. Enrico, and Z. Stefano. A content based mathematical search engine: Whelp. In TYPES, pages 17--32, 2004. Google ScholarDigital Library
- R. Ausbrooks, S. Buswell, S. Dalmas, S. Devitt, A. Diaz, R. Hunter, B. Smith, N. Soiffer, R. Sutor, and S. Watt. Mathematical markup language (mathml) version 2.0, 2000.Google Scholar
- P. N. Bennett, K. El-Arini, T. Joachims, and K. M. Svore. Sigir '11 workshop report: Enriching information retrieval, 2011.Google Scholar
- H. Block. The perceptron: A model for brain functioning. Rev. Modern Phys., 34:123--135, 1962.Google ScholarCross Ref
- S. Buswell, O. Caprotti, D. P. Carlisle, M. C. Dewar, M. Gaetano, and M. Kohlhase. The Open Math standard version 2.0. 2004.Google Scholar
- Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '06, pages 186--193, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, pages 551--585, 2006. Google ScholarDigital Library
- M. Dredze, K. Crammer, and F. Pereira. Confidence-weighted linear classification. In ICML '08: Proceedings of the 25th international conference on Machine learning, pages 264--271, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- J. Gao, H. Qi, X. Xia, and J.-Y. Nie. Linear discriminant model for information retrieval. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '05, pages 290--297, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- R. Guillén. Gir with language modeling and dfr using terrier. In Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access, CLEF '08, pages 822--829, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarDigital Library
- K. S. Jones. Index term weighting. Information Storage and Retrieval, 9(11):619--633, 1973.Google ScholarCross Ref
- A. Kohlhase and M. Kohlhase. Reexamining the mkm value proposition: From math web search to math web research. In Calculemus '07 / MKM '07: Proceedings of the 14th symposium on Towards Mechanized Mathematical Assistants, pages 313--326, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarDigital Library
- M. Kohlhase and I. Sucan. A search engine for mathematical formulae. In J. Calmet, T. Ida, and D. Wang, editors, AISC '06: Proceedings of 8th International Conference on Artificial Intelligence and Symbolic Computation, pages 241--253. Springer-Verlag, 2006. Google ScholarDigital Library
- M. Kohlhase and I. A. Sÿucan. A search engine for mathematical formulae. In Proc. of Artificial Intelligence and Symbolic Computation, number 4120 in LNAI, pages 241--253. Springer, 2006. Google ScholarDigital Library
- P. Libbrecht and E. Melis. Methods to access and retrieve mathematical content and activemath. In ICMS '06: In Proceeding of the 2nd International Congress on Mathematical Software, 2006. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
- B. R. Miller and A. Youssef. Augmenting presentation mathml for search. MKM '08: Proceedings of the 7th International Conference on Mathematical Knowledge Management, pages 536--542, 2008. Google ScholarDigital Library
- R. Miner and R. Munavalli. An approach to mathematical search through query formulation and data normalization. In Calculemus '07 / MKM '07: Proceedings of the 14th symposium on Towards Mechanized Mathematical Assistants, pages 342--355, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarDigital Library
- R. Munavalli and R. Miner. Mathfind: a math-aware search engine. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 735--735, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- T. T. Nguyen, K. Chang, and S. C. Hui. Distribution-aware online classifiers. In T. Walsh, editor, IJCAI, pages 1427--1432. IJCAI/AAAI, 2011. Google ScholarDigital Library
- T. T. Nguyen, S. C. Hui, and K. Chang. A lattice-based approach for mathematical search using formal concept analysis. Expert Systems with Applications, 2011. Google ScholarDigital Library
- I. Ounis, G. Amati, V. Plachouras, B. He, C. MacDonald, and D. Johnson. Terrier information retrieval platform. Advances in Information Retrieval, 3408:517--519, 2005. Google ScholarDigital Library
- A. Singhal. Modern information retrieval: a brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 24:2001, 2001.Google Scholar
- L. Wang, J. Lin, and D. Metzler. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR '11, pages 105--114, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- A. S. Youssef. Roles of math search in mathematics. In J. M. Borwein and W. M. Farmer, editors, MKM '06: Proceedings of the 5th International Conference on Mathematical Knowledge Management, pages 2--16, Berlin Heidelberg, 2006. Springer-Verlag. Google ScholarDigital Library
- Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, pages 271--278, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- D. Zhang, R. Mao, H. Li, and J. Mao. How to count thumb-ups and thumb-downs?: an information retrieval approach to user-rating based ranking of items. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR '11, pages 1223--1224, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Index Terms
- A math-aware search engine for math question answering system
Recommendations
Answering math queries with search engines
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebConventional search engines such as Bing and Google provide a user with a short answer to some queries as well as a ranked list of documents, in order to better meet her information needs. In this paper we study a class of such queries that we call ...
Comments