Skip to main content

Text Classification Techniques in Oil Industry Applications

  • Conference paper

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 239))

Abstract

The development of automatic methods to produce usable structured information from unstructured text sources is extremely valuable to the oil and gas industry. A structured resource would allow researches and industry professionals to write relatively simple queries to retrieve all the information regards transcriptions of any accident. Instead of the thousands of abstracts provided by querying the unstructured corpus, the queries on structured corpus would result in a few hundred well-formed results.

On this paper we propose and evaluate information extraction techniques in occupational health control process, particularly, for the case of automatic detection of accidents from unstructured texts. Our proposal divides the problem in subtasks such as text analysis, recognition and classification of failed occupational health control, resolving accidents.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)

    Google Scholar 

  2. Vapnik, V.: The nature of statistical learning theory. Springer (1995)

    Google Scholar 

  3. Deerwester, S., Dumais, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the Society for Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  4. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  5. Bloehdorn, S., Hotho, A.: Text Classification by Boosting Weak Learners based on Terms and Concepts. In: 4th IEEE International Conference on Data Mining, ICDM 2004 (2004)

    Google Scholar 

  6. Nagarajan, M., Sheth, A.P., Aguilera, M., Keeton, K., Merchant, A., Uysal, M.: Altering Document Term Vectors for Classification - Ontologies as Expectations of Co-occurrence. LSDIS Technical Report (November 2006)

    Google Scholar 

  7. Fang, J., Guo, L., Wang, X., Yang, N.: Ontology-Based Automatic Classification and Ranking for Web Documents. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), pp. 627–631 (2007)

    Google Scholar 

  8. Camous, F., Blott, S., Smeaton, A.F.: Ontology-based MEDLINE document classification. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 439–452. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Gabrilovich, E., Markovitch, S.: Overcomingthe Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. In: 21st National Conference on Artificial Intelligence, Boston, MA, USA (2006)

    Google Scholar 

  10. Wu, S.-H., Tsai, T.-H., Hsu, W.-L.: Text categorization using automatically acquired domain ontology. In: 6th International Workshop on Information Retrieval with Asian Languages, Sapporo, Japan, vol. 11 (2003)

    Google Scholar 

  11. Sheth, A.P., Bertram, C., Avant, D., Hammond, B., Kochut, K.J., Warke, Y.: Semantic Content Management for Enterprises and the Web. IEEE Internet Computing (July/August 2002)

    Google Scholar 

  12. Hammond, B., Sheth, A.P., Kochut, K.J.: Semantic Enhancement Engine: A Modular Document Enhancement Platform for Semantic Applications over Heterogeneous Content. In: Real World Semantic Web Applications. IOS Press (2002)

    Google Scholar 

  13. Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5, 199–220 (1993)

    Article  Google Scholar 

  14. Sheth, A.P., Arpinar, I.B., Kashyap, V.: Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships. In: Nikravesh, M., Azvin, B., Yager, R., Zadeh, L. (eds.) Enhancing the Power of the Internet. Stud Fuzz. Springer (2003)

    Google Scholar 

  15. Gospodnetic, O., Hatcher, E., McCandless, M.: Lucene in Action, 2nd edn. Manning Publications (2009) ISBN 1-9339-8817-7

    Google Scholar 

  16. DicSin: Dicionário de Sinônimos Português Brasil. Apache OpenOffice.org (2013), http://extensions.openoffice.org/en/project/DicSin-Brasil

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nayat Sanchez-Pi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sanchez-Pi, N., Martí, L., Garcia, A.C.B. (2014). Text Classification Techniques in Oil Industry Applications. In: Herrero, Á., et al. International Joint Conference SOCO’13-CISIS’13-ICEUTE’13. Advances in Intelligent Systems and Computing, vol 239. Springer, Cham. https://doi.org/10.1007/978-3-319-01854-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01854-6_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01853-9

  • Online ISBN: 978-3-319-01854-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics