Skip to main content

Provenance Based Web Search

  • Conference paper
Intelligent Informatics

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 182))

Abstract

During web search, we often end up with untrusted, duplicates and near duplicate search results which dilutes the focus of search query. Factors that may influence the trust of web search results shall be referred to as ’Provenance’. Provenance is basically the information about the history of data. In this paper, we propose a provenance model which uses both content based and trust based factors in identifying trusted search results. The novelty of our idea lies in attempting to construct a provenance matrix which encompasses 6 factors (who, where, when, what, why, how) related to the search results. Inferences performed over the provenance matrix leads to trust score which is then utilized to remove near-duplicates and retrieve trusted search results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Singh, B., Singh, H.K.: Web Data Mining Research: A Survey. In: Computational Intelligence and Computing Research (ICCIC), pp. 1–10 (2010)

    Google Scholar 

  2. Hartig, O.: Provenance Information in the Web of Data. In: Proceedings of the Linked Data on the Web (LDOW) Workshop at the World Wide Web Conference (WWW), Madrid, Spain, pp. 1–7 (April 2009)

    Google Scholar 

  3. Ma, Q., Miyamori, H., Kidawara, Y., Tanaka, K.: Content-coverage Based Trust-oriented Evaluation Method for Information Retrieval. In: Proceedings of the Second International Conference on Semantics, Knowledge, and Grid (SKG 2006), pp. 22–26 (2006)

    Google Scholar 

  4. Li, X., Yang, Q., Zeng, L.: Clustering Web Retrieval Results Accompanied by Removing Duplicate Documents. In: 2010 International Conference on Web Information Systems and Mining, pp. 259–261 (2010)

    Google Scholar 

  5. Bollegala, D., Matsuo, Y., Ishizuka, M.: A Web Search Engine-Based Approach to Measure Semantic Similarity between Words. IEEE Transactions on Knowledge and Data Engineering 23, 977–990 (2011)

    Article  Google Scholar 

  6. Anderson, N.: Putting Search in Context: Using Dynamically-Weighted Information Fusion to Improve Search Results. In: 2011 Eighth International Conference on Information Technology, pp. 66–71 (2011)

    Google Scholar 

  7. Pandey, S.K., Mishra, R.B.: Intelligent Web Mining Model to Enhance Knowledge Discovery on the Web. In: Proceedings of the Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 339–343 (2006)

    Google Scholar 

  8. Taylan, D., Poyraz, M., Akyokuş, S., Ganiz, M.C.: Intelligent Focused Crawler: Learning which Links to crawl, pp. 504–508. IEEE (2011)

    Google Scholar 

  9. Tanaka, K.: Knowledge Search and Trust-oriented Search. In: International Conference on Informatics Education and Research for Knowledge-Circulating Society, pp. 81–86 (2008)

    Google Scholar 

  10. Huang, C., Chen, Y., Wang, W., Cui, Y., Wang, H., Du, N.: A novel social search model based on trust and popularity. In: Proceedings of IC-BNMT, pp. 1030–1034 (2010)

    Google Scholar 

  11. Vasquez, I., Gomadam, K., Patterson, S.: Data Provenance in next-gen information systems: Adding, extracting and analyzing information in the Web services domain (2008)

    Google Scholar 

  12. Syed Mudhasir, Y., Deepika, J., Sendhilkumar, S., Mahalakshmi, G.S.: Near-Duplicates De-tection and Elimination Based on Web Provenance for Effective Web Search. (IJIDCS) International Journal on Internet and Distributed Computing Systems 1(1), 22–32 (2011)

    Google Scholar 

  13. Subhashini, R., Akilandeswari, J.: A Survey On Ontology Construction Methodologies. International Journal of Enterprise Computing and Business Systems 1(1), 60–72 (2011)

    Google Scholar 

  14. Biryukov, M., Wang, Y.: Classification of Personal Names with Application to DBLP. In: Third International Conference on Digital Information Management (ICDIM), pp. 131–137 (2008)

    Google Scholar 

  15. Beel, J., Gipp, B.: Google Scholar’s ranking algorithm: The impact of citation counts (An empirical study). In: Third International Conference on Research Challenges in Information Science (RCIS), pp. 439–446 (2009)

    Google Scholar 

  16. Poomagal, S., Hamsapriya, T.: K-Means for Search Results Clustering Using URL and Tag Contents. In: International Conference on Process Automation, Control and Computing (PACC), pp. 1–7 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ajitha Robert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Robert, A., Sendhilkumar, S. (2013). Provenance Based Web Search. In: Abraham, A., Thampi, S. (eds) Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32063-7_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32063-7_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32062-0

  • Online ISBN: 978-3-642-32063-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics