Skip to main content

Part of the book series: Springer Handbooks ((SHB))

  • 9895 Accesses

Abstract

In this chapter, data mining and knowledge discovery (DMKD) is presented with basic concepts, a brief history of its evolution, mathematical foundations, and usable techniques, along with the data warehouse and the decision support system (DSS). First, dataset and knowledge will be defined and elucidated as under DMKD. DMKD is a discovery process with different hierarchies, granularities, and/or scales. For a set of concepts that may be best understood if being viewed and explained from various perspectives, the chapter starts with a definition followed by a table explaining DMKD from different views (Sect. 5.1). The evolution of DMKD is then briefly tracked from the rapid advance in massive data to the birth of DMKD (Sect. 5.2). Some mathematical foundations are given in Sect. 5.3, i.e. probability theory, statistics, fuzzy set, rough set, data fields, and cloud models. Section 5.4 introduces some usable DMKD techniques. DMKD is used to discover a set of rules and exceptions with association, classification, clustering, prediction, discrimination, and exception detection. In Sects. 5.5 and 5.6, data warehouses and decision support systems are given. The first one mentioned is one of the data sources for DMKD, and DMKD is a new technique to assist the latter with a task. Finally, trends and perspectives are summarized and forecasted into two promising fields, web mining and spatial data mining (Sect. 5.7).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 269.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 349.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

BI:

business intelligence

DBMS:

database management system

DMKD:

data mining and knowledge discovery

DSS:

decision support system

ERP:

enterprise resource planning

ETL:

extraction, transformation, and loading

GIS:

Geographic Information System

GNSS:

Global Navigation Satellite System

HTML:

Hypertext Markup Language

OLAP:

Online analytical processing

OLTP:

online transactional processing

RS:

remote sensing

SDMKD:

spatial data mining and knowledge discovery

SDSS:

spatial decision support system

References

  1. J. Wang: Encyclopedia of Data Warehousing and Mining (Idea Group Reference, Hershey 2006)

    Google Scholar 

  2. J. Han, M. Kamber: Data Mining: Concepts and Techniques, 2nd edn. (Academic, San Francisco 2001)

    Google Scholar 

  3. D.R. Li, S.L. Wang, D.Y. Li: Theories and Applications of Spatial Data Mining (Science Press, Beijing 2006)

    Google Scholar 

  4. S. Shekar, H. Xiong (Eds.): Encyclopedia of GIS (Springer, New York 2007)

    Google Scholar 

  5. M.W. Berry: Survey of Text Mining: Clustering, Classification, and Retrieval Scanned by Velocity (Springer, Berlin Heidelberg 2004)

    Google Scholar 

  6. T. Dasu: Exploratory Data Mining and Data Cleaning (Wiley, New York 2003)

    Book  Google Scholar 

  7. M. Ester, A. Frommelt, H.-P. Kriegel, J. Sander: Spatial data mining: databases primitives, algorithms and efficient DBMS support, Data Min. Knowl. Discov. 4, 193–216 (2000)

    Article  Google Scholar 

  8. K. Thearling: An Introduction to Data Mining (Vertex Business Services, Richardson 2001)

    Google Scholar 

  9. S.L. Wang: Data field and cloud model-based spatial data mining and knowledge discovery. Dissertation, Wuhan University, Wuhan (2002)

    Google Scholar 

  10. J. Wang: Data Mining: Opportunities and Challenges (Idea Group Reference, Hershey 2002)

    Book  Google Scholar 

  11. D.R. Li, T. Cheng: KDG: Knowledge discovery from GIS – Propositions on the use of KDD in an intelligent GIS, Proc. ACTES, Can. Conf. GIS (1994)

    Google Scholar 

  12. W.H. Inmon: Building the Data Warehouse (QED, London 1992)

    Google Scholar 

  13. R. Kimball: The Data Warehouse Lifecycle Toolkit (Wiley, New York 2008)

    Google Scholar 

  14. W.H. Inmon: Tech Topic: What is a Data Warehouse?, Vol. 1 (Prism Solutions, Brighton 1995)

    Google Scholar 

  15. W.H. Inmon: Building the Data Warehouse, 4th edn. (Wiley, New York 2005)

    Google Scholar 

  16. F. Burstein, C.W. Holsapple: Handbook of Decision Support System (Springer, Berlin Heidelberg 2008)

    Google Scholar 

  17. D.J. Power: A Brief History of Decision Support Systems, (DSSResources.COM, Cedar Falls 2007) available at http://DSSResources.COM/history/dsshistory.html, version 4.0 (March 10, 2007)

  18. P.J. Densham, M.F. Goodchild: Spatial decision support systems: A research agenda, Proc. GIS/LISʼ89, Orlando (1989) pp. 707–716

    Google Scholar 

  19. A.M. Arthurs: Probability Theory (Dover, London 1965)

    Google Scholar 

  20. G. Shafer: A Mathematical Theory of Evidence (Princeton Univ. Press, Princeton 1976)

    Google Scholar 

  21. S.K. Thompson: Sampling (Wiley, New York 1992)

    Google Scholar 

  22. N. Cressie: Statistics for Spatial Data (Wiley, New York 1993)

    Google Scholar 

  23. J. Grabmeier, A. Rudolph: Techniques of cluster algorithms in data mining, Data Min. Knowl. Discov. 6, 303–360 (2002)

    Article  Google Scholar 

  24. L.A. Zadeh: The concept of linguistic variable ant its application to approximate reasoning, Inform. Sci. 8, 199–249 (1975)

    Article  Google Scholar 

  25. Z.Y. Wang, G.J. Klir: Fuzzy Measure Theory (Plenum, New York 1992)

    Google Scholar 

  26. L. Polkowski, S. Tsumoto, T.Y. Lin: Rough Set Methods and Applications (Physica, Heidelberg 2000)

    Google Scholar 

  27. Z. Pawlak: Rough Sets: Theoretical Aspects of Reasoning About Data (Kluwer, Dordrecht 1991)

    Google Scholar 

  28. L. Polkowski, A. Skowron: Rough Sets in Knowledge Discovery 1 (Physica, Heidelberg 1998)

    Google Scholar 

  29. L. Polkowski, A. Skowron: Rough Sets in Knowledge Discovery 2 (Physica, Heidelberg 1998)

    Google Scholar 

  30. Y.Y. Yao, S.K.M. Wong, T.Y. Lin: A review of rough set models. In: Rough Sets and Data Mining Analysis for Imprecise Data, ed. by Y. Lin, N. Cercone (Kluwer, London 1997) pp. 47–75

    Chapter  Google Scholar 

  31. D.Y. Li, Y. Du: Artificial Intelligence with Uncertainty (National Defense Industry Press, Beijing 2005)

    Google Scholar 

  32. D.L. Olson, D. Dursun: Advanced Data Mining Techniques (Springer, Berlin Heidelberg 2008)

    Google Scholar 

  33. K.C. Di: Spatial Data Mining and Knowledge Discovery (Wuhan Univ. Press, Wuhan 2001)

    Google Scholar 

  34. D.R. Li, S.L. Wang, D.Y. Li, X.Z. Wang: Theories and techniques of spatial data mining and knowledge discovery, Geomat. Inf. Sci. Wuhan Univ. 27(3), 221–233 (2002)

    Google Scholar 

  35. D.T. Larose: Data Mining Methods and Models (Wiley, New York 2006)

    Google Scholar 

  36. T. Bayes: An essay toward solving a problem in the doctrine of chances, Philos. Trans. R. Soc. Lond. 53, 370–418 (1764)

    Google Scholar 

  37. J. Stutz, P. Cheeseman: A Short Exposition on Bayesian Inference and Probability (NASA Ames Research Centre, Data Learning Group, Moffett Field 1994)

    Google Scholar 

  38. J. James: Bayesʼ Theorem, Stanford Encyclopedia of Philosophy (Metaphysics Res. Lab, Stanford 2003)

    Google Scholar 

  39. N. Friedman, D. Geiger, M. Goldszmidt: Bayesian network classifiers, Mach. Learn. 29, 131–163 (1997)

    Article  Google Scholar 

  40. Daryle Niedermayer I.S.P.: An introduction to Bayesian networks and their contemporary applications, Innovations in Bayesian Networks (Springer, Berlin Heidelberg 2008) pp. 117–130

    Google Scholar 

  41. N. Friedman, M. Goldszmidt: Learning Bayesian Network from Data (SRI International, Menlo Park 1998)

    Google Scholar 

  42. D. Heckerman, D. Geiger: Learning with Bayesian Networks, Tech. Rep. MSR-TR-95-06 (Microsoft Research, Redmond 1995) available at http://research.microsoft.com/apps/pubs/default.aspx?id=69588

  43. D. Heckerman: Bayesian networks for data mining, Data Min. Knowl. Discov. 1, 79–119 (1997)

    Article  Google Scholar 

  44. S.L. Wang, X.Z. Wang: A Fuzzy Comprehensive Clustering Method ADMA 2007, Lecture Notes in Artifical Intelligence, Vol. 4632 (Springer, Berlin Heidelberg 2007) pp. 488–499

    Google Scholar 

  45. R.L. Winkler: An Introduction to Bayesian Inference and Decision (Holt Rinehart Winston, Toronto 1972)

    Google Scholar 

  46. S.L. Wang, H.N. Yuan, G. Chen, D.R. Li, W.Z. Shi: Rough spatial interpretation, Lect. Notes Artif. Int. 3066, 435–444 (2004)

    Google Scholar 

  47. S. Shekar, C.T. Lu, P. Zhang: A unified approach to detecting spatial outliers, GeoInformatica 7(2), 139–166 (2003)

    Article  Google Scholar 

  48. D. Hawkins: Identifications of Outliers (Chapman Hall, London 1980)

    Google Scholar 

  49. P. Rousseeuw, A. Leroy: Robust Regression and Outlier Detection, 3rd edn. (Wiley, New York 1996)

    Google Scholar 

  50. V.J. Hodge, J. Austin: A survey of outlier detection methodologies, Artif. Int. Rev. 22, 85–126 (2004)

    Article  Google Scholar 

  51. S. Ramaswamy, R. Rastogi, K. Shim: Efficient algorithms for mining outliers from large datasets, Proc. ACM SIGMOD Conf. Manag. Data, Dallas (2000) pp. 427–438

    Google Scholar 

  52. A. Jøsang, R. Ismail, C. Boyd: A survey of trust and reputation systems for online service provision, Decis. Support Syst. 43, 618–644 (2007)

    Article  Google Scholar 

  53. D.J. Power: Decision Support Systems: Concepts and Resources for Managers (Quorum, Westport 2002)

    Google Scholar 

  54. The Internet of Things Council: http://www.theinternetofthings.eu/ (last accessed July 1, 2010)

  55. B. Liu: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Springer, Berlin Heidelberg 2007)

    Google Scholar 

  56. Z. Markov, D.T. Larose: Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage (Wiley, New York 2007)

    Google Scholar 

  57. A. Barabási, E. Bonabeau: Scale-free networks, Sci. Am. 288, 60–69 (2003)

    Article  Google Scholar 

  58. D.J. Watts, S.H. Strogatz: Collective dynamics of small world networks, Nature 393, 440–442 (1998)

    Article  Google Scholar 

  59. J. Srivastava, R. Cooleyz, M. Deshpande, P.-N. Tan: Web usage mining, ACM SIGKDD Explor. 1(2), 12–23 (2000)

    Article  Google Scholar 

  60. D.R. Li, Z.Q. Guan: Integration and Realization of Spatial Information System (Wuhan Univ. Press, Wuhan 2002)

    Google Scholar 

  61. R. Haining: Spatial Data Analysis: Theory and Practice (Cambridge Univ. Press, Cambridge 2003)

    Book  Google Scholar 

  62. H.J. Miller, J. Han: Geographic Data Mining and Knowledge Discovery, 2nd edn. (CRC, Boca Raton 2009)

    Book  Google Scholar 

  63. D.R. Li, S.L. Wang, W.Z. Shi, X.Z. Wang: On spatial data mining and knowledge discovery (SDMKD), Geomat. Infor. Sci. Wuhan Univ. 26(6), 491–499 (2001)

    Google Scholar 

  64. F. Giannotti, D. Pedreschi: Mobility, Data Mining: Geographic Knowledge Discovery (Springer, Berlin Heidelberg 2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shuliang Wang or Wenzhong Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag

About this chapter

Cite this chapter

Wang, S., Shi, W. (2011). Data Mining and Knowledge Discovery. In: Kresse, W., Danko, D. (eds) Springer Handbook of Geographic Information. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72680-7_5

Download citation

Publish with us

Policies and ethics