Data Mining Methods Applied to a Digital Forensics Task for Supervised Machine Learning

Tallón-Ballesteros, Antonio J.; Riquelme, José C.

doi:10.1007/978-3-319-05885-6_17

Antonio J. Tallón-Ballesteros⁶ &
José C. Riquelme⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 555))

1775 Accesses
8 Citations

Abstract

Digital forensics research includes several stages. Once we have collected the data the last goal is to obtain a model in order to predict the output with unseen data. We focus on supervised machine learning techniques. This chapter performs an experimental study on a forensics data task for multi-class classification including several types of methods such as decision trees, bayes classifiers, based on rules, artificial neural networks and based on nearest neighbors. The classifiers have been evaluated with two performance measures: accuracy and Cohen’s kappa. The followed experimental design has been a 4-fold cross validation with thirty repetitions for non-deterministic algorithms in order to obtain reliable results, averaging the results from 120 runs. A statistical analysis has been conducted in order to compare each pair of algorithms by means of t-tests using both the accuracy and Cohen’s kappa metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Caddy, B.: Forensic Examination of Glass and Paint: Analysis and Interpretation. Taylor & Francis, London (2011)
Google Scholar
Mumford, C.L., Jain, L.C. (eds.): Computational Intelligence. ISRL, vol. 1. Springer, Heidelberg (2009)
MATH Google Scholar
Popescu, A.C., Farid, H.: Statistical Tools for Digital Forensics. In: Fridrich, J. (ed.) IH 2004. LNCS, vol. 3200, pp. 128–147. Springer, Heidelberg (2004)
Chapter Google Scholar
Kessler, G.C.: Advancing the Science of Digital Forensics. Computer 45(12), 25–27 (2012)
Article Google Scholar
Stuart, B.H.: Forensic Analytical Techniques. John Wiley & Sons, West Sussex (2013)
Google Scholar
Curran, J.M., Hicks, T.N., Buckleton, J.S.: Forensic Interpretation of Glass Evidence. CRC Press, Boca Raton (2000)
Google Scholar
Newton, A.W.N., Kitto, L., Buckleton, J.S.: A study of the performance and utility of annealing in forensic glass analysis. Forensic Science International 155, 119–125 (2005)
Article Google Scholar
Winstanley, R., Rydeard, C.: Concepts of annealing applied to small glass fragments. Forensic Science International 29, 1–10 (1985)
Article Google Scholar
Terry, K.W., van Riessen, A., Lynch, B.F., Vowles, D.J.: Quantitative analysis of glasses used within Australia. Forensic Science International 25, 19–34 (1984)
Article Google Scholar
Zadora, G.: Classification of Glass Fragments Based on Elemental Composition and Refractive Index. Journal of Forensic Science 54(1), 49–59 (2009)
Article Google Scholar
Ahmad, U.K., Asmuje, N.F., Ibrahim, R., Kamaruzamanc, N.U.: Forensic Classification of Glass Employing Refractive Index Measurement. Malaysian Journal of Forensic Sciences 3(1), 1–4 (2012)
Google Scholar
Zadora, G., Brozek-Mucha, Z., Parczewski, A.: A classification of glass microtraces. Problems of Forensic Sciences XLVII, 137–143 (2001)
Google Scholar
Grainger, M.N.C., Manley-Harris, M., Coulson, S.: Classification and discrimination of automotive glass using LA-ICP-MS. Journal of Analytical Atomic Spectrometry 27, 1413–1422 (2012)
Article Google Scholar
Uzkent, B., Barkana, B.D., Cevikalp, H.: Non-speech environmental sound classification using SVMs with a new set of features. International Journal of Innovative Computing, Information and Control 8(5B), 3511–3524 (2012)
Google Scholar
Bottrell, M.C.: Forensic Glass Comparison: Background Information Used in Data Interpretation. Forensic Science Communications 11(2) (2009)
Google Scholar
Koons, R.D., Buscaglia, J., Bottrell, M., Miller, E.T.: Forensic glass comparisons. In: Saferstein, R. (ed.) Forensic Science Handbook, 2nd edn., vol. I, pp. 161–213. Prentice Hall, Upper Saddle River (2002)
Google Scholar
Evett, I.W., Spiehler, E.J.: Rule induction in forensic science. In: Knowledge Based Systems in Government, pp. 152–160. Halsted Press, London (1988)
Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2010), http://archive.ics.uci.edu/ml
Buscema, M.: Artificial Adaptive Systems in Data Visualization: Proactive Data. In: Buscema, M., Tastle, W. (eds.) Intelligent Data Mining in Law Enforcement Analytics: New Neural Networks Applied to Real Problems, pp. 51–88 (2013)
Google Scholar
Parvin, H., Minaei-Bidgoli, B., Shahpar, H.: Classifier Selection by Clustering. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Ben-Youssef Brants, C., Hancock, E.R. (eds.) MCPR 2011. LNCS, vol. 6718, pp. 60–66. Springer, Heidelberg (2011)
Chapter Google Scholar
Murty, M.N., Devi, V.S.: Pattern Recognition. An Algorithmic Approach. Universities Press (India), Pvt. Ltd., London (2011)
MATH Google Scholar
Dougherty, G.: Pattern Recognition and Classification: An Introduction. Springer, New York (2013)
Book Google Scholar
Murthy, S.K.: Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey. Data Mining and Knowledge Discovery 2, 345–389 (1998)
Article Google Scholar
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth Int. Group, Belmont (1984)
MATH Google Scholar
Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers, San Francisco (1998)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, Waltham (2011)
Google Scholar
Cohen, W.: Fast effective rule induction. In: Proc. of the 12th Int. ICML Conf., pp. 115–123 (1995)
Google Scholar
Michie, D., Spiegelhalter, D.J.: Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York (1994)
MATH Google Scholar
Haykin, S.O.: Neural Networks and Learning Machines. Prentice Hall, Upper Saddle River (2009)
Google Scholar
Bishop, M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)
Google Scholar
Howlett, R.J., Jain, L.C.: Radial Basis Function Networks 1: Recent Developments in Theory and Applications. Springer, Heidelberg (2001)
Book Google Scholar
Fix, E., Hodges, J.: Discriminatory analysis, nonparametric discrimination: consistency properties. Tech. Rep. 4, USAF School of Aviation Medicine, Randolph Field, Texas (1951)
Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Boston (2005)
Google Scholar
Boularias, A., Chaib-draa, B.: Apprenticeship learning with few examples. Neurocomputing 104, 83–96 (2013)
Article Google Scholar
Bargiela, A., Pedrycz, W.: A model of granular data: a design problem with the Tchebyschev FCM. Soft Computing 9(3), 155–163 (2005)
Article MATH Google Scholar
Hjorth, J.S.U.: Computer intensive statistical methods: Validation model selection and bootstrap. Chapman and Hall, London (1994)
MATH Google Scholar
Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI 1995), Montreal, Quebec, Canada, vol. 2, pp. 1137–1145 (1995)
Google Scholar
Flach, P.: Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press, United Kingdom (2012)
Book Google Scholar
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, USA (2011)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Article Google Scholar
Tallón-Ballesteros, A.J., Hervás-Martínez, C., Riquelme, J.C., Ruiz, R.: Feature selection to enhance a two-stage evolutionary algorithm in product unit neural networks for complex classification problems. Neurocomputing 114, 107–117 (2013)
Article Google Scholar
Nisbet, R., Elder, J.F., Miner, G.: Handbook of Statistical Analysis and Data Mining Applications. Academic Press, Canada (2009)
MATH Google Scholar
Silva, J.A., Hruschka, E.R.: An experimental study on the use of nearest neighbor-based imputation algorithms for classification tasks. Data & Knowledge Engineering 84, 47–58 (2013)
Article Google Scholar
Wang, Y., Cao, F., Yuan, Y.: A study on effectiveness of extreme learning machine. Neurocomputing 74, 2483–2490 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Languages and Computer Systems, University of Seville, Reina Mercedes Avenue, Seville, 41012, Spain
Antonio J. Tallón-Ballesteros & José C. Riquelme

Authors

Antonio J. Tallón-Ballesteros
View author publications
You can also search for this author in PubMed Google Scholar
José C. Riquelme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio J. Tallón-Ballesteros .

Editor information

Editors and Affiliations

Department of Software Engineering Faculty of Information and, Technical University of Malaysia Melaka (UTeM), Durian Tunggal, Malaysia
Azah Kamilah Muda
Department of Software Engineering Faculty of Info. and Comm. Tech., Technical University of Malaysia Melaka (UTeM), Durian Tunggal, Malaysia
Yun-Huoy Choo
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, Washington, Washington, USA
Ajith Abraham
Dept. of Computer Sci. and Engineering Center of Excellence for Document, The State University of New York SUNY, Buffalo, New York, USA
Sargur N. Srihari

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tallón-Ballesteros, A.J., Riquelme, J.C. (2014). Data Mining Methods Applied to a Digital Forensics Task for Supervised Machine Learning. In: Muda, A., Choo, YH., Abraham, A., N. Srihari, S. (eds) Computational Intelligence in Digital Forensics: Forensic Investigation and Applications. Studies in Computational Intelligence, vol 555. Springer, Cham. https://doi.org/10.1007/978-3-319-05885-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-05885-6_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05884-9
Online ISBN: 978-3-319-05885-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics