Generating Character-Based Templates for Log Data

Skopik, Florian; Wurzenberger, Markus; Landauer, Max

doi:10.1007/978-3-030-74450-2_4

Florian Skopik⁴,
Markus Wurzenberger⁴ &
Max Landauer⁴

702 Accesses

Abstract

Log line clusters usually lack meaningful descriptions that are required to understand the information provided by log lines within a cluster. Template generators allow to produce such descriptions in form of patterns that match all log lines within a cluster and therefore describe the common features, e.g., substrings, of the lines. Current approaches only allow the generation of token-based (e.g., space-separated words) templates, which are often inaccurate for log lines, because they usually do not account for existing string similarities in, for instance fully qualified system names or domain names. Consequently, novel character-based template generators are required that provide robust templates for any type of computer log data, which can be applied in security information and event management (SIEM) solutions, for continuous auditing, quality inspection and control. In this chapter, we propose a novel approach for computing character-based templates, which combines comparison-based methods and heuristics. To achieve this goal, we solve the problem of efficiently calculating a multi-line alignment for a group of log lines and compute an accurate approximation of the optimal character-based template.

Parts of this chapter have been published in [119].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A sequence alignment is the result of an algorithm that arranges two strings, so that the least number of operations (i.e., insertions, deletions, or replacements of characters) is required to transform one string into the other one, i.e., it assumes the highest possible similarity.
2.
Note, the direction is also diagonal when a character should be replaced.
3.
https://github.com/ait-aecid/aecid-template-generator.

References

Wael H Gomaa and Aly A Fahmy. A survey of text similarity approaches. International Journal of Computer Applications, 68(13):13–18, 2013.
Google Scholar
Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R Lyu. An evaluation study on log parsing and its use in log mining. In 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 654–661. IEEE, 2016.
Google Scholar
D. Jurafsky and J.H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson international edition. Prentice Hall, 2009.
Google Scholar
Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady, 10(8):707–710, 1966.
Google Scholar
Saul B Needleman and Christian D Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 48(3):443–453, 1970.
Google Scholar
Cédric Notredame. Recent evolutions of multiple sequence alignment algorithms. PLoS computational biology, 3(8):e123, 2007.
Google Scholar
Markus Wurzenberger, Georg Höld, Max Landauer, Florian Skopik, and Wolfgang Kastner. Creating Character-based Templates for Log Data to Enable Security Event Classification. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, pages 141–152, 2020.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Digital Safety & Security, Austrian Institute of Technology, Vienna, Austria
Florian Skopik, Markus Wurzenberger & Max Landauer

Authors

Florian Skopik
View author publications
You can also search for this author in PubMed Google Scholar
Markus Wurzenberger
View author publications
You can also search for this author in PubMed Google Scholar
Max Landauer
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Skopik, F., Wurzenberger, M., Landauer, M. (2021). Generating Character-Based Templates for Log Data. In: Smart Log Data Analytics. Springer, Cham. https://doi.org/10.1007/978-3-030-74450-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-74450-2_4
Published: 29 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74449-6
Online ISBN: 978-3-030-74450-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics