Skip to main content

Generating Character-Based Templates for Log Data

  • Chapter
  • First Online:
Smart Log Data Analytics

Abstract

Log line clusters usually lack meaningful descriptions that are required to understand the information provided by log lines within a cluster. Template generators allow to produce such descriptions in form of patterns that match all log lines within a cluster and therefore describe the common features, e.g., substrings, of the lines. Current approaches only allow the generation of token-based (e.g., space-separated words) templates, which are often inaccurate for log lines, because they usually do not account for existing string similarities in, for instance fully qualified system names or domain names. Consequently, novel character-based template generators are required that provide robust templates for any type of computer log data, which can be applied in security information and event management (SIEM) solutions, for continuous auditing, quality inspection and control. In this chapter, we propose a novel approach for computing character-based templates, which combines comparison-based methods and heuristics. To achieve this goal, we solve the problem of efficiently calculating a multi-line alignment for a group of log lines and compute an accurate approximation of the optimal character-based template.

Parts of this chapter have been published in [119].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A sequence alignment is the result of an algorithm that arranges two strings, so that the least number of operations (i.e., insertions, deletions, or replacements of characters) is required to transform one string into the other one, i.e., it assumes the highest possible similarity.

  2. 2.

    Note, the direction is also diagonal when a character should be replaced.

  3. 3.

    https://github.com/ait-aecid/aecid-template-generator.

References

  1. Wael H Gomaa and Aly A Fahmy. A survey of text similarity approaches. International Journal of Computer Applications, 68(13):13–18, 2013.

    Google Scholar 

  2. Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R Lyu. An evaluation study on log parsing and its use in log mining. In 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 654–661. IEEE, 2016.

    Google Scholar 

  3. D. Jurafsky and J.H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson international edition. Prentice Hall, 2009.

    Google Scholar 

  4. Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady, 10(8):707–710, 1966.

    Google Scholar 

  5. Saul B Needleman and Christian D Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 48(3):443–453, 1970.

    Google Scholar 

  6. Cédric Notredame. Recent evolutions of multiple sequence alignment algorithms. PLoS computational biology, 3(8):e123, 2007.

    Google Scholar 

  7. Markus Wurzenberger, Georg Höld, Max Landauer, Florian Skopik, and Wolfgang Kastner. Creating Character-based Templates for Log Data to Enable Security Event Classification. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, pages 141–152, 2020.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Skopik, F., Wurzenberger, M., Landauer, M. (2021). Generating Character-Based Templates for Log Data. In: Smart Log Data Analytics. Springer, Cham. https://doi.org/10.1007/978-3-030-74450-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74450-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74449-6

  • Online ISBN: 978-3-030-74450-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics