Is text compression by prefixes and suffixes practical?

Fraenkel, A. S.; Mor, M.; Perl, Y.

doi:10.1007/BF00264280

Is text compression by prefixes and suffixes practical?

Published: December 1983

Volume 20, pages 371–389, (1983)
Cite this article

Acta Informatica Aims and scope Submit manuscript

A. S. Fraenkel¹,
M. Mor¹ &
Y. Perl²^nAff3

77 Accesses
18 Citations
Explore all metrics

Summary

One approach to text compression is to replace high-frequency variable-length fragments of words by fixed-length codes pointing to a compression table containing these high-frequency fragments. It is shown that the problem of optimal fragment compression is NP-hard even if the fragments are restricted to prefixes and suffixes. This seems to be a simplest fragment compression problem which is NP-hard, since a polynomial algorithm for compressing by prefixes only (or suffixes only) has been found recently. Various compression heuristics based on using both prefixes and suffixes have been tested on large Hebrew and English texts. The best of these heuristics produce a net compression of some 37% for Hebrew and 45% for English using a prefix/suffix compression table of size 256.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text Index Compression

Computing Minimum Length Representations of Sets of Words of Uniform Length

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

Article 22 October 2019

References

Choueka, Y., Fraenkel, A.S., Perl, Y.: Polynomial construction of optimal prefix tables for text compression. Proc. 19th Annual Allerton Conference on Communication, Control and Computing, pp. 762–768, Oct. 1981
Cooper, D., Lynch, M.F.: Text compression using variable-to-fixed-length encodings. Tech. Report, Postgraduate School of Librarian-ship and Information Science, University of Sheffield, Western Bank, Sheffield S10 2TN, England
Fraenkel, A.S.: All about the Responsa Retrieval Project you always wanted to know but were afraid to ask, Expanded Summary. Proc. 3rd Symp. Legal Data Process. in Europe (Oslo 1975), pp. 131–141, Council of Europe, Strasbourg (1976). Reprinted in Jurimetrics J. 16 (3), 149–156 (1976); Informatica e Diritto II, 362–370 (1976)
Google Scholar
Gotlieb, D., Hagerth, S.A., Lehot, P.G.H., Rabinowitz, H.S.: A classification of compression methods and their usefulness for a large data processing center. National Comp. Conference 44, 453–458 (1975)
Google Scholar
Hagamen, W.D., Linden, D.J., Long, H.S., Weber, J.C.: Encoding verbal information as unique numbers. IBM Syst. J. 11, 278–315 (1972)
Google Scholar
Knuth. D.E.: The Art of Computer Programming, Vol. 1: Fundamental Algorithms, Addison-Wesley. Reading, MA, Second Printing, 1973
Google Scholar
Lichtenstein, D.: Planar satisfiability and its uses. SIAM J. on Computing 11, 329–343 (1982)
Google Scholar
Lynch, M.F.: Compression of bibliographic files using an adoption of run-length coding. Inform. Stor. Retr. 9, 207–214 (1973)
Google Scholar
Maier, D., Storer, J.A.: A note on the complexity of the superstring problem. Extended Abstract, Proc. Conference on Information Sciences and Systems, Dept. of Elect. Engr., The Johns Hopkins University, Baltimore, MD, pp. 52–56, 1978
Google Scholar
Mayne, A., James, E.B.: Information compression by factorising common strings. Computer J. 18, 157–160 (1975)
Google Scholar
McCarthy, J.P.: Automatic file compression, Intern. Computing Symp. 1973, North-Holland, Amsterdam, pp. 511–516, 1974
Google Scholar
Peterson, J.L.: Computer programs for detecting and correcting spelling errors. CACM 23, 676–687 (1980)
Google Scholar
Radhakrishnan, T.: Selection of prefix and postfix word fragments for data compression. Inform. Process. & Management 14, 97–106 (1978)
Google Scholar
Rodeh, M., Pratt, V.R., Even, S.: Linear algorithm for data compression via string matching. JACM 28, 16–24 (1981)
Google Scholar
Rubin, F.: Experiments in text file compression. CACM 19, 617–623 (1976)
Google Scholar
Schuegraf, E.J., Heaps, H.S.: Selection of equifrequent word fragments for information retrieval. Inform. Stor. Retr. 9, 697–711 (1973)
Google Scholar
Storer, J.A.: Toward an abstract theory of data compression. Extended Abstract. Proc. Conference on Information Sciences and Systems, Dept. of Elect. Engr., The Johns Hopkins University, Baltimore, MD, pp. 391–399, 1978
Google Scholar
Storer, J.A., Szymanski, T.G.: The macro model for data compression. Extended Abstract. Proc. Tenth Annual ACM Symposium on Theory of Computing, San Diego, CA, pp. 30–39, 1978
Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. JACM 29, 928–951 (1982)
Google Scholar
Wagner, R.A.: Common phrases and minimum-space text storage. CACM 16, 148–152 (1973)
Google Scholar
Walker, V.R.: Compaction of names by x-grams. Proc. Amer. Soc. Inform. Sci. 6, 129–135 (1969)
Google Scholar
Yannakoudakis, E.J., Goyal, P., Huggill, J.A.: The generation and use of text fragments for data compression. Inform. Process. & Management 18, 15–21 (1982)
Google Scholar
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Information Theory IT-23, 337–343 (1977)
Google Scholar

Download references

Author information

Y. Perl
Present address: Computer Science Department, Rutgers University, 08903, New Brunswick, NJ

Authors and Affiliations

Department of Applied Mathematics, The Weizmann Institute of Science, 76100, Rehovot, Israel
A. S. Fraenkel & M. Mor
Department of Mathematics and Computer Science, Bar-Ilan University, Ramat Gan, Israel
Y. Perl

Authors

A. S. Fraenkel
View author publications
You can also search for this author in PubMed Google Scholar
M. Mor
View author publications
You can also search for this author in PubMed Google Scholar
Y. Perl
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This work was done within the Responsa Retrieval Project, developed initially at the Weizmann Institute of Science and Bar-Ilan University, now located at the Institute for Information Retrieval and Computational Linguistics (IRCOL), Bar-Ilan University, Ramat Gan, Israel. The work reported herein was done at the Weizmann Institute

Partial affiliation with IRCOL

Supported in part by a grant of Bank Leumi Le'Israel

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fraenkel, A.S., Mor, M. & Perl, Y. Is text compression by prefixes and suffixes practical?. Acta Informatica 20, 371–389 (1983). https://doi.org/10.1007/BF00264280

Download citation

Received: 02 April 1982
Accepted: 29 September 1983
Issue Date: December 1983
DOI: https://doi.org/10.1007/BF00264280

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Is text compression by prefixes and suffixes practical?

Summary

Access this article

Similar content being viewed by others

Text Index Compression

Computing Minimum Length Representations of Sets of Words of Uniform Length

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Is text compression by prefixes and suffixes practical?

Summary

Access this article

Similar content being viewed by others

Text Index Compression

Computing Minimum Length Representations of Sets of Words of Uniform Length

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation