ABSTRACT
Probabilistic data structures use space-efficient representations of data in order to (approximately) respond to queries about the data. Traditionally, these structures are accompanied by probabilistic bounds on query-response errors. These bounds implicitly assume benign attack models, in which the data and the queries are inputs are chosen non-adaptively, and independent of the randomness used to construct the representation. Yet probabilistic data structures are increasingly used in settings where these assumptions may be violated. This work provides a provable security treatment of probabilistic data structures in adversarial environments. We give a syntax that captures a wide variety of in-use structures, and our security notions support development of error bounds in the presence of powerful attacks. Concretely, we primarily focus on examining the widely used Bloom filter, but also consider counting (Bloom) filters and count-min sketch data structures. For the traditional version of these, our security findings are largely negative; however, we show that simple embellishments (e.g., using salts, or secret keys) yields structures that provide provable security, and with little overhead.
Supplemental Material
- Mihir Bellare and Phillip Rogaway. 1993. Random Oracles Are Practical: A Paradigm for Designing Efficient Protocols. In Proceedings of the 1st ACM Conference on Computer and Communications Security (CCS '93). ACM, New York, NY, USA, 62--73. https://doi.org/10.1145/168588.168596Google ScholarDigital Library
- Mihir Bellare and Phillip Rogaway. 2006. The Security of Triple Encryption and a Framework for Code-based Game-playing Proofs. In EUROCRYPT 2006: Proceedings of the 24th Annual International Conference on The Theory and Applications of Cryptographic Techniques.Google ScholarDigital Library
- Steven M. Bellovin and William R. Cheswick. 2004. Privacy-Enhanced Searches Using Encrypted Bloom Filters. Cryptology ePrint Archive, Report 2004/022. http://eprint.iacr.org/2004/022.Google Scholar
- Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, Vol. 13, 7 (1970).Google ScholarDigital Library
- Andrei Broder and Michael Mitzenmacher. 2004. Network Applications of Bloom Filters: A Survey. Internet Mathematics, Vol. 1, 4 (2004).Google ScholarCross Ref
- John W. Byers, Jeffrey Considine, Michael Mitzenmacher, and Stanislav Rost. 2004. Informed Content Delivery Across Adaptive Overlay Networks. IEEE/ACM Trans. Netw., Vol. 12, 5 (2004).Google ScholarDigital Library
- Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. 2004. The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables. In SODA 2004: Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms.Google Scholar
- Graham Cormode and S Muthukrishnan. 2005. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, Vol. 55, 1 (2005).Google ScholarDigital Library
- Scott A. Crosby and Dan S. Wallach. 2003. Denial of Service via Algorithmic Complexity Attacks. In Proceedings of the 12th USENIX Security Symposium.Google Scholar
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM, Vol. 51, 1 (2008).Google ScholarDigital Library
- Fan Deng and Davood Rafiei. 2006. Approximately detecting duplicates for streaming data using stable bloom filters. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 25--36.Google ScholarDigital Library
- Martin Dietzfelbinger and Rasmus Pagh. 2008. Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract). In ICALP 2008: Proceedings of the 35th International Colloquium on Automata, Languages and Programming.Google ScholarDigital Library
- Marianne Durand and Philippe Flajolet. 2003. Loglog Counting of Large Cardinalities. In ESA 2003: Proceedings of the 11th Annual European Symposium on Algorithms.Google Scholar
- Bin Fan, David G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. 2014. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International Conference on Emerging Networking Experiments and Technologies.Google Scholar
- Li Fan, Pei Cao, Jussara Almeida, and Andrei Z Broder. 2000. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, Vol. 8, 3 (2000).Google Scholar
- Wu-chang Feng, Dilip D. Kandlur, Debanjan Saha, and Kang G. Shin. 2001. Stochastic Fair Blue: A Queue Management Algorithm for Enforcing Fairness. In INFOCOM 2001: Proceedings of the 20th Annual Joint Conference of the IEEE Computer and Communications Society.Google Scholar
- Michael L. Fredman, János Komlós, and Endre Szemerédi. 1984. Storing a Sparse Table with $0(1)$ Worst Case Access Time. J. ACM, Vol. 31, 3 (1984).Google ScholarDigital Library
- Thomas Gerbet, Amrit Kumar, and Cédric Lauradoux. 2015. The Power of Evil Choices in Bloom Filters. In Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.Google ScholarDigital Library
- Arthur Gervais, Srdjan Capkun, Ghassan O Karame, and Damian Gruber. 2014. On the privacy provisions of Bloom filters in lightweight Bitcoin clients. In ACSAC 2014: Proceedings of the 30th Annual Computer Security Applications Conference.Google ScholarDigital Library
- Adam Kirsch and Michael Mitzenmacher. 2008. Less Hashing, Same Performance: Building a Better Bloom Filter. Random Structures and Algorithms, Vol. 33, 2 (2008).Google Scholar
- James Larisch, David Choffnes, Dave Levin, Bruce M. Maggs, Alan Mislove, and Christo Wilson. 2017. CRLite: A Scalable System for Pushing All TLS Revocations to All Browsers. In The Proceedings of the 38th IEEE Symposium on Security and Privacy.Google ScholarCross Ref
- Richard J. Lipton and Jeffrey F. Naughton. 1993. Clocked Adversaries for Hashing. Algorithmica, Vol. 9, 3 (1993).Google Scholar
- Ilya Mironov, Moni Naor, and Gil Segev. 2011. Sketching in Adversarial Environments. SIAM J. Comput., Vol. 40, 6 (2011).Google ScholarDigital Library
- Moni Naor and Eylon Yogev. 2015. Bloom Filters in Adversarial Environments. In CRYPTO 2015: Proceedings of the 35th Annual Cryptology Conference.Google Scholar
- Ryo Nojima and Youki Kadobayashi. 2009. Cryptographically Secure Bloom-Filters. Transactions on Data Privacy, Vol. 2, 2 (2009).Google Scholar
- Patrick Reynolds and Amin Vahdat. 2003. Efficient peer-to-peer keyword searching. In Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware.Google ScholarDigital Library
- Rainer Schnell, Tobias Bachteler, and Jörg Reiher. 2011. A novel error-tolerant anonymous linking code. Working paper series no. WP-GRLC-2011-02, German Record Linkage Center.Google Scholar
- Sasu Tarkoma, Christian Rothenberg, and Eemil Lagerspetz. 2012. Theory and Practice of Bloom Filters for Distributed Systems. IEEE Communications Surveys and Tutorials, Vol. 14, 1 (2012).Google ScholarCross Ref
Index Terms
- Probabilistic Data Structures in Adversarial Environments
Recommendations
Adversarial Correctness and Privacy for Probabilistic Data Structures
CCS '22: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications SecurityWe study the security of Probabilistic Data Structures (PDS) for handling Approximate Membership Queries (AMQ); prominent examples of AMQ-PDS are Bloom and Cuckoo filters. AMQ-PDS are increasingly being deployed in environments where adversaries can ...
A Generalized Bloom Filter to Secure Distributed Network Applications
Distributed applications use Bloom filters to transmit large sets in a compact form. However, attackers can easily disrupt these applications by using or advertising saturated filters. In this paper we introduce the Generalized Bloom Filter (GBF), a ...
A Space Lower Bound for Dynamic Approximate Membership Data Structures
† Special Section on the Fiftieth Annual IEEE Symposium on Foundations of Computer Science (FOCS 2009)An approximate membership data structure is a randomized data structure representing a set which supports membership queries. It allows for a small false positive error rate but has no false negative errors. Such data structures were first introduced by Bloom ...
Comments