CryptDICE: Distributed data protection system for secure cloud data storage and computation
Introduction
The emergence of Cloud computing has led to a paradigm shift, not only in the technological and business landscape, but also in the database landscape [1], [2], as illustrated with the emergence of delivery models such as Database-as-a-Service (DBaaS) [3]. Cloud storage services enables data owners –individuals and organizations– to store their data remotely in a flexible and on-demand manner, without taking on the responsibility for provisioning, configuring, scaling, and maintaining these storage systems [4].
In the context of cloud storage, one of the biggest challenges is to provide data management support for cloud-based applications in an efficient and scalable manner [5]. The need to support data-intensive cloud applications in an efficient and scalable manner have gained substantial interest, and led to the development of cloud-friendly database technologies, commonly known under the umbrella term of NoSQL. NoSQL databases are built from the ground up to scale horizontally just by simply adding more nodes. As such, they yield numerous benefits in terms of high availability, elastic scalability, and data model flexibility —concerns that are particularly relevant in cloud computing and more specifically in cloud storage [6].
Although cloud data storage provides numerous benefits to organizations, there are also caveats that significantly hindering its rapid and wider adoption. In essence, the DBaaS delivery model requires and assumes a degree of trust in the provider that will not be realistic or desirable in different real-world application contexts. Many applications involve storing sensitive information that when compromised will seriously jeopardize the privacy of individuals and violate data protection laws such as the GDPR. Recent data security breaches and their impact on a large number of individuals and organizations have exacerbated these concerns [7], [8], [9]. In practice, data security and privacy protection are among the most important factors when choosing a database for cloud-based applications [10].
NoSQL databases, which are prominently used in a cloud environment do not provide strong built-in security mechanisms and thus rely on developers to engage with a wide range of data protection measures from within the application code [10], [11], [12], [13], [14], [15]. Although adopting these measures in application will lead to an adequate protection of the data, many of them impose non-trivial trade-offs: for example, the use of data encryption before persisting data has implications on the ability to execute different types of search (e.g., equality search, full-text search, etc.) and aggregate queries. To address this concern, a number of different data encryption schemes (e.g., deterministic, random, homomorphic, etc.) have been proposed [16], [17], [18], [19], [20], [21], which can be used to execute different types of queries and perform complex computation over encrypted data.
However, the approach to adopt specific data encryptionschemes to support different types of queries comes with several non-trivial challenges. Firstly, as these data encryption schemes are commonly integrated in the application layer to support different types of search and aggregate queries over encrypted data, an extra layer of complexity is introduced in the application, and a level of expertise is required from application developers. Secondly, as these different data encryption schemes have specific security strengths and weaknesses (e.g., the random encryption scheme offers greater data security strength than other encryption schemes, but has no built-in support for executing queries), trade-offs need to be made between strong security, increased performance, and rich query capabilities. For example, enforcing strong data security requirements can lead to a system that is less performance-oriented and offers limited query capabilities. Similarly, disregarding privacy towards increased performance and rich query capabilities can lead to pushing off critical security requirements. Therefore, making appropriate trade-offs is a non-trivial task which highly depends on the application requirements and on the limitations imposed to sensitive data. This task becomes more complex when different types of data with varying privacy requirements are considered. Thirdly and finally, to support aggregate queries in the application requires User Defined Functions (UDF) to be supported directly within the database engine, which not only demands expert knowledge and introduces additional management complexity, but also raises additional security concerns.
To address the above-mentioned concerns, we present CryptDICE, a flexible, generic, reusable, and distributed data protection system that facilitates building applications that involve encrypted data storage and search, but does not require an in-depth understanding of different data encryption schemes. To address the problems highlighted above, CryptDICE (i) provides built-in support for several different data encryption schemes (by integrating a number of established libraries), yet hides the complexity from the developer via annotations, which steer the selection of the most appropriate scheme for a given (search) requirement; (ii) supports trade-offs between performance and security and enables executing different types of search and aggregate queries over encrypted data for a variety of different NoSQL databases; (iii) incorporates a lightweight service that reduces the management complexity and also mitigates high-security risks by preventing developers from implementing UDF directly in the database engine. The latter service –which has built-in support for heterogeneous NoSQL databases– rather implements UDF in the application code and provides migration transparency (from on-premise to the cloud) in order to perform complex computations next to the database engine purely for the sake of performance, i.e., to realize low-latency aggregate queries and also to avoid expensive data shuffling (from cloud to an on-premise data center).
There exists several individual implementations [22], [23] and combined libraries [7], [24], [25], [26] that can be used by software developers, but to our knowledge, an integrated and developer-friendly framework such as CryptDICE that reduces the implementation and management complexity from a developer point of view and offers performance optimization, is lacking. We have validated a prototype implementation of CryptDICE in the context of a realistic industrial Software-as-a-Service (SaaS) application, carried out an extensive functional validation, and also conducted a thorough experimental evaluation. The evaluation results confirm that CryptDICE significantly reduces the required development time for enabling data encryption and supporting different types of interactive search queries over encrypted data as well as offers performance optimizations for achieving low-latency aggregate queries. We have also conducted a thorough experimental evaluation to analyze the performance overhead of CryptDICE, which is shown to be negligible.
The remainder of this paper is structured as follows: Section 2 provides relevant background information and derives the problem statement motivated by a realistic industrial SaaS application case. Section 3 presents the design of our proposed CryptDICE system, while Section 4 details the prototype based on CryptDICE. Section 5 presents our extensive evaluation of the CryptDICE system in three different aspects. We contrast our solution with related works in Section 6. Finally, Section 7 concludes this paper and indicates directions for future research.
Section snippets
Motivation
The motivation for this work is based on our experiences with large-scale Software-as-a-Service (SaaS) applications, which stem from several applied research projects. These projects have been carried out in active collaboration with industrial SaaS application providers. For simplicity of illustration, we focus on one such application case from the financial domain, a Billing-as-a-Service document management SaaS application, which is introduced in Section 2.1. More specifically, we highlight
CryptDICE: a distributed data protection system
CryptDICE hides the complexity of different data encryption schemes and performs their adaptive selection in order to provide data protection support, yet enables the execution of (search and aggregate) queries over encrypted data. This section provides an in-depth overview of the design objectives and the architecture of our proposed system. At its core, the system is designed with several objectives in mind:
- •
Support data protection guarantees at different levels of granularity
Prototype implementation
A proof-of-concept implementation of CryptDICE is developed and made available to the community.9 We choose to implement the prototype of CryptDICE on top of Impetus Kundera, an open-source abstraction layer (aka Object-NoSQL datastore mapper (ONDM) framework) so we could avoid dealing with heterogeneity in terms of different APIs to communicate with several NoSQL databases and thus
Evaluation
This section describes the techniques and choices made to evaluate the efficiency and effectiveness of CryptDICE as well as to analyze its impact on the overall performance of the application. Section 5.1 describes the application setups and discusses the different deployment setups in which we tested CryptDICE along with details on software and hardware used for the evaluation. Then, our research focuses on a series of experiments, which are conducted to evaluate CryptDICE in three different
Related work
In the last few years, considerable research has been conducted to mitigate the security challenges in NoSQL databases. This section summarizes related work, which can be broadly classified into two categories: (i) advanced data encryption techniques, and (ii) systems and middleware for protecting sensitive data. In Section 6.1, we give a brief overview of related work on advanced data encryption techniques. Section 6.2 then describes recent research on systems and middleware for protecting
Conclusion
In this paper, we have proposed CryptDICE, a flexible and generic data access system that runs in a distributed fashion and ensures fine-grained protection on application data. The system enables the execution of different types of search and aggregate queries over encrypted data for a wide range of different NoSQL databases with absolutely no modification in the underlying database engine and minimum changes by using the built-in annotations to the client-side applications. The lightweight
CRediT authorship contribution statement
Ansar Rafique: Conceptualization, Data Curation, Investigation, Methodology, Software, Validation, Writing - Original Draft Preparation, Writing - Review & Editing. Dimitri Van Landuyt: Project Administration, Resources, Supervision, Validation, Visualization, Writing - Review & Editing. Emad Heydari Beni: Validation, Writing - Review & Editing. Bert Lagaisse: Supervision, Validation, Writing - Review & Editing. Wouter Joosen: Funding Acquisition, Supervision, Visualization.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research is partially funded by the Research Fund KU Leuven and the Cybersecurity Initiative Flanders (CIF) project .
References (89)
- et al.
SecureNoSQL: An approach for secure search of encrypted NoSQL databases in the public cloud
Int. J. Inf. Manage.
(2017) - et al.
NoSQL security
- et al.
Performance evaluation of NoSQL big-data applications using multi-formalism models
Future Gener. Comput. Syst.
(2014) - et al.
New order preserving encryption model for outsourced databases in cloud environments
J. Netw. Comput. Appl.
(2016) - et al.
Cloud databases: a paradigm shift in databases
Int. J. Comput. Sci. Issues (IJCSI)
(2012) - et al.
Relational cloud: A database-as-a-service for the cloud
(2011) - et al.
A survey on querying encrypted XML documents for databases as a service
ACM SIGMOD Rec.
(2008) - et al.
Providing database as a service
- et al.
Can relational DBMS scale up to the cloud?
- et al.
Benchmarking cloud serving systems with YCSB