Elsevier

Information Systems

Volume 96, February 2021, 101671
Information Systems

CryptDICE: Distributed data protection system for secure cloud data storage and computation

https://doi.org/10.1016/j.is.2020.101671Get rights and content

Highlights

  • Outsourcing data to third-party cloud providers offers numerous benefits.

  • Data protection support in NoSQL databases is lacking.

  • Executing different types of queries and performing complex computation over encrypted data in NoSQL databases introduces complexity.

  • Performing complex computation next to the database engine to realize low-latency aggregate queries requires large development efforts.

  • A flexible and distributed data protection system, named CryptDICE, is designed.

Abstract

Cloud storage allows organizations to store data at remote sites of service providers. Although cloud storage services offer numerous benefits, they also involve new risks and challenges with respect to data security and privacy aspects. To preserve confidentiality, data must be encrypted before outsourcing to the cloud. Although this approach protects the security and privacy aspects of data, it also impedes regular functionality such as executing queries and performing analytical computations. To address this concern, specific data encryption schemes (e.g., deterministic, random, homomorphic, order-preserving, etc.) can be adopted that still support the execution of different types of queries (e.g., equality search, full-text search, etc.) over encrypted data.

However, these specialized data encryption schemes have to be implemented and integrated in the application and their adoption introduces an extra layer of complexity in the application code. Moreover, as these schemes imply trade-offs between performance and security, storage efficiency, etc, making the appropriate trade-off is a challenging and non-trivial task. In addition, to support aggregate queries, User Defined Functions (UDF) have to be implemented directly in the database engine and these implementations are specific to each underlying data storage technology, which demands expert knowledge and in turn increases management complexity.

In this paper, we introduce CryptDICE, a distributed data protection system that (i) provides built-in support for a number of different data encryption schemes, made accessible via annotations that represent application-specific (search) requirements; (ii) supports making appropriate trade-offs and execution of these encryption decisions at diverse levels of data granularity; and (iii) integrates a lightweight service that performs dynamic deployment of User Defined Functions (UDF) –without performing any alteration directly in the database engine– for heterogeneous NoSQL databases in order to realize low-latency aggregate queries and also to avoid expensive data shuffling (from the cloud to an on-premise data center). We have validated CryptDICE in the context of a realistic industrial SaaS application and carried out an extensive functional validation, which shows the applicability of the middleware platform. In addition, our experimental evaluation efforts confirm that the performance overhead of CryptDICE is acceptable and validates the performance optimizations for achieving low-latency aggregate queries.

Introduction

The emergence of Cloud computing has led to a paradigm shift, not only in the technological and business landscape, but also in the database landscape [1], [2], as illustrated with the emergence of delivery models such as Database-as-a-Service (DBaaS) [3]. Cloud storage services enables data owners –individuals and organizations– to store their data remotely in a flexible and on-demand manner, without taking on the responsibility for provisioning, configuring, scaling, and maintaining these storage systems [4].

In the context of cloud storage, one of the biggest challenges is to provide data management support for cloud-based applications in an efficient and scalable manner [5]. The need to support data-intensive cloud applications in an efficient and scalable manner have gained substantial interest, and led to the development of cloud-friendly database technologies, commonly known under the umbrella term of NoSQL. NoSQL databases are built from the ground up to scale horizontally just by simply adding more nodes. As such, they yield numerous benefits in terms of high availability, elastic scalability, and data model flexibility —concerns that are particularly relevant in cloud computing and more specifically in cloud storage [6].

Although cloud data storage provides numerous benefits to organizations, there are also caveats that significantly hindering its rapid and wider adoption. In essence, the DBaaS delivery model requires and assumes a degree of trust in the provider that will not be realistic or desirable in different real-world application contexts. Many applications involve storing sensitive information that when compromised will seriously jeopardize the privacy of individuals and violate data protection laws such as the GDPR. Recent data security breaches and their impact on a large number of individuals and organizations have exacerbated these concerns [7], [8], [9]. In practice, data security and privacy protection are among the most important factors when choosing a database for cloud-based applications [10].

NoSQL databases, which are prominently used in a cloud environment do not provide strong built-in security mechanisms and thus rely on developers to engage with a wide range of data protection measures from within the application code [10], [11], [12], [13], [14], [15]. Although adopting these measures in application will lead to an adequate protection of the data, many of them impose non-trivial trade-offs: for example, the use of data encryption before persisting data has implications on the ability to execute different types of search (e.g., equality search, full-text search, etc.) and aggregate queries. To address this concern, a number of different data encryption schemes (e.g., deterministic, random, homomorphic, etc.) have been proposed [16], [17], [18], [19], [20], [21], which can be used to execute different types of queries and perform complex computation over encrypted data.

However, the approach to adopt specific data encryptionschemes to support different types of queries comes with several non-trivial challenges. Firstly, as these data encryption schemes are commonly integrated in the application layer to support different types of search and aggregate queries over encrypted data, an extra layer of complexity is introduced in the application, and a level of expertise is required from application developers. Secondly, as these different data encryption schemes have specific security strengths and weaknesses (e.g., the random encryption scheme offers greater data security strength than other encryption schemes, but has no built-in support for executing queries), trade-offs need to be made between strong security, increased performance, and rich query capabilities. For example, enforcing strong data security requirements can lead to a system that is less performance-oriented and offers limited query capabilities. Similarly, disregarding privacy towards increased performance and rich query capabilities can lead to pushing off critical security requirements. Therefore, making appropriate trade-offs is a non-trivial task which highly depends on the application requirements and on the limitations imposed to sensitive data. This task becomes more complex when different types of data with varying privacy requirements are considered. Thirdly and finally, to support aggregate queries in the application requires User Defined Functions (UDF) to be supported directly within the database engine, which not only demands expert knowledge and introduces additional management complexity, but also raises additional security concerns.

To address the above-mentioned concerns, we present CryptDICE, a flexible, generic, reusable, and distributed data protection system that facilitates building applications that involve encrypted data storage and search, but does not require an in-depth understanding of different data encryption schemes. To address the problems highlighted above, CryptDICE (i) provides built-in support for several different data encryption schemes (by integrating a number of established libraries), yet hides the complexity from the developer via annotations, which steer the selection of the most appropriate scheme for a given (search) requirement; (ii) supports trade-offs between performance and security and enables executing different types of search and aggregate queries over encrypted data for a variety of different NoSQL databases; (iii) incorporates a lightweight service that reduces the management complexity and also mitigates high-security risks by preventing developers from implementing UDF directly in the database engine. The latter service –which has built-in support for heterogeneous NoSQL databases– rather implements UDF in the application code and provides migration transparency (from on-premise to the cloud) in order to perform complex computations next to the database engine purely for the sake of performance, i.e., to realize low-latency aggregate queries and also to avoid expensive data shuffling (from cloud to an on-premise data center).

There exists several individual implementations [22], [23] and combined libraries [7], [24], [25], [26] that can be used by software developers, but to our knowledge, an integrated and developer-friendly framework such as CryptDICE that reduces the implementation and management complexity from a developer point of view and offers performance optimization, is lacking. We have validated a prototype implementation of CryptDICE in the context of a realistic industrial Software-as-a-Service (SaaS) application, carried out an extensive functional validation, and also conducted a thorough experimental evaluation. The evaluation results confirm that CryptDICE significantly reduces the required development time for enabling data encryption and supporting different types of interactive search queries over encrypted data as well as offers performance optimizations for achieving low-latency aggregate queries. We have also conducted a thorough experimental evaluation to analyze the performance overhead of CryptDICE, which is shown to be negligible.

The remainder of this paper is structured as follows: Section 2 provides relevant background information and derives the problem statement motivated by a realistic industrial SaaS application case. Section 3 presents the design of our proposed CryptDICE system, while Section 4 details the prototype based on CryptDICE. Section 5 presents our extensive evaluation of the CryptDICE system in three different aspects. We contrast our solution with related works in Section 6. Finally, Section 7 concludes this paper and indicates directions for future research.

Section snippets

Motivation

The motivation for this work is based on our experiences with large-scale Software-as-a-Service (SaaS) applications, which stem from several applied research projects. These projects have been carried out in active collaboration with industrial SaaS application providers. For simplicity of illustration, we focus on one such application case from the financial domain, a Billing-as-a-Service document management SaaS application, which is introduced in Section 2.1. More specifically, we highlight

CryptDICE: a distributed data protection system

CryptDICE hides the complexity of different data encryption schemes and performs their adaptive selection in order to provide data protection support, yet enables the execution of (search and aggregate) queries over encrypted data. This section provides an in-depth overview of the design objectives and the architecture of our proposed system. At its core, the system is designed with several objectives in mind:

  • Support data protection guarantees at different levels of granularity

Prototype implementation

A proof-of-concept implementation of CryptDICE is developed and made available to the community.9 We choose to implement the prototype of CryptDICE on top of Impetus Kundera, an open-source abstraction layer (aka Object-NoSQL datastore mapper (ONDM) framework) so we could avoid dealing with heterogeneity in terms of different APIs to communicate with several NoSQL databases and thus

Evaluation

This section describes the techniques and choices made to evaluate the efficiency and effectiveness of CryptDICE as well as to analyze its impact on the overall performance of the application. Section 5.1 describes the application setups and discusses the different deployment setups in which we tested CryptDICE along with details on software and hardware used for the evaluation. Then, our research focuses on a series of experiments, which are conducted to evaluate CryptDICE in three different

Related work

In the last few years, considerable research has been conducted to mitigate the security challenges in NoSQL databases. This section summarizes related work, which can be broadly classified into two categories: (i) advanced data encryption techniques, and (ii) systems and middleware for protecting sensitive data. In Section 6.1, we give a brief overview of related work on advanced data encryption techniques. Section 6.2 then describes recent research on systems and middleware for protecting

Conclusion

In this paper, we have proposed CryptDICE, a flexible and generic data access system that runs in a distributed fashion and ensures fine-grained protection on application data. The system enables the execution of different types of search and aggregate queries over encrypted data for a wide range of different NoSQL databases with absolutely no modification in the underlying database engine and minimum changes by using the built-in annotations to the client-side applications. The lightweight

CRediT authorship contribution statement

Ansar Rafique: Conceptualization, Data Curation, Investigation, Methodology, Software, Validation, Writing - Original Draft Preparation, Writing - Review & Editing. Dimitri Van Landuyt: Project Administration, Resources, Supervision, Validation, Visualization, Writing - Review & Editing. Emad Heydari Beni: Validation, Writing - Review & Editing. Bert Lagaisse: Supervision, Validation, Writing - Review & Editing. Wouter Joosen: Funding Acquisition, Supervision, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is partially funded by the Research Fund KU Leuven and the Cybersecurity Initiative Flanders (CIF) project .

References (89)

  • Silver-GreenbergJ. et al.

    Jpmorgan chase hack affects 76 million households

    New York Times

    (2014)
  • WeissN.E. et al.

    The target and other financial data breaches: Frequently asked questions

  • HuangB. et al.

    A homomorphic searching scheme for sensitive data In NoSQL Database

  • OkmanL. et al.

    Security issues in NoSQL databases

  • SellamiR. et al.

    Using multiple data stores in the cloud: Challenges and solutions

  • ShihM.-H. et al.

    Design and analysis of high performance crypt-NoSQL

  • TianX. et al.

    A transparent middleware for encrypting data in MongoDB

  • R. Agrawal, J. Kiernan, R. Srikant, Y. Xu, Order preserving encryption for numeric data, in: Proceedings of the 2004...
  • BonehD. et al.

    Semantically secure order-revealing encryption: Multi-input functional encryption without obfuscation

  • CashD. et al.

    Parameter-hiding order revealing encryption

  • ChenetteN. et al.

    Practical order-revealing encryption with limited leakage

  • ElGamalT.

    A public key cryptosystem and a signature scheme based on discrete logarithms

    IEEE Trans. Inf. Theory

    (1985)
  • C. Gentry, Fully homomorphic encryption using ideal lattices, in: Proceedings of the Forty-First Annual ACM Symposium...
  • MacedoR. et al.

    A practical framework for privacy-preserving NoSQL databases

  • PopaR.A. et al.

    CryptDB: Protecting confidentiality with encrypted query processing

  • PapadimitriouA. et al.

    Big data analytics over encrypted datasets with seabed

  • S.L. Tu, M.F. Kaashoek, S.R. Madden, N. Zeldovich, Processing analytical queries over encrypted data, URL:...
  • X. Yuan, Y. Guo, X. Wang, C. Wang, B. Li, X. Jia, Enckv: An encrypted key-value store with rich queries, in:...
  • TurnerA. et al.

    C-mart: Benchmarking the cloud

    IEEE Trans. Parallel Distrib. Syst.

    (2012)
  • CattellR.

    Scalable SQL and NoSQL data stores

    ACM SIGMOD Rec.

    (2011)
  • LourençoJ.R. et al.

    Choosing the right NoSQL database for the job: a quality attribute evaluation

    J. Big Data

    (2015)
  • WieseL. et al.

    CloudDBGuard: A framework for encrypted data storage in NoSQL wide column stores

    Data Knowl. Eng.

    (2019)
  • EassaA.M. et al.

    NoSQL Injection Attack Detection in Web Applications Using RESTful Service

    Program. Comput. Softw.

    (2018)
  • SahatqijaK. et al.

    Comparison between relational and NOSQL databases

  • MüllerT. et al.

    TRESOR Runs Encryption Securely Outside RAM

  • ShafaghH. et al.

    Talos: Encrypted query processing for the Internet of Things

  • NaveedM. et al.

    Inference attacks on property-preserving encrypted databases

  • AlvesP.G. et al.

    A framework for searching encrypted databases

    J. Internet Serv. Appl.

    (2018)
  • YubinG. et al.

    A solution for privacy-preserving data manipulation and query on NoSQL database

    J. Comput.

    (2013)
  • SathyaS.S. et al.

    A review of homomorphic encryption libraries for secure computation

    (2018)
  • PhilippsJ. et al.

    Refinement of pipe-and-filter architectures

  • RafiqueA. et al.

    On the performance impact of data access middleware for NoSQL data stores: a study of the trade-off between performance and migration cost

    IEEE Trans. Cloud Comput.

    (2015)
  • JPaillier

    (2020)
  • PaillierP.

    Public-key cryptosystems based on composite degree residuosity classes

  • Cited by (0)

    View full text