Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Oblivious RAM (ORAM) is a cryptographic primitive for accessing a remote memory \(\mathsf {M}\) of n entries in a way that memory accesses do not reveal anything about the accessed index \(y\in \{1,\ldots ,n\}\). Goldreich and Ostrovsky [16] were the first to show that ORAM can be built with \(\mathsf {poly}(\log n)\) bandwidth overheadFootnote 1, and since then, there has been a fruitful line of research on substantially reducing this overhead [9, 29, 34, 36], in part motivated by the tree ORAM framework proposed by Shi et al. [31]. However, most existing practical ORAM protocols are interactive, requiring the client to perform a “download-decrypt-compute-encrypt-upload” operation several times (typically \(O(\log n)\) rounds are involved). This can be a bottleneck for applications where low latency is important.

In this paper, we consider the problem of building an efficient round-optimal ORAM scheme. In particular, we propose \(\mathsf {TWORAM}\), an ORAM scheme enabling a client to obliviously access a memory location \(\mathsf {M}[y]\) in two rounds, where the client sends an encrypted message to the server that encapsulates y, the server performs the oblivious computation, and sends a message back to the client, from which the client can retrieve the desired value \(\mathsf {M}[y]\).

\(\mathsf {TWORAM}\)’s worst-case bandwidth overhead is \(O(\kappa \cdot p)\) where p is the bandwidth overhead of a tree-based ORAM scheme and \(\kappa \) is the security parameter. For instance, in Path-ORAM [34], it is \(p=\log ^3n\) for a block of size \(O(\log n)\) bits. In other words, in order to obliviously read a data block of \(O(\log n)\) bits using \(\mathsf {TWORAM}\), one needs to exchange, in the worst case, a \(O(\kappa \cdot \log ^3n)\) bits with the server, just in two rounds.

1.1 Existing Round-Optimal ORAM Protocols

Williams and Sion [37] devised a round-optimal ORAM scheme based on a customized garbling scheme and Bloom filters. Lu and Ostrovsky also include an optimized construction for single-round oblivious RAM in their seminal garbled RAM paper [28]. Subsequent to our work, Fletcher et al. [10] also provide single-round ORAM by generalizing the approach of [37] to use a garbling scheme for branching programs. All aforementioned approaches are symmetric-key and are built on top of the hierarchical ORAM framework as introduced by Goldreich and Ostrovsky [16]. Our approach however is based on the tree-based ORAM framework as introduced by Shi et al. [31], yielding worst-case logarithmic costs by construction, thus avoiding involved deamortization procedures. Burst ORAM [21] is also round-optimal, yet it requires linear storage at the client side.

Other less efficient approaches to construct round-optimal ORAM schemes are generic constructions based on garbled RAM [11, 12, 14]. However, such generic approaches are prohibitively inefficient. For instance, for the non-black-box Garbled RAM approaches [12, 14], the bandwidth overhead grows with \(\mathsf {poly}(\log n,\kappa , |f|)\), where |f| is the size of the circuit for computing the one-way function f and \(\kappa \) is the security parameter. This leads to inefficient constructions, that are only of theoretical interest. Also, for the black-box Garbled RAM approach [11] the bandwidth overhead grows with \(\mathsf {poly}(\log n,\kappa )\), and is independent of |f|. However, the construction itself is asymptotically very inefficient. Specifically in [11] the authors do not provide details on how large the involved polynomials are, which will depend on the choice of various parameters. According to our back-of-the-envelope calculation, however, the polynomial is at least \(\kappa ^5\cdot \log ^7 n\). A key reason for this inefficiency is that they require certain expensive ORAM operations, specifically “eviction,” to be performed inside a garbled circuit. We eliminate this source of inefficiency by moving these expensive ORAM operations outside of the garbled circuits.

1.2 \(\mathsf {TWORAM}\)’s Technical Highlights

Our construction is inspired by the ideas from the recent, black-box garbled RAM work by Garg et al. [11]. We specifically use those ideas on top of the tree ORAM algorithms [31]. Our new ideas help avoid certain inefficiencies involved in the original construction of [11], yielding an asymptotically better protocol.

Our first step is to abstract away certain details of eviction-based tree ORAM algorithms, such as Path-ORAM [34], circuit ORAM [36] and Onion ORAM [9]. These algorithms work as follows: The memory \(\mathsf {M}\) that must be accessed obliviously is stored as a sequence of L trees \(T_1,T_2,\ldots ,T_L\). The actual data of \(\mathsf {M}\) is stored encrypted in the tree \(T_L\), while the other trees store position map information (also encrypted). Only \(T_1\) is stored on the client side. Roughly speaking, to access an index y in \(\mathsf {M}\), the client accesses \(T_1\) and sends a path index \(p_2\) to the server. The server then, successively accesses paths \(p_2,p_3,\ldots ,p_L\) in \(T_2,T_3,\ldots ,T_L\). However the paths are accessed adaptively: in order to learn \(p_i\), one needs to first access \(p_{i-1}\) in \(T_{i-1}\), and have all the information (also known as buckets) stored in its nodes decrypted. This is where existing approaches require O(L) rounds of interaction: decryption can only take place at the client side, which means all the information on the paths must be communicated back to the client.

\(\mathsf {TWORAM}\) ’s Core Idea. In order to avoid the roundtrips described above, we do not use standard encryption. Instead, we hardcode the content of each bucket inside a garbled circuit [38]. In other words, after the trees \(T_2,T_3,\ldots ,T_L\) are produced, the client generates one garbled circuit per each internal node in each tree. The function of this garbled circuit is very simple: Informally, it takes as input an index x; loops through the blocks \(\mathsf {bucket}[i]\) contained in the current bucket until it finds \(\mathsf {bucket}[x]\), and returns the index \(\pi = \mathsf {bucket}[x]\) of the next path to be followed. Note that the index \(\pi \) is returned in form of a garbled input for the next garbled circuit, so that the execution can proceed by the server until \(T_L\) is reached, and the final desired value can be returned to the client (see Fig. 3 for a more formal description).

This simplified description ignores some technical hurdles. Firstly, security of the underlying ORAM scheme requires that the location where \(\mathsf {bucket}[x]\) is found remains hidden. In particular, the garbled circuit which has the value \(\mathsf {bucket}[x]\) inside should not be identifiable by the server. We resolve this issue as follows. For every bucket that the underlying ORAM needs to touch, all the corresponding garbled circuits are executed in a specific order and the value of interest is carried along the way and output only by the final evaluated circuit in that tree.

Secondly, the above approach only works well for a single memory access, since the garbled circuits can only be used once. Fortunately, as we show in the paper, only a logarithmic number of garbled circuits are touched for each access. These circuits can be downloaded by the client who decodes the hardcoded values, performs the eviction strategy locally (on plaintext data), and sends fresh garbled circuits back to the server. This step does not increase the number of rounds (from two to three), since sending the fresh garbled circuits to the server can be “piggybacked” onto the message the client prepares for the next memory access.

Finally, in order to ensure the desired efficiency, and to avoid a blowup of polynomial multiplicative factor in security parameter, we develop optimizations that help ensure that the sizes of the circuits garbled in our construction remain small and proportional to the underlying ORAM.

1.3 Application: 4-Round Searchable Encryption with No Search Pattern Leakage

An SSE scheme allows a client to outsource a database (defined as a set of document/keyword set pairs \(\mathsf {DB}= (d_i, W_i)_{i=1}^N\)) to a server in an encrypted format, where a search query for w returns \(d_i\) where \(w \in W_i\).

Several recent works [3, 20, 26, 39] demonstrate attacks against property-preserving encryption schemes (which also enable search on encrypted data), by taking advantage of the leakage associated with these schemes. Thought these attacks do not lead to concrete attacks against existing SSE schemes, they underline the importance of examining the feasibility of solutions that avoid leakage. A natural building block for doing so is ORAM. We use \(\mathsf {TWORAM}\) to obtain the first constant-round, and asymptotically efficient SSE that can hide search/access patterns.

Our construction combines \(\mathsf {TWORAM}\) and a non-recursive Path-ORAM (whose position map of the first level is not outsourced) such that searching for w requires (i) a single access on \(\mathsf {TWORAM}\); (ii) \(|\mathsf {DB}(w)|\) parallel accesses to the non-recursive Path-ORAM (note that an access to a non-recursive ORAM requires only two rounds).

In particular, we use \(\mathsf {TWORAM}\) to store pairs of the form \((w,(count_w,access_w))\), where w is a keyword, \(count_w\) is the number of documents containing w and \(access_w\) is the number of times w has been accessed so far. The keyword/document pairs \((w||i,d_i)\) (where \(d_i\) is the i-th document containing w) are then stored in the non-recursive Path-ORAM where their position in the Path-ORAM tree (namely the random path they are mapped to) is determined on the fly by using a PRF F as \(F_k(w||i,access_w)\) (therefore there is no need to store the position map locally). To search for keyword w, we first access \(\mathsf {TWORAM}\) to obtain \((count_w, access_w)\) (and increment \(access_w\)), and then generate all positions to look up in the Path-ORAM using the PRF F. These lookups can be parallelized and updating the paths can be piggybacked to the next search.

The above yields a construction with 4 rounds of interaction. Note that naively using ORAM for SSE would incur \(|\mathsf {DB}(w)|\) ORAM accesses which imply at least \(|\mathsf {DB}(w)|\) roundtrips (depending on the number of rounds of the underlying ORAM). As we said before, our construction does not leak the search pattern, by providing randomly generated tokens every time a search is performed. If we choose to store all documents in the obliviously-accessed memory, the access pattern can also be concealed.

1.4 Other Related Work

Oblivious RAM. ORAM protocols with a non-constant number of roundtrips can be categorized into hierarchical [17, 18, 24, 27], motivated by the seminal work of Goldreich and Ostrovsky [16], and tree-based [9, 29, 34, 36], motivated by the seminal work of Shi et al. [31]. We note however, that, by picking the data block size to be big (e.g., \(\sqrt{n}\) bits), the number of rounds in tree-based ORAMs can be made constant, yet bandwidth increases beyond polylogarithmic, so such a parameter selection is not interesting.

Searchable Encryption. Song et al. [32] were the first to explore feasibility of searchable encryption. Since then, many follow-up works have designed new schemes for both static data [4, 6, 8] and dynamic data [5, 15, 22, 23, 33, 35]. The security definitions also evolved over time and were eventually established in the work of [6, 8]. Unlike our construction, all these approaches use deterministic tokens, an therefore leak the search patterns. The only proposed approaches that are constant-round and have randomized tokens (apart from constructing SSE through Garbled RAM) are the ones based on functional encryption [30]. However, such approaches incur a linear search overhead. We also note that one can obtain SSE with no search pattern leakage by using interactive ORAMs such as Path-ORAM [34], or other variants optimized for binary search [13].

Secure Computation for RAM Programs. A recent line of work studies efficient secure two-party computation of RAM programs based on garbled circuits [1, 19]. These constructions can also be used to design SSE that hide the search pattern—yet these approaches do not lead to constant-round SSE schemes, requiring the client to perform computation proportional to the size of the search result.

2 Definitions for Garbled Circuits and Oblivious RAM

In this section, we recall definitions and describe building blocks we use in this paper. We use the notation \(\langle C',S'\rangle \leftrightarrow \varPi \langle C,S\rangle \) to indicate that a protocol \(\varPi \) is executed between a client with input C and a server with input S. After the execution of the protocol the client receives \(C'\) and the server receives \(S'\). For non-interactive protocols, we just use the left arrow notation (\(\leftarrow \)) instead.

2.1 Garbled Circuits

Garbled circuits were first constructed by Yao [38] (see Lindell and Pinkas [25] and Bellare et al. [2] for a detailed proof and further discussion). A circuit garbling scheme is a tuple of PPT algorithms \((\mathsf {GCircuit},\mathsf {Eval})\), where \(\mathsf {GCircuit}\) is the circuit garbling procedure and \(\mathsf {Eval}\) the corresponding evaluation procedure. More formally:

  • \((\tilde{C}, \mathsf {lab}) \leftarrow \mathsf {GCircuit}\left( 1^\kappa , C\right) \): \(\mathsf {GCircuit}\) takes as input a security parameter \(\kappa \), and a Boolean circuit C. This procedure outputs a garbled circuit \(\tilde{C}\) and input labels \(\mathsf {lab}\), which is a set of pairs of random strings. Each pair in \(\mathsf {lab}\) corresponds to every input wire of C (and in particular each element in the pair represents either 0 or 1).

  • \(y \leftarrow \mathsf {Eval}(\tilde{C}, \mathsf {lab}_{x})\): Given a garbled circuit \(\tilde{C}\) and garbled input \(\mathsf {lab}_{x}\), \(\mathsf {Eval}\) outputs \(y=C(x)\).

Input Labels and Garbled Inputs. For a specific input x, we denote with \(\mathsf {lab}_x\) the garbled inputs, a “projection” of x on the input labels. E.g., for a Boolean circuit of two input bits z and w, it is \(\mathsf {lab}=\{(z_0,z_1),(w_0,w_1)\}\), \(\mathsf {lab}_{00}=\{z_0,w_0\}\), \(\mathsf {lab}_{01}=\{z_0,w_1\}\), etc.

Correctness. Let \((\mathsf {GCircuit},\mathsf {Eval})\) be a circuit garbling scheme. For correctness, we require that for any circuit C and an input x for C, we have that that \(C(x) = \mathsf {Eval}(\tilde{C}, \mathsf {lab}_{x})\), where \((\tilde{C}, \mathsf {lab}) \leftarrow \mathsf {GCircuit}\left( 1^\kappa , C\right) \).

Security. Let \((\mathsf {GCircuit},\mathsf {Eval})\) be a circuit garbling scheme. For security, we require that for any PPT adversary \(\mathsf {A}\), there is a PPT simulator \(\mathsf {Sim}\) such that the following distributions are computationally indistinguishable:

  • \(\mathrm {Real}_\mathsf {A}(\kappa )\): \(\mathsf {A}\) chooses a circuit C. Experiment runs \((\tilde{C}, \mathsf {lab}) \leftarrow \mathsf {GCircuit}\left( 1^\kappa , C\right) \) and sends \(\tilde{C}\) to \(\mathsf {A}\). \(\mathsf {A}\) then chooses an input x. The experiment uses \(\mathsf {lab}\) and x to derive \(\mathsf {lab}_x\) and sends \(\mathsf {lab}_x\) to \(\mathsf {A}\). Then it outputs the output of the adversary.

  • \(\mathrm {Ideal}_{\mathsf {A},\mathsf {Sim}}(\kappa )\): \(\mathsf {A}\) chooses a circuit C. Experiment runs \((\tilde{C},\sigma ) \leftarrow \mathsf {Sim}(1^\kappa ,|C|)\) and sends \(\tilde{C}\) to \(\mathsf {A}\). \(\mathsf {A}\) then chooses an input x. The experiment runs \(\mathsf {lab}_x \leftarrow \mathsf {Sim}(1^\kappa ,\sigma )\) and sends \(\mathsf {lab}_x\) to \(\mathsf {A}\). Then it outputs the output of the adversary.

The above definition guarantees adaptive security, since the adversary gets to choose input x after seeing the garbled circuit \(\tilde{C}\). We only know how to instantiate garbling schemes with adaptive security in the random oracle model. In the standard model, existing garbling schemes achieve a weaker static variant of the above definition where the adversary chooses both C and input x at the same time and before receiving \(\tilde{C}\).

Concerning complexity, we note that if the cleartext circuit C has |C| gates, the respective garbled circuit has size \(O(|C|\kappa )\). This is because every gate in the circuit is typically replaced with a table of four rows, each row storing encryptions of labels (each encryption has \(\kappa \) bits).

2.2 Oblivious RAM

We recall Oblivious RAM (ORAM), a notion introduced and first studied by Goldreich and Ostrovsky [16]. ORAM can be thought of as a compiler that encodes the memory into a special format such that accesses on the compiled memory do not reveal the underlying access patterns on the original memory. An ORAM scheme consists of protocols \(\left( \textsc {Setup}, \textsc {ObliviousAccess}\right) \).

  • \(\langle \sigma , \mathsf {EM}\rangle \leftrightarrow \textsc {Setup}\langle (1^\kappa , \mathsf {M}),\bot \rangle \): \(\textsc {Setup}\) takes as input the security parameter \(\kappa \) and a memory array \(\mathsf {M}\) and outputs a secret state \(\sigma \) (for the client), and an encrypted memory \(\mathsf {EM}\) (for the server).

  • \(\langle (\mathsf {M}[y], \sigma '), \mathsf {EM}' \rangle \leftrightarrow \textsc {ObliviousAccess}\langle (\sigma ,y,v), \mathsf {EM}\rangle \): \(\textsc {ObliviousAccess}\) is a protocol between the client and the server, where the client’s input is the secret state \(\sigma \), an index y and a value v which is set to \(\mathsf {null}\) in case the access is a read operation (and not a write). Server’s input is the encrypted memory \(\mathsf {EM}\). Client’s output is \(\mathsf {M}[y]\) and an updated secret state \(\sigma '\) and the server’s output is an updated encrypted memory \(\mathsf {EM}'\) where \(\mathsf {M}[y] = v\), if \(v\ne \mathsf {null}\).

Correctness. Consider the following correctness experiment. Adversary \(\mathsf {A}\) chooses memory \(\mathsf {M}_0\). Consider \(\mathsf {EM}_0\) generated with \(\langle \sigma _0, \mathsf {EM}_0 \rangle \leftrightarrow \textsc {Setup}\langle (1^\kappa , \mathsf {M}_0 ),\bot \rangle )\). The adversary then adaptively chooses memory locations to read and write. Denote the adversary’s read/write queries by \((y_1,v_1),\dots ,(y_q,v_q)\) where \(v_i =\mathsf {null}\) for read operations. \(\mathsf {A}\) wins in the correctness game if \(\langle (\mathsf {M}_{i-1}[y_i], \sigma _i), \mathsf {EM}_{i} \rangle \) are not the final outputs of the protocol \(\textsc {ObliviousAccess}\langle (\sigma _{i-1},y_i,v_i), \mathsf {EM}_{i-1} \rangle \) for any \(1 \le i \le q\), where \(\mathsf {M}_i\), \(\mathsf {EM}_{i}\), \(\sigma _i\) are the memory array, the encrypted memory array and the secret state, respectively, after the i-th access operation, and \(\textsc {ObliviousAccess}\) is run between an honest client and server. The ORAM scheme is correct if the probability of \(\mathsf {A}\) in winning the game is negligible in \(\kappa \).

Security. An ORAM scheme is secure in the semi-honest model if for any PPT adversary \(\mathsf {A}\), there exists a PPT simulator \(\mathsf {Sim}\) such that the following two distributions are computationally indistinguishable.

  • \(\mathrm {Real}_\mathsf {A}(\kappa )\): \(\mathsf {A}\) chooses \(\mathsf {M}_0\). Experiment then runs \(\langle \sigma _0, \mathsf {EM}_0 \rangle \leftrightarrow \textsc {Setup}\langle (1^\kappa , \mathsf {M}_0 ),\bot \rangle \). For \(i=1,\ldots ,q\), \(\mathsf {A}\) makes adaptive read/write queries \((y_i,v_i)\) where \(v_i = \mathsf {null}\) on reads, for which the experiment runs the protocol

    $$\begin{aligned} \langle (\mathsf {M}_{i-1}[y_i], \sigma _i), \mathsf {EM}_{i} \rangle \leftrightarrow \textsc {ObliviousAccess}\langle (\sigma _{i-1},y_i,v_i), \mathsf {EM}_{i-1} \rangle . \end{aligned}$$

    Denote the full transcript of the above protocol by \(t_i\). Eventually, the experiment outputs \((\mathsf {EM}_q, t_1, \dots , t_q)\) where q is the total number of read/write queries.

  • \(\mathrm {Ideal}_{\mathsf {A},\mathsf {Sim}}(\kappa )\): The experiment outputs \((\mathsf {EM}_q, t'_1, \ldots , t'_q) \leftrightarrow \mathsf {Sim}(q,|\mathsf {M}_0|,1^\kappa )\).

3 \(\mathsf {TWORAM}\) Construction

Our \(\mathsf {TWORAM}\) construction uses an abstraction of tree-based ORAM schemes, e.g., Path-ORAM [34]. We start by describing this abstraction informally. Then we show how to turn the interactive Path-ORAM protocol (e.g., the one by Stefanov et al. [34]) into a two-round ORAM protocol, using the abstraction that we present below. We now give some necessary notation that we need for understanding our abstraction.

3.1 Notation

Let \(n=2^L\) be the size of the initial memory that we wish to access obliviously. This memory is denoted by \(A_L[1],A_L[2],\ldots ,A_L[n]\) where \(A_L[i]\) is the i-th block of the memory. Given location y that we wish to access, let \(y_L,y_{L-1},\ldots ,y_1\) be defined recursively as \(y_L=y\) and \(y_i=ceil(y_{i+1}/2)\), for all \(i=L-1,L-2,\ldots ,1\). For example, for \(L=4\) and \(y=13\), we have

  • \(y_1=ceil(ceil(ceil(y/2)/2)/2)=2\).

  • \(y_2=ceil(ceil(y/2)/2)=4\).

  • \(y_3=ceil(y/2)=7\).

  • \(y_4=13\).

Also define \(b_i=1-y_{i}\%2\) to be a bit (namely \(b_i\) indicates if \(y_i\) is even or not). Finally, on input a value x of \(2\cdot L\) bits, \(\mathsf {select}(x,0)\) selects the first L bits of x, while \(\mathsf {select}(x,1)\) selects the last L bits of x. We note here that both \(y_i\) and \(b_i\) are functions of y, but we do not indicate this explicitly so that not to clutter notation.

3.2 Path-ORAM Abstraction

We start by describing our abstraction of Path-ORAM construction. In Appendix A we describe formally how this abstraction can be used to implement the interactive Path-ORAM algorithm [34] (with \(\log n\) rounds of interaction). We note that the details in Appendix A are provided only for helping better understanding. Our construction can be understood based on just the abstraction defined below.

Roughly speaking, Path-ORAM algorithms encode the original memory \(A_L\) in the form of L memories

$$A_L,A_{L-1},\ldots ,A_1,$$

where \(A_L\) stores the original data and the remaining memories \(A_i\) store information required for accessing data in \(A_L\) obliviously. Each \(A_i\) has \(2^i\) entries, each one storing blocks of \(2\cdot L\) bits (for ease of presentation we assume the block size is \(\varTheta (\log n)\) but our results apply with other block parameterizations as well). Memories \(A_L,A_{L-1},\ldots ,A_2\) are stored in trees \(T_L,T_{L-1},\ldots ,T_2\) respectively. The smallest memory \(A_1\) is kept locally by the client. The invariant that is maintained is that any block \(A_i[x]\) will reside in some leaf-to-root path of tree \(T_i\), and specifically on the path that starts from leaf \(x_{i}\) in \(T_i\). The value \(x_{i}\) itself can be retrieved by accessing \(A_{i-1}\), as we detail in the following.

Reading a Value \(A_L[y]\) . To read a value \(A_L[y]\), one first reads \(A_1[y_1]\) from local storage and computes \(x_2\leftarrow \mathsf {select}(A_1[y_1],b_1)\) (recall definitions of \(y_1\) and \(b_1\) from Sect. 3.1). Then one traverses the path starting from leaf \(x_2\) in \(T_2\). This path is denoted with \(T_2(x_2)\). Block \(A_2[y_2]\) is guaranteed to be on \(T_2(x_2)\). Then one computes \(x_3\leftarrow \mathsf {select}(A_2[y_2],b_2)\), and continues in this way. In the end, one will traverse path \(T_L(x_L)\) and will eventually retrieve block \(A_L[y]\). See Fig. 1.

Updating the Paths. Once the above process finishes, we need to make sure that we do not access the same leaf-to-root paths in case we access \(A_L[y]\) again in the future—this would violate obliviousness. Thus, for \(i=2,\ldots ,L\), we perform the following tasks:

  1. 1.

    We remove all blocks from \(T_{i}(x_{i})\) and copy them into a data structure \(C_i\) called stash. In our abstraction, stash \(C_i\) is viewed as an extension of the root of tree \(T_i\);

  2. 2.

    In the stash \(C_{i-1}\), we set \(\mathsf {select}(A_{i-1}[y_{i-1}],b_{i-1})\leftarrow r_{i}\), where \(r_{i}\) is a fresh random number in \([1,2^{i}]\) that replaces \(x_{i}\) from above. This effectively means that block \(A_{i}[y_{i}]\) should be reinserted on path \(T_i(r_i)\), when eviction from stash \(C_i\) takes place;

  3. 3.

    We evict blocks from stash \(C_i\) back to tree \(T_i(x_i)\), respecting the new assignments made above.

Fig. 1.
figure 1

Our Path-ORAM abstraction for reading a value \(val=A_L[y]\). \(A_1[y_1]\) is read from local storage and defines \(x_2\). \(x_2\) defines a path \(p_2\) in \(T_2\). By traversing \(p_2\) the algorithm will retrieve \(A_2[y_2]\), which will yield \(x_3\), which defines a path \(p_3\) in \(T_3\). Repeating this process yields a path \(p_L\) in \(T_L\), traversing which yields the final value \(A_L[y_L]=A_L[y]\). Note that y is passed from tree \(T_{i-1}\) to tree \(T_i\) so that the index \(y_i\) (and the bit \(b_i\)) can be computed for searching for the right block on path \(p_i\).

Syntax. A Path-ORAM consists of three procedures \((\textsc {Initialize},\textsc {Extract},\textsc {Update})\) with syntax:

  • \(\mathcal {T}\leftarrow \textsc {Initialize}(1^\kappa ,A_L)\): Given a security parameter \(\kappa \) and memory \(A_L\) as input, \(\textsc {Setup}\) outputs a set of \(L-1\) trees \(\mathcal {T}=\{T_2,T_3,\ldots ,T_L\}\) and an array of two entries \(A_1\). \(A_1\) is stored locally with the client and \(T_2,\ldots ,T_L\) are stored with the server.

  • \(x_{i+1}\leftarrow \textsc {Extract}(i,y,T_i(x_{i}))\) for \(i=2,\ldots ,L\). Given the tree number i, the final memory location of interest y and a leaf-to-root path \(T_i(x_{i})\) (that starts from leaf \(x_{i}\)) in tree \(T_i\), \(\textsc {Extract}\) outputs an index \(x_{i+1}\) to be read in the next tree \(T_{i+1}\). The client can obtain \(x_2\) from local storage as \(x_2\leftarrow \mathsf {select}(A_1[y_1],b_1)\). The obtained value \(x_2\) is sent to the server in order for the server to continue execution. Finally, the server outputs \(x_{L+1}\), which is the desired value \(A_L[y]\). ExtractBucket Algorithm. In Path-ORAM [34], internal nodes of the trees store more than one block \((z,A_i[z])\), in the form of buckets. We note that \(\textsc {Extract}\) can be broken to work on individual buckets along a root-to-leaf path in a tree \(T_i\). In particular, we can define the algorithm \(\pi \leftarrow \textsc {ExtractBucket}(i,y,b)\) where i is the tree of interest, y is the memory location that needs to be accessed, and b is a bucket corresponding to a particular node on the leaf-to-root path. \(\pi \) will be found at one of the nodes on the leaf-to-root path. Note that the algorithm Extract can be implemented by repeatedly calling ExtractBucket for every b on \(T_i(x_i)\).

  • \(\{A_1,T_2(x_2),\ldots ,T_L(x_{L})\}\leftarrow \textsc {Update}(y,op,val,A_1,T_2(x_2),\ldots ,T_L(x_{L})\). Procedure Update takes as input the leaf-to-root paths (and local storage \(A_1\)) that were traversed during the access and accordingly updates these paths (and local storage \(A_1\)). Additionally, \(\textsc {Update}\) ensures the new value val is written to \(A_L[y]\), if operation op is a “write” operation.

An implementation of the above abstractions, for Path-ORAM [34], is given in Algorithms 1, 2 and 3 in Appendix A.1. Note that the description of the Update procedure [34] abstracts away the details of the eviction strategy. The Setup and ObliviousAccess protocols of the interactive Path-ORAM using these abstractions are given in Figs. 6 and 7 respectively in the Appendix A.2. It is easy to see that the ObliviousAccess protocol has \(\log n\) rounds of interactions. By the proof of Stefanov et al. [34], we get the following:

Corollary 1

The protocols Setup and ObliviousAccess from Figs. 6 and 7 respectively in Appendix A.2 comprise a secure ORAM scheme (as defined in Sect. 2.2) with \(O(\log n)\) rounds, assuming the encryption scheme used is CPA-secure.

We recall that the bandwidth overhead for Path-ORAM [34] is \(O(\log ^3 n)\) bits and the client storage is \(O(\log ^2 n)\cdot \omega (1)\) bits, for a block size of \(2\cdot L=2\cdot \log n\) bits.

3.3 From \(\log n\) Rounds to Two Rounds

Existing Path-ORAM protocols implementing our abstraction require \(\log n\) rounds (see ObliviousAccess protocol in Fig. 7). The main reason for that is the following: In order for the server to determine the index of leaf \(x_i\) from which the next path traversal begins, the server needs to access \(A_{i-1}[y_{i-1}]\), which is stored encrypted at some node on the path starting from leaf \(x_{i-1}\) in tree \(T_{i-1}\)—see Fig. 1. Therefore the server has to return all encrypted nodes on \(T_{i-1}(x_{i-1})\) to the client, who performs the decryption locally, searches for \(A_{i-1}[y_{i-1}]\) (via the ExtractBucket procedure) and returns the value \(x_i\) to the server (see Line 10 of the ObliviousAccess protocol in Fig. 7).

Our Approach. To overcome this difficulty, we do not encrypt the blocks in the buckets. Instead, for each bucket stored at a tree node u, we prepare a garbled circuit that hardcodes, among other things, the blocks that are contained in the bucket. Subsequently, this garbled circuit executes the ExtractBucket algorithm on the hardcoded blocks and outputs either \(\bot \) or the next leaf index \(\pi \), depending on whether the search performed by ExtractBucket was successful or not. The output, whatever that is, is fed as a garbled input to either the left child bucket or the right child bucket (depending on the currently traversed path) or the next root bucket (in case u is a leaf) of u. In this way, by the time the server has executed all the garbled circuits along the currently traversed path, he will be able to pass the index \(\pi \) to the next tree as a garbled input, and continue the execution in the same way without having to interact with the client. Therefore the client can obliviously retrieve his value \(A_L[y]\) in only two rounds of communication.

Unfortunately, once these garbled circuits have been consumed, they cannot be used again since this would violate security of garbled circuits. To avoid this problem, the client downloads all the data that was accessed before, decrypts them, runs the Update procedure locally, recomputes the garbled circuits that were consumed before, and stores the new garbled circuits locally. In the next access, these garbled circuits will be sent along with the query. Therefore the total number of communication rounds is equal to two (note that this approach requires permanent client storage—for transient storage, the client will have to send the garbled circuits immediately which would increase the rounds to three). We now continue with describing the bucket circuit that needs to be garbled for our construction.

Fig. 2.
figure 2

Formal description of the naive bucket circuit. Notation: Given \({\mathsf {lab}}\), the set of input labels for a garbled circuit, we let \({\mathsf {lab}}_{a}\) denote the garbled input labels (i.e., the labels taken from \({\mathsf {lab}}\)) corresponding to the input value a.

Naive Bucket Circuit. To help the reader, in Fig. 2 we describe a naive version of our bucket circuit that leads to an inefficient construction. Then we give the full-fledged description of our bucket circuit in Fig. 3. The naive bucket circuit has inputs, outputs and hardcoded parameters, which we detail in the following.

Inputs. The input of the circuit is a triplet consisting of the following information:

  1. 1.

    The index of the leaf p from which the currently explored path begins;

  2. 2.

    The final location to be accessed y;

  3. 3.

    The output from previous bucket \(\pi \) (can be the actual value of the next index to be explored or \(\bot \)).

Outputs. The outputs of the circuit are the next node to be executed, along with its garbled inputs. For example, if the current node u is not a leaf (see Lines 4 and 5 in Fig. 2), the circuit outputs the garbled inputs of either the left or the right child, whereas if the current node is a leaf (see Lines 6–8 in Fig. 2), the circuit outputs the garbled inputs of the next root to be executed. Note that outputting the garbled inputs is easy, since the bucket circuit hardcodes the input labels of the required circuits. Finally we note that the \(\textsc {ExtractBucket}(i,y,\textsf {bucket})\) algorithm used in Fig. 2 can be found in Appendix A.1—see Algorithm 2.

Hardcoded Parameters. The circuit for node u hardcodes:

  1. 1.

    The node identifier u that consists of a triplet (ijk) where

    • \(i\in \{2,\ldots ,L\}\) is the tree number where node u belongs to;

    • \(j\in \{0,\ldots ,2^{i-1}\}\) is the depth of node u;

    • \(k\in \{0,\ldots ,2^j-1\}\) is the oder of node u in the specific level.

    For example, the root of tree \(T_3\) will be denoted (3, 0, 0), while its right child will be (3, 1, 1).

  2. 2.

    The bucket information bucket (i.e., blocks \((x,A_i[x],r)\) contained in node u—recall r is the path index in \(T_i\) assigned to \(A_i[x]\));

  3. 3.

    The input labels \(\mathsf {leftLabels}\), \(\mathsf {rightLabels}\) and \(\mathsf {nextRootLabels}\) that are used to compute the garbled inputs for the next circuit to be executed. Note that \(\mathsf {leftLabels}\) and \(\mathsf {rightLabels}\) are used to prepare the next garbled inputs when node u is an internal node (to go either to the left or the right child), while \(\mathsf {nextRootLabels}\) are used when node u is a leaf (to go to the next root).

Final Bucket Circuit. In the naive circuit presented before, we hardcode the input labels of the root node \(\mathsf {root}\) of every tree \(T_i\) into all the nodes/circuits of tree \(T_{i-1}\). Unfortunately, in every oblivious access, the garbled circuits of all roots are consumed (and therefore \(\mathsf {root}\)’s circuit as well), hence all the garbled circuits of tree \(T_{i-1}\) will have to be recomputed from scratch. This cost is O(n), thus very inefficient. We would like to mimimize the number of circuits in \(T_{i-1}\) that need to be recomputed and ideally make this cost proportional to \(O(\log n)\).

To achieve that, we observe that, instead of hardcoding input labels \(\mathsf {nextRootLabels}\) in the garbled circuit of every node of tree \(T_{i-1}\) , we can just pass them as garbled inputs to the garbled circuit of every node of tree \(T_{i-1}\). The final circuit is given in Fig. 3. Note that the only difference of the new circuit from the naive circuit is in the computation of the garbled inputs

$$\mathsf {leftNewLabels}_{(p,y,\pi ,\mathsf {nextRootLabels})}$$

and

$$\mathsf {rightNewLabels}_{(p,y,\pi ,\mathsf {nextRootLabels})},$$

where \(\mathsf {nextRootLabels}\) is added in the subscript (see Line 5 of both Figs. 3 and 2), to account for the new input of the new circuit. Note also that we indicate the change in the input format by using “\(\mathsf {leftNewLabels}\)” instead of just “\(\mathsf {leftLabels}\)” and “\(\mathsf {rightNewLabels}\)” instead of just “\(\mathsf {rightLabels}\)”. \(\mathsf {nextRootLabels}\) have the same meaning in both circuits.

Fig. 3.
figure 3

Formal description of the final bucket circuit.

3.4 Protocols Setup and ObliviousAccess of our construction

We now describe in detail the Setup and ObliviousAccess protocols of \(\mathsf {TWORAM}\).

Setup. The Setup protocol is described in Fig. 4. Just like the setup for the interactive ORAM protocol (see Fig. 6 in Appendix A.2), in \(\mathsf {TWORAM}\), the client does some computation locally in the beginning (using his secret key) and then outputs some “garbled information” that is being sent to the server. In particular:

  1. 1.

    After producing the trees \(T_2,T_3,\ldots ,T_L\) using algorithm Initialize, the client prepares the garbled circuit of Fig. 3 for all the nodes \(u\in T_i\), for all trees \(T_i\). It is important this computation takes place from the leaves towards the root (that is why we write \(j\in \{i-1,\ldots ,0\}\) in Line 2 of Fig. 4), since a garbled circuit of a node u hardcodes the input labels of the garbled circuits of its children—so these need to be readily available by the time u’s garbled circuit is computed.

  2. 2.

    Apart from the garbled circuits, the client needs to prepare garbled inputs for the \(\mathsf {nextRootLabels}\) inputs of all the roots of the trees \(T_i\). These are essentially the \(\beta _i\)’s computed in Line 4 of Fig. 4.

Fig. 4.
figure 4

\(\textsc {Setup}\) protocol for \(\mathsf {TWORAM}\).

ObliviousAccess. The ObliviousAccess protocol of \(\mathsf {TWORAM}\) is described in Fig. 5. The first step of the protocol is similar to that of the interactive scheme (see Fig. 7 in Appendix), where the client accesses local storage \(A_1\) to compute the path index \(x_2\) that must be traversed in \(T_2\). However, the main difference is that, instead of sending \(x_2\) directly, the client sends the garbled input that corresponds to \(x_2\) for the root circuit of tree \(T_2\), denoted with \(\alpha \) in Fig. 5.

We note here that \(\alpha \) is not enough for the first garbled circuit to start executing, and therefore the server complements this garbled input with \(\beta _2\) (see Server Line 1), the other half that was sent by the client before and that represents the garbled inputs for the input labels of the next root. Subsequently, the server starts executing the garbled circuits one-by-one, using the outputs of the first circuit, as garbled inputs to the second one, and so on. Eventually, the clients reads and decrypts all paths \(T_i(x_i)\), retrieving the desired value (see Client Line 2). Finally, the client runs the Update, re-garbles the circuits that got consumed and waits until the next query to send them back. We can now state the main result of our paper.

Theorem 1

The protocols Setup and ObliviousAccess from Figs. 4 and 5 respectively comprise a two-round secure ORAM scheme (as defined in Sect. 2.2), assuming the garbling scheme used is secure (as defined in Sect. 2.1) and the encryption scheme used is CPA-secure.

Fig. 5.
figure 5

\(\textsc {ObliviousAccess}\) protocol for \(\mathsf {TWORAM}\).

The proof of the above theorem can be found in Appendix A.3. Concerning complexity, it is clear that the only overhead that we are adding on Path-ORAM [34] is a garbled circuit per bucket—this adds a multiplicative security parameter factor on all the complexity measures of Path-ORAM. E.g., the bandwidth overhead of our construction is \(O(\kappa \cdot \log ^3 n)\) bits (for blocks of \(2\log n\) bits).

3.5 Optimizations

Recall that in the garbling procedure of a circuit C, one has the following choices: (i) either to garble C in a way that during evaluation of the garbled circuit on x the output is the cleartext value C(x); (ii) or to garble C in a way that during evaluation of the garbled circuit on x the output is the garbled labels corresponding to the value C(x). We now describe an optimization for a specific circuit C that we will be using in our construction that uses the above observation.

General Optimization. Consider a circuit that performs the following task: It hardcodes two k-bit strings \(s_0\) and \(s_1\), takes an input a bit b and outputs \(s_b\). This cleartext circuit has size O(k), so the garbled circuit for that will have size \(O(k^2)\). To improve upon that we consider a circuit \(C'\) that takes as input bit b and outputs the same bit b! This cleartext circuit has size O(1). However, to make sure that the output of the garbled version of \(C'\) is always \(s_b\), we garble \(C'\) by outputting the garbled label corresponding to b, namely \(s_b\) (i.e., using (ii) from above). In particular, during the garbling procedure we use \(s_0\) as the garbled label output for output \(b=0\) and we use \(s_1\) as the garbled label output for the output \(b=1\). Note that the size of the new garbled circuit has size O(k), yet it has exactly the same I/O behavior with the garbling of C, which has size \(O(k^2)\).

  • Improving \(\mathsf {cState}\) —Not Hard-Coding Input Labels Inside the Bucket Circuit. In the construction we described, we include the input labels \({\mathsf {leftLabels}},{\mathsf {rightLabels}}\) in the circuit \(\mathsf {C}[u,\textsf {bucket},{\mathsf {leftLabels}},{\mathsf {rightLabels}}]\). Consequently, the size of the ungarbled version of this circuit grows with the size of \({\mathsf {leftLabels}}\) and \({\mathsf {rightLabels}}\) which is \(\kappa \cdot |\mathsf {cState}|\). We can easily use the general optimization described above, for each bit of \(|\mathsf {cState}|\), to make the size of the ungarbled version of our circuit only grow with \(|\mathsf {cState}|\).

  • Improving \(\mathsf {nState}\) —Input Labels Passing. In the construction described previously, for each tree, an input value \(\mathsf {nState}\) is passed from the root to a leaf node in the tree. However this value is used only at the leaf node. Recall that the \(\mathsf {nState}\) value passed from the root to a leaf garbled circuits in the tree \(T_{i}\) is exactly the value \(\mathsf {cState}^{i+1,0,0}\), the input labels of the root garbled circuit of the tree \(T_{i+1}\). Since each ungarbled circuit gets this value as input, therefore each of one of them needs to grow with \(\kappa \cdot |\mathsf {cState}|\).Footnote 2 We will now describe an optimization such that the size of the garbled version, rather than the clear version, grows linearly in \(\kappa \cdot |\mathsf {cState}|\). Note that in our construction the value \(\mathsf {cState}^{i+1,0,0}\) is not used at all in the intermediate circuits as it gets passed along the garbled circuits for tree \(T_i\). In order to avoid this wastefulness, for all nodes \(i \in \{1,\ldots , L\},j\in [i],k\in [2^j]\) we sample a value \(r^{(i,j,k)}\) of length \(\kappa \cdot |\mathsf {cState}|\) and hardcode the values \(r^{(i,j,k)} \oplus r^{(i,j+1,2k)}\) and \(r^{(i,j,k)} \oplus r^{(i,j+1, 2k+1)}\) inside the garbed circuit \(\tilde{C}^{i,j,k}\) which output the first of two values if the execution goes left and the second if the execution goes right. Note that a garbled circuits grows only additively in \(\kappa \cdot |\mathsf {cState}|\) because of this change. This follows by using the first optimization. Additionally, we include the value \(\mathsf {cState}^{i+1,0,0} \oplus r^{(i,0,0)}\) with the root node of the tree \(T_i\). The leaf garbled circuit \((i,i-1,k)\) in tree \(T_i\) is constructed assuming \(r^{(i,i-1,k)}\) is the sequence of input labels for the root garbled circuit of the tree \(T_{i+1}\).Footnote 3 Let \(\alpha _0, \ldots \alpha _{i-1}\) be the strings output during the root to a leaf traversal in tree \(T_i\). Now observe that \(\mathsf {cState}^{i+1,0,0} \oplus r^{(i,0,0)}\oplus _{j\in [i]}\alpha _j\) is precisely \(\mathsf {cState}^{i+1,0,0} \oplus r^{(i,i-1,k)}\) where k is the leaf node in the traversed path. At this point it is easy to see that given the output of the leaf grabled circuit for tree \(T_i\) one can compute the required input labels for the root of tree \(T_{i+1}\). The update mechanism in our construction can be easily adapted to work with this change. Here note that we would now include the values \(r^{(i,j,k)}, r^{(i,j+1,2k)}\) and \(r^{(i,j+1,2k+1)}\) in the ciphertext \(X^{(i,j,k)}\). Also note that we will use fresh \(r^{(\cdot ,\cdot ,\cdot )}\) values whenever a fresh garbled circuit for a node is generated. The security argument now additionally uses the fact that the outputs generated by garbled circuits in two separate root to leaf traversals depend on completely independent \(r^{(\cdot ,\cdot ,\cdot )}\) values. Note that the above modification leaks what value is passed by the executed leaf garbled circuit in tree \(T_i\) to the root garbled circuit in tree \(T_{i+1}\). This can be deduced based on what bit values of \(\mathsf {cState}^{i+1,0,0} \oplus r^{(i,0,0)}\) are revealed. This can be tackled by randomly permuting the labels in \(\mathsf {cState}^{i+1,0,0}\) and passing the information on this permutations along with in the tree to leaf garbled circuits. Note that the size of this information is small.

Taken together these two optimizations reduce the size of each garbled circuit to \(O(\kappa \cdot (|\textsf {bucket}|+ |\mathsf {cState}|))\). Since \(|\textsf {bucket}|> |\mathsf {cState}|\) this expression reduces to \(O(\kappa \cdot |\textsf {bucket}|)\). This implies that the overhead of our construction is just \(\kappa \) times the overhead of the underlying Path ORAM scheme.

4 Searchable Encryption Construction Using \(\mathsf {TWORAM}\)

The natural way of designing an SSE scheme that does not leak the search and access patterns using an ORAM scheme is to first use a data structure for storing keyword-document pairs, setup the data structure in memory using an ORAM setup and then read/write from it using ORAM operations. Since ORAM hides the read/write access patterns, but it does not hide the number of memory accesses, one needs to ensure that the number of memory accesses for each operation is also data-independent. Fortunately, this can be achieved by not letting the key used for the hash table be the output of a pseudorandom function applied to the keyword w, and not the keyword w itself.

We start by giving some definitions and then describe constructions that can be instantiated using any ORAM scheme. We then show how to obtain a significantly more efficient instantiation using a combination of \(\mathsf {TWORAM}\) and a non-recursive Path-ORAM scheme.

4.1 Hash Table Definition

A hash table is a data structure commonly used for mapping keys to values [7]. It often uses a hash function h that maps a key to an index (or a set of indices) in a memory array \(\mathsf {M}\) where the value associated with the key may be found. In particular, h takes as input a keyword key and outputs a set of indices \(i_1,\ldots , i_c\) where c is a parameter. The value associated with key is in one of the locations \(\mathsf {M}[i_1], \ldots \mathsf {M}[i_c]\). The keyword is not in the table if it is not in one of those locations. Similarly, to write a new (keyvalue) pair into the table, (keyvalue) is written into the first empty location among \(i_1, \dots , i_c\). More formally, we define a hash table \(H =(\mathsf {hsetup}, \mathsf {hlookup}, \mathsf {hwrite})\) using a tuple of algorithms and a parameter c denoting an upper bound on the number of locations to search.

  • \((h,\mathsf {M})\leftarrow \mathsf {hsetup}(S, size)\): \(\mathsf {hsetup}\) takes as input an initial set S of keyword-value pairs and a maximum table size size and outputs a hash function h and a memory array \(\mathsf {M}\).

  • \(value \leftarrow \mathsf {hlookup}(key)\): \(\mathsf {hlookup}\) computes \(\{i_1, \dots , i_c \}\leftarrow h(key)\), looks for a key-value pair \((key,\cdot )\) in \(\mathsf {M}[i_1], \dots , \mathsf {M}[i_c]\). If such a pair is found it returns the second component of the pair (i.e., the value), else it returns \(\bot \).

  • \(\mathsf {M}\leftarrow \mathsf {hwrite}(key,value)\): \(\mathsf {hwrite}\) computes \(i_1, \dots , i_c \leftrightarrow h(key)\), if (keyvalue) already exists in one of those indices in \(\mathsf {M}\) it does nothing, else it stores (keyvalue) in the first empty index.

4.2 Searchable Encryption Definition

A database D is a set of document/keyword-set pair

$$\mathsf {DB}= (d_i, W_i)_{i=1}^N.$$

Let \(W = \cup _{i=1}^N W_i\) be the universe of keywords. A keyword search query for w should return all \(d_i\) where \(w \in W_i\). We denote this subset of \(\mathsf {DB}\) by \(\mathsf {DB}(w)\). A searchable symmetric encryption scheme consists of protocols \(\textsc {SSESetup}\), \(\textsc {SSESearch}\) and \(\textsc {SSEAdd}\). The following formalization first appeared in [6, 8].

  • \(\langle \sigma ,\mathsf {EDB}\rangle \leftrightarrow \textsc {SSESetup}\langle (1^\kappa , \mathsf {DB}),\bot \rangle \): \(\textsc {SSESetup}\) takes as client’s input database \(\mathsf {DB}\) and outputs a secret state \(\sigma \) (for the client), and an encrypted database \(\mathsf {EDB}\) which is outsourced to the server.

  • \(\langle (\mathsf {DB}(w), \sigma '), \mathsf {EDB}' \rangle \leftrightarrow \textsc {SSESearch}\langle (\sigma ,w), \mathsf {EDB}\rangle \): \(\textsc {SSESearch}\) is a protocol between the client and the server, where client’s input is the secret state \(\sigma \) and the keyword w he is searching for. Server’s input is the encrypted database \(\mathsf {EDB}\). Client’s output is the set of documents containing w, i.e. \(\mathsf {DB}(w)\) as well an updated secret state \(\sigma '\) and the server obtains an updated encrypted database \(\mathsf {EDB}'\).

  • \(\langle \sigma ', \mathsf {EDB}' \rangle \leftrightarrow \textsc {SSEAdd}\langle (\sigma ,d), \mathsf {EDB}\rangle \): \(\textsc {SSEAdd}\) is a protocol between the client and the server, where client’s input is the secret state \(\sigma \) and a document d to be inserted into the database. Server’s input is the encrypted database \(\mathsf {EDB}\). Client’s output is an updated secret state \(\sigma '\) and the server’s output is an updated encrypted database \(\mathsf {EDB}'\) which now contains the new document d.

Correctness. Consider the following correctness experiment. An adversary \(\mathsf {A}\) chooses a database \(\mathsf {DB}_0\). Consider the encrypted database \(\mathsf {EDB}_0\) generated using \(\textsc {SSESetup}\) (i.e., \(\langle \sigma _0,\mathsf {EDB}_0\rangle \leftrightarrow \textsc {SSESetup}\langle (1^\kappa , \mathsf {DB}_0), \bot \rangle \)). The adversary then adaptively chooses keywords to search and documents to add to the database, and the respective protocols \(\textsc {SSESearch}\) and \(\textsc {SSEAdd}\) are run between an honest client and server, outputting the updated \(\mathsf {EDB}\), \(\mathsf {DB}\) and \(\sigma \). Denote the operations chosen by the adversary with \(w_1, \dots ,w_q\). \(\mathsf {A}\) wins in the correctness game if for some search query \(w_i\) it is

$$\langle (\mathsf {DB}_i(w_i),\sigma _i), \mathsf {EDB}_i \rangle \ne \textsc {SSESearch}\langle (\sigma _{i-1},w_i), \mathsf {EDB}_{i-1} \rangle ,$$

where \(\mathsf {DB}_i,\mathsf {EDB}_{i}\) are the database and encrypted database, respectively, after the i-th search. The SSE scheme is correct if the probability of \(\mathsf {A}\) winning the game is negligible in \(\kappa \).

Security. We discuss security in the semi-honest model. It is parametrized by a leakage function \(\mathcal {L}\), which explains what the adversary (the server) learns about the database and the search and update queries, while interacting with a secure SSE scheme. A SSE scheme is \(\mathcal {L}\)-secure if for any PPT adversary \(\mathsf {A}\), there exist a simulator \(\mathsf {Sim}\) such that the following two distributions are computationally indistinguishable.

  • \(\mathrm {Real}_A(\kappa )\): \(\mathsf {A}\) chooses \(\mathsf {DB}_0\). The experiment then runs

    $$\langle \sigma _0,\mathsf {EDB}_0 \rangle \leftrightarrow \textsc {SSESetup}\langle (1^\kappa , \mathsf {DB}_0),\bot \rangle .$$

    \(\mathsf {A}\) then adaptively makes search queries \(w_i\), which the experiment answers by running the protocol \(\langle \mathsf {DB}_{i-1}(w_i), \sigma _i \rangle \leftrightarrow \textsc {SSESearch}\langle (\sigma _{i-1},w_i), \mathsf {EDB}_{i-1} \rangle \). Denote the full transcripts of the protocol by \(t_i\) and with \(\mathsf {EDB}'\) the final encrypted database. Add queries are handled in a similar way. Eventually, the experiment outputs

    $$(\mathsf {EDB}, t_1, \dots , t_q),$$

    where q is the total number of search/add queries made by \(\mathsf {A}\).

  • \(\mathrm {Ideal}_{\mathsf {A},\mathsf {Sim},\mathcal {L}}(\kappa )\): \(\mathsf {A}\) chooses \(\mathsf {DB}_0\). The experiment runs

    $$(st_0,\mathsf {EDB}_0) \leftrightarrow \mathsf {Sim}(\mathcal {L}(\mathsf {DB}_0)),$$

    where \(st_0\) is the initial state of the simulator. On input any search query \(w_i\) from \(\mathsf {A}\), the experiment adds \((w_i,search)\) to the history H, and on an add query \(d_i\) it adds \((d_i,add)\) to H. It then runs \((t_i,st_i) \leftrightarrow \mathsf {Sim}(st_{i-1},\mathcal {L}(\mathsf {DB}_{i-1},H))\). Eventually, the experiment outputs \((\mathsf {EDB}', t_1, \dots , t_q)\) where q is the total number of search/add queries made by \(\mathsf {A}\).

Leakage. The level of security one obtains from a SSE scheme depends on the leakage function \(\mathcal {L}\). Ideally \(\mathcal {L}\) should only output the total number \(\sum _{w \in W} |\mathsf {DB}(w)|\) of (wd) pairs, the total number of unique keywords |W| and \(|\mathsf {DB}(w)|\) for any searched keyword w. Achieving this level of security is only possible if the \(\textsc {SSESearch}\) operation outputs the documents themselves to the client. If instead (as is common for applications with large document sizes), it returns document identifiers which the client then uses to retrieve the actual documents, any SSE protocol would also leak the access pattern.

4.3 SSE from any ORAM

First Approach. The common way of storing a database of documents in a hash table is to insert a key-value pair (wd) into the table for any keyword w in a document d. Searching for a document with keyword w then reduces to looking up w in the table. If there is more than one document containing a keyword w, a natural solution is to create a bucket \(B_w\) storing all the documents containing w and storing the bucket in position \(pt_w\) of an array A. One then inserts \((w,pt_w)\) in a hash table. Now, to search for a keyword w, we first look up \((w,pt_w)\), and then access \(\mathsf {A}[pt_w]\) to obtain the bucket \(B_w\) of all the desired documents. A subtle issue is that the distribution of bucket sizes would leak information about the database even before any keyword is searched. As a result, for this approach to be fully-secure, one needs to pad each bucket to an upperbound on the number of searchable documents per keyword.

Next we describe the SSE scheme more formally. Given a hash table \(H = (\mathsf {hsetup}, \mathsf {hlookup}, \mathsf {hwrite})\), and an ORAM scheme \(ORAM = (\textsc {Setup}, \textsc {ObliviousAccess})\), we construct an SSE scheme \((\textsc {SSESetup},\textsc {SSESearch},\textsc {SSEAdd})\) as follows.

  1. 1.

    \(\langle \sigma , \mathsf {EDB}\rangle \leftrightarrow \textsc {SSESetup}\langle (1^\kappa , max, \mathsf {DB}),\bot \rangle \): Given an initial set of documents \(\mathsf {DB}\), client lets S be the set of key-value pairs \((w, pt_w)\) where \(pt_w\) is an index to an array of buckets \(\mathsf {A}\) such that \(\mathsf {A}[pt_w]\) stores the bucket of all documents in \(\mathsf {DB}\) containing w. Each bucket is padded to the maximum size max with dummy documents. Client first runs \(\mathsf {hsetup}(S,size)\) to obtain \((h, \mathsf {M})\). size is the maximum size of hash table H. Then client and server run \(\langle \sigma _1, \mathsf {EM}\rangle \leftrightarrow \textsc {Setup}\langle (1^\kappa , \mathsf {M}),\bot \rangle \). Cleint and server also run \(\langle \sigma _2, \mathsf {EA}\rangle \leftrightarrow \textsc {Setup}\langle (1^\kappa , \mathsf {A}),\bot \rangle \) Note that server’s output is \(\mathsf {EDB}= (\mathsf {EM},\mathsf {EA})\) and client’s output is \(\sigma = (\sigma _1,h,\sigma _2)\).

  2. 2.

    \( \textsc {SSESearch}\langle (\sigma ,w), \mathsf {EDB}\rangle \): Client computes \(i_1,\dots ,i_c \leftarrow h(w)\). Then, client and server run \(\textsc {ObliviousAccess}\langle ((\sigma _1,i_j,\mathsf {null}), \mathsf {EM}\rangle \) for \(j \in \{1,\dots ,c\}\) for client to obtain \(\mathsf {M}[i_j]\). If client does not find \((w,pt_w)\) in one of the retrieved locations it lets \(pt_w = 0\), corresponding to a dummy access to the index 0 in \(\mathsf {A}\). Client and server then run \(\textsc {ObliviousAccess}\langle (\sigma _2,pt_w,\mathsf {null}),\mathsf {EA}\rangle )\) for client to obtain the bucket \(B_w\) stored in \(\mathsf {A}[pt_w]\). Client outputs all the non-dummy documents in \(B_w\).

  3. 3.

    \( \textsc {SSEAdd}\langle (\sigma ,d), \mathsf {EDB}\rangle \): For every w in d, client computes \(i_1,\dots ,i_c \leftarrow h(w)\) and client and server run \(\textsc {ObliviousAccess}\langle (\sigma _1,i_j,\mathsf {null}), \mathsf {EM}\rangle \) for \(j \in \{1,\dots ,c\}\) for client to obtain \(\mathsf {M}[i_j]\). If \((w,pt_w)\) is in the retrieved locations let \(i^*_j\) be the location it was found at. If not, let \(pt_w\) be the first empty location in \(\mathsf {A}\), and let \(i*_j\) be the first empty location from the retrieved ones in \(\mathsf {M}\). Client and server run \(\textsc {ObliviousAccess}\langle (\sigma _1,i^*_j,(w,pt_w)), \mathsf {EM}\rangle \). Client and server run \(\textsc {ObliviousAccess}\langle (\sigma _2,pt_w,\mathsf {null}), \mathsf {EA}\rangle \) to retrieve \(\mathsf {A}[pt_w]\). Let \(B_w\) be the retrieved bucket. Client inserts d in the first dummy entry of \(B_w\), denoting the new bucket by \(B'_w\). Client and server run

    $$\textsc {ObliviousAccess}\langle (\sigma _2,pt_w, B'_w), \mathsf {EA}\rangle .$$

The main disadvantage of the above construction is that we need to anticipate an upper bound on the bucket sizes, and pad all buckets to that size. Given that in practice there are often keywords that appear in a large number of documents, and keywords that only appear in a few, the padding will lead to inefficiency. Our next solution addresses this issue but instead has a higher round complexity.

Second Approach. Instead of storing all documents matching a keyword w in one bucket, we store each of them separately in the hash table, using a different keyword. In particular, we can store the key-value pair (w||id) in the hash table for the ith document d containing w. This works fine except that it requires looking up w||count for an incremental counter count until the keyword is no longer found in the table.

To make this approach cleaner and the write operations more efficient, we maintain two hash tables, one for storing the counter representing the number of documents containing the keyword, and one storing the incremental key-value pairs as described above. To lookup a keyword w, one first looks up the counter count in the first table and then makes count lookup queries to the second table.

We now describe the above SSE scheme in more detail. Given a hash table \(H = (\mathsf {hsetup}, \mathsf {hlookup}, \mathsf {hwrite})\) and a scheme \(ORAM = (\textsc {Setup}, \textsc {ObliviousAccess})\), we construct an SSE scheme \((\textsc {SSESetup},\textsc {SSESearch},\textsc {SSEAdd})\) as follows:

  1. 1.

    \(\langle \sigma , \mathsf {EDB}\rangle \leftrightarrow \textsc {SSESetup}\langle (1^\kappa , \mathsf {DB}),\bot \rangle \): Given an initial set of documents \(\mathsf {DB}\). Let \(S_1\) be the set of \((w,count_w)\) pairs and \(S_2\) be the set of key-value pairs \((w||i,d_i)\) for \(1 \le i \le count_w\) where \(count_w\) is the number of documents containing w, and \(d_i\) denotes the ith document in \(\mathsf {DB}\) containing w.

    Cleint runs \(\mathsf {hsetup}(S_i,size_i)\) to obtain \((h_i, \mathsf {M}_i)\). \(size_i\) is the maximum size of the hash table \(H_i\). Then client and server run \(\langle \sigma _i, \mathsf {EM}_i \rangle \leftrightarrow \textsc {Setup}\langle (1^\kappa , \mathsf {M}_i),\bot \rangle \). Note that server’s output is \(\mathsf {EDB}= (\mathsf {EM}_1,\mathsf {EM}_2)\) and client’s output is \(\sigma = (\sigma _1,\sigma _2,h_1,h_2)\).

  2. 2.

    \( \textsc {SSESearch}\langle (\sigma ,w), \mathsf {EDB}\rangle \): Client computes \(i_1,\dots ,i_c \leftarrow h_1(w)\) and client and server run \(\textsc {ObliviousAccess}\langle (\sigma _1,i_j,\mathsf {null}),\mathsf {EM}_1 \rangle )\) for \(j \in \{1,\dots ,c\}\) for client to obtain \((w,count_w)\) among the retrieved locations. If such a pair is not found, client lets \(count_w =0\).

    For \(1 \le k \le count_w\), client computes \(i^k_1,\dots , i^k_c \leftarrow h_2(w||k)\) and client and server run \(\textsc {ObliviousAccess}\langle (\sigma _2,i^k_j,\mathsf {null}),\mathsf {EM}_2 \rangle )\) for \(j \in \{1,\dots ,c\}\) for client to obtain \(\mathsf {M}_2[i^k_j]\). Client outputs d for all d where (w||kd) is in the retrieved locations from \(\mathsf {M}_2\).

  3. 3.

    \( \textsc {SSEAdd}\langle (\sigma ,d), \mathsf {EDB}\rangle \): For every w in d, client computes \(i_1,\dots ,i_c \leftarrow h_1(w)\) and client and server run \(\textsc {ObliviousAccess}\langle (\sigma _1,i_j,\mathsf {null}), \mathsf {EM}_1 \rangle \) for \(j \in \{1,\dots ,c\}\) for client to obtain \(\mathsf {M}_1[i_j]\). If \((w,count_w)\) is in the retrieved locations let \(i^*_j\) be the location it was found at. If not, let \(count_w = 0\) and let \(i^*_j\) be the first empty location from the retrieved ones. Client and server run \(\textsc {ObliviousAccess}\langle (\sigma _1,i^*_j, (w,count_w+1)), \mathsf {EM}_1\rangle \) to increase the counter by one.

    Client then computes \(i'_1,\dots ,i'_c \leftarrow h_2(w||count_w+1)\) and client and server run \(\textsc {ObliviousAccess}\langle (\sigma _2,i'_j,\mathsf {null}), \mathsf {EM}_2 \rangle \) to retrieve \(\mathsf {M}_2[i'_j]\) for \(j \in \{1,\dots ,c\}\). Let \(i'_k\) be the first empty location among them. Client and server run

    $$\textsc {ObliviousAccess}\langle (\sigma _2,i'_k,(w||count+1)), \mathsf {EM}_2\rangle .$$

The main disadvantage of our second approach is that for each search, it requires \(count_w\) ORAM accesses to retrieve all matching documents. This means that the bandwidth/computation overhead of ORAM scheme is multiplied by \(count_w\) which can be large for some keywords. More importantly, it would require \(O(count_w)\) rounds since the ORAM accesses cannot be parallelized in our constant-round ORAM construction. In particular, note that each memory garbled circuit in the construction can only be used once and needs to be replaced before the next memory access. Finally, the constant-round ORAM needs to store a memory array that is proportional to the number of (wd) tuples associated with the database, which is significantly larger than the number of unique keywords, increasing the storage overhead of the resulting SSE scheme.

Next, we address all these efficiency concerns, showing a construction that only requires a single ORAM access using our constant-round construction.

4.4 SSE from Path-ORAM

The idea is to not only store a per-keyword counter \(count_w\) as before, but also to store a \(access_w\) that represents the number of search/add queries performed on w so far. Similar to the previous approach, the tuple \((w,(count_w,access_w))\) is stored in a hash table that is implemented using our constant-round ORAM scheme \(\mathsf {TWORAM}\). The \(count_w\) is incremented whenever a new document containing w is added and the \(access_w\) is incremented after each search/add query for w.

The tuples \((w||i,d_i)\) for all \(d_i\) containing w are then stored in a one-level (non-recursive) Path-ORAM. In order to avoid storing a large client-side position map for this non-recursive Path-ORAM, we generate/update the positions pseudorandomly using a PRF \(F_K(w||i||access_w)\). Since each document \(d_i\) has a different index and each search/add query for w will increment \(access_w\), the pseudorandomness property of F ensures that this way of generating the position maps is indistinguishable from generating them at random. Now the client only needs to keep the secret key K. Note that since we are using a one-level Path-ORAM to store the documents, we can handle multiple parallel accesses without any problems, hence obtaining a constant-round search/add complexity. Furthermore, we only access \(\mathsf {TWORAM}\)(which uses garbled circuits) once per keyword search to retrieve the tuple \((w,(count_w,access_w))\), so \(\mathsf {TWORAM}\)’s overhead is not multiplied by \(count_w\) for each search/add query. Similarly, the storage overhead of \(\mathsf {TWORAM}\)is only for a memory array of size |W| (number of unique keywords in documents) which is significantly smaller than the number of keyword-document pairs needed in the general approach.

We need to make a few small modifications to the syntax of the abstraction for Path-ORAM here. First, since we generate the position map on the fly using a PRF, it is convenient to modify the syntax of the \(\textsc {Update}\) procedure to take the new random position as input, instead of internally generating it in our original syntax. Also, since we are not extracting an index y from the Path-ORAM and instead are extracting a tuple of the form \((w||i,d_i)\), we will pass w||i as input in place of y in the \(\textsc {Extract}\) and \(\textsc {Update}\) operations.

We now describe the SSE scheme. Given a hash table \(H = (\mathsf {hsetup},\mathsf {hlookup}, \mathsf {hwrite})\), our constant-round ORAM scheme \(\mathsf {TWORAM}\) \(=(\textsc {Setup},\textsc {ObliviousAccess})\), a single level Path-ORAM scheme with procedures \((\textsc {Initialize},\textsc {Extract},\textsc {Update})\), and a PRF function F, we build an SSE scheme \((\textsc {SSESetup},\textsc {SSESearch},\textsc {SSEAdd})\) as follows:

  1. 1.

    \(\langle \sigma , \mathsf {EDB}\rangle \leftrightarrow \textsc {SSESetup}\langle (1^\kappa , \mathsf {DB}),\bot \rangle \): Given an initial set of documents \(\mathsf {DB}\), let S be the set of \((w,(count_w, access_w = 0))\) where \(count_w\) is the number of documents containing w, and \(access_w\) denotes the number of times the keyword w has been searched/added.

    Client runs \(\mathsf {hsetup}(S,size)\) to obtain \((h, \mathsf {M})\). size is the anticipated maximum size of the hash table H. Then client and server run \(\langle \sigma _s, \mathsf {EM}\rangle \leftrightarrow \textsc {Setup}\langle (1^\kappa , \mathsf {M}),\bot \rangle \).

    Let \(A_L\) be an initially empty memory array with a size that estimates an upper bound on total number of (wd) pairs ind \(\mathsf {DB}\). Client runs \(\mathcal {T}\leftarrow \textsc {Initialize}(1^\kappa ,A_L)\), and only sends the tree \(T_L\) for the last level to server, and discards the rest.

    Client generates a PRF key \(K \leftarrow \{0,1\}^\kappa \).

    For every item \((w, (count_w, access_w))\) in S, and for \(1 \le i \le count_w\) (in parallel):

    1. (a)

      Client lets \(val_{w,i} = (w||i,d_i)\) where \(d_i\) denotes the ith document in \(\mathsf {DB}\) containing w.

    2. (b)

      Client lets \(x_{w,i} = F_K(w||i||access_w)\) and sends \(x_{w,i}\) to server who returns the encrypted buckets on path \(T_L(x_{w,i})\) which client decrypts itself.

    3. (c)

      Client runs \(\{T_L(x_{w,i})\}\leftarrow \textsc {Update}(w||i, write ,val_{w,i},T_L(x_{w,i}), x'_{w,i})\), where \(x'_{w,i} = F_K(w||i||access_w+1)\), to insert \(val_{w,i}\) into the path along its new path \(T_L(x'_{w,i})\). Client then encrypts the updated path \(T_L(x_{w,i})\) and sends it to server who updates \(T_L\).

    Note that server’s output is \(\mathsf {EDB}= (\mathsf {EM}, T_L)\) and client’s output is \(\sigma = (\sigma _s,h, K)\).

  2. 2.

    \(\textsc {SSESearch}\langle (\sigma ,w), \mathsf {EDB}\rangle \): Client computes \(i_1,\dots ,i_c \leftarrow h(w)\) and client and server run \(\textsc {ObliviousAccess}\langle (\sigma _s,i_j,\mathsf {null}),\mathsf {EM}\rangle )\) for \(j \in \{1,\dots ,c\}\). If client finds \((w,(count_w, access_w))\) in one of the retrieved locations, let \(i^*_j\) be the location it was found at. If such a pair is not found the search ends here. Client and server run \(\textsc {ObliviousAccess}\langle (\sigma _s,i^*_j,(w,count_w, access_w+1)), \mathsf {EM}\rangle \) to increase the \(access_w\) by one.

    For \(1 \le i \le count_w\) (in parallel):

    1. (a)

      Client lets \(x_{w,i} = F_K(w||i||access_w)\) and sends \(x_{w,i}\) to server who returns \(T_L(x_{w,i})\) which client decrypts.

    2. (b)

      Client runs \((w||i,d_i) \leftarrow \textsc {Extract}(L,w||i,T_L(x_{w,i}))\), and outputs \(d_i\). Client runs \(\{T_L(x_{w,i})\}\leftarrow \textsc {Update}(w||i,read, (w||i ,d_i),T_L(x_{w,i}), x'_{w,i} = F_K(w||i||access_w+1))\) to update the location of \((w||i ,d_i)\) to \(x'_{w,i}\). Client then encrypts the updated path and sends it to server to update \(T_L\).

  3. 3.

    \( \textsc {SSEAdd}\langle (\sigma ,d), \mathsf {EDB}\rangle \):

    For every w in d:

    1. (a)

      Client computes \(i_1,\dots ,i_c \leftarrow h(w)\) and client and server run

      $$\textsc {ObliviousAccess}\langle (\sigma _s,i_j,\mathsf {null}),\mathsf {EM}\rangle ),$$

      for \(j \in \{1,\dots ,c\}\). If client finds \((w,(count_w, access_w))\) in one of the retrieved locations, let \(i^*_j\) be the location it was found at. Else, it lets \(i^*_j\) be the first empty location among the retrieved ones.

    2. (b)

      Client and server run \(\textsc {ObliviousAccess}\langle (\sigma _s,i^*_j,(w,(count_w+1, access_w+1))), \mathsf {EM}\rangle \) to increase \(count_w\) and \(access_w\) by one.

    3. (c)

      Client lets \(x_{w,count_w} = F_K(w||count_w||acess_w)\) and sends \(x_{w,count_w}\) to server who returns encrypted \(T_L(x_{w,count_w})\) back. Client decrypts the path.

    4. (d)

      Client lets \(x' = F_K(w||count_w+1||access_w+1)\) and runs \(\{T_L(x_{w,count_w})\}\leftarrow \textsc {Update}(w||i,write,(w||count_w+1 ,d),T_L(x_{w,count_w}), x')\) to update the path. Client then encrypts the updated path and sends it to server to update \(T_L\).

Before stating the security theorem for the above SSE scheme, we first need to make the leakage function associated with the scheme more precise. The leakage function \(\mathcal {L}(\mathsf {DB}, H)\) for our scheme outputs the following (\(\mathsf {DB}\) is the database and H is the search/add history): |W|, number unique keywords in all documents; \(|\mathsf {DB}(w)|\) for every w searched; \(\sum _{w \in W} |\mathsf {DB}(w)|\) i.e. the number of (wd) pairs where w is in d. See Appendix A.4 for the proof.

Theorem 2

The above SSE scheme is \(\mathcal {L}\)-secure (cf. Definition of Sect. 4), if \(\mathsf {TWORAM}\) is secure (cf. Definition in Sect. 2.2), F is a PRF, and the encryption used in the one-level Path-ORAM is CPA-secure.

Efficiency. The setup cost for our SSE scheme is the sum of the setup cost for \(\mathsf {TWORAM}\) for a memory of size |W|, and the setup for a one-level Path-ORAM of size \(n= \sum _{w \in W} |\mathsf {DB}(w)|\) which is \(O(n\log n{{\mathrm{loglog}}}n)\).

The bandwidth cost for each search/add query w is the cost of one ORAM read in \(\mathsf {TWORAM}\)plus \(O(|\mathsf {DB}(w)|*(\log n{{\mathrm{loglog}}}n))\) for \(n= \sum _{w \in W} |\mathsf {DB}(w)|\).