Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Releasing verifiable partial information while maintaining privacy is a requirement in many practical scenarios where the data being dealt with is sensitive. A basic case is releasing a subset of a set and proving its authenticity in a privacy-preserving way (referred to as zero-knowledge property) [10, 12, 26, 29]. However, in many other cases, the information is stored in data structures to support richer type of queries. In this paper, we consider order queries on two or more elements of a list, where the answer to the query returns the elements rearranged according to their order in the list. Order queries lie at the heart of many practical applications where the order between queried elements is revealed and proved but the rank of the queried elements in the list and information about other elements in the list should be protected.

In an auction with a single winner (e.g., online ad auction for a single ad spot) every participant submits her secret bid to the auction organizer. After the top bidder is announced, a participant wishes to verify that her bid was inferior. The organizer would then provide a proof without revealing the amount of the top bid, the rank of the participant’s bid, or any information about other bids.

Lenders often require an individual or a couple to prove eligibility for a loan by providing a bank statement and a pay stub. Such documents contain sensitive information beyond what the lender is looking for: whether the bank account balance and salary are above given thresholds. A desirable alternative would be to provide a proof from the bank and employer that these thresholds are met without revealing exact figures and even hiding who of the two spouses earns more.

The above examples can be generalized using order queries on an ordered set, aka list, that return the order of the queried elements as well as a proof of this order but without revealing anything more than the answer itself. We address this problem by introducing two different models: zero knowledge lists (ZKL) and privacy-preserving authenticated lists (PPAL).

ZKL considers two party model and extends zero knowledge sets [12, 29] to lists. In ZKL a prover commits to a list and a verifier queries the prover to learn the order of a subset of list elements. The verifier should be able to verify the answer but learn no information about the rest of the list, e.g., the size of the list, the order of other elements of the list or the rank of the queried element(s). Here both the prover and the verifier can act as malicious adversaries. While the prover may want to give answers inconsistent with the initial list he committed to, the verifier may try to learn information beyond the query answer or arbitrarily deviate from the protocol.

PPAL considers three parties: the owner of the list, the server who answers list queries on behalf of the owner, and the client who queries the server. The privacy guarantee of PPAL is the same as in ZKL. For authenticity, PPAL assumes that the owner is trusted while the server and the client could be malicious. This trust model allows for a much more efficient construction than ZKL, as we will see later in the paper. PPAL has direct applications to outsourced services where the server is modeling the cloud service that the owner uses to interact with her clients.

We note that PPAL can be viewed as a privacy-preserving extension of authenticated data structures (ADS) (see, e.g., [19, 20, 28, 36]), which also operate in a three party model: the server stores the owner’s data and proves to the client the answer to a query. However, privacy properties have not been studied in this model and as a consequence, known ADS constructions leak information about the rest of the data through their proofs of authenticity. For example, the classic Merkle hash tree [28] on a set of n elements proves membership of an element via a proof of size \(\log n\), thus leaking information about the size of the set. Also, if the elements are stored at the leaves in sorted order, the proof of membership of an element reveals its rank.

In this paper, we define the security properties for ZKL and PPAL and provide efficient constructions for them. The privacy property against the verifier in ZKL and the client in PPAL is zero knowledge. That is, the answers and the proofs are indistinguishable from those that are generated by a simulator that knows nothing except the previous and current queries and answers and, hence, cannot possibly leak any information beyond that. While we show that PPAL can be implemented using our ZKL construction, we also provide a direct PPAL construction that is considerably more efficient thanks to the trust that clients put in the list owner. Let n be the size of the list and m be the size of the query, i.e., the number of list elements whose order is sought. Our PPAL construction uses proofs of O(m) size and allows the client to verify a proof in O(m) time. The owner executes the setup in O(n) time and space. The server uses O(n) space to store the list and related authentication information, and takes \(O(\min (m\log n, n))\) time to answer a query and generate a proof. In contrast, in the ZKL construction, the time and storage requirements have an overhead that linearly depends on the security parameter. Note that ZKL also supports (non-)membership queries. The client in PPAL and the verifier in ZKL require only one round of communication for each query. Our ZKL construction is based on zero knowledge sets and homomorphic integer commitments. Our PPAL construction uses a novel technique of blinding of accumulators along with bilinear aggregate signatures. Both are secure in the random oracle model.

2 Problem Statement, Models, Related Work, and Contributions

In this section, we state our problem, outline our models, review related work, and summarize our contributions. Formal definitions and constructions are in the rest of the paper. Detailed proofs and construction that are omitted due to space restrictions are available in the full version [17].

2.1 Problem Statement and Models

Let \(\mathcal {L}\) be a totally ordered list of distinct elements. An order query on \(\mathcal {L}\) is defined as follows: given a set of elements of \(\mathcal {L}\), return these elements rearranged according to their order in \(\mathcal {L}\) and a proof of this order. Both models we introduce, PPAL and ZKL, support this query. ZKL, in addition to order queries, supports provable membership and non-membership queries. Beside providing authenticity, the proofs are required not to leak any information beyond the answer.

ZKL: This model has two parties: prover and verifier. The prover initially computes a commitment to a list \(\mathcal {L}\) and reveals the commitment to the verifier. Later the verifier asks membership and order queries on \(\mathcal {L}\) and the prover responds with a proof. Both the prover and the verifier can be malicious:

  • The prover may try to give answers inconsistent with the initial commitment.

  • The verifier may try to learn from the proofs additional information about \(\mathcal {L}\) beyond what he has inferred from the answers. E.g., if the verifier has performed two order queries with answers \(x<y\) and \(x<z\), he may want to find out whether \(y<z\) or \(z<y\).

The security properties of ZKL, completeness, soundness and zero-knowledge, guarantee security against malicious prover and verifier. Completeness mandates that honestly generated proofs always satisfy the verification test. Soundness states that the prover should not be able to come up with a query, and corresponding inconsistent (with the initial commitment) answers and convincing proofs. Finally, zero-knowledge means that each proof reveals the answer and nothing else. In other words, there must exist a simulator, that given only oracle access to \(\mathcal {L}\), can simulate proofs for membership and order queries that are indistinguishable from real proofs.

PPAL: This model has three parties: owner, server and client. The owner generates list \(\mathcal {L}\) and outsources it to the server. The owner also sends (possibly different) digest information with respect to \(\mathcal {L}\) to the server and the client. Given an order query from the client, the server, using the server digest, builds and returns to the client the answer and its proof, which is verified by the client using the client digest. Both the server and the client can be malicious:

  • The server may try to forge proofs for incorrect answers to (order) queries, e.g., prove an incorrect ordering of a pair of elements of \(\mathcal {L}\).

  • The client, similar to the verifier in ZKL, may try to learn from the proofs additional information about list \(\mathcal {L}\) beyond what he has inferred from the answers.

Note that in typical cloud database applications, the client is allowed to have only a restricted view of the data structure and the server enforces an access control policy that prevents the client from getting answers to unauthorized queries. This motivates the curious, possibly malicious, behavior from the client where he tries to ask ill-formed queries or queries violating the access control policy. However, we assume that the server enforces client’s legitimate behavior by refusing to answer illegal queries. Hence, the security model for PPAL is defined as follows.

The properties of PPAL, Completeness, Soundness and Zero-Knowledge, guarantee security against malicious server and client. They are close to the ones of ZKL except for soundness. For PPAL it enforces that the client does not accept proofs forged by the server for incorrect answers w.r.t. owner’s list. PPAL’s owner and server together can be thought of as a single party in ZKL, the prover. Hence, ZKL soundness protects against the prover who tries to give answers inconsistent with her own initial commitment. In the PPAL model, the owner and the server are separate parties where the owner is trusted and soundness protects against a malicious server only.

To understand the strength of the zero-knowledge property, let us illustrate to what extent the proofs are non-revealing. This property guarantees that a client, who adaptively queries a static list, does not learn anything about ranks of the queried elements, the distance between them or even the size of \(\mathcal {L}\). The client is not able to infer any relative order information that is not inferable by the rule of transitivity from the previously queried orders. It is worth noting that in the context of leakage-free redactable signature schemes, privacy property has been defined using game-based definitions in transparency [6, 34] and privacy [11, 23]. However, our definition of simulatability of the query responses, or the zero-knowledge property, is a simpler and more intuitive way to capture the property of leakage-freeness.

Efficiency: We characterize the ideal efficiency goals of our models as follows, where \(\mathcal {L}\) is a list of n items and m is the query size. The space for storing list \(\mathcal {L}\) and the auxiliary information for generating proofs should be O(n). As in related work, a multiplicative factor for element size of \(O(\mathsf {poly(k)})\), where k is the security parameter, is not shown in \(O(\cdot )\). The setup to preprocess list \(\mathcal {L}\) should take O(n) time. The proof of the answer to a query should have O(m) size. Processing a query to generate the answer and its proof should take O(m) time. Verifying the proof of an answer should take O(m) time.

Applications of Order Queries to Order Statistics: Our PPAL order queries can be used as a building block to answer efficiently and in zero knowledge (i.e., the returned proofs should be simulatable) many interesting statistical queries about a list \(\mathcal {L}\) with n elements. Let a pair order proof denote the proof of the order of two elements from \(\mathcal {L}\). Then a PPAL client can send the server a subset \(\mathcal {S}\) of m list elements and request the server to return the maximum, minimum, or the median element of \(\mathcal {S}\) w.r.t. the order of the elements in the list. This can be done by providing m pair order proofs. Order queries also can be extended to return the top t elements of \(\mathcal {S}\) by means of \(t(m-t)\) pair order proofs, or only \(m-1\) pair order proofs if the order between the top t elements can be revealed, where \(t<m\). Finally, given an element a in \(\mathcal {L}\), the server can return the elements of \(\mathcal {S}\) that are above (or below) the threshold value a by means of m pair order proofs. It is important to note that neither of these queries reveal anything more than the answer itself. Moreover, the size of the proof returned for each query is proportional to the query size and is optimal for the threshold query where the proof size is proportional to the answer size. We note that these statistical queries are also supported by ZKL.

2.2 Related Work

First, we discuss work on data structures that answer queries in zero knowledge. Our ZKL is the first extension of this work to lists and order queries. We then mention signature schemes that can be used to instantiate outsourced data structures that require privacy and integrity to be maintained. However, such instantiations are not efficient since they are based on different models of usage and underlying data. Finally, we outline leakage-free redactable signature schemes for ordered lists and other structured data. These signature schemes are not as efficient as our construction and their definitions are game-based as opposed to our intuitive zero-knowledge definition. Finally we discuss follow-up work on PPAL.

Zero Knowledge Data Structures: Zero-knowledge dictionary and range queries have received considerable attention in literature [10, 12, 26, 29, 32]. Our proposed ZKL model is the first generalization of this line of work that supports order queries.

The model of zero knowledge set (ZKS) was introduced by Micali et al. [29] where a prover commits to a finite set S in such a way that, later on, she will be able to efficiently (and non-interactively) prove statements of the form \(x \in S\) or \(x \notin S\) without leaking any information about S beyond what has been queried for, not even the size of S. The prover should not be able to prove contradictory statements about an element. Chase et al. [12] abstracted the above solution and described it in terms of a mercurial commitment, which was later generalized to q-trapdoor mercurial commitments in [10, 26] and a closely related notion of vector commitments was proposed in [9]. Kate et al. [22] suggested a weaker primitive called nearly-zero knowledge set where the set size is not private. Ostrovsky et al. [32] generalized (non-)membership queries to orthogonal range queries on multidimensional dataset and considered adding privacy to their protocol. However, the use of NP-reductions and probabilistically checkable proofs makes their generic construction expensive.

We note that a recent work on DNSSEC zone enumeration by Goldberg et al. [18] uses a model related to our PPAL model and is independently developed. The framework supports only set (non-)membership queries and answers them in f-zero knowledge. This property ensures that the information leaked to the verifier is in terms of a function f on the set, e.g., f is the set size in [18].

Signature Schemes: A three party model where the owner digitally signs a data document and outsources it to the server and the server discloses to the client only part of the signed document along with a legitimately derived signature on it (without the owner’s involvement), can be instantiated with a collection of signature schemes, namely, content extraction, quotable, arithmetic, redactable, homomorphic, sanitizable and transitive signatures [7, 21, 30, 31, 35, 38]. Additionally, if the signatures reveal no information about the parent document, then this approach can be used to add privacy. However the generic instantiation, with signature schemes that do not specifically address structured data, is inefficient for most practical purposes.

Ahn et al. [1] present a unified framework for computing on authenticated data where a third party can derive a signature on an object \(x^{\prime }\) from a signature on a parent object x as long as \(P(x,x^{\prime }) = 1\) for some predicate P that captures the authenticatable relationship between x and \(x^{\prime }\). Additionally, a derived signature reveals no extra information about the parent x. This line of work was later refined in [2, 37]. The authors in [1] propose a computationally expensive scheme based on the RSA accumulator and predicates for specific data structures are not considered. A related notion of malleable signature scheme was proposed in [13], where given a signature \({\upsigma }\) on a message x, it is possible to efficiently derive a signature \({\upsigma }'\) on a message \(x'\) such that \(x' = T(x)\) for an allowable transformation T without access to the secret key. The privacy definition of [13] (simulation context hiding) is stronger than that of [1] as it allows for adversarially-generated keys and signatures. However, the owner is a trusted party in our PPAL setting and therefore the stronger notion of simulation context hiding is not relevant in this framework. Moreover, in our PPAL model, given a quote from a document and a proof of the quote, the client should be able to verify that the quote is indeed in the document, this is inverse of the notion of unlinkability in [13].

Leakage-Free Signature Schemes for Structural Data: A leakage-free redactable signature scheme (LRSS) allows a third party to remove, or redact, parts of a signed document without signer’s involvement. The verifier only sees the remaining redacted document and is able to verify that it is valid and authentic. Leakage-freeness property ensures that the redacted document and its signature do not reveal anything about the content or position of the removed parts. We discuss LRSSs that specifically look at structural data and ordered lists. In Table 1 we show that PPAL outperforms known LRSS constructions. Another significant difference of our work is the definition of privacy. The zero-knowledge property is more intuitive and simple in capturing the leakage-freeness property compared to the game based definitions in the LRSS literature [6, 34].

Kundu and Bertino [24] introduced the idea of structural signatures for ordered trees (subsuming ordered lists) that support public redaction of subtrees by third-parties. This work was later extended to undirected graphs and DAGs [25]. The notion was later formalized as LRSS for ordered trees in [6] and subsequently several attacks on [24] were also proposed in [6, 33]. The basic idea of the LRSS scheme presented in [6] is to sign all possible ordered pairs of elements of an ordered list. So both the computation cost and the storage space are quadratic in the number of elements of the list.

Building on the work of [6, 34] proposed a LRSS for lists that has quadratic time and space complexity. Poehls et al. [33] presented a LRSS scheme for a list that has linear time and space complexity but assumes an associative non-abelian hash function, whose existence has not been formally proved. Kundu et al. [23], presented a construction that uses quadratic space at the server. Chang et al. [11] presented a leakage-free redactable signature scheme for a string (which can be viewed as a list) that hides the location of the redacted or deleted portions of the string at the expense of quadratic verification cost. None of the constructions of [11, 23, 24] satisfy our definition of zero-knowledge.

Follow-up Work: Finally we note that in recent work [16], Ghosh et al. have generalized the models introduced in this paper to general abstract data types that support both query and update operations. Also, they have presented efficient constructions for dynamic lists and partially-ordered sets of bounded dimension.

2.3 Contributions and Organization of the Paper

Our contributions are novel models and efficient constructions. After reviewing preliminary concepts and cryptographic primitives, in Sect. 3, we introduce the zero-knowledge list (ZKL) model. We describe our ZKL construction, its security and efficiency in Sect. 4. In Sect. 5, we introduce the privacy-preserving authenticated list (PPAL) model. An efficient PPAL construction based on bilinear maps, its performance and security properties are given in Sect. 6. In Table 1, we compare our ZKL and PPAL construction with previous work in terms of performance and assumptions. We specifically indicate which constructions satisfy the zero-knowledge property. Our PPAL construction outperforms all previous work based on widely accepted assumptions [6, 34] (the construction of [33] is based on a non-standard assumption).

Table 1. Comparison of our ZKL and PPAL constructions with previous work. All the time and space complexities are asymptotic. Notation: n is the list size, m is the query size, k is the security parameter. WLOG we assume list elements are k bit long. Following the standard convention, we omit a multiplicative factor of O(k) for element size in every cell. Assumptions: Strong RSA Assumption (SRSA); Existential Unforgeability under Chosen Message Attack (EUCMA) of the underlying signature scheme; Random Oracle Model (ROM); n-Element Aggregate Extraction Assumption (nEAE); Associative non-abelian hash function (AnAHF) [non-standard]; Collision Resistant Hash Function (CRHF); Discrete Log Assumption (DL); Factoring a composite (FC); n-Bilinear Diffie Hellman Inversion Assumption (nBDHI).

3 Preliminaries

3.1 Data Type

We consider a linearly ordered list \(\mathcal {L}\) as a data structure that the owner wishes to store with the server. A list is an ordered set of elements \(\mathcal {L}= \{ x_1,x_2,\ldots ,x_n \}\), where each \(x_i \in \{ 0,1 \}^*, \forall x_1, x_2 \in \mathcal {L}, x_1 \ne x_2\) and either \(x_1 < x_2\) or \(x_2 < x_1\). Hence, \(<\) is a strict order on elements of \(\mathcal {L}\) that is irreflexive, asymmetric and transitive.

We denote the set of elements of the list \(\mathcal {L}\) as \(\mathsf {Elements}\)(\(\mathcal {L}\)). A sublist of \(\mathcal {L}\), \({\updelta }\), is defined as: \({\updelta }= \lbrace x~|~x \in \mathsf {Elements}(\mathcal {L}) \rbrace \). Note that the order of elements in \({\updelta }\) may not follow the order of \(\mathcal {L}\). We denote with \({{\uppi }}_{\mathcal {L}}({{\updelta }})\) the permutation of the elements of \({\updelta }\) under the order of \(\mathcal {L}\). \(\mathcal {L}(x_i)\) denotes the membership of element \(x_i\) in \(\mathcal {L}\), i.e., \(\mathcal {L}(x_i) = \mathsf {true}\) if \(x_i \in \mathcal {L}\) and \(\mathcal {L}(x_i) = \mathsf {false}\) if \(x_i \notin \mathcal {L}\). For all \(x_i\) such that \(\mathcal {L}(x_i) = \mathsf {true}\), \(\mathsf {rank}(\mathcal {L}, x_i)\) denotes the rank of element \(x_i\) in the list, \(\mathcal {L}\).

3.2 Cryptographic Primitives

We now describe the cryptographic primitives that are used in our construction and cryptographic assumptions that underlie the security of our method. In particular, our zero knowledge list construction relies on homomorphic integer commitments, zero knowledge protocol to prove a number is non-negative and zero knowledge sets, while the construction for privacy preserving lists relies on bilinear aggregate signatures and n-Bilinear Diffie Hellman Inversion assumption.

Homomorphic Integer Commitment Scheme: We use a homomorphic integer commitment scheme \(\mathsf {HomIntCom}\) that is statistically hiding and computationally binding [5, 14]. The latter implies the existence of a trapdoor and, hence, can be used to “equivocate” a commitment (i.e., open the commitment to any message using the trapdoor). We denote a commitment to x as C(xr) where r is the randomness used for the commitment. For simplicity, we sometimes drop r from the notation and use C(x) to denote the commitment to x. The homomorphism of the scheme is defined as \(C(x+y) = C(x) \times C(y)\).

Proving an Integer is Non-negative in Zero-Knowledge: We use the following (interactive) protocol between a prover and a verifier: the prover sends a commitment c to an integer \(x \ge 0\) to the verifier and proves in zero-knowledge that the committed integer is non-negative, without opening c. We denote this protocol as \(\mathsf {P}\leftrightarrow \mathsf {V}(x,r : c = C(x;r) \wedge x \ge 0)\). As a concrete construction we use the protocol of [27] which is a \(\Sigma \) protocol, i.e., honest verifier zero knowledge and can be made non-interactive zero-knowledge (NIZK) in the random oracle model using Fiat-Shamir heuristic [15].

Zero Knowledge Set Scheme: Let D be a set of key value pairs. If (xv) is a key, value pair of D, then we write \(D(x) = v\) to denote v is the value corresponding to the key x. For the keys that are not present in D, \(x \notin D\), we write \(D(x) = \bot \). A Zero Knowledge Set scheme (ZKS) [29] consists of three probabilistic polynomial time algorithms, \(\mathsf {ZKS}= (\mathsf {ZKS}\mathsf {Setup},\mathsf {ZKS}\mathsf {Prover}= (\mathsf {ZKS}\mathsf {P}_1,\) \(\mathsf {ZKS}\mathsf {P}_2),\) \(\mathsf {ZKS}\mathsf {Verifier})\), and queries are of the form “is key x in D?”. The \(\mathsf {ZKS}\mathsf {Setup}\) algorithm takes the security parameter as input and produces a public key for the scheme that both the prover (\(\mathsf {ZKS}\mathsf {Prover}\)) and the verifier (\(\mathsf {ZKS}\mathsf {Verifier}\)) take as input. The prover, \(\mathsf {Prover}\), is a tuple of two algorithms: \(\mathsf {ZKS}\mathsf {P}_1\) takes the security parameter, the public key, and the set D and produces a short digest commitment \(\mathsf {com}\) for D. \(\mathsf {ZKS}\mathsf {P}_2\) takes a query x and produces the value \(v = D(x)\), and the corresponding proof of (non-)membership, \(\mathsf {proof}_x\). The verifier, \(\mathsf {ZKS}\mathsf {Verifier}\), takes the security parameter, the public key, \(\mathsf {com}\), a query x, an answer D(x), and \(\mathsf {proof}_x\) and returns a bit b, where \(b= \mathsf {ACCEPT}/\mathsf {REJECT}\). For our construction of zero knowledge lists we pick a ZKS construction of [12] that is based on mercurial commitments.

Bilinear Aggregate Signature Scheme: Our PPAL scheme relies on bilinear aggregate signature scheme of Boneh et al. [4]. Given signatures \({\upsigma }_1, \ldots , {\upsigma }_n\) on distinct messages \(M_1,\ldots , M_n\) from n distinct users \(u_1,\ldots , u_n\), it is possible to aggregate these signatures into a single short signature \({\upsigma }\) such that it (and the n messages) convince the verifier that the n users indeed signed the n original messages (i.e., user i signed message \(M_i\)). We use the special case where a single user signs n distinct messages \(M_1, \ldots , M_n\). The security requirement of an aggregate signature scheme guarantees that the aggregate signature \({\upsigma }\) is valid if and only if the aggregator used all \({\upsigma }_i\)’s to construct it.

3.3 Hardness Assumption

Let p be a large k-bit prime where \(k \in \mathbb {N}\) is a security parameter. Let \(n \in \mathbb {N}\) be polynomial in k, \(n = \mathsf {poly(k)}\). Let \(e:G \times G \rightarrow G_1\) be a bilinear map where G and \(G_1\) are groups of prime order p and g be a random generator of G. We denote a probabilistic polynomial time (PPT) adversary \(\mathcal {A}\) as an adversary who is running in time \(\mathsf {{poly}(k)}\). We use \(\mathcal {A}^{\mathsf {alg}(\mathsf {input}, \ldots )}\) to show that an adversary \(\mathcal {A}\) has an oracle access to an instantiation of an algorithm \(\mathsf {alg}\) with first argument set to \(\mathsf {input}\) and \(\ldots \) denoting that \(\mathcal {A}\) can give arbitrary input for the rest of the arguments.

Definition 1

( n -Bilinear Diffie Hellman Inversion ( n -BDHI) [3]). Let s be a random element of \(\mathbb {Z}_p^*\) and n be a positive integer. Then, for every PPT adversary \(\mathcal {A}\) there exists a negligible function \({\upnu }(.)\) such that:

\(Pr[s \xleftarrow {\$} \mathbb {Z}_p^*;y \leftarrow \mathcal {A}(\langle g, g^s,g^{s^2}, \ldots , g^{s^n} \rangle ): y = e(g,g)^{\frac{1}{s}}] \le {\upnu }(k).\)

4 Zero Knowledge List (ZKL)

We generalize the idea of consistent set membership queries [12, 29] to support membership and order queries in zero-knowledge on a list with no repeated elements. More specifically, given a totally ordered list of unique elements \(\mathcal {L}= \{ y_1,y_2,\ldots ,y_n \}\), we want to support non-interactively and in zero-knowledge, (proofs reveal nothing beyond the query answer, not even the size of the list) queries of the following form:

  • Is \(y_i \in \mathcal {L}\) or \(y_i \notin \mathcal {L}\), i.e., \(\mathcal {L}(y_i)=\mathsf {true}\) or \(\mathcal {L}(y_i)=\mathsf {false}\)?

  • For two elements \(y_i, y_j \in \mathcal {L}\), what is their relative order, i.e., \(y_i <y_j\) or \(y_j <y_i\) in \(\mathcal {L}\)?

We adopt the same adversarial model as in [12, 29, 32]. There are two parties: the prover and the verifier. The prover initially commits to a list of elements and makes the commitment public. We now formally describe the model and the security properties.

4.1 Model

A Zero Knowledge List scheme (ZKL) consists of three probabilistic polynomial time algorithms: \((\mathsf {Setup},\mathsf {Prover}= (\mathsf {P}_1, \mathsf {P}_2),\mathsf {Verifier})\). The queries are of the form \(({\updelta },\mathsf {flag})\) where \({\updelta }= \{ z_1, \ldots , z_m \}, z_i \in \{ 0,1 \}^*\), is a collection of elements, \(\mathsf {flag}=0\) denotes a (non-)membership query and \(\mathsf {flag}=1\) denotes an order query. In the following sections, we will use \(\mathsf {state}\) to represent a variable that saves the current state of the algorithm (when it finishes execution).

  • \(\mathsf {PK}\leftarrow \mathsf {Setup}(1^k)\) The \(\mathsf {Setup}\) algorithm takes the security parameter as input and produces a public key \(\mathsf {PK}\) for the scheme. The prover and the verifier both take as input the string \(\mathsf {PK}\) that can be a random string (in which case, the protocol is in the common random string model) or have a specific structure (in which case the protocol is in the trusted parameters model).

  • \( (\mathsf {com}, \mathsf {state}) \leftarrow \mathsf {P}_1(1^k, \mathsf {PK}, \mathcal {L})\) \(\mathsf {P}_1\) takes the security parameter, the public key \(\mathsf {PK}\) and the list \(\mathcal {L}\), and produces a short digest commitment \(\mathsf {com}\) for the list.

  • \((\mathsf {member},\mathsf {proof}_M, \mathsf {order}, \mathsf {proof}_O) \leftarrow \mathsf {P}_2(\mathsf {PK},\mathsf {state},{\updelta },\mathsf {flag})\) where \({\updelta }= \{z_1, \ldots , z_m \}\) and \(\mathsf {flag}\) denotes the type of query. \(\mathsf {P}_2\) produces the membership information of the queried elements, \(\mathsf {member}= \{ \mathcal {L}(z_1), \ldots , \mathcal {L}(z_m) \}\) and the proof of membership (and non-membership), \(\mathsf {proof}_M\)\(\mathsf {proof}_O\) is set depending on \(\mathsf {flag}\):

    • \(\mathsf {flag}= 0\): \(\mathsf {P}_2\) sets \(\mathsf {order}\) and \(\mathsf {proof}_O\) to \(\bot \) and returns \((\mathsf {member},\mathsf {proof}_M, \bot , \bot )\).

    • \(\mathsf {flag}= 1\): Let \(\tilde{{\updelta }} = \{ z_i \mid i \in [1,m] \wedge \mathcal {L}(z_i) = \mathsf {true} \}\). \(\mathsf {P}_2\) produces the correct list order among the elements of \(\tilde{{\updelta }}\), \(\mathsf {order}= {{\uppi }}_{\mathcal {L}}({\tilde{{\updelta }}})\), and the proof of the order, \(\mathsf {proof}_O\).

  • \(b \leftarrow \mathsf {Verifier}(1^k,\mathsf {PK},\mathsf {com},{\updelta },\mathsf {flag},\mathsf {member}, \mathsf {proof}_M, \mathsf {order}, \mathsf {proof}_O)\) \(\mathsf {Verifier}\) takes the security parameter, the public key \(\mathsf {PK}\), the commitment \(\mathsf {com}\) and a query \(({\updelta },\mathsf {flag})\) and \(\mathsf {member}\), \(\mathsf {proof}_M\), \(\mathsf {order}\), \(\mathsf {proof}_O\) and returns a bit b, where \(b= \mathsf {ACCEPT}/\mathsf {REJECT}\).

Example: Let us illustrate the above functionality with a small example. Let \(\mathcal {L}= \{A, B, C \}\) and \(({\updelta }, \mathsf {flag}) = (\{B, D, A\}, 1)\) be the query. Given this query \(\mathsf {P}_2\) returns \(\mathsf {member}= \{ \mathcal {L}(B),\) \(\mathcal {L}(D),\) \(\mathcal {L}(A)\}=\{\mathsf {true}, \mathsf {false}, \mathsf {true} \}\), the corresponding proofs of membership and non-membership in \(\mathsf {proof}_M\), \(\mathsf {order}= \{ A, B \}\) and the corresponding proof of order between A and B in \(\mathsf {proof}_O\).

4.2 Security Properties

Recall that the security properties of ZKL, Completeness, Soundness and Zero-Knowledge, guarantee security against malicious prover and verifier. Completeness mandates that honestly generated proofs always satisfy the verification test. Soundness states that the prover should not be able to come up with a query, and corresponding inconsistent (with the initial commitment) answers and convincing proofs. Finally, zero-knowledge ensures that each proof reveals the answer and nothing else.

Definition 2

(Completeness). For every list \(\mathcal {L}\), every query \({\updelta }\) and every \(\mathsf {flag}\),

$$\begin{aligned} \Pr&[\mathsf {PK}\leftarrow \mathsf {Setup}(1^k); \\&(\mathsf {com},\mathsf {state}) \leftarrow \mathsf {P}_1(1^k, \mathsf {PK}, \mathcal {L});\\&(\mathsf {member}, \mathsf {proof}_M, \mathsf {order}, \mathsf {proof}_O) \leftarrow \mathsf {P}_2(\mathsf {PK},\mathsf {state},{\updelta },\mathsf {flag}):\\&\mathsf {Verifier}(1^k,\mathsf {PK},\mathsf {com},{\updelta },\mathsf {flag},\mathsf {member},\mathsf {proof}_M, \mathsf {order}, \mathsf {proof}_O) = \mathsf {ACCEPT}] = 1 \end{aligned}$$

Definition 3

(Soundness). For every PPT malicious prover algorithm, \(\mathsf {Adv}\), for every query \({\updelta }\) and for every \(\mathsf {flag}\) there exists a negligible function \({\upnu }(.)\) such that:

$$\begin{aligned} \Pr&[\mathsf {PK}\leftarrow \mathsf {Setup}(1^k); \\&(\mathsf {com}, \mathsf {member}^1, \mathsf {proof}_M^1, \mathsf {order}^1, \mathsf {proof}_O^1, \mathsf {member}^2, \\&\mathsf {proof}_M^2, \mathsf {order}^2, \mathsf {proof}_O^2) \leftarrow \mathsf {Adv}(1^k,\mathsf {PK}):\\&\mathsf {Verifier}(1^k,\mathsf {PK},\mathsf {com},{\updelta },\mathsf {flag}, \mathsf {member}^1, \mathsf {proof}_M^1, \mathsf {order}^1, \mathsf {proof}_O^1) = \mathsf {ACCEPT} {\wedge }\\&\mathsf {Verifier}(1^k,\mathsf {PK},\mathsf {com},{\updelta },\mathsf {flag}, \mathsf {member}^2, \mathsf {proof}_M^2, \mathsf {order}^2, \mathsf {proof}_O^2) = \mathsf {ACCEPT} \wedge \\&((\mathsf {member}^1 \ne \mathsf {member}^2) \vee (\mathsf {order}^1 \ne \mathsf {order}^2)) ] \le {\upnu }(k) \end{aligned}$$

Definition 4

(Zero-Knowledge). There exists a PPT simulator \(\mathsf {Sim}= (\mathsf {Sim}_1, \mathsf {Sim}_2, \mathsf {Sim}_3)\) such that for every PPT malicious verifier \(\mathsf {Adv}= (\mathsf {Adv}_1, \mathsf {Adv}_2)\), there exists a negligible function \({\upnu }(.)\) such that:

$$\begin{aligned} |\Pr&[\mathsf {PK}\leftarrow \mathsf {Setup}(1^k);(\mathcal {L}, \mathsf {state}_A) \leftarrow \mathsf {Adv}_1(1^k,\mathsf {PK}); \\&(\mathsf {com}, \mathsf {state}_P) \leftarrow \mathsf {P}_1(1^k, \mathsf {PK}, \mathcal {L}):\\&\mathsf {Adv}_2^{\mathsf {P}_2(\mathsf {PK},\mathsf {state}_P,\cdot )}(\mathsf {com},\mathsf {state}_A) = 1] - \\ \Pr&[(\mathsf {PK},\mathsf {state}_S) \leftarrow \mathsf {Sim}_1(1^k);(\mathcal {L}, \mathsf {state}_A) \leftarrow \mathsf {Adv}_1(1^k, \mathsf {PK});\\&(\mathsf {com}, \mathsf {state}_S) \leftarrow \mathsf {Sim}_2(1^k,\mathsf {state}_S):\\&\mathsf {Adv}_2^{\mathsf {Sim}_3^{\mathcal {L}}(1^k,\mathsf {state}_S)}(\mathsf {com},\mathsf {state}_A) = 1] | \le {\upnu }(k), \end{aligned}$$

where \(\mathsf {Sim}_3\) has oracle access to \(\mathcal {L}\), that is, given a query \(({\updelta },\mathsf {flag})\), \(\mathsf {Sim}_3\) can query the list \(\mathcal {L}\) to learn only the membership/non-membership of elements in \({\updelta }\) and, if \(\mathsf {flag}=1\), learn the list order of the elements of \({\updelta }\) in \(\mathcal {L}\).

4.3 ZKL Construction

The construction uses zero knowledge set scheme, homomorphic integer commitment scheme, zero-knowledge protocol to prove non-negativity of an integer and a collision resistant hash function \(\mathbb {H}:\{0,1\}^*\rightarrow \{0,1\}^l\), if the elements of the list \(\mathcal {L}\) are larger that l bits. In particular, given an input list \(\mathcal {L}\) the prover \(\mathsf {P}_1\) creates a set D where for every element \(y_j\in \mathcal {L}\) it adds a (key,value) pair \((\mathbb {H}(y_j),C(j))\)\(\mathbb {H}(y_j)\) is a hash of \(y_j\) and C(j) is a homomorphic integer commitment of \(\mathsf {rank}(\mathcal {L},y_j)\) (assuming \(\mathsf {rank}(\mathcal {L},y_j)=j\), wlog). \(\mathsf {P}_1\) sets up a zero knowledge set on D using \(\mathsf {ZKS}\mathsf {P}_1\) from a zero-knowledge set scheme \(\mathsf {ZKS}= (\mathsf {ZKS}\mathsf {Setup},\mathsf {ZKS}\mathsf {Prover}= (\mathsf {ZKS}\mathsf {P}_1, \mathsf {ZKS}\mathsf {P}_2),\mathsf {ZKS}\mathsf {Verifier})\) [12]. The output of \(\mathsf {ZKS}\mathsf {P}_1\) is a commitment to D, \(\mathsf {com}\), that \(\mathsf {P}_1\) sends to the verifier.

\(\mathsf {P}_2\) operates as follows. Membership and non-membership queries of the form \(({\updelta }, 0)\) are replied in the same fashion as in zero knowledge set, by invoking \(\mathsf {ZKS}\mathsf {P}_2\) on the hash of every element of sublist \({\updelta }\). Recall that as a response to a membership query for a key, \(\mathsf {ZKS}\mathsf {P}_2\) returns the value corresponding to this key. In our case, the queried key is \(\mathbb {H}(y_j)\) and the value returned by \(\mathsf {ZKS}\mathsf {P}_2, D(\mathbb {H}(y_j))\), is the commitment C(j) where j is the rank of element \(y_j\) in the list \(\mathcal {L}\), if \(y_j \in \mathcal {L}\). If \(y_j \notin \mathcal {L}\), the value returned is \(\bot \). Hence, the verifier receives the commitments to ranks for queried member elements. These commitments are never opened but are used as part of order proofs.

For a given order query \(({\updelta },1)\), for every adjacent pair of elements in the returned order, \(\mathsf {order}\)\(\mathsf {P}_2\) gives a proof of order. Recall that \(\mathsf {order}\) contains the member elements of \({\updelta }\), arranged according to their order in the list \(\mathcal {L}\). \(\mathsf {P}_2\) proves the order between two elements \(y_i\) and \(y_j\) as follows. Let \(\mathsf {rank}(\mathcal {L},y_i) = i,\mathsf {rank}(\mathcal {L},y_j) = j\), and C(i), C(j) be the corresponding commitments and, wlog, let \(i < j\). As noted above, C(i) and C(j) are already returned by \(\mathsf {P}_2\) as part of the membership proof. Additionally, \(\mathsf {P}_2\) returns a commitment to 1, C(1), and its opening information \({\uprho }\). Note that, the verifier can compute C(1) himself, but then the prover needs C(1) computed by the verifier, to be able to generate proof for non-negativity of \(C(j-i-1)\). To avoid this interaction, we make the prover send C(1) and its opening.

The verification of the query answer proceeds as follows. \(\mathsf {Verifier}\) computes \(C(j-i-1) := C(j)/(C(i)C(1))\) using the homomorphic property of the integer commitment scheme. \(\mathsf {P}_2\) uses the zero knowledge protocol \(\mathsf {P}\leftrightarrow \mathsf {V}(x,r :c = C(x;r) \wedge x \ge 0)\) to convince \(\mathsf {Verifier}\) that \(C(j-i-1)\) is a commitment to value \(\ge 0\). Note that we use the non-interactive general zero-knowledge version of the protocol as discussed in Sect. 3. Hence, the query phase proceeds in a single round.

We note that we require \(\mathsf {Verifier}\) to verify that \(j-i-1 \ge 0\) and not \(j-i\ge 0\) since otherwise a cheating prover \(\mathsf {Adv}\) can do the following: store the same arbitrary non-negative integer as a rank for every element in the list, hence, \(C(j-i)\) and \(C(i-j)\) are commitments to 0, and \(\mathsf {Adv}\) can always succeed in proving an arbitrary order. However, an honest prover can always prove the non-negativity of \(C(j-i-1)\) as \(|j-i|\ge 1\) for any rank ij of the list.

Also, we note that the commitments to ranks can be replaced by commitments to a strictly monotonic sequence as long as there is a 1:1 correspondence with the rank sequence. In this case, the distance between two elements will also be positive and, hence, the above protocol still holds.

Theorem 1

The zero-knowledge list (ZKL) construction of Sect. 4.3 is a non-interactive two-party protocol that satisfies the security properties of completeness (Definition 2), soundness (Definition 3) and zero-knowledge (Definition 4) in the random oracle model (inherited from NIZK). The construction has the following performance, where n is the list size, m is the query size, each element of the list is a k-bit (if not, we can use a hash function to reduce every element to a k-bit string, as shown in the construction).

  • The prover executes the commitment phase in O(nk) time and space, where the multiplicative factor k is inherited from the height of the tree.

  • In the query phase, the prover computes the proof of the answer in O(mk) time.

  • The verifier verifies the proof in O(mk) time and space.

The soundness of the ZKL scheme follows from the soundness of the ZKS scheme, the binding property of the commitment scheme, and the correctness of protocol \(\mathsf {P}\leftrightarrow \mathsf {V}(x,r :c = C(x;r) \wedge x \ge 0)\) (see Sect. 3.2). For the zero-knowledge property, we write a simulator that uses the ZKS simulator and the trapdoor of the commitment scheme to equivocate commitments. The formal proof of Theorem 1 is omitted due to space restrictions and is presented in [17].

5 Privacy Preserving Authenticated List (PPAL)

In the previous section we presented a model and a construction for a new primitive called zero knowledge lists. As we noticed earlier, ZKL model gives the desired functionality to verify order queries on lists. However, the corresponding construction does not provide the efficiency one may desire in cloud computing setting where the verifier (client) has limited memory resources as we discuss in Sect. 5.3. In this section we address this setting and define a model for privacy preserving authenticated lists, PPAL, that is executed between three parties. This model, arguably, fits cloud scenario better and, as we will see, our construction is also more efficient.

5.1 Model

PPAL is a tuple of three probabilistic polynomial time algorithms \((\mathsf {Setup},\mathsf {Query},\mathsf {Verify})\) executed between the owner of the data list \(\mathcal {L}\), the server who stores \(\mathcal {L}\) and answers queries from the client and the client who issues queries on the elements of the list and verifies corresponding answers. We note that this model assumes that the query is on the member elements of the list, i.e., for any query, \({\updelta }\), \(\mathsf {Elements}({\updelta }) \subseteq \mathsf {Elements}(\mathcal {L})\). In other words, this model does not support proofs of non-membership, similar to other data structures that support only positive membership proofs, e.g., [6, 8, 9, 11, 23, 24, 33].

  • \((\mathsf {digest}_C, \mathsf {digest}_S) \leftarrow \mathsf {Setup}(1^k, \mathcal {L})\) This algorithm takes the security parameter and the source list \(\mathcal {L}\) as input and produces two digests \(\mathsf {digest}_C\) and \(\mathsf {digest}_S\) for the list. This algorithm is run by the owner. \(\mathsf {digest}_C\) is sent to the client and \(\mathsf {digest}_S\) is sent to the server.

  • \( (\mathsf {order}, \mathsf {proof}) \leftarrow \mathsf {Query}(\mathsf {digest}_S, \mathcal {L}, {\updelta })\) This algorithm takes the server digest generated by the owner, \(\mathsf {digest}_S\), the source list, \(\mathcal {L}\), and a queried sublist, \({\updelta }\), as input, where a sublist of a list \(\mathcal {L}\) is defined as: \(\mathsf {Elements}\)(\({\updelta }\)) \(\subseteq \) \(\mathsf {Elements}\)(\(\mathcal {L}\)). The algorithm produces the list order of the elements of \(\mathcal {L}\), \(\mathsf {order}= {{\uppi }}_{\mathcal {L}}({{\updelta }})\), and a proof, \(\mathsf {proof}\), of the answer. This algorithm is run by the server. Wlog, we assume \(|{\updelta }|>1\). In the trivial case of \(|{\updelta }|= 1\), the server returns an empty proof, i.e., \( (\mathsf {order}= {\updelta }, \mathsf {proof}= \bot )\).

  • \(b \leftarrow \mathsf {Verify}(\mathsf {digest}_C, {\updelta }, \mathsf {order}, \mathsf {proof})\) This algorithm takes \(\mathsf {digest}_C\), a queried sublist \({\updelta }\), \(\mathsf {order}\) and \(\mathsf {proof}\) and returns a bit b, where \(b= \mathsf {ACCEPT}\) iff \(\mathsf {Elements}\)(\({\updelta }\)) \(\subseteq \) \(\mathsf {Elements}\)(\(\mathcal {L}\)) and \(\mathsf {order}= {{\uppi }}_{\mathcal {L}}({{\updelta }})\). Otherwise, \(b= \mathsf {REJECT}\). This algorithm is run by the client.

5.2 Security Properties

A PPAL has three important security properties. Recall that the properties of PPAL, Completeness, Soundness and Zero-Knowledge, guarantee security against malicious server and client. They are close to the ones of ZKL except for soundness. For PPAL it enforces that the client does not accept proofs forged by the server for incorrect answers w.r.t. owner’s list. We describe each security definition formally below.

The first property is Completeness. This property ensures that for any list \(\mathcal {L}\) and for any sublist \({\updelta }\) of \(\mathcal {L}\), if \(\mathsf {digest}_C,\mathsf {digest}_S, \mathsf {order},\mathsf {proof}\) are generated honestly, i.e., the owner and the server honestly execute the protocol, then the client will be always convinced about the correct list order of \({\updelta }\).

Definition 5

(Completeness). For all lists \(\mathcal {L}\) and all sublists \({\updelta }\) of \(\mathcal {L}\)

$$\begin{aligned} \Pr [(\mathsf {digest}_C, \mathsf {digest}_S) \leftarrow \mathsf {Setup}(1^k, \mathcal {L});(\mathsf {order}, \mathsf {proof}) \leftarrow \mathsf {Query}(\mathsf {digest}_S,\mathcal {L}, {\updelta }): \\ \mathsf {Verify}(\mathsf {digest}_C, {\updelta }, \mathsf {order}, \mathsf {proof}) = \mathsf {ACCEPT} \wedge \mathsf {order}= {\uppi }_\mathcal {L}({\updelta })] =1 \end{aligned}$$

The second security property is Soundness. This property ensures that once an honest owner generates a pair \((\mathsf {digest}_C, \mathsf {digest}_S)\) for a list \(\mathcal {L}\), even a malicious server will not be able to convince the client of an incorrect order of elements belonging to the list \(\mathcal {L}\). This property ensures integrity of the scheme.

Definition 6

(Soundness). For all PPT malicious query algorithms \(\mathsf {Adv}\), for all lists \(\mathcal {L}\) and all query sublists \({\updelta }\) of \(\mathcal {L}\), there exists a negligible function \({\upnu }(.)\) such that:

$$\begin{aligned} \Pr [(\mathsf {digest}_C, \mathsf {digest}_S) \leftarrow \mathsf {Setup}(1^k, \mathcal {L});(\mathsf {order}, \mathsf {proof})&\leftarrow \mathsf {Adv}(\mathsf {digest}_S,\mathcal {L}): \\ \mathsf {Verify}(\mathsf {digest}_C, {\updelta }, \mathsf {order}, \mathsf {proof}) = \mathsf {ACCEPT}&\wedge \mathsf {order}\ne {\uppi }_\mathcal {L}({\updelta }) ] \le {\upnu }(k) \end{aligned}$$

The last property is Zero-Knowledge. This property captures that even a malicious client cannot learn anything about the list (and its size) beyond what the client has queried for. Informally, this property involves showing that there exists a simulator such that even for adversarially chosen list \(\mathcal {L}\), no adversarial client (verifier) can tell if it is talking to a honest owner and honest server who know \(\mathcal {L}\) and answer w.r.t. \(\mathcal {L}\), or to the simulator that only has oracle access to the list \(\mathcal {L}\).

Definition 7

(Zero-Knowledge). There exists a PPT simulator \(\mathsf {Sim}= (\mathsf {Sim}_1, \mathsf {Sim}_2)\) such that for all PPT malicious verifiers \(\mathsf {Adv}= (\mathsf {Adv}_1, \mathsf {Adv}_2)\), there exists a negligible function \({\upnu }(.)\) such that:

$$\begin{aligned} | \Pr [(\mathcal {L}, \mathsf {state}_A) \leftarrow \mathsf {Adv}_1(1^k);&(\mathsf {digest}_C, \mathsf {digest}_S) \leftarrow \mathsf {Setup}(1^k, \mathcal {L}):\\&\mathsf {Adv}_2^{\mathsf {Query}(\mathsf {digest}_S,\mathcal {L},.)}(\mathsf {digest}_C,\mathsf {state}_A) = 1] - \\ \Pr [(\mathcal {L}, \mathsf {state}_A) \leftarrow \mathsf {Adv}_1(1^k);&(\mathsf {digest}_C, \mathsf {state}_S) \leftarrow \mathsf {Sim}_1(1^k):\\&\mathsf {Adv}_2^{\mathsf {Sim}_2^{\mathcal {L}}(1^k,\mathsf {state}_S)}(\mathsf {digest}_C,\mathsf {state}_A) = 1] | \le {\upnu }(k) \end{aligned}$$

Here \(\mathsf {Sim}_2\) has oracle access to \(\mathcal {L}\), that is given a sublist \({\updelta }\) of \(\mathcal {L}\), \(\mathsf {Sim}_2\) can query the list \(\mathcal {L}\) to learn only the correct list order of the sublist \({\updelta }\) and cannot look at \(\mathcal {L}\).

5.3 Construction of PPAL via ZKL

We show how a PPAL can be instantiated via a ZKL in Theorems 2 and 3 and then discuss that the resulting construction does not yield the desired efficiency.

Theorem 2

Given a non-interactive ZKL scheme \(\mathsf {ZKL} = (\mathsf {Setup},\mathsf {Prover}= (\mathsf {P}_1, \mathsf {P}_2),\mathsf {Verifier})\), which supports queries of the form \(({\updelta },\mathsf {flag})\) on a list \(\mathcal {L}\), we can instantiate a PPAL scheme for the list \(\mathcal {L}\), \(\mathsf {PPAL} = (\mathsf {Setup},\mathsf {Query},\mathsf {Verify})\), which supports queries of the form \({\updelta }\), where \({\updelta }\) is a sublist of \(\mathcal {L}\), as follows:

  • \(\mathsf {PPAL}.\mathsf {Setup}(1^k, \mathcal {L})\): Invoke \(\mathsf {PK}\leftarrow \mathsf {ZKL}.\mathsf {Setup}(1^k)\) and \((\mathsf {com}, \mathsf {state}) \leftarrow \mathsf {ZKL}.\mathsf {P}_1(1^k, \mathsf {PK}, \mathcal {L})\). Return \((\mathsf {digest}_C=(\mathsf {PK},\mathsf {com}),\) \(\mathsf {digest}_S=(\mathsf {PK},\mathsf {com},\mathsf {state}))\).

  • \(\mathsf {PPAL}.\mathsf {Query}(\mathsf {digest}_S, \mathcal {L}, {\updelta })\): Invoke \((\mathsf {member},\mathsf {proof}_M, \mathsf {order}, \mathsf {proof}_O) \leftarrow \mathsf {ZKL}.\mathsf {P}_2(\mathsf {PK},\mathsf {state},{\updelta },1)\). Return \((\mathsf {order}, \mathsf {proof}=(\mathsf {proof}_M, \mathsf {proof}_O))\).

  • \(\mathsf {PPAL}.\mathsf {Verify}(\mathsf {digest}_C, {\updelta }, \mathsf {order}, \mathsf {proof}_M,\mathsf {proof}_O)\): Set \(\mathsf {member}= \{1,1,\ldots ,1\}\) such that \(|\mathsf {member}| = |{\updelta }|= |\mathsf {order}|\). Return bit b where \(b \leftarrow \mathsf {ZKL}.\mathsf {Verifier}(1^k,\mathsf {PK},\mathsf {com},{\updelta },1\), \(\mathsf {member}, \mathsf {proof}_M, \mathsf {order}, \mathsf {proof}_O)\).

Theorem 3

A PPAL scheme instantiated using a ZKL scheme, \(\mathsf {ZKL} = (\mathsf {Setup},\mathsf {Prover}= (\mathsf {P}_1, \mathsf {P}_2),\) \(\mathsf {Verifier})\) has the following performance:

  • The owner’s runtime and space are proportional to the runtime and space of \(\mathsf {ZKL}.\mathsf {Setup}\) and \(\mathsf {ZKL}.\mathsf {P}_1\), respectively.

  • The server’s runtime and space are proportional to the runtime and space of \(\mathsf {ZKL}.\mathsf {P}_2\).

  • The client’s runtime and space are proportional to the runtime and space of \(\mathsf {ZKL}.\mathsf {Verifier}\).

The correctness of Theorems 2 and 3 follow from the definition of PPAL and ZKL models. In a PPAL instantiated with the ZKL construction of Sect. 4, the owner runs in time and space O(kn) and the server requires space O(kn), where n is the list size and each element of the list is k-bits long. To answer a query of size m, the server runs in time O(km) and the verification time of the client is O(km).

As we see, this generic construction is not very efficient due to the multiplicative factor O(k) and heavy cryptographic primitives. In Sect. 6, we present a direct PPAL construction which is a factor of O(k) more efficient in space and computation requirements as compared to an adaptation of our ZKL construction from Sect. 4.

6 PPAL Construction

We start by presenting the intuition behind our construction of a privacy preserving authenticated list (PPAL). Next, we give more details on the algorithms and analyze the security and efficiency of the construction.

Intuition: Every element of the list is associated with a member witness where a member witness is a blinded component of the bilinear accumulator public key. This allows us to encode the rank of the element in the member witness and then “blind” rank information with randomness. Every pair of (element, member witness) is signed by the owner and the signatures are aggregated using bilinear aggregate signature scheme [4], to compute the list digest signature. Signatures and digest are sent to the server, who can use them to prove authenticity when answering client queries. The owner also sends the list digest signature and the public key of the bilinear aggregate signature scheme to the client. The advantage of using an aggregate signature is for the server to be able to compute a valid digest signature for any sublist of the source list by exploiting the homomorphic nature of aggregate signatures, that is without owner’s involvement. Moreover, the client can verify the individual signatures in a single invocation to aggregate signature verification.

The owner also sends to the server linear (in the list size) number of random elements used in the encoding of member witnesses. These random elements allow the server to compute the order witnesses on queried elements, without the owner’s involvement. The order witness encodes the distance between two elements, i.e., the difference between element ranks, without revealing anything about it. Together with member witnesses, the client can later use bilinear map to verify the order of the elements.

Construction: Our construction for \(\mathsf {PPAL}\) is presented in Fig. 1. We denote member witness for \(x_i \in \mathcal {L}\) as \(t_{{x_i}\in {\mathcal {L}}}\). For two elements \(x_i, x_j \in \) \(\mathcal {L}\), such that \(x_i < x_j\) in \(\mathcal {L}\), \(t_{{x_i}<{x_j}}\) is an order witness for the order between \(x_i\) and \(x_j\). The construction works as follows.

Fig. 1.
figure 1

Privacy-preserving authenticated list (PPAL) construction

In the \(\mathsf {Setup}\) phase, the owner generates secret key \((v,s)\) and public key \(g^v\), where \(v\) is used for signatures. The owner picks a distinct random element \(r_i\) from the group \(\mathbb {Z}_p^*\) for each element \(x_i\) in the list \(\mathcal {L}, i \in [1,n]\). The element \(r_i\) is used to compute the member witness \(t_{x_i \in \mathcal {L}}\). Later in the protocol, together with \(r_j\), it is also used by the server to compute the order witness \(t_{x_i < x_j}\). The owner also computes individual signatures, \({\upsigma }_i\)’s, for each element and aggregates them into a digest signature \({\upsigma }_\mathcal {L}\) for the list. It returns the signatures and member witnesses for every element of \(\mathcal {L}\) in \(\Sigma _\mathcal {L}\) and the set of random numbers picked for each index to be used in order witnesses in \(\Omega _\mathcal {L}\). The owner sends \(\mathsf {digest}_C= (g^v, {\upsigma }_\mathcal {L})\) to the client and \(\mathsf {digest}_S= (g^v, {\upsigma }_\mathcal {L}, \langle g, g^s, g^{s^2}, \ldots , g^{s^{n}} \rangle , \Sigma _\mathcal {L}, \Omega _\mathcal {L})\) and \(\mathcal {L}\) to the server.

Given a query \({\updelta }\), the server returns a response list \(\mathsf {order}\) that contains elements of \({\updelta }\) in the order they appear in \(\mathcal {L}\). The server uses information in \(\Sigma _\mathcal {L}\) to compute the digest signature for the sublist, \({\upsigma }_\mathsf {order}\), and its membership verification unit \({\uplambda }_{\mathcal {L}'}\) which are part of the \(\Sigma _\mathsf {order}\) component of the proof. To compute the \(\Omega _\mathsf {order}\) component of the proof, the server uses corresponding blinding values in \(\Omega _\mathcal {L}\) and elements \(g^{s^d}\) where d’s correspond to distances between ranks of queried elements.

The client first checks that all the returned elements are signed by the owner using \(\Sigma _\mathsf {order}\) and then verifies the order of the returned elements using \(\Omega _\mathsf {order}\). Hence, the client uses bilinear map for two purposes: first for member verification and then to verify the order. The query phase has a single round of communication between client and server.

We now describe the preprocessing step at the server that reduces the query time for a query of size m on a list of size n from O(n) to \(O(\min \lbrace m\log n,n \rbrace )\). Let \(\uppsi _i = \mathcal {H}(t_{x_i \in \mathcal {L}}||x_i)\) for \(x_i \in \mathcal {L}\). The server computes and stores a balanced binary tree over n leaves, where the ith leaf corresponds to \(x_i\) and stores \(\uppsi _i\). Each internal node of the tree stores the product of the values at its children. When answering a query of size m, the server can compute \( {\uplambda }_{\mathcal {L}'}\) by using partial products that correspond to intervals between elements in the query. There are \(m+1\) such partial products. Since each partial product can be computed using \(O(\log n)\) precomputed products in the tree, it takes \(O(m \log n)\) time to compute the product of \(m+1\) of them. The server takes O(n) for preprocessing and the query time is reduced to \(O(\min \lbrace m\log n,n \rbrace )\).

We summarize the properties and efficiency of our PPAL construction in Theorem 4.

Theorem 4

The privacy-preserving authenticated list (PPAL) construction of Fig. 1 satisfies the security properties of completeness (Definition 5), soundness (Definition 6) and zero-knowledge (Definition 7) in the random oracle model (inherited from [4]) and under the n-BDHI assumption (Definition 1). Also, the construction has the following performance, where n denotes the list size and m denotes the query size.

  • The owner and the server use O(n) space.

  • The owner performs the setup phase in O(n) time and goes offline.

  • The server performs the preprocessing phase in O(n) time.

  • Query phase is a single-round protocol between the server and the client.

  • The server computes the answer to a query and its proof in \(O(\min \lbrace m\log n,n \rbrace )\) time.

  • The client verifies the proof in O(m) time and space.

The formal proof is omitted due to space restrictions and is available in [17]. Here we highlight the proof of soundness and zero knowledge. To prove soundness, we assume that there exists a malicious server \(\mathsf {Adv}\), which forges the order on a non-trivial sublist \({\updelta }= \lbrace x_1, \ldots , x_m \rbrace \), where \(m \ge 2\), for a list \(\mathcal {L}\). Then there exists at least one inversion pair \((x_i,x_j)\) whose order is flipped in \(\mathsf {Adv}\)’s forgery. Wlog assume that \(u < v\) where \(u=\mathsf {rank}(\mathcal {L},x_i)\) and \(v= \mathsf {rank}(\mathcal {L},x_j)\). Then \(\mathsf {Adv}\) must have forged the witness \(t_{x_j < x_i} = {(g^{s^{(u-v)}})}^{r_1 r_2^{-1}}\) that passes the verification, where \(r_1,r_2 \in \mathbb {Z}_p^*\) are the blinded components of elements \(x_i\) and \(x_j\), respectively. We show that by invoking \(\mathsf {Adv}\) and using its forged witness \(t_{x_j < x_i}\), we can construct a PPT adversary that successfully breaks the n-BDHI hardness assumption [3] by outputting \(e\biggl (t_{{x_j}<{x_i}}, (g^{s^{v-u-1}})^{{r_1}^{-1} r_2}\biggr ) = {e(g,g)}^{\frac{1}{s}}\), where \(g^{s^{v-u-1}}\) is part of the input to the n-BDHI problem.

For the zero knowledge property, we write a simulator that can produce witnesses identically distributed to real witnesses by giving it only oracle access to the list, and using the fact that our PPAL construction uses witnesses blinded in their exponents.