Keywords

1 Introduction

Multiparty computation (MPC) allows a set of parties to jointly compute a function over their inputs while keeping them private. In the last decade MPC has developed from a largely theoretical field to a practical one where many applications have been developed on top of it [DES16, GSB+17]. This is mostly due to the rise of compilers which translate high-level code to secure branching, additions and multiplications on secret data [BLW08, KSS13, DSZ15, ZE15].

A high number of applications require to evaluate an arithmetic circuit (over the integers or modulo p) due to the easiness of expressing them rather than performing bitwise operations in a binary circuit. This is especially true for linear programming of satellite collisions where fixed and floating point numbers are intensively used [DDN+16, KW15]. A recent line of work even looked at how to decrease the amount of storage needed throughout sequential computations from one MPC engine to another with symmetric key primitives evaluated as arithmetic circuits [GRR+16, RSS17].

To accomplish MPC one can select between two paradigms: garbled circuits [GLNP15, RR16, WRK17] or secret sharing [DGKN09, BDOZ11, DPSZ12]. We will concentrate on the latter because it is currently the most suitable to evaluate arithmetic circuits although there have been some recent theoretical improvements on garbling modulo p made by Ball et al. [BMR16]. Since our goal in this paper is to have secure computation within a system that scales with the number of parties as well as to provide a guarantee against malicious, players we will focus on SPDZ [DPSZ12, DKL+13].

It is no surprise that homomorphic encryption can help with multiparty computation. In the presence of malicious adversaries, however, there needs to be assurances that parties actually encrypt the information that they are supposed to Zero-knowledge proofs are the essential tool to achieve this, and there exist compilers to make passive protocols secure against an active adversary. However, these proofs are relatively expensive, and it is the aim of SPDZ to reduce this cost by using them as little as possible.

The core idea of SPDZ is that, instead of encrypting the parties’ inputs, it is easier to work with random data, conduct some checks at the end of the protocol, and abort if malicious behavior is detected. In order to evaluate a function with private inputs, the computation is separated in two phases, a preprocessing or offline phase and an online phase. The latter uses information-theoretic algorithms to compute the results from the inputs and the correlated randomness produced by the offline phase.

The correlated randomness consists of secret-shared random multiplication triples, that is (abab) for random a and b. In SPDZ, the parties encrypt random additive shares of a and b under a global public key, use the homomorphic properties to sum up and multiply the shares, and then run a distributed decryption protocol to learn their share of ab. With respect to malicious parties, there are two requirements on the encrypted shares of a and b. First, they need to be independent of other parties’ shares, otherwise the sum would not be random, and second, the ciphertexts have to be valid. In the context of lattice-based cryptography, this means that the noise must be limited. Both requirements are achieved by using zero-knowledge proofs of knowledge and bounds of the cleartext and encryption randomness. It turns out that this is the most expensive part of the protocol.

The original SPDZ protocol [DPSZ12] uses a relatively simple Schnorr-like protocol [CD09] to prove knowledge of cleartexts and correctness of ciphertexts, but the later implementation [DKL+13] uses more sophisticated cut-and-choose-style protocols for both covert and active security. We have found that the simpler Schnorr-like protocol, which guarantees security against active malicious parties, is actually more efficient than the cut-and-choose proof with covert security.

Intuitively, it suffices that the encryption of the sum of all shares has to be correct because only the sum is used in the protocol. We take advantage of this by replacing the per-party proof with a global proof in Sect. 4. This significantly reduces the computation because every party only has to check one proof instead of \(n-1\). However, the communication complexity stays the same because the independence requirement means that every party still has to commit to every other party in some sense. Otherwise, a rushing adversary could make its input dependent on others, resulting in a predictable triple.

Section 3 contains our largest theoretical contribution. We present a replacement for the offline phase of SPDZ based solely on the additive homomorphism of BGV. This allows to reduce the communication and computation compared to SPDZ because the ciphertext modulus can be smaller. At the core of our scheme is the two-party oblivious multiplication protocol by Bendlin et al. [BDOZ11], which is based on the multiplication of ciphertexts and constants. Unlike their work, we assume that the underlying cryptosystem achieves linear targeted malleability introduced by Bitansky et al. [BCI+13], which enables us to avoid the costliest part of their protocol, the proof of correct multiplication. Instead, we replace this check by the SPDZ sacrifice, and argue that BGV with increased entropy in the secret key is a candidate for the above-mentioned assumption.

We do not consider the restriction to BGV to be a loss. Bendlin et al. suggest two flavors for the underlying cryptosystem: lattice-based and Paillier-like. For lattice-based cryptosystems, Costache and Smart [CS16] have shown that BGV is very competitive for large enough cleartext moduli such as needed by our protocol. On the other hand, Paillier only supports simple packing techniques and makes it difficult to manipulate individual slots [NWI+13]. Another advantage of BGV over Paillier is the heavy parallelization with CRT and FFT since in the lattice-based cryptosystem the ciphertext modulus can be a product of several primes.

2 Preliminaries

In the following section we define the basic notation and give an overview of the BGV encryption scheme and the SPDZ protocol.

2.1 Security Model

We use the UC (Universally Composable) framework of Canetti [Can01] to prove the security of our schemes against malicious, static adversaries, except for proofs of knowledge where we use rewinding to extract inputs from the adversary. Previous works [BDOZ11, DPSZ12] do this by having all inputs encrypted under a public key for which the secret key is known to the simulator in the registered key model. In our Low Gear protocol, this would involve sending extra ciphertexts not used in the protocol otherwise, which is why we opt for limited UC security.

Our protocols work with n parties \(\mathcal {P}= \{P_1,\dots , P_n\}\) where up to \(n-1\) corruptions can take place before the protocol starts. We say that a protocol \(\mathrm {\varPi }\) implements securely a functionality \(\mathcal {F}\) if any probabilistic polynomial time adversary \(\mathsf {Adv}\) cannot distinguish between a protocol \(\mathrm {\varPi }\) and a functionality \(\mathcal {F}\) attached to a simulator \(\mathcal {S}\) with computational security k and statistical security \(\mathsf {sec}\).

We require the functionality \(\mathcal {F}_\mathsf {Rand}\) to generate public randomness. Whenever the functionality is activated by all parties, it outputs a uniformly random value \(r {\mathop {\leftarrow }\limits ^{\small {\$}}}\mathbb {F}\) to all parties. \(\mathcal {F}_\mathsf {Rand}\) can be implemented using commitments of random values, which are then added. In our experiments, we will use simple commitments based on the random oracle model.

2.2 BGV

We now give a short overview of the leveled encryption scheme developed by Brakerski et al. [BGV12] required for our pre-processing phase. Since the protocols used for generating the triples need only multiplication by scalars or ciphertext addition, the BGV scheme is instantiated with a single level. For completion we present the details required to understand our paper. The reader can consult the following papers for further details: [LPR10, BV11, GHS12a, LPR13].

Underlying Algebra. Let \(R = \mathbb {Z}[X] / \langle f(x) \rangle \) be a polynomial ring with integer coefficients modulo f(x). In our case \(R = \mathbb {Z}[x] / \langle \mathrm {\Phi }_m(X) \rangle \) where \(\mathrm {\Phi }_{m}(X) = \prod _{i\in Z_m^{*}}(X-\omega _{m}^i) \in \mathbb {Z}[X]\) and \(\omega _m = \exp (2\pi /m) \in \mathbb {C}\) is a principal m’th complex root of unity and \(\omega _m^i = \exp (2\pi \sqrt{-1} / m) \in \mathbb {C}\) iterates over all primitive complex mth roots of unity.

The ring R is also called the ring of algebraic integers of the m’th cyclotomic polynomial. For example when \(m \ge 2\) and m is a power of two, the polynomial \(\mathrm {\Phi }_m(X) = X^{m/2} + 1\). Notice that the degree of \(\mathrm {\Phi }_m(X)\) is equal to \(\phi (m)\), which makes R a field extension with degree \(N = \phi (m)\). Next we define \(R_q = R / qR \cong R/ \langle (\mathrm {\Phi }_m(X), q) \rangle \) where q is not necessarily a prime number. The latter will be used as the ciphertext modulus.

Plaintext Slots. Since triples are generated for arithmetic circuits modulo p, the plaintext space is the ring \(R_p = R / pR\) where for technical reasons p and q are co-prime. If \(p \equiv 1 \mod m\) we have that \(\mathrm {\Phi }_m(X) = F_1(X) \cdots F_l(X) \mod p\) splits into l irreducible polynomials where each \(F_i(X)\) has degree \(d = \phi (m) / l\) and \(F_i(X) \cong \mathbb {F}_p^d\). It is useful to think of an element \(a \in R_p\) as a vector of size l where each element is \({(a \mod F_i(X))}_{i=1}^{l}\). This in turn allows manipulating l plaintexts at once using SIMD (Single Instruction Multiple Data) operations.

Distributions. Throughout the definitions we will refer to a polynomial \(a \in R\) as a vector of size \(N = \phi (m)\). To realize the cryptosystem we need to sample at various times from different distributions to generate a vector of length N with coefficients mod p or q (which means an element from \(R_p\) or \(R_q\)). We will keep \(R_q\) throughout the following definitions:

  • \(\mathcal {U}(R_q)\) is the uniform distribution where each unique polynomial \(a \in R_q\) has an equal chance to be drawn. This is achieved by sampling each coefficient of a uniformly at random (from the integers modulo q).

  • \(\mathcal {DG}(\sigma ^2, R_q)\) is the discrete Gaussian with variance \(\sigma ^2\). Sampling proceeds as above except each coefficient \(a \in R_q\) is generated by calling the normal Gaussian \(\mathcal {N}(\sigma ^2)\) and rounding it to the nearest integer.

  • \(\mathcal {ZO}(0.5)\) outputs a vector of length N where each entry has values in the set \(\{-1, 0, 1\}\). Here, zero appears with a probability 1/2 whereas \(\{-1, 1\}\) each appear with probability 1/4.

  • \(\mathcal {HWT}(h)\) outputs a random vector of length N where at least h entries are non-zero and each entry is in the \(\{-1, 0, 1\}\) set.

Ring-LWE. Hardness of the BGV scheme is based on the Ring version of the Learning with Errors problem [LPR10]. For a secret \(s \in R_p\), recall that a Ring-LWE sample is produced by choosing \(a \in R_q\) uniformly at random and an error \(e \leftarrow \chi \) from a special Gaussian distribution, and computing \(b = a \cdot s + e\). It turns out that, if an adversary manages to break the BGV encryption scheme in polynomial time, one can also build a polynomial time distinguisher for Ring-LWE samples and the uniform distribution, namely \((a, b=a \cdot s + e) \cong (a', b')\) where \((a', b') {\mathop {\leftarrow }\limits ^{\small {\$}}}\mathcal {U}(R_q^2)\).

Key-Generation, Encryption and Decryption. The cryptosystem used in Sect. 3 is identical to the one by Damgård et al. [DKL+13] bar the augmentation data needed for modulus switching:

  • \(\mathsf {KeyGen}()\): Sample \(s \leftarrow \mathcal {HWT}(h)\), \(a \leftarrow \mathcal {U}(R_q)\), \( e \leftarrow \mathcal {DG}(\sigma ^2, R_q)\) and then \(b \leftarrow a \cdot s + p \cdot e\). Now set the public key \(\mathsf {pk}\leftarrow (a,b)\). Note that \(\mathsf {pk}\) looks very similar to a Ring-LWE sample.

  • \(\mathsf {Enc}_\mathsf {pk}(m)\): To encrypt a message \(m \in R_p\), sample a small polynomial with coefficients \(v \leftarrow \mathcal {ZO}(0.5)\), and two Gaussian polynomials \(e_1, e_2 \leftarrow \mathcal {DG}(\sigma ^2, R_q)\). The ciphertext will be a pair \(c = (c_0, c_1)\) where \(c_0 = b \cdot v + p \cdot e_0 + m \in R_q\) and \(c_1 = a \cdot v + p \cdot e_1 \in R_q\).

  • \(\mathsf {Dec}_\mathsf {sk}(c)\): To decrypt a ciphertext \(c \in R_q^2\), one can simply compute \(m' \leftarrow c_0 - s \cdot c_1 \in R_q\) and then set \(m \leftarrow m' \bmod p\) to get the original plaintext. The decryption works only if the noise \(\nu = (m' \bmod p) - m\) associated with c is less than q/2 such that the ciphertext will not wrap around the modulus \(R_q\).

2.3 Zero-Knowledge Proofs

In a typical scenario, a zero-knowledge (ZK) proof allows a verifier to check the validity of a statement claimed by a prover without revealing anything other that the claim is true. Previous implementations have used one of two approaches: a Schnorr-like protocol [CD09, DPSZ12, DKL+12] and cut-and-choose [DKL+13]. We will call SPDZ using either of the two protocols SPDZ-1 and SPDZ-2, respectively. Analysing the communication complexity, we found that the Schnorr-like protocol is more efficient because it only involves sending two extra ciphertexts per ciphertext to be proven whereas Damgård et al. [DKL+13] suggest that, for malicious security, their protocol is most efficient with 32 extra ciphertexts. It is also worth noting that the Schnorr-like protocol seems to be easier to implement.

The Schnorr-like protocol is based on the following 3-move standard \(\varSigma \)-protocol. To prove knowledge of x in a field \(\mathbb {F}\) such that \(f(x)=y\) without revealing x:

  1. 1.

    The prover \(\mathcal {P}\) sends a commitment \(a=f(s)\) for a random s.

  2. 2.

    The verifier \(\mathcal {V}\) then samples a random \(e {\mathop {\leftarrow }\limits ^{\small {\$}}}\mathbb {F}\) and sends it to \(\mathcal {P}\).

  3. 3.

    \(\mathcal {P}\) replies with \(z = s + e \cdot x\). Finally \(\mathcal {V}\) checks whether \(f(z) = a + e \cdot y\).

If f is homomorphic with respect to the field operations, the protocol is clearly correct. Security of the prover (honest-verifier zero-knowledge) is achieved by simulating (aez) from any e by sampling \(z {\mathop {\leftarrow }\limits ^{\small {\$}}}\mathbb {F}\) and computing \(a = f(z) - e \cdot y\). Security for the verifier (special soundness) allows to extract the secret from two different transcripts (zc), \((z',c')\) with \(c\ne c'\). This can be done by computing \(x = (z - z') \cdot (c - c')^{-1}\), which is possible in a field.

For our setting x is an integer (or a vector thereof), and we would like to prove that \(\Vert {x}\Vert _\infty \le B\) for some bound. For this case, Damgård and Cramer [CD09] have presented an amortized protocol (proving several pre-images at once) where s has to be chosen in a large enough interval (to statistically hide \(E \cdot x\)) and the challenge E is sampled from a set of matrices such that any \((E-E')\) is invertible over \(\mathbb {Z}\) for any \(E \ne E'\). The preimage is now extracted as \(x = (E - E')^{-1}(z - z')\), thus a bound on \(\Vert {z}\Vert _\infty \) also implies a bound on \(\Vert {x}\Vert _\infty \).

However, it is not possible to make these bounds tight. Namely, an honest prover using \(\Vert {x}\Vert _\infty \le B\) will achieve that \(\Vert {z}\Vert _\infty \le B'\) for some \(B' > B\). The quotient between the two bounds is called slack. Damgård et al. [DPSZ12] also show that in the Fiat-Shamir setting (where the challenge is generated using a random oracle on a), a technique called rejection sampling can be used to reduce the slack. This involves sampling different s until the response z achieves the desired bound. In any case, we will see in Sect. 3.4 that the slack of this proof is too small to make it worthwhile using the cut-and-choose proof instead.

Figure 1 shows the functionality that the proofs above implement. For a simplified exposition we also assume that \(\mathcal {F}_\mathsf {ZKPoK}^S\) generates correct keys. In previous works this has been done by separate key registration [BDOZ11] or key generation functionalities [DPSZ12, DKL+13].

2.4 Overview of SPDZ

The SPDZ protocol [DPSZ12, DKL+13] can be viewed as a two-phase protocol where inputs are shared via an additive secret sharing scheme. First there is the pre-processing phase where triples are generated independent of the inputs to the computation. The classical way to produce these triples is either by oblivious transfer or homomorphic encryption. Each has its own advantages and caveats. In this work, we are only concerned with the homomorphic encryption technique, where ciphertexts are passed around players. Since we allow parties to deviate maliciously from the protocol, they could insert too much noise in the encryption algorithm, which we mitigate by using ZK proofs.

Fig. 1.
figure 1

Proof of knowledge of ciphertext

These random triples are further used in the online phase where parties interact by broadcasting data whenever a value is revealed. Privacy and correctness are then guaranteed by authenticated shared values with information-theoretic MACsFootnote 1 on top of them.

More formally, an authenticated secret value \(x \in \mathbb {F}\) is defined as the following:

$$\begin{aligned} \llbracket {x} \rrbracket = (x^{(1)}, \dots , x^{(n)}, m^{(1)}, \dots , m^{(n)}, \varDelta ^{(1)}, \dots , \varDelta ^{(n)}) \end{aligned}$$

where each player \(P_i\) holds an additive sharing tuple \((x^{(i)}, m^{(i)}, \varDelta ^{(i)})\) such that:

$$\begin{aligned} x = \sum _{i=1}^{n} x^{(i)}, x\cdot \varDelta = \sum _{i=1}^n m^{(i)}, \varDelta = \sum _{i=1}^{n} \varDelta ^{(i)}. \end{aligned}$$

For the pre-processing phase the goal is to model a \(\mathsf {Triple}\) command which generates a tuple \((\llbracket {a} \rrbracket , \llbracket {b} \rrbracket , \llbracket {c} \rrbracket )\) where \(c=a\cdot b\) and ab are uniformly random from \(\mathbb {F}\).

To open a value \(\llbracket {x} \rrbracket \), all players \(P_i\) broadcast their shares \(x^{(i)}\), commit and then open \(m^{(i)} - x\cdot \varDelta ^{(i)}\). Afterwards they check if the sum of the latter is equal to zero. One can check multiple values at once by taking a random linear combination of \(m^{(i)} - x \cdot \varDelta ^{(i)}\) exactly as in the MAC Check protocol in Fig. 5 in Sect. 3.

In the online phase the main task is to evaluate an arbitrary circuit with secret inputs. After the parties have provided their inputs using the \(\mathsf {Input}\) command, the next step is to perform addition and multiplication between authenticated shared values. Since the addition is linear, it can be done via local computation. However multiplying two values \(\llbracket {x} \rrbracket , \llbracket {y} \rrbracket \) requires some interaction between the parties. To compute \(\llbracket {x \cdot y} \rrbracket \) a fresh random triple \(\llbracket {a} \rrbracket , \llbracket {b} \rrbracket , \llbracket {c} \rrbracket = \llbracket {ab} \rrbracket \) has to be available for Beaver’s trick [Bea92]. It works by opening \(\llbracket {x - a} \rrbracket \) and \(\llbracket {x - b} \rrbracket \) to get \(\epsilon \) and \(\rho \) respectively. Then the authenticated product can be obtained by setting \(\llbracket {x\cdot y} \rrbracket \leftarrow \llbracket {c} \rrbracket + \epsilon \llbracket {b} \rrbracket + \rho \llbracket {a} \rrbracket + \epsilon \cdot \rho \).

Offline Phase. We now outline the core ideas of the preprocessing phase of SPDZ. Assume that the parties have a global public key and a secret sharing of the secret key \(\varDelta \), and that there is a distributed decryption protocol that allows the parties to decrypt an encryption such that they receive a secret sharing of the cleartext (see the Reshare procedure by Damgård et al. [DPSZ12] for details).

For passive security only, the parties can simply broadcast encryptions of randomly sampled shares \(a_i,b_i\) and their share of the MAC key \(\varDelta _i\). These encryptions can be added up and multiplied to produce encryptions of \((a \cdot b, a \cdot \varDelta , b \cdot \varDelta , a \cdot b \cdot \varDelta )\) if the encryption allows multiplicative depth two. Distributed decryption then allows the parties to receive an additive secret sharing of each of those values, which already is enough for a triple. Since achieving a higher multiplicative depth is relatively expensive, SPDZ only uses a scheme with multiplicative depth one and extends the distributed decryption to produce a fresh encryption of \(a \cdot b\), which then can be multiplied with the encryption of \(\varDelta \).

In the context of an active adversary there are two main issues: First, the ciphertexts input by corrupted parties have to be correct and independent of the honest parties’ ciphertexts. This is where zero-knowledge is applied to prove that certain values lie within a certain bound. Second, the distributed decryption protocol actually allows the adversary to add an error - that is, the parties can end up with a triple \((a,b,ab+e)\) with e known to the adversary and where the MACs have additional errors as well. While an error on a MAC will make the MAC check fail in any case, the problem of an incorrect triple requires more attention. This is where the so-called SPDZ sacrifice comes in. Imagine two triples with potential errors \((\llbracket {a} \rrbracket ,\llbracket {b} \rrbracket ,\llbracket {ab+e} \rrbracket )\) and \((\llbracket {a'} \rrbracket ,\llbracket {b'} \rrbracket ,\llbracket {a'b'+e'} \rrbracket )\), and let t be a random field element. Then,

$$\begin{aligned}&t \cdot (ab+e) - (a'b'+e') - (ta - a') \cdot b - a' \cdot (b - b)' \\&= tab + te - a'b' - e' - tab - a'b - a'b + a'b' \\&= te - e', \end{aligned}$$

which is 0 with probability negligible in \(\mathsf {sec}\) for a field of size at least \(2^{\mathsf {sec}}\) if either \(e \ne 0\) or \(e' \ne 0\). The use of MACs means that the adversary cannot forge the result of this computation, hence any error will be caught with overwhelming probability since with the additive secret sharing of our triples the parties have to reveal \(\llbracket {ta-a'} \rrbracket \) and \(\llbracket {b-b'} \rrbracket \). Therefore, one of the triples has to be discarded in order keep the other one “fresh” for use in the online phase. For MASCOT, Keller et al. [KOS16] found that the sacrifice also works with two triples (abab) and \((a',b,a'b)\), which implies \(b - b' = b - b = 0\). Such a combined triple is cheaper to produce (both in MASCOT and SPDZ), and requires less revealing for the check.

3 Low Gear: Triple Generation Using Semi-homomorphic Encryption

The multiplication of secret numbers is at the heart of many secret sharing-based multiparty computation protocols because linear secret sharing schemes make addition easy, and the two operations together are complete.Footnote 2 Both Bendlin et al. [BDOZ11] and Keller et al. [KOS16] have effectively reduced the problem of secure computation to computing an additive secret sharing of the product of two numbers known to two different parties. The former uses semi-homomorphic encryption which allows to add two ciphertexts to get an encryption of the sum of cleartexts whereas the latter uses oblivious transfer which is known to be complete for any protocol.

The semi-homomorphic solution works roughly as follows: One party sends an encryption \(\mathsf {Enc}(a)\) of their input under their own public key to the other, which replies by \(C := b \cdot \mathsf {Enc}(a) - \mathsf {Enc}(c_B)\), where b denotes the second party’s input, and \(c_B\) is chosen at random. Any semi-homomorphic encryption scheme allows the multiplication of a known value with a ciphertext, hence the decryption of the second message is \(c_A := b \cdot a - c_B\), which makes \((c_A,c_B)\) an additive secret sharing of \(a \cdot b\). Here the noise of C might reveal information about b but this can be mitigated by adding random noise from an interval that is \(\mathsf {sec}\) larger than the maximum noise of C. This technique, sometimes called “drowning”, is also used in the distributed decryption of SPDZ.

In the context of a malicious adversary there are two concerns with the above protocol: \(\mathsf {Enc}(a)\) might not be a correct encryption and C might not be computed correctly. In both cases, Bendlin et al. use a zero-knowledge proof of knowledge to make sure that both parties behave correctly.

To prove the correctness of \(\mathsf {Enc}(a)\), there are relatively efficient proofs based on amortized \(\varSigma \)-protocols (reducing the overhead per ciphertext by processing several ciphertexts at once), but for the proof of correct multiplication amortization this is not possible in our context because the underlying ciphertext \(\mathsf {Enc}(a)\) is different in every instance. The main goal of our work in this section is therefore to avoid the proof of correct multiplication altogether and delay it to a later check in the protocol described in the previous section.

Recall that the goal in this family of protocols is to generate random multiplication triples (abab). The sacrifice will guarantee that the parties have shares of correct triples, but there is a possibility of a selective failure attack. If C was not computed correctly, just the fact that the check passed (otherwise the parties abort without using their private data) can reveal information meant to stay private in the protocol. We will show that assuming the enhanced CPA notion in Sect. 3.1 for the underlying cryptosystem suffices to achieve this.

In Sect. 3.2, we will then use our multiplication protocol a first time to compute SPDZ-style MACs, that is, additive secret sharings of the product of a value and a global MAC key, which itself is secret-shared additively. It is straight-forward to compute such a global product from the two-party protocol. Consider that \(\sum _i a_i \cdot \sum _i b_i = \sum _{i,j} a_i \cdot b_j\). Every summand in the right-hand side can be computed either locally or by the two-party protocol, and the additive operation is trivially commutative with the addition of shares.

Building on the authentication protocol, we present the multiplication triple generation in Sect. 3.3 using the two-party multiplication protocol once more. Note that the after-the-fact check of correct multiplication works differently in the two protocols. In the authentication protocol, we make use of the fact that changing values are always multiplied with the same share of the MAC key. In the triple generation, however, both values change from triple to triple, thus we rely on the SPDZ sacrifice there. For this, we use a trick used by Keller et al. that reduces the complexity by generating a pair of triples \(((a,b,ab),(a',b,a'b))\) for the sacrifice instead of two independent triples.

Finally, we present our choice of BGV parameters in Sect. 3.4, following the considerations of Damgård et al. [DKL+13], which in turn are based on Gentry et al. [GHS12b]. We found that the ciphertext modulus is about 100 bits shorter compared to original SPDZ for fields of size \(2^{64}\) to \(2^{128}\), which makes a significant contribution to the reduced complexity of our protocol because SPDZ requires a modulus of bit length about 300 for 64-bit fields and 40-bit security.

3.1 Enhanced CPA Security

We want to reduce the security of our protocol to an enhanced version of the CPA game for the encryption scheme. In other words, if the encryption scheme in use is enhanced-CPA secure, then even a selective failure caused by the adversary does not reveal private information.

Fig. 2.
figure 2

Enhanced CPA game

We say that an encryption scheme is enhanced-CPA secure if, for all PPT adversaries in the game from Fig. 2, \(\Pr [b=b'] - 1/2\) is negligible in \(k\).

Achieving Enhanced-CPA Security. The game without zero-checks in step 3 clearly can be reduced to the standard CPA game. Furthermore, we have to make sure that the oracle queries cannot be used to reveal information about m. The cryptosystem is only designed to allow affine linear operations limiting the adversary to succeed only with negligible probability due to the high entropy of m. However, if the cryptosystem would allow to generate an encryption of a bit of m from \(\mathsf {Enc}_\mathsf {pk}(m)\), the adversary could test this bit for zero with success probability 1/2. Therefore, we have to assume that non-linear operations on ciphertexts are not possible. To this end, Bitansky et al. [BCI+13] have introduced the notion of linear targeted malleability. A stronger notion thereof, linear-only encryption, has been conjectured by Boneh et al. [BISW17] to apply to the cryptosystem by Peikert et al. [PVW08], which is based on the ring learning with errors problem. The definition by Bitansky et al. is as follows.

Definition 1

An encryption scheme has the linear targeted malleability property if for any polynomial-size adversary A and plaintext generator \(\mathcal {M}\) there is a polynomial-size simulator S such that, for any sufficiently large \(\lambda \in \mathbb {N}\), and any auxiliary input \(z \in \{0, 1\}^{\mathsf {poly}(\lambda )}\), the following two distributions are computationally indistinguishable:

$$\begin{aligned} \left\{ \begin{array}{l} \mathsf {pk}, \\ a_1, \dots , a_m, \\ s, \\ \mathsf {Dec}_\mathsf {sk}(c_1'), \dots , \mathsf {Dec}_\mathsf {sk}(c_k') \end{array} \left| \begin{array}{r} (\mathsf {sk}, \mathsf {pk}) \leftarrow \mathsf {Gen}(1^\lambda ) \\ (s, a_1, \dots , a_m) \leftarrow \mathcal {M}(\mathsf {pk}) \\ (c_1, \dots , c_m) \leftarrow (\mathsf {Enc}_\mathsf {pk}(a_1), \dots , \mathsf {Enc}_\mathsf {pk}(a_m)) \\ (c_1', \dots , c_k') \leftarrow A(\mathsf {pk}, c_1, \dots , c_m; z) \\ \text {where} \\ \mathsf {ImVer}_\mathsf {sk}(c_1') = 1, \dots , \mathsf {ImVer}_\mathsf {sk}(c_k') = 1 \end{array} \right. \right\} \\ \text {and}\qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \\ \left\{ \begin{array}{l} \mathsf {pk}, \\ a_1, \dots , a_m, \\ s, \\ a_1', \dots , a_k' \end{array} \left| \begin{array}{r} (\mathsf {sk}, \mathsf {pk}) \leftarrow \mathsf {Gen}(1^\lambda ) \\ (s, a_1, \dots , a_m) \leftarrow \mathcal {M}(\mathsf {pk}) \\ (\mathrm \varPi , \mathbf {b}) \leftarrow S(\mathsf {pk}; z) \\ (a_1', \dots , a_k')^\top \leftarrow \mathrm \varPi \cdot (a_1, \dots , a_m)^\top + \mathbf {b}\end{array} \right. \right\} \end{aligned}$$

where \(\mathrm \varPi \in \mathbb {F}^{k \times m}\), \(\mathbf {b}\in \mathbb {F}^k\), and s is some arbitrary string (possibly correlated with the plaintexts).

In the context of BGV, the definition can easily be extended to vectors of field elements. Furthermore, verifying whether a ciphertext is the image of the encryption (\(\mathsf {ImVer}\)) can be trivially done by checking membership in \(R_q \times R_q\), which is possible without the secret key.

It is straightforward to see that linear targeted malleability allows to reduce the enhanced-CPA game to a game without a zero-test oracle. We simply replace the decryption of the adversary’s queries by \(a_1', \dots , a_k'\) computed using S according to the definition, which can be tested for zero without knowing the secret key. The two games are computationally indistinguishable by definition, and the modified one can be reduced to the normal CPA game as argued above.

We now argue that BGV as used by us is a valid candidate for linear targeted malleability. First, the definition excludes computation on ciphertexts other than affine linear maps. Most notably, this excludes multiplication. Since we do not generate the key-switching material used by Damgård et al. [DKL+13], there is no obvious way of computing multiplications or operations of any higher order.

Second, the definition requires the handling of ciphertexts that were generated by the adversary without following the encryption algorithm. For example, \(\mathsf {Dec}_\mathsf {sk}(0,1) = s \bmod p\). The decryption of such ciphertexts can be simulated by sampling a secret key and computing the decryption accordingly. However, to avoid a security degradation due to independent consideration of standard CPA security and linear targeted malleability, we add \(\mathsf {sec}\) bits of entropy to the secret key as follows.

The key generation of BGV generates s of length N such that s has \(h = 64\) non-zero entries at randomly chosen places, which are chosen uniformly from \(\{-1,1\}\). The entropy is therefore

$$\begin{aligned} \log {N \atopwithdelims ()h} + h. \end{aligned}$$

It is easy to see that choosing \(h' = h + \mathsf {sec}\) non-zero entries increases the entropy by \(\mathsf {sec}\) bitsFootnote 3 for large enough N. Because \({N \atopwithdelims ()k}\) monotonously increases for \(k \le N/2\),

$$\begin{aligned} {N \atopwithdelims ()h + \mathsf {sec}} \ge {N \atopwithdelims ()h} \end{aligned}$$

for \(N \ge 2 \cdot (h + \mathsf {sec})\). It follows that

$$ \log {N \atopwithdelims ()h + \mathsf {sec}} + h + \mathsf {sec}\ge \left( \log {N \atopwithdelims ()h} + h\right) + \mathsf {sec}, $$

which is the desired result. We will later see that N is much bigger than \(2 \cdot (h + \mathsf {sec})\) for \(h = 64\) and \(\mathsf {sec}= 128\).

3.2 Input Authentication

As in Keller et al. [KOS16], we want to implement a functionality (Fig. 3) that commits the parties to secret sharings and that provides the secure computation of linear combinations of inputs. However, instead of using oblivious transfer for the pairwise multiplication of secret numbers we use our building block based on semi-homomorphic encryption. See Fig. 4 for our protocol.

Fig. 3.
figure 3

Functionality \(\mathcal {F}_{\llbracket \cdot \rrbracket }\)

Fig. 4.
figure 4

Protocol for n-party input authentication

Fig. 5.
figure 5

Protocol for MAC checking

In case parties \(P_i\) and \(P_j\) are honest,

$$\begin{aligned} \varDelta ^{(i)}\cdot \rho - \sigma ^{(i)}- \sum _{k=1}^m t_k \cdot \mathbf {d}^{(i)}_k&= \varDelta ^{(i)}\cdot (\sum _{k=1}^m t_k \cdot x_k) - \sum _{k=1}^m t_k \cdot e^{(i)}_k - \sum _{k=1}^m t_k \cdot \mathbf {d}^{(i)}_k \\&= \sum _{k=1}^m t_k \cdot (\varDelta ^{(i)}\cdot x_k - e^{(i)}_k - \mathbf {d}^{(i)}_k) = 0 \end{aligned}$$

for all \(i \ne j\). This means that \(P_i\)’s check succeeds in this case. The last equation follows from the homomorphism of the encryption scheme.

Furthermore, one can check similarly that

$$\begin{aligned} \sum _i m^{(i)}_k = x_k \cdot \sum _i \varDelta ^{(i)}, \end{aligned}$$

which is the desired equation underlying the MAC. If it does not hold because of \(P_j\)’s behaviour, we would like the check to fail for some honest \(P_i\). Informally, the fact that \(P_j\) cannot predict the coefficients \(t_k\) makes it impossible for \(P_j\) to provide correct \((\rho , \sigma ^{(i)})\) to an honest party \(P_i\) after computing \(C^{(i)}\) incorrectly. However, this opens the possibility for leakage by a selective failure attack, which is why we need the underlying cryptosystem to achieve enhanced-CPA security.

The most intricate part of the simulator \(\mathcal {S}_{\llbracket \cdot \rrbracket }\) (Fig. 6) is simulating the Input phase for a corrupted \(P_j\) while the same phase for honest \(P_j\) is straight-forward given that \(\mathsf {Enc}'\) statistically hides the noise of \(\mathbf {x}^{(i)}\cdot \mathsf {Enc}(\varvec{\varDelta })\). Note that \((x_m, e_m^{(i)}, d_m^{(i)})\) are only used for the check. This maintains \(P_j\)’s privacy even after sending \(\rho \) and \(\{\sigma ^{(i)}\}_{i \ne j}\).

Fig. 6.
figure 6

Simulator for \(\varPi _\mathsf {\llbracket \cdot \rrbracket }\)

Theorem 1

\(\varPi _\mathsf {\llbracket \cdot \rrbracket }\) implements \(\mathcal {F}_{\llbracket \cdot \rrbracket }\) in the \(\mathcal F_\mathsf {Commit}\)-hybrid model with rewinding in a presence of a dishonest majority if the underlying cryptosystem achieves enhanced CPA-security.

Proof

(Sketch). We focus on the case of a corrupted \(P_j\) in the Input phase because the adversary has a larger degree of freedom with the encryptions \(C^{(i)}\). However, with rewinding in step 6 we can extract the values used by the adversary. This extraction takes time inversely dependent to the success probability, as per the soundness argument for \(\varSigma \)-protocols. To see this, consider that the space of all possible challenges \(\{t_k\}_{k=1}^m\) has size \(|\mathbb {F}|^m\). The extractor requires the responses to m linearly independent challenges \(\{t_k\}_{k=1}^m\). The adversary can only prevent this by restricting the correct responses to an incomplete subspace of \(S \subset \mathbb {F}^m\), that is \(|S| \le \mathbb {F}^{m-1}\). Such an adversary will succeed with probability at most \(|\mathbb {F}|^{m-1}/|\mathbb {F}|^m = |\mathbb {F}|^{-1}\), which is negligible because we require the size of \(\mathbb {F}\) to be exponential in the security parameter. It follows that the soundness extractor for \(\varSigma \)-protocols by Damgård [Dam02] can be adapted to our case.

After the extraction, it is straightforward to simulate the rest of the protocol because the Linear Combination phase does not involve communication, and producing a correct MAC in the Check phase for an incorrect output in the Open phase is equivalent to extracting \(\varDelta \). This argument can also be extended to the random linear combination used in the Check phase similarly to Keller et al. [KOS16]. It is easy to see that extracting \(\varDelta \) is in turn equivalent to breaking the security of the underlying cryptosystem.

We therefore construct a distinguisher in the enhanced-CPA security game from an environment distinguishing between the real and the ideal world. The difference between \(\mathsf {Enc}_{\mathsf {pk}_{ij}}(\varDelta ^{(i)})\) in the real world and \(E = \mathsf {Enc}_{\mathsf {pk}_{ij}}(x)\) for random x in the simulation can trivially be reduced to our CPA security game (using the encryption as c in the game) because the adversary never receives \(\varDelta ^{(i)}\). Furthermore, \(\mathbf {x}\) and \(\mathbf {e}^{(i)}\) extracted from the adversary can be used to compute \(C' = C^{(i)}- \mathbf {x}\cdot E - \mathbf {e}^{(i)}\). Via the check conducted by the honest party \(P_i\), the adversary learns whether \(C'\) decrypts to zero. We therefore forward \(C'\) to the zero test in our enhanced CPA game.

3.3 Triple Generation

Recall that the goal is to produce random authenticated triples \((\llbracket {a} \rrbracket , \llbracket {b} \rrbracket , \llbracket {ab} \rrbracket )\) such that ab are randomly sampled from \(\mathbb {F}\) as described in Fig. 8. Our protocol in Fig. 7 is modeled closely after MASCOT [KOS16], replacing oblivious transfer with semi-homomorphic encryption. The construction of a “global” multiplication from a two-party protocol works exactly the same way in both cases. The Sacrifice step is exactly the same as in SPDZ and MASCOT and essentially guarantees that corrupted parties have used the same inputs in the Multiplication and Authentication steps. This is the only freedom the adversary has because all other arithmetic is handled by \(\mathcal {F}_{\llbracket \cdot \rrbracket }\) at this stage.

Fig. 7.
figure 7

Protocol for random triple generation

Fig. 8.
figure 8

Functionality for random triple generation.

Theorem 2

\(\varPi _\mathsf {\mathsf {Triple}}\) implements \(\mathcal {F}_\mathsf {\mathsf {Triple}}\) in the \((\mathcal {F}_{\llbracket \cdot \rrbracket }, \mathcal {F}_\mathsf {Rand})\)-hybrid model with a dishonest majority of parties.

Proof

(Sketch). For the proof we use \(\mathcal {S}_{\mathsf {Triple}}\) in Fig. 9. The simulator is based on two important facts: First, it can decrypt \(C^{(ji)}\) for a corrupted party \(P_j\) because it generates the keys emulating \(\mathcal {F}_\mathsf {ZKPoK}^S\). Second, the adversary is committed to all shares of corrupted parties by the input to \(\mathcal {F}_{\llbracket \cdot \rrbracket }\) in the Authenticate step. This allows the simulator to determine exactly whether the Sacrifice step in \(\varPi _\mathsf {\llbracket \cdot \rrbracket }\) will fail. Furthermore, the adversary only learns encryptions of honest parties’ shares, corrupted parties’ shares, \(\varvec{\rho }\), and the result of the check. If the check fails, the protocol aborts, \(\varvec{\rho }\) is independent of any output information because \(\hat{\mathbf {b}}\) and \(\hat{\mathbf {c}}\) are discarded at the end, and finally, an environment deducing information from the encryptions can be used to break the enhanced-CPA security of the underlying cryptosystem. In addition, the environment only learns handles to triples in the Output steps, from which no information can be deduced.

Fig. 9.
figure 9

Simulator for \(\varPi _\mathsf {\mathsf {Triple}}\)

3.4 Parameter Choice

Since we do not need multiplication of ciphertexts, the list of moduli used in previous works [DKL+13, GHS12b] collapses to one q (\( = q_1 = q_0 = p_0\) depending on context). The other main parameter is the number of ciphertext slots denoted by \(N = \phi (m)\). Gentry et al. [GHS12b] give the following inequality for the largest modulus:

$$\begin{aligned} N \ge \frac{\log (q/\sigma )(k + 110)}{7.2} \end{aligned}$$

for a computational security k, which gives

$$\begin{aligned} N \ge \log q \cdot 33.1 \end{aligned}$$
(1)

for 128-bit security. \(\sigma = 3.2\) does not make a difference in this inequality.

The second constraint on q and \(\phi (m)\) depends on the noise of the ciphertext to be decrypted. Damgård et al. compute the bound \(B_\mathsf {clean}\) on the noise of a freshly generated ciphertext:

$$\begin{aligned} B_\mathsf {clean}&= N \cdot p / 2 + p \cdot \sigma (16 \cdot N \cdot \sqrt{n/2} + 6 \cdot \sqrt{N} + 16 \cdot \sqrt{n \cdot h \cdot N}) \\ \end{aligned}$$

p denotes the plaintext modulus, and n denotes the number of parties, which appears because of the distributed ciphertext generation (the secret is the sum of n secret keys). Setting \(n = 1\) because we do not use distributed ciphertext generation, and \(h = 64 + \mathsf {sec}\le 192\), \(\sigma = 3.2\) as in the previous works, we get

$$\begin{aligned} B_\mathsf {clean}\le p \cdot (37 N + 685 \sqrt{N}). \end{aligned}$$

In the multiplication protocol, one party multiplies the ciphertext with a number in \(\mathbb {F}_p\), adds a number in \(\mathbb {F}_p\), and then “drowns” the noise with statistical security \(\mathsf {sec}\) (adding extra noise sampling from an interval that is \(2^\mathsf {sec}\) larger than the current noise bound). Furthermore, depending on the proof of knowledge used, we can only assume that the noise of the ciphertext being sent is \(S \cdot B_\mathsf {clean}\) for some soundness slack \(S \ge 1\). Therefore, the noise before decryption is bounded by

$$\begin{aligned} p \cdot S \cdot B_\mathsf {clean}\cdot (1 + 2^\mathsf {sec}), \end{aligned}$$

which must be smaller than q/2 for correct decryption. Hence,

$$\begin{aligned} 2 \cdot p^2 \cdot S \cdot \left( 37 N + 685 \sqrt{N}\right) (1 + 2^\mathsf {sec}) < q. \end{aligned}$$
(2)

Putting things together, (2) implies that, loosely, \(120 \le \log q\) or \(384 \le \log q\) if \(\mathsf {sec}= 40\) or \(\mathsf {sec}= 128\) and \(p \ge 2^\mathsf {sec}\) (the latter is a requirement of SPDZ-like sacrificing). Using this in (1) gives \(N \ge 3972\) or \(N \ge 12711\). For both values of N as well as a ten times larger N,

$$\begin{aligned} \log \left( 37 N + 685 \sqrt{N}\right) \approx 20 \pm 2. \end{aligned}$$

Hence,

$$\begin{aligned} \log q \gtrsim 21 + 2\log p + \log S + \mathsf {sec}\pm 2. \end{aligned}$$

The proof of knowledge in the first version of SPDZ [DPSZ12] has the worst soundness slack with

$$\begin{aligned} S = N \cdot \mathsf {sec}^2 \cdot 2^{\sec /2 + 8}. \end{aligned}$$

Thus,

$$\begin{aligned} \log S \le \log N + 2 \log \mathsf {sec}+ \sec /2 + 8 \end{aligned}$$

and

$$ \log q \gtrsim 29 + 2\log p + 3 \mathsf {sec}/ 2 + 2 \log \mathsf {sec}+ \log N \pm 2. $$

Note that, even though this estimate is now five years old, we found our parameters to hold against more recent estimates [APS15] tested using the script that is available online [Alb17]. The main reason is that our parameters have a considerable margin because we require N to be a power of two.

More recently, Damgård et al. [CDXY17] presented an improved version of the cut-and-choose proof used in a previous implementation of SPDZ [DKL+13], but the reduced slack does not justify the increased complexity caused by several additional ciphertexts being computed and sent in the proof. Consider that, even for \(\mathsf {sec}=128\) and \(N=2^{16}\) (the latter being typical for our parameters), \(\log S\) is about 100, increasing the ciphertext modulus length by less than 25%.

We have calculated the ciphertext modulus q’s bit length for various parameters and for our protocol with semi-homomorphic encryption and SPDZ (using somewhat homomorphic encryption). Then we instantiated both protocols with several ZK proofs like the Schnorr-like protocol [CD09, DPSZ12] and the recent cut-and-choose proof [CDXY17]. Table 1 shows the results of our calculation as well as the results given by Damgård et al. [DKL+13]. One can see that using cut-and-choose instead of the Schnorr-like protocol does not make any difference for SPDZ. This is because the scaling (also called modulus switching) involves the division by a number larger than the largest possible slack of the Schnorr-like protocol (roughly \(2^{100}\)), hence the slack will be eliminated. For our Low Gear protocol, the slack has a slight impact, increasing the size of a ciphertext by up to 25%. However, this does not justify the use of a cut-and-choose proof because it involves sending seven instead of two extra ciphertexts per proof.

Table 1 also shows Low Gear ciphertexts are about 30% shorter than SPDZ ciphertexts. Consider that Table 3 in Sect. 5 shows a reduction in the communication from SPDZ to Low Gear of up to 50%. The main reason for the additional reduction is the fact that for one guaranteed triple, SPDZ involves producing two triples (abc), (def), of which (abde) require a zero-knowledge proof. In Low Gear on the other hand, we produce \((a,b,c,\hat{b},\hat{c})\), of which only a requires a zero-knowledge proof.

4 High Gear: SPDZ with Global ZKPoK Check

In terms of computation, the most expensive part of SPDZ is anything related to the encryption scheme, encryption, decryption, and homomorphic operations. The encryption algorithm is not only used for inputs but also by both the prover and the verifier in the zero-knowledge proof. Since a non-interactive zero-knowledge protocol allows the parties to generate only one proof per input, independently of the number of parties, every party has to verify every other party’s proof because every other party is assumed to be corrupted. With a growing number of parties, this is clearly the computational bottleneck of the protocol. In this section, we present a way to avoid this by summing all proofs and only checking the sum. This is similar to the threshold proofs presented by Keller et al. [KMR12]. However, this neither reduces the communication nor the asymptotic computation because every party still has to send every proof to every party and then sum all the received proofs. Nevertheless, summing up the proofs is much cheaper than verifying them individually.

Table 1. Ciphertext modulus bit length (\(\log (q)\)) for two parties.

The High Gear protocol is meant to surpass Low Gear when executed with a high number of parties. To achieve this we design a new zero-knowledge proof which scales better when increasing the number of players. One can think of the High Gear proof of knowledge as a customized interactive proof version from Damgård et al. [DPSZ12] whereas Low Gear is a protocol ran with the non-interactive proof. The latter requires knowledge of the first message of the proof (sometimes called the commitment) to compute the challenge. In the context of combining the proof with many parties, the first message is the sum of an input from each party, which means that communication is required in any case. Therefore, there is less of an advantage in using the non-interactive proof.

Figure 10 shows our adaptation of the zero-knowledge proof in Fig. 9 from Damgård et al. [DPSZ12]. The main conceptual difference is going from a two-party to a multi-party protocol. However, we have also simplified the bounds.

In the following we will prove that our protocol achieves the natural extension of the \(\varSigma \)-protocol properties in the multi-party setting.

Correctness. The equality in step 6 follows trivially from the linearity of the encryption. It remains to check the probability that an honest prover will fail the bounds check on \(\Vert {\mathbf {z}}\Vert _\infty \) and \(\Vert {\mathbf {t}}\Vert _\infty \) where the infinity norm \(\Vert {\cdot }\Vert _\infty \) denotes the maximum of the absolute values of the components.

Remember that the honestly generated \(E^{(i)}\) are \((\tau , \rho )\) ciphertexts. The bound check will succeed if the infinity norm of \(\sum _{i=1}^n (\mathbf {y}^{(i)} + \sum _{k=1}^{\mathsf {sec}}(M_{e_{jk}} \cdot \mathbf {x}^{(i)}))\) is at most \(2 \cdot n \cdot B_\mathsf {plain}\). This is always true because \(\mathbf {y}^{(i)}\) is sampled such that \(\Vert {\mathbf {y}^{(i)}}\Vert _\infty \le B_\mathsf {plain}\) and \(\Vert {M_\mathbf {e}\cdot \mathbf {x}^{(i)}}\Vert _\infty \le \mathsf {sec}\cdot \tau \le 2^\mathsf {sec}\cdot \tau = B_\mathsf {plain}\). A similar argument holds regarding \(\rho \) and \(B_\mathsf {rand}\).

Special Soundness. To prove this property one must be able to extract the witness given responses from two different challenges. In this case consider the transcripts \((\mathbf {x}, \mathbf {a}, \mathbf {e}, (\mathbf {z}, T))\) and \((\mathbf {x}, \mathbf {a}, \mathbf {e}', (\mathbf {z}', T'))\) where \(\mathbf {e}\ne \mathbf {e}'\). Recall that each party has a different secret \(\mathbf {x}^{(i)}\). Because both challenges have passed the bound checks during the protocol, we get that:

$$\begin{aligned} (M_\mathbf {e}- M_{\mathbf {e}'}) \cdot E^{\intercal } = (\mathbf {d}- \mathbf {d}')^{\intercal } \end{aligned}$$
Fig. 10.
figure 10

Protocol for global proof of knowledge of a ciphertext

To solve the equation for E notice that \(M_\mathbf {e}- M_{\mathbf {e}'}\) is a matrix with entries in \(\{-1,0,1\}\) so we must solve a linear system where \(E = \mathsf {Enc}_\mathsf {pk}(\mathbf {x}_k, \mathbf {r}_k)\) for \(k = 1,\dots ,\mathsf {sec}\). This can be done in two steps: solve the linear system for the first half: \(\mathbf {c}_1, \dots , \mathbf {c}_{\mathsf {sec}/2}\) and then for the second half: \(\mathbf {c}_{\mathsf {sec}/2+1}, \dots , \mathbf {c}_{\mathsf {sec}}\). For the first step identify a square submatrix of \(\mathsf {sec}\times \mathsf {sec}\) entries in \(M_\mathbf {e}- M_{\mathbf {e}'}\) which has a diagonal full of 1’s or \(-1\)’s and it is lower triangular. This can be done since there is at least one component j such that \(e_j \ne e'_j\). Recall that the plaintexts \(\mathbf {z}_k, \mathbf {z}'_k\) have norms less than \(B_\mathsf {plain}\) and the randomness used for encrypting them, \(\mathbf {t}_k, \mathbf {t}'_k\), have norms less than \(B_\mathsf {rand}\) where k ranges through \(1, \dots , \mathsf {sec}\).

Solving the linear system from the top row to the middle row via substitution we obtain in the worst case: \(\Vert {\mathbf {x}_k}\Vert _\infty \le 2^k \cdot n \cdot B_\mathsf {plain}\) and \(\Vert {\mathbf {y}_k}\Vert _\infty \le 2^k \cdot n \cdot B_\mathsf {rand}\) where k ranges through \(1, \dots , \mathsf {sec}/2\). The second step is similar to the first with the exception that now we have to look for an upper triangular matrix of \(\mathsf {sec}\times \mathsf {sec}\). Then solve the linear system from the last row to the middle row. In this way we extract \(\mathbf {x}_k, \mathbf {r}_k\) which form \((2^{\mathsf {sec}/2+1}\cdot n \cdot B_\mathsf {plain}, 2^{\mathsf {sec}/2+1} \cdot n \cdot B_\mathsf {rand})\) or \((2^{3\mathsf {sec}/2+1} \cdot n \cdot \tau , 2^{3\mathsf {sec}/2+1} \cdot n \cdot \rho )\) ciphertexts. This means that the slack is \(2^{3\mathsf {sec}/2+1}\).

Honest Verifier Zero-Knowledge. Here we give a simulator \(\mathcal {S}\) for an honest verifier (each party \(P_i\) acts as one at one point during the protocol). The simulator’s purpose is to create a transcript with the verifier which is indistinguishable from the real interaction between the prover and the verifier. To achieve this, \(\mathcal {S}\) samples uniformly \(\mathbf {e}{\mathop {\leftarrow }\limits ^{\small {\$}}}\{0,1\}^{\mathsf {sec}}\) and then creates the transcript accordingly: sample \(\mathbf {z}^{(i)}\) such that \(\Vert {\mathbf {z}^{(i)}}\Vert _\infty \le B_\mathsf {plain}\) and \(T^{(i)}\) such that \(\Vert {T^{(i)}}\Vert _\infty \le B_\mathsf {rand}\) and then fix \(\mathbf {a}^{(i)}= \mathsf {Enc}_\mathsf {pk}(\mathbf {z}^{(i)}, T^{(i)}) - (M_\mathbf {e}\cdot E^{(i)})\), where the encryption is applied component-wise. Clearly the produced transcript \((\mathbf {a}^{(i)}, \mathbf {e}^{(i)}, \mathbf {z}^{(i)}, T^{(i)})\) passes the final checks and the statistical distance to the real one is \(2^{-\mathsf {sec}}\), which is negligible with respect to \(\mathsf {sec}\).

Fig. 11.
figure 11

Functionality for global proof of knowledge of ciphertext

Fig. 12.
figure 12

Simulator for global proof of knowledge of ciphertext

Putting Things Together. In the context of our triple generation, we model \(\varPi _\mathsf {gZKPoK}\) as \(\mathcal {F}_\mathsf {gZKPoK}^S\) in Fig. 11. We will argue below that \(\varPi _\mathsf {gZKPoK}\) implements \(\mathcal {F}_\mathsf {gZKPoK}^S\) with slack \(S = 2^{3\mathsf {sec}/2+1}\).

\(\mathcal {F}_\mathsf {gZKPoK}^S\) does not guarantee the correctness of individual corrupted parties’ ciphertexts but the correctness of the resulting sum. This suffices because only the latter is used in the protocol. A rewinding simulator still can extract individual inputs, but there is no guarantee that either they are in fact pre-images of the encryptions sent by corrupted parties or they are subject to any bounds. Both properties only hold for the sum. This is modeled by \(\mathcal {F}_\mathsf {gZKPoK}^S\) only outputting a sum, and it is easy to see that this output suffices for SPDZ.

\(\mathcal {S}_{\mathsf {gZKPoK}}^S\) in Fig. 12 describes our simulator. The rewinding technique is the same as in the soundness simulator for the \(\varSigma \)-protocol and therefore has the same running time (roughly inverse to the success probability of a corrupted prover). See Sect. 3 of [Dam02] for details.

5 Implementation

We have implemented all three approaches to triple generation in this paper and measured the throughputs achieved by them in comparison to previous results with SPDZ [DKL+12, DKL+13] and MASCOT [KOS16]. We have used the optimized distributed decryption in the full version [KPR17] for SPDZ-1, SPDZ-2, and High Gear. Our code is written in C++ and uses MPIR [MPI17] for arithmetic with large integers.Footnote 4 We use Montgomery modular multiplication and the Chinese reminder theorem representation of polynomials wherever beneficial. See Gentry et al. [GHS12b] for more details.

Note that the parameters chosen by Damgård et al. [DKL+13][Appendix A] for the non-interactive zero-knowledge proof imply that the prover has to re-compute the proof with probability 1/32 as part of a technique called rejection sampling. We have increased the parameters to reduce this probability by up to \(2^{20}\) as long as it would not impact on the performance, i.e., the number of 64-bit words needed to represent \(p_o\) and \(p_1\) would not change.

All previous implementations have benchmarks for two parties on a local network with 1 Gbit/s throughput on commodity hardware. We have have used i7-4790 and i7-3770S CPUs with 16 to 32 GB of RAM, and we have re-run and optimized the code by Damgård et al. [DKL+13] for a fairer comparison. Table 2 shows our results in this setting. SDPZ-1 and SPDZ-2 refer to the two different proofs for ciphertexts, the Schnorr-like protocol presented in the original paper [DPSZ12] and the cut-and-choose protocol in the follow-up work [DKL+13], the latter with either covert or active security. The c-covert security is defined as a cheating adversary being caught with probability 1/c, and by \(\mathsf {sec}\)-bit security we mean a statistical security parameter of \(\mathsf {sec}\). Throughout this section, we will round figures to the two most significant digits for a more legible presentation.

To allow direct comparisons with previous works, we have benchmarked our protocols for several choices of security parameters and field size. Note that the computational security parameter is set everywhere to \(k=128\) and we highlight how the statistical parameter impacts the performance. The main difference between our implementation of SPDZ with the Schnorr-like protocol to the previous one [DKL+12], is the underlying BGV implementation because the protocol is the same.

Table 2. Triple generation for 64 and 128 bit prime fields with two parties on a 1 Gbit/s LAN.

In Table 3 we also analyze the communication per triple of some protocols with active security and compared the actual throughput to the maximum possible on a 1 Gbit/s link (network throughput divided by the communication per triple). The higher the difference between actual and maximum possible, the more time is spent on computation. The figures show that MASCOT has very low computation; the actual throughput is more than 90% of the maximum possible. On the other hand, all BGV-based implementations have a significant gap, which is to be expected. Experiments have shown that the relative gap increases in Low Gear with a growing statistical parameter. This is mostly because the ciphertexts become larger and 32 GB of memory is not enough for one triple generator thread per core, hence there is some computation capacity left unused.

Table 3. Communication per prime field triple (one way) and actual vs. maximum throughput with two parties on a 1 Gbit/s link.

WAN Setting. For a more complete picture, we have also benchmarked our protocols in the same WAN setting as Keller et al. [KOS16], restricting the bandwidth to 50 Mbit/s and imposing a delay of 50 ms to all communication. Table 4 shows our results in similar manner to Table 3. As one would expect, the gap between actual throughput and maximum possible is more narrow because the communication becomes more of a bottleneck, and the performance is closely related to the required communication.

Table 4. Communication per prime field triple (one way) and actual vs. maximum throughput with two parties on a 50 Mbit/s link.

Fields of Characteristic Two. For a more thorough comparison with MASCOT, we have also implemented our protocols for the field of size \(2^{40}\) using the same approach as Damgård et al. [DKL+12]. Table 5 shows the low performance of homomorphic encryption-based protocols with fields of characteristic two. This has been observed before: in the above work, the performance for \(\mathbb {F}_{2^{40}}\) is an order of magnitude worse than for \(\mathbb {F}_p\) with a 64-bit bit prime. The main reason is that BGV lends itself naturally to cleartexts modulo some integer p. The construction for \(\mathbb {F}_{2^{40}}\) sets \(p = 2\) and uses 40 slots to represent an element whereas an element of \(\mathbb {F}_p\) for a prime p only requires one ciphertext slot.

Table 5. Triple generation for characteristic two with two parties on a 1 Gbit/s LAN.
Fig. 13.
figure 13

Triple generation for a 128 bit prime field with 64 bit statistical security on AWS r4.16xlarge instances.

More Than Two Parties. Increasing the number of parties, we have benchmarked our protocols and our implementation of SPDZ with up to 64 r4.16xlarge instances on Amazon Web Services. Figure 13 shows that both Low and High Gear improve over SPDZ-1, with High Gear taking the lead from about ten parties. Missing figures do not indicate failed experiments but rather omitted experiments due to financial constraints.

At the time of writing, one hour on an r4.16xlarge instance in US East costs $4.256. Therefore, the number of triples per dollar and party varies between 190 million (two parties with Low Gear) and 13 million (64 parties with High Gear).

5.1 Vickrey Auction for 100 Parties

As a motivation for computation with a high number of parties, we have implemented a secure Vickrey second price auction [Vic61], where 100 parties input one bid each. Table 6 shows our online phase timings for two different Amazon Web Services instances.

Table 6. Online phase of Vickrey auction with 100 parties, each inputting one bid.

The Vickrey auction requires 44,571 triples. In Table 7, we compare the offline cost of MASCOT and our High Gear protocol on AWS m3.2xlarge instances.

Table 7. Offline phase of Vickrey auction with 100 parties, each inputting one bid.

6 Future Work

Recently, there has been an improved zero-knowledge proof of knowledge of bounded pre-images for LWE-style one-way functions [BL17]. It reduces the extra ciphertexts per proven plaintext from two (in our protocol) to any number larger than one. The technique is dependent on the number of ciphertexts that are proven simultaneously. More concretely, for \(u \cdot \mathsf {sec}\) ciphertexts in one proof (and \(u \ge 1\)), the prover needs to send \((u + 1) \cdot \mathsf {sec}\) ciphertexts in the first round, hence the amortized overhead is \((u+1)/u\). This compares to \(2u \cdot \mathsf {sec}- 1\) ciphertexts amortized over \(2-1/(u \cdot \mathsf {sec})\) in our scheme. However, we estimate that the benefits of the newer proof strongly depend on the parameters and the available memory. For some parameters, we found that our implementation would exhaust 32 GB of memory with fewer than eight generation threads. We therefore could not exhaust the computational capacity of the CPU. Note that our implementation stores all necessary information for the proof in memory, and consider that one ciphertext takes up to \(2^{16} \cdot 700 \cdot 2\) bits or \(\approx 11\) MBytes. This means that, for 128-bit active security, we require about \((3 \mathsf {sec}- 1) \cdot 11\) MBytes or \(\approx 4.4\) GBytes of storage for the ciphertexts alone (not considering any cleartexts). It would be interesting to see how the newer proof fares and whether using a solid state disk for storage would improve the performance.