Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The cryptographic technique which allows an untrusted entity to perform arbitrary computation on encrypted data is known as fully homomorphic encryption. The first such construction was based on ideal lattices and was presented by Gentry in 2009 [24]. When the algorithm applied to the encrypted data is known in advance one can use a somewhat homomorphic encryption (SHE) scheme which only allows to perform a limited number of computational steps on the encrypted data. Such schemes are significantly more efficient in practice.

In all popular SHE schemes, the plaintext space is a ring of the form \(R_t=\mathbb {Z}_t[X]/(f(X))\), where \(t \ge 2\) is a small integer called the coefficient modulus, and \(f(X) \in \mathbb {Z}[X]\) is a monic irreducible degree d polynomial called the polynomial modulus. Usually one lets f(X) be a cyclotomic polynomial, where for reasons of performance the most popular choices are the power-of-two cyclotomics \(X^{d} + 1\) where \(d = 2^k\) for some positive integer k, which are maximally sparse. In this case arithmetic in \(R_t\) can be performed efficiently using the fast Fourier transform, which is used in many lattice-based constructions (e.g. [8,9,10, 34]) and most implementations (e.g. [3, 6, 7, 25, 26, 29, 32]).

One interesting problem relates to the encoding of the input data of the algorithm such that it can be represented as elements of \(R_t\) and such that one obtains a meaningful outcome after the encrypted result is decrypted and decoded. This means that addition and multiplication of the input data must agree with the corresponding operations in \(R_t\) up to the depth of the envisaged SHE computation. An active research area investigates different such encoding techniques, which are often application-specific and dependent on the type of the input data. For the sake of exposition we will concentrate on the particularly interesting and popular setting where the input data consists of finite precision real numbers \(\theta \), even though our discussion below is fairly generic. The main idea, going back to Dowlin et al. [19] (see also [20, 27, 31]) and analyzed in more detail by Costache et al. [16], is to expand \(\theta \) with respect to a base b

$$\begin{aligned} \theta =a_rb^r+a_{r-1}b^{r-1}+\cdots +a_1b+a_0+a_{-1}b^{-1}+a_{-2}b^{-2}+\cdots +a_{-s}b^{-s} \end{aligned}$$
(1)

using integer digits \(a_i\), after which one replaces b by X to end up inside the Laurent polynomial ring \(\mathbb {Z}[X^{\pm 1}]\). One then reduces the digits \(a_i\) modulo t and applies the ring homomorphism to \(R_t\) defined by

$$\begin{aligned} \iota : \mathbb {Z}_t[X^{\pm 1}] \rightarrow R_t : \left\{ \begin{array}{lcl} X &{}\mapsto &{} X, \\ X^{-1} &{} \mapsto &{} -g(X) \cdot f(0)^{-1}, \\ \end{array} \right. \end{aligned}$$

where we write \(f(X) = Xg(X) + f(0)\) and it is assumed that f(0) is invertible modulo t; this is always true for cyclotomic polynomials, or for factors of them. The quantity \(r+s\) will sometimes be referred to as the degree of the encoding (where we assume that \(a_r a_{-s} \ne 0\)). For power-of-two-cyclotomics the homomorphism \(\iota \) amounts to letting \(X^{-1} \mapsto -X^{d-1}\), so that the encoding of (1) is given byFootnote 1 \( a_rX^r+a_{r-1}X^{r-1}+\cdots +a_1X+a_0-a_{-1}X^{d-1}-a_{-2}X^{d-2}-\cdots -a_{-s}X^{d-s}\).

Decoding is done through the inverse of the restriction \(\iota |_{\mathbb {Z}_t[X^{\pm 1}]_{[-\ell ,m]}}\) where

$$\begin{aligned} \mathbb {Z}_t[X^{\pm 1}]_{[-\ell ,m]} = \{ \, a_m X^m + a_{m-1}X^{m-1} + \ldots + a_{-\ell } X^{-\ell } \, | \, a_i \in \mathbb {Z}_t \text { for all }i \, \} \end{aligned}$$

is a subset of Laurent polynomials whose monomials have bounded exponents. If \(\ell + m + 1 = d\) then this restriction of \(\iota \) is indeed invertible as a \(\mathbb {Z}_t\)-linear map. The precise choice of \(\ell ,m\) depends on the data encoded. After applying this inverse, one replaces the coefficients by their representants in \(\{ - \lfloor (t-1)/2 \rfloor , \ldots , \lceil (t-1)/2 \rceil \}\) to end up with an expression in \(\mathbb {Z}[X^{\pm 1}]\), and evaluates the result at \(X = b\). Ensuring that decoding is correct to a given computational depth places constraints on the parameters t and d, in order to avoid ending up outside the box depicted in Fig. 1 if the computation were to be carried out directly in \(\mathbb {Z}[X^{\pm 1}]\). In terms of \(R_t\) we will often refer to this event as the ‘wrapping around’ of the encoded data modulo t or f(X), although we note that this is an abuse of language. In the case of power-of-two cyclotomics, ending up above or below the box does indeed correspond to wrapping around modulo t, but ending up at the left or the right of the box corresponds to a mix-up of the high degree terms and the low degree terms.

Fig. 1.
figure 1

Box in which to stay during computation, where \(\ell + m + 1 = d\).

The precise constraints on t and d not only depend on the complexity of the computation, but also on the type of expansion (1) used in the encoding. Dowlin et al. suggest to use balanced b-ary expansions with respect to an odd base \(b \in \mathbb {Z}_{\ge 3}\), which means that the digits are taken from \(\{-(b-1)/2,\cdots ,(b-1)/2\}\). Such expansions have been used for centuries going back at least to Colson (1726) and Cauchy (1840) in the quest for more efficient arithmetic.

If we fix a precision, then for smaller b the balanced b-ary expansions are longer but the coefficients are smaller, this implies the need for a larger d but smaller t. Similarly for larger bases the expansions become shorter but have larger coefficients leading to smaller d but larger t. For the application to somewhat homomorphic encryption considered in [6, 16] the security requirements ask for a very large d, so that the best choice is to use as small a base as possible, namely \(b=3\), with digits in \(\{\pm 1,0\}\). Even for this smallest choice the resulting lower bound on t is very large and the bound on d is much smaller than that coming from the cryptographic requirements. To illustrate this, we recall the concrete figures from the paper [6], which uses the Fan-Vercauteren (FV) somewhat homomorphic encryption scheme [23] for privacy-friendly prediction of electricity consumption in the setting of the smart grid. Here the authors use \(d = 4096\) for cryptographic reasons, which is an optimistic choice that leads to 80-bit security only (and maybe even a few bits less than that [1]). On the other hand using balanced ternary expansions, correct decoding is guaranteed as soon as \(d \ge 368\), which is even a conservative estimate. This eventually leads to the huge bound \(t \gtrapprox 2^{107}\), which is overcome by decomposing \(R_t\) into 13 factors using the Chinese Remainder Theorem (CRT). This is then used to homomorphically forecast the electricity usage for the next half hour for a small apartment complex of 10 households in about half a minute, using a sequential implementation.

The discrepancy between the requirements coming from correct decoding and those coming from security considerations suggests that other possible expansions may be better suited for use with SHE. In this paper we introduce a generic encoding technique, using very sparse expansions having digits in \(\{ \pm 1, 0 \}\) with respect to a non-integral base \(b_w > 1\), where w is a sparseness measure. These expansions will be said to be of ‘non-integral base non-adjacent form’ with window size w, abbreviated to w-NIBNAF. Increasing w makes the degrees of the resulting Laurent polynomial encodings grow and decreases the growth of the coefficients when performing operations; hence lowering the bound on t. Our encoding technique is especially useful when using finite precision real numbers, but could also serve in dealing with finite precision complex numbers or even with integers, despite the fact that \(b_w\) is non-integral (this would require a careful precision analysis which is avoided here).

Fig. 2.
figure 2

Comparison of the amount of plaintext space which is actually used in the setting of [6], where \(d=4096\). More precise figures to be found in Sect. 4.

We demonstrate that this technique results in significant performance increases by re-doing the experiments from [6]. Along with a more careful precision analysis which is tailored for this specific use case, using 950-NIBNAF expansions we end up with the dramatically reduced bound \(t \ge 33\). It is not entirely honest to compare this to \(t \gtrapprox 2^{107}\) because of our better precision analysis; as explained in Sect. 4 it makes more sense to compare the new bound to \(t \gtrapprox 2^{42}\), but the reduction remains huge. As the reader can see in Fig. 2 this is explained by the fact that the data is spread more evenly across the plaintext space during computation. As a consequence we avoid the need for CRT decomposition and thus reduce the running time by a factor 13, showing that the same homomorphic forecasting can be done in only 2.5 s.

Remark. An alternative recent proposal for encoding using a non-integral base can be found in [15], which targets efficient evaluation of the discrete Fourier transform on encrypted data. Here the authors work exclusively in the power-of-two cyclotomic setting \(f(X) = X^d + 1\), and the input data consists of complex numbers \(\theta \) which are expanded with respect to the base \(b = \zeta \), where \(\zeta \) is a primitive 2d-th root of unity, i.e. a root of f(X); a similar idea was used in [12]. One nice feature of this approach is that the correctness of decoding is not affected by wrapping around modulo f(X). To find a sparse expansion they use the LLL algorithm [28], but for arbitrary complex inputs the digits become rather large when compared to w-NIBNAF.

2 Encoding Data Using w-NIBNAF

Our approach in reducing the lower bound on the plaintext modulus t is to use encodings for which many of the coefficients are zero. In this respect, a first improvement over balanced ternary expansions is obtained by using the non-adjacent form (NAF) representations which were introduced by Reitweisner in 1960 for speeding up early multiplication algorithms [33]. We note that independent work by Cheon et al. [11] also mentions the advantages of using NAF encodings.

Definition 1

The non-adjacent form (NAF) representation of a real number \(\theta \) is an expansion of \(\theta \) to the base \(b=2\) with coefficients in \(\{-1,0,1\}\) such that any two adjacent coefficients are not both non-zero.

The NAF representation has been generalized [13]: for an integer \(w\ge 1\) (called the ‘window size’) one can ensure that in any window of w consecutive coefficients at most one of them is non-zero. This is possible to base \(b=2\) but for \(w>2\) one requires larger coefficients.

Definition 2

Let \(w\ge 1\) be an integer. A w-NAF representation of a real number \(\theta \) is an expansion of \(\theta \) with base 2 and whose non-zero coefficients are odd and less than \(2^{w-1}\) in absolute value such that for every set of w consecutive coefficients at most one of them is non-zero.

We see that NAF is just the special case of w-NAF for \(w=2\). Unfortunately, due to the fact that the coefficients are taken from a much larger set, using w-NAF encodings in the SHE setting actually gives larger bounds on both t and d for increasing w. Therefore this is not useful for our purposes.

Ideally, we want the coefficients in our expansions to be members of \(\{\pm 1,0\}\) with many equal to 0, as this leads to the slowest growth in coefficient sizes, allowing us to use smaller values for t. This would come at the expense of using longer encodings, but remember that we have a lot of manoeuvring space on the d side. One way to achieve this goal is to use a non-integral base \(b>1\) when computing a non-adjacent form. We first give the definition of a non-integral base non-adjacent form with window size w (w-NIBNAF) representation and then explain where this precise formulation comes from.

Definition 3

A sequence \(a_0,a_1,\cdots ,a_n,\cdots \) is a w-balanced ternary sequence if it has \(a_i\in \{-1,0,1\}\) for \(i\in \mathbb {Z}_{\ge 0}\) and satisfies the property that each set of w consecutive terms has no more than one non-zero term.

Definition 4

Let \(\theta \in \mathbb {R}\) and \(w\in \mathbb {Z}_{>0}\). Define \(b_w\) to be the unique positive real root of the polynomial \(F_w(x)=x^{w+1}-x^w-x-1\). A w-balanced ternary sequence \(a_r,a_{r-1},\cdots ,a_1,a_0,a_{-1},\cdots \) is a w-NIBNAF representation of \(\theta \) if

$$\begin{aligned} \theta =a_rb_w^r+a_{r-1}b_w^{r-1}+\cdots +a_1b_w+a_0+a_{-1}b_w^{-1}+\cdots . \end{aligned}$$

Below we will show that every \(\theta \in \mathbb {R}\) has at least one such w-NIBNAF representation and provide an algorithm to find such a representation. But let us first state a lemma which shows that \(b_w\) is well-defined for \(w\ge 1\).

Lemma 1

For an integer \(w\ge 1\) the polynomial \(F_w(x)=x^{w+1}-x^{w}-x-1\) has a unique positive real root \(b_w>1\). The sequence \(b_1,b_2,\cdots \) is strictly decreasing and \(\lim _{w \rightarrow \infty } b_w = 1\). Further, \((x^2+1)\mid F_w(x)\) for \(w\equiv 3\bmod {4}\).

The proof is straightforward and given in Appendix A. The first few values of \(b_w\) are as follows

figure a

where we note that \(b_3\) is the golden ratio \(\phi \).

Since we are using a non-integral base, a w-NIBNAF representation of a fixed-point number has infinitely many non-zero terms in general. To overcome this one approximates the number by terminating the w-NIBNAF representation after some power of the base. We call such a terminated sequence an approximate w-NIBNAF representation. There are two straightforward ways of deciding where to terminate: either a fixed power of the base is chosen so that any terms after this are discarded giving an easy bound on the maximal possible error created, or we choose a maximal allowed error in advance and terminate after the first power which gives error less than or equal to this value.

Algorithm 1 below produces for every \(\theta \in \mathbb {R}\) a w-NIBNAF representation in the limit as \(\epsilon \) tends to 0, thereby demonstrating its existence. It takes the form of a greedy algorithm which chooses the closest signed power of the base to \(\theta \) and then iteratively finds a representation of the difference. Except when \(\theta \) can be written as \(\theta = h(b_w)/b_w^q\), for some polynomial h with coefficients in \(\{\pm 1,0\}\) and \(q\in \mathbb {Z}_{\ge 0}\), any w-NIBNAF representation is infinitely long. Hence, we must terminate Algorithm 1 once the iterative input is smaller than some pre-determined precision \(\epsilon >0\).

figure b

We now prove that the algorithm works as required.

Lemma 2

Algorithm 1 produces an approximate w-NIBNAF representation of \(\theta \) with an error of at most \(\epsilon \).

Proof

Assuming that the algorithm terminates, the output clearly represents \(\theta \) to within an error of at most size \(\epsilon \). First we show that the output is w-NIBNAF. Suppose that the output, on input \(\theta ,b_w,\epsilon \), has at least two non-zero terms, the first being \(a_d\). This implies either that \(b_w^d\le |\theta |<b_w^{d+1}\) and \(b_w^{d+1}-|\theta |>|\theta |-b_w^d\) or \(b_w^{d-1}<|\theta |\le b_w^d\) and \(b_w^d-|\theta |\le |\theta |-b_w^{d-1}\). These conditions can be written as \(b_w^d\le |\theta |<\tfrac{1}{2}b_w^d(1+b_w)\) and \(\tfrac{1}{2}b_w^{d-1}(1+b_w)\le |\theta |\le b_w^d\) respectively, showing that

$$\begin{aligned} ||\theta |-b_w^d|<\max \left\{ b_w^d-\tfrac{1}{2}b_w^{d-1}(1+b_w),~\tfrac{1}{2}b_w^d(1+b_w)-b_w^d\right\} =\tfrac{1}{2}b_w^d(b_w-1)\,. \end{aligned}$$

The algorithm subsequently chooses the closest power of \(b_w\) to this smaller value, suppose it is \(b_w^\ell \). By the same argument with \(\theta \) replaced by \(|\theta |-b_w^d\) we have that either \(b_w^\ell \le \left| |\theta |-b_w^d\right| \) or \(\tfrac{1}{2}b_w^{\ell -1}(1+b_w)\le \left| |\theta |-b_w^d\right| \) and since \(b_w^\ell \) is larger than \(\tfrac{1}{2}b_w^{\ell -1}(1+b_w)\) the maximal possible value of \(\ell \), which we denote by \(\ell _w(d)\), satisfies

$$\begin{aligned} \ell _w(d)=\max \left\{ \ell \in \mathbb {Z}~\left| ~\tfrac{1}{2}b_w^{\ell -1}(1+b_w)<\tfrac{1}{2}b_w^d(b_w-1)\right. \right\} . \end{aligned}$$

The condition on \(\ell \) can be rewritten as \(b_w^\ell <b_w^{d+1}(b_w-1)/(b_w+1)\) which implies that \(\ell <d+1+\log _{b_w}((b_w-1)/(b_w+1))\) and thus

$$\begin{aligned} \ell _w(d)=d+\left\lceil \log _{b_w}\left( \frac{b_w-1}{b_w+1}\right) \right\rceil , \end{aligned}$$

so that the smallest possible difference is independent of d and equal to

$$\begin{aligned} s(w):=d-\ell _w(d)=-\left\lceil \log _{b_w}\left( \frac{b_w-1}{b_w+1}\right) \right\rceil =\left\lfloor \log _{b_w}\left( \frac{b_w+1}{b_w-1}\right) \right\rfloor . \end{aligned}$$

We thus need to show that \(s(w)\ge w\). As w is an integer this is equivalent to

$$\begin{aligned} \log _{b_w}\left( \frac{b_w+1}{b_w-1}\right) \ge w~\iff ~b_w^{w}\le \frac{b_w+1}{b_w-1}~\iff ~b_w^{w+1}-b_w^{w}-b_w-1\le 0 \end{aligned}$$

which holds for all w since \(F_w(b_w)=0\). Note that our algorithm works correctly and deterministically because when \(|\theta |\) is exactly half-way between two powers of \(b_w\) we pick the larger power. This shows that the output is of the desired form.

Finally, to show that the algorithm terminates we note that the k’th successive difference is bounded above by \(\tfrac{1}{2}b_w^{d-(k-1)s(w)}(b_w-1)\) and this tends to 0 as k tends to infinity. Therefore after a finite number of steps (at most \(\left\lceil (d-\log _{b_w}\left( 2\epsilon /(b_w-1)\right) /s(w)\right\rceil +1\)) the difference is smaller than or equal to \(\epsilon \) and the algorithm terminates.    \(\square \)

The process of encoding works as described in the introduction, i.e. we follow the approach from [16, 19] except we use an approximate w-NIBNAF representation instead of the balanced ternary representation. Thus to encode a real number \(\theta \) we find an approximate w-NIBNAF representation of \(\theta \) with small enough error and replace each occurrence of \(b_w\) by X, after which we apply the map \(\iota \) to end up in plaintext space \(R_t\). Decoding is almost the same as well, only that after inverting \(\iota \) and lifting the coefficients to \(\mathbb {Z}\) we evaluate the resulting Laurent polynomial at \(X=b_w\) rather than \(X=3\), computing the value only to the required precision. Rather than evaluating directly it is best to reduce the Laurent polynomial modulo \(F_w(X)\) (or modulo \(F_w(X)/(X^2+1)\) if \(w\equiv 3\bmod {4}\)) so that we only have to compute powers of \(b_w\) up to w (respectively \(w-2\)).

Clearly we can also ask Algorithm 1 to return \(\sum _{i}a_iX^i \in \mathbb {Z}_t[X^{\pm 1}]\), this gives an encoding of \(\theta \) with maximal error \(\epsilon \). Since the input \(\theta \) of the algorithm can get arbitrarily close to but larger than \(\epsilon \), the final term can be \(\pm X^h\) where \(h=\lfloor \log _{b_w}(2\epsilon /(1+b_w))\rfloor +1\). If we are to ensure that the smallest power of the base to appear in any approximate w-NIBNAF representation is \(b_w^s\) then we require that if \(b_w^{s-1}\) is the nearest power of \(b_w\) to the input \(\theta \) then \(|\theta |\le \epsilon \) so that we must have \(\tfrac{1}{2}b_w^{s-1}(1+b_w)\le \epsilon \) which implies the smallest precision we can achieve is \(\epsilon =b_w^{s-1}(1+b_w)/2\). In particular if we want no negative powers of \(b_w\) then the best precision possible using the greedy algorithm is \((1+b_w^{-1})/2<1\).

Remark. If one replaces \(b_w\) by a smaller base \(b > 1\) then Algorithm 1 still produces a w-NIBNAF expansion to precision \(\epsilon \): this follows from the proof of Lemma 2. The distinguishing feature of \(b_w\) is that it is maximal with respect to this property, so that the resulting expansions become as short as possible.

3 Analysis of Coefficient Growth During Computation

After encoding the input data it is ready for homomorphic computations. This increases both the number of non-zero coefficients as well as the size of these coefficients. Since we are working in the ring \(R_t\) there is a risk that our data wraps around modulo t as well as modulo f(X), in the sense explained in the introduction, which we should avoid since this leads to erroneous decoding. Therefore we need to understand the coefficient growth more thoroughly. We simplify the analysis in this section by only considering multiplications and what constraint this puts on t, it is then not hard to generalize this to include additions.

Worst case coefficient growth for w -NIBNAF encodings. Here we analyze the maximal possible size of a coefficient which could occur from computing with w-NIBNAF encodings. Because fresh w-NIBNAF encodings are just approximate w-NIBNAF representations written as elements of \(R_t\) we consider finite w-balanced ternary sequences and the multiplication endowed on them from \(R_t\). Equivalently, we consider multiplication in the \(\mathbb {Z}[X^{\pm 1}]\)-plane depicted in Fig. 1. As we ensure in practice that there is no wrap around modulo f(X) this can be ignored in our analysis.

To start the worst case analysis we have the following lower bound; note that the d we use here is not that of the degree of f(X).

Lemma 3

The maximal absolute size of a term that can appear in the product of p arbitrary w-balanced ternary sequences of length \(d+1\) is at least

$$\begin{aligned} B_w(d,p):=\!\!\sum _{k=0}^{\left\lfloor {\left\lfloor p\lfloor d/w\rfloor /2\right\rfloor }/(\lfloor d/w\rfloor +1)\right\rfloor }\!\!(-1)^k\left( {\begin{array}{c}p\\ k\end{array}}\right) \!\left( {\begin{array}{c}p-1+\left\lfloor p\lfloor d/w\rfloor /2\right\rfloor -k\lfloor d/w\rfloor -k\\ p-1\end{array}}\right) . \end{aligned}$$

A full proof of this lemma is given in Appendix A but the main idea is to look at the largest coefficient of \(m^p\) where \(m\) has the maximal number of non-zero coefficients, \(\lfloor d/w\rfloor +1\), all being equal to 1 and with exactly \(w-1\) zero coefficients between each pair of adjacent non-zero coefficients. The (non-zero) coefficients of \(m^p\) are variously known in the literature as extended (or generalized) binomial coefficients or ordinary multinomials; we denote them here by \(\left( {\begin{array}{c}p\\ k\end{array}}\right) _n\) defined via

$$\begin{aligned} \left( 1+X+X^2+\cdots +X^{n-1}\right) ^p=\sum _{k=0}^\infty \left( {\begin{array}{c}p\\ k\end{array}}\right) _{n}X^k, \end{aligned}$$

[18, 21, 22, 35]. In particular the maximal coefficient is the (or a) central one and we can write \(B_w(d,p)=\left( {\begin{array}{c}p\\ k\end{array}}\right) _{n}\) where \(k=\lfloor p\lfloor d/w\rfloor /2\rfloor \) and \(n=\lfloor d/w\rfloor +1\).

We note that the w-NIBNAF encoding, using the greedy algorithm with precision \(\tfrac{1}{2}\), of \(b_w^{d+w-(d\!\!\mod w)}(b_w-1)/2\) is \(m\) so in practice this lower bound is achievable although highly unlikely to occur.

We expect that this lower bound is tight, indeed we were able to prove the following lemma, the proof is also given in Appendix A.

Lemma 4

Suppose w divides \(d\), then \(B_w(d,p)\) equals the maximal absolute size of a term that can be produced by taking the product of p arbitrary w-balanced ternary sequences of length \(d+1\).

We thus make the following conjecture which holds for all small values of p and d we tested and which we assume to be true in general.

Conjecture 1

The lower bound \(B_w(d,p)\) given in Lemma 3 is exact for all \(d\), that is the maximal absolute term size which can occur after multiplying p arbitrary w-balanced ternary sequences of length \(d+1\) is \(B_w(d,p)\).

This conjecture seems very plausible since as soon as one multiplicand does not have non-zero coefficients exactly w places apart the non-zero coefficients start to spread out and decrease in value.

To determine \(B_w(d,p)\) for fixed p define \(n:=\lfloor d/w\rfloor +1\), then we can expand the expression for \(B_w(d,p)\) as a ‘polynomial’ in n of degree \(p-1\) where the coefficients depend on the parity of n, see [5] for more details. The first few are:

$$\begin{aligned} B_w(d,1)&= 1, \qquad \qquad \qquad B_w(d,2)=n,\\B_w(d,3)&=\tfrac{1}{8}(6n^2+1)-\tfrac{(-1)^n}{8}, \qquad \qquad B_w(d,4)=\tfrac{1}{3}(2n^3+n),\\B_w(d,5)&=\tfrac{1}{384}(230n^4+70n^2+27)-\tfrac{(-1)^n}{384}(30n^2+27),\\B_w(d,6)&=\tfrac{1}{20}(11n^5+5n^3+4n). \end{aligned}$$

Denoting the coefficient of \(n^{p-1}\) in these expressions by \(\ell _p\), it can be shown (see [2] or [5]) that \(\lim _{p\rightarrow \infty }\sqrt{p}\ell _p=\sqrt{6/\pi }\) and hence we have

$$\begin{aligned} \lim _{p\rightarrow \infty }\log _2(B_w(d,p))-(p-1)\log _2(n)+\tfrac{1}{2}\log _2\left( \tfrac{\pi p}{6}\right) =0 \end{aligned}$$

or equivalently \(B_w(d,p)\sim _p\sqrt{6/\pi p}n^{p-1}\). Thus we have the approximation

$$\begin{aligned} \log _2(B_w(d,p))\approx (p-1)\log _2(n)-\tfrac{1}{2}\log _2\left( \tfrac{\pi p}{6}\right) \end{aligned}$$

which for large enough n (experimentally we found for \(n>1.825\sqrt{p-1/2}\)) is an upper bound for \(p>2\). For a guaranteed upper bound we refer to Mattner and Roos [30] where they state, for \(n,p\in \mathbb {Z}_{>0}\) with \(n\ge 2\), if \(p\ne 2\) or \(n\in \{2,3,4\}\) then \(B_w(d,p)\le \sqrt{6/(\pi p(n^2-1))}n^p\). This upper bound is in fact a more precise asymptotic limit than that above which only considers the leading coefficient.

Statistical analysis of the coefficient growth. Based on the w-NIBNAF encodings of random numbers in \(N \in \left[ -2^{40},2^{40}\right] \), we try to get an idea of the amount of zero and non-zero coefficients in a fresh encoding without fractional part, obtained by running Algorithm 1 to precision \((1 + b_w^{-1})/2\). We also analyze how these proportions change when we perform multiplications. We plot this for different values of w to illustrate the positive effects of using sparser encodings. As a preliminary remark note that the w-NIBNAF encodings produced by Algorithm 1 applied to \(-N\) and N are obtained from one another by changing all the signs, so the coefficients \(-1\) and 1 are necessarily distributed evenly.Footnote 2

We know from the definition of a w-NIBNAF expansion that at least \(w-1\) among each block of w consecutive coefficients of the expansion will be 0, so we expect for big w that the 0 coefficient occurs a lot more often than \(\pm 1\). This is clearly visible in Fig. 3. In addition we see an increasing number of 0 coefficients and decreasing number of \(\pm 1\) coefficients for increasing w. Thus both the absolute and the relative sparseness of our encodings increase as w increases.

Since the balanced ternary encoding of [16, 19] and the 2-NAF encoding [33], only have coefficients in \(\{0, \pm 1\}\) it is interesting to compare them to 1-NIBNAF and 2-NIBNAF respectively. We compare them by computing the percentage of zero and non-zero coefficients, in \(10\,000\) encodings of random integers N in \(\left[ -2^{40},2^{40}\right] \). We compute this percentage up to an accuracy of \(10^{-2}\) and consider for our counts all coefficients up to and including the leading coefficient, further zero coefficients are not counted. When we compare the percentages of zero and non-zero coefficients occurring in 1-NIBNAF and balanced ternary in Table 1 we see that for the balanced ternary representation, the occurrences of 0, 1 and \(-1\) coefficients are approximately the same, while for 1-NIBNAF the proportion of 0 coefficients is larger than that of 1 or \(-1\). Hence we can conclude that 1-NIBNAF encodings will be sparser than the balanced ternary encodings even though the window size is the same. For 2-NIBNAF we also see an improvement in terms of sparseness of the encoding compared to 2-NAF.

Fig. 3.
figure 3

Plot of \(\log _2\)(#coeff) on the vertical axis against w on the horizontal axis averaged over \(10\,000\) w-NIBNAF encodings of random integers in \(\left[ -2^{40},2^{40}\right] \).

Table 1. Comparison between the previous encoding techniques and w-NIBNAF
Fig. 4.
figure 4

Plot of \(\log _2\)(#coeff+1) on the vertical axis against the respective value of the coefficient on the horizontal axis for the result of 10 000 multiplications of two w-NIBNAF encodings of random numbers between \(\left[ -2^{40},2^{40}\right] \).

Fig. 5.
figure 5

\(\log _2\) of the maximum absolute value of the coefficient of \(x^i\) seen during 10 000 products of two w-NIBNAF encodings of random numbers in \(\left[ -2^{40},2^{40}\right] \) against i.

The next step is to investigate what happens to the coefficients when we multiply two encodings. From Fig. 4 we see that when w increases the maximal size of the resulting coefficients becomes smaller. So the plots confirm the expected result that sparser encodings lead to a reduction in the size of the resulting coefficients after one multiplication. Next, we investigate the behaviour for an increasing amount of multiplications. In Fig. 5 one observes that for a fixed number of multiplications the maximum coefficient, considering all coefficients in the resulting polynomial, decreases as w increases and the maximum degree of the polynomial increases as w increases. This confirms that increasing the degree of the polynomial, in order to make it more sparse, has the desirable effect of decreasing the size of the coefficients. Figure 5 also shows that based on the result of one multiplication we can even estimate the maximum value of the average coefficients of \(x^i\) for a specific number of multiplications by scaling the result for one multiplication.

To summarize, we plot the number of bits of the maximum coefficient of the polynomial that is the result of a certain fixed amount of multiplications as a function of w in Fig. 6. From this figure we clearly see that the maximal coefficient decreases when w increases and hence the original encoding polynomial is sparser. In addition we see that the effect of the sparseness of the encoding on the size of the resulting maximal coefficient is bigger when the amount of multiplications increases. However the gain of sparser encodings decreases as w becomes bigger. Furthermore, Fig. 6 shows that the bound given in Lemma 3 is much bigger than the observed upper bound we get from \(10\,000\) samples.

Fig. 6.
figure 6

\(\log _2\) of the observed and theoretical maximum absolute coefficient of the result of multiplying w-NIBNAF encodings of random numbers in \(\left[ -2^{40},2^{40}\right] \) against w.

4 Practical Impact

We encounter the following constraints on the plaintext coefficient modulus t while homomorphically computing with polynomial encodings of finite precision real numbers. The first constraint comes from the correctness requirement of the SHE scheme: the noise inside the ciphertext should not exceed a certain level during the computations, otherwise decryption fails. Since an increase of the plaintext modulus expands the noise this places an upper bound on the possible t which can be used. The second constraint does not relate to SHE but to the circuit itself. After any arithmetic operation the polynomial coefficients tend to grow. Given that fact, one should take a big enough plaintext modulus in order to prevent or mitigate possible wrapping around modulo t. This determines a lower bound on the range of possible values of t. In practice, for deep enough circuits these two constraints are incompatible, i.e. there is no interval from which t can be chosen. However, the plaintext space \(R_t\) can be split into smaller rings \(R_{t_1}, \cdots , R_{t_k}\) with \(t = \prod _{i=1}^k t_i\) using the Chinese Remainder Theorem (CRT). This technique [8] allows us to take the modulus big enough for correct evaluation of the circuit and then perform k threads of the homomorphic algorithm over \(\{R_{t_i}\}_i\). These k output polynomials will then be combined into the final output, again by CRT. This approach needs k times more memory and time than the case of a single modulus. Thus the problem is mostly about reducing the number of factors of t needed.

An a priori lower bound on t can be derived using the worst case scenario in which the final output has the maximal possible coefficient, which was analyzed in Sect. 3. If we use w-NIBNAF encodings for increasing values of w then this lower bound will decrease, eventually leading to fewer CRT factors; here a concern is not to take w too large to prevent wrapping around modulo f(X). In practice though, we can take t considerably smaller because the worst case occurs with a negligible probability, which even decreases for circuits having a bigger multiplicative depth. Moreover, we can allow the least significant coefficients of the fractional part to wrap around modulo t with no harm to the final results.

In this section we revisit the homomorphic method for electricity load forecasting described in [6] and demonstrate that by using w-NIBNAF encodings, by ignoring the unlikely worst cases, and by tolerating minor precision losses we can reduce the number of CRT factors from \(k=13\) to \(k=1\), thereby enhancing its practical performance by a factor 13. We recall that [6] uses the Fan-Vercauteren SHE scheme [23], along with the group method of data handling (GMDH) as a prediction tool; we refer to [6, Sect. 3] for a quick introduction to this method. Due to the fact that 80% of electricity meter devices in the European Union should be replaced with smart meters by 2020, this application may mitigate some emerging privacy and efficiency issues.

Experimental setup. For comparison’s sake we mimic the treatment in [6] as closely as possible. In particular we also use the real world measurements obtained from the smart meter electricity trials performed in Ireland [14]. This dataset [14] contains observed electricity consumption of over 5000 residential and commercial buildings during 30 min intervals. We use aggregated consumption data of 10 buildings. Given previous consumption data with some additional information, the GMDH network has the goal of predicting electricity demand for the next time period. Concretely, it requires 51 input parameters: the 48 previous measurements plus the day of the week, the month and the temperature. There are three hidden layers with 8, 4, 2 nodes, respectively. A single output node provides the electricity consumption prediction for the next half hour. Recall that a node is just a bivariate quadratic polynomial evaluation.

The plaintext space is of the form \(R_t = \mathbb {Z}_t[X]/(X^{4096}+1)\), where the degree \(d=4096\) is motivated by the security level of 80 bits which is targetted in [6]; recent work by Albrecht [1] implies that the actual level of security is slightly less than that. Inside \(R_t\) the terms corresponding to the fractional parts and those corresponding to the integral parts come closer together after each multiplication. Wrapping around modulo \(X^{4096} + 1\), i.e. ending up at the left or at the right of the box depicted in Fig. 1, means that inside \(R_t\) these integer and fractional parts start to overlap. In this case it is no longer possible to decode correctly. We encode the input data using approximate w-NIBNAF representations with a fixed number of integer and fractional digits. When increasing the window size w one should take into account that the precision of the corresponding encodings changes as well. To maintain the same accuracy of the algorithm it is important to keep the precision fixed, hence for bigger w’s the smaller base \(b_w\) should result in an increase of the number of coefficients used by an encoding. Starting with the balanced ternary expansion (BTE), for any \(w > 2\), the numbers \(\ell (w)_i\) and \(\ell (w)_f\) of integer and fractional digits should be expanded according to \(\ell (w)_i = (\ell (\text {BTE})_i - 1) \cdot \log _{b_w} 3 + 1\), \(\ell (w)_f = -\lfloor \log _{b_w} e_f \rfloor \), where \(e_f\) is the maximal error of an approximate w-NIBNAF representation such that the prediction algorithm preserves the same accuracy. Empirically we found that the GMDH network demonstrates reasonable absolute and relative errors when \(\ell (\text {BTE})_i^{\text {inp}} = 4\) and \(e_f^{\text {inp}} = 1\) for the input and \(\ell (\text {BTE})_i^{\text {pol}} = 2\) and \(e_f^{\text {pol}} = 0.02032\) for the coefficients of the nodes (quadratic polynomials).

Results. The results reported in this section are obtained running the same software and hardware as in [6]: namely, FV-NFLlib software library [17] running on a laptop equipped with an Intel Core i5-3427U CPU (running at 1.80 GHz). We performed 8560 runs of the GMDH algorithm with BTE, NAF and 950-NIBNAF. The last expansion is with the maximal possible w such that the resulting output polynomial still has discernible integer and fractional parts. Correct evaluation of the prediction algorithm requires the plaintext modulus to be bigger than the maximal coefficient of the resulting polynomial. This lower bound for t can be deduced either from the maximal coefficient (in absolute value) appearing after any run or, in case of known distribution of coefficient values, from the mean and the standard deviation. In both cases increasing window sizes reduce the bound as depicted in Fig. 7. Since negative encoding coefficients are used, 950-NIBNAF demands a plaintext modulus of 7 bits which is almost 6 times smaller than for BTE and NAF.

Fig. 7.
figure 7

The mean and the maximal size per coefficient of the resulting polynomial.

As expected, w-NIBNAF encodings have longer expansions for bigger w’s and that disrupts the decoding procedure in [6, 16]. Namely, they naively split the resulting polynomial into two parts of equal size. As one can observe in Fig. 7, using 950-NIBNAF, decoding in this manner will not give correct results. Instead, the splitting index \(i_s\) should be shifted towards zero, i.e. to 385. To be specific [6, Lem. 1] states that \(i_s\) lies in the interval \((d_i + 1, d - d_f)\) where \(d_i = 2^{r+1} (\ell (w)_i^{\text {inp}} + \ell (w)_i^{\text {pol}}) - \ell (w)_i^{\text {pol}}\) and \(d_f = 2^{r+1}(\ell (w)_f^{\text {inp}} + \ell (w)_f^{\text {pol}}) - \ell (w)_f^{\text {pol}}.\) Indeed, this is the worst case estimation which results in the maximal \(w = 74\) for the current network configuration.

However the impact of the lower coefficients of the fractional part can be much smaller than the precision required by an application. In our use case the prediction value should be precise up to \(e_f^{\text {inp}} = 1\). We denote the aggregated sum of lower coefficients multiplied by corresponding powers of the w-NIBNAF base as \(L(j) = \sum _{i=j-1}^{i_s} a_i b_w^{-i}\). Then the omitted fractional coefficients \(a_i\) should satisfy \(\left| L(i_c)\right| < 1\), where \(i_c\) is the index after which coefficients are ignored.

To find \(i_c\) we computed L(j) for every index j of the fractional part and stored those sums for each run of the algorithm. For fixed j the distribution of L(j) is bimodal with mean \(\mu _{L(j)}\) and standard deviation \(\sigma _{L(j)}\) (see Fig. 8). Despite the fact that this unknown distribution is not normal, we naively approximate the prediction interval \([\mu _{L(j)} - 6 \sigma _{L(j)}\), \(\mu _{L(j)} + 6 \sigma _{L(j)}]\) that will contain the future observation with high probability. It seems to be a plausible guess in this application because all observed L(j) fall into that region with a big overestimate according to Fig. 8. Therefore \(i_c\) is equal to the maximal j that satisfies \(\tau (j) < 1\), where \(\tau (j) = \max (|\mu _{L(j)} - 6 \sigma _{L(j)}|, |\mu _{L(j)} + 6 \sigma _{L(j)}|)\).

Fig. 8.
figure 8

The distribution of L(3500) over 8560 runs of the GMDH algorithm and an approximation of its prediction interval in red.

Fig. 9.
figure 9

The expected precision loss after ignoring fractional coefficients less than j.

As Fig. 9 shows, \(i_c\) is equal to 3388. Thus, the precision setting allows an overflow in any fractional coefficient \(a_j\) for \(j < 3388\). The final goal is to provide the bound on t which is bigger than any \(a_j\) for \(j \ge 3388\). Since the explicit distributions of coefficients are unknown and seem to vary among different indices, we rely in our analysis on the maximal coefficients occurring among all runs. Hence, the plaintext modulus should be bigger than \(\max _{j \ge 3388}\{a_j\}\) over all resulting polynomials. Looking back at Fig. 7, one can find that \(t=33\) suffices.

As mentioned above t is constrained in two ways: from the circuit and from the SHE correctness requirements. In our setup the ciphertext modulus is \(q \approx 2^{186}\) and the standard deviation of noise is \(\sigma = 102\), which together impose that \(t \le 396\) [6]. This is perfectly compatible with \(t=33\), therefore 950-NIBNAF allows us to omit the CRT trick and work with a single modulus, reducing the sequential timings by a factor 13. In the parallel mode it means that 13 times less memory is needed.

Table 2. GMDH implementation with 950-NIBNAF and BTE [6]

Additionally, these plaintext moduli are much smaller than the worst case estimation from Sect. 3. For 950-NIBNAF we take \(d \in [542, 821]\) according to the encoding degrees of input data and network coefficients. Any such encoding contains only one non-zero coefficient. Consequently, any product of those encodings has only one non-zero coefficient which is equal to \(\pm 1\). When all monomials of the GMDH polynomial result in an encoding with the same index of a non-zero coefficient, the maximal possible coefficient of the output encoding will occur. In this case the maximal coefficient is equal to the evaluation of the GMDH network with all input data and network coefficients being just 1. It leads to \(t = 2\cdot 6^{15} \simeq 2^{39.775}\).

One further consequence of smaller t is that one can reconsider the parameters of the underlying SHE scheme. Namely, one can take smaller q and \(\sigma \) that preserve the same security level and require a smaller bound on t instead of 396 taken above. Given \(t = 33\) from above experiments, q reduces to \(2^{154}\) together with \(\sigma \approx 5\) that corresponds to smaller sizes of ciphertexts and faster SHE routines, where \(\sigma \) is taken the minimal possible to prevent the Arora-Ge attack [4] as long as each batch of input parameters is encrypted with a different key. Unfortunately, it is not possible to reduce the size of q by 32 bits in our implementation due to constraints of the FV-NFLlib library.

5 Conclusions

We have presented a generic technique to encode real numbers using a non-integral base. This encoding technique is especially suitable for use when evaluating homomorphic functions since it utilizes the large degree of the defining polynomial imposed by the security requirements. This leads to a considerably smaller growth of the coefficients and allows one to reduce the size of the plaintext modulus significantly, resulting in faster implementations. We show that in the setting studied in [6], where somewhat homomorphic function evaluation is used to achieve a privacy-preserving electricity forecast algorithm, the plaintext modulus can be reduced from about \(2^{103}\) when using a balanced ternary expansion encoding, to \(33 \simeq 2^{5.044}\) when using the encoding method introduced in this paper (non-integral base non-adjacent form with window size w), see Table 2. This smaller plaintext modulus means a factor 13 decrease in the running time of this privacy-preserving forecasting algorithm: closing the gap even further to making this approach suitable for industrial applications in the smart grid.