Skip to main content
Log in

Profiles of PATRICIA Tries

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

A PATRICIA trie is a trie in which non-branching paths are compressed. The external profile \(B_{n,k}\), defined to be the number of leaves at level k of a PATRICIA trie on n nodes, is an important “summarizing” parameter, in terms of which several other parameters of interest can be formulated. Here we derive precise asymptotics for the expected value and variance of \(B_{n,k}\), as well as a central limit theorem with error bound on the characteristic function, for PATRICIA tries on n infinite binary strings generated by a memoryless source with bias \(p > 1/2\) for \(k\sim \alpha \log n\) with \(\alpha \in (1/\log (1/q) + \varepsilon , 1/\log (1/p) - \varepsilon )\) for any fixed \(\varepsilon > 0\). In this range, \( {\mathbb {E}}[B_{n,k}] = \varTheta ( {\mathrm {Var}}[B_{n,k}])\), and both are of the form \(\varTheta (n^{\beta (\alpha )}/\sqrt{\log n})\), where the \(\varTheta \) hides bounded, periodic functions in \(\log n\) whose Fourier series we explicitly determine. The compression property leads to extra terms in the Poisson functional equations for the profile which are not seen in tries or digital search trees, resulting in Mellin transforms which are only implicitly given in terms of the moments of \(B_{m,j}\) for various m and j. Thus, the proofs require information about the profile outside the main range of interest. Our derivations rely on analytic techniques, including Mellin transforms, analytic de-Poissonization, the saddle point method, and careful bounding of complex functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Volume 55 of National Bureau of Standards Applied Mathematics Series. For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington (1964)

    MATH  Google Scholar 

  2. Devroye, L.: A note on the probabilistic analysis of patricia trees. Random Struct. Algorithms 3(2), 203–214 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  3. Devroye, L.: Laws of large numbers and tail inequalities for random tries and patricia trees. J. Comput. Appl. Math. 142, 27–37 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Devroye, L.: Universal asymptotics for random tries and patricia trees. Algorithmica 42(1), 11–29 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  5. Drmota, M., Szpankowski, W.: The expected profile of digital search trees. J. Comb. Theory Ser. A 118(7), 1939–1965 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Flajolet, P.: The ubiquitous digital tree. STACS 3884, 1–22 (2006)

    MathSciNet  MATH  Google Scholar 

  7. Flajolet, P., Gourdon, X., Dumas, P.: Mellin transforms and asymptotics: Harmonic sums. Theor. Comput. Sci. 144, 3–58 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  8. Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  9. Jacquet, P., Knessl, C., Szpankowski, W.: A note on a problem posed by D. E. Knuth on a satisfiability recurrence. Comb. Probab. Comput. 23, 829–841 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  10. Jacquet, P., Szpankowski, W.: Analytical depoissonization and its applications. Theor. Comput. Sci. 201(1–2), 1–62 (1998)

    Article  MATH  Google Scholar 

  11. Janson, S., Szpankowski, W.: Analysis of an asymmetric leader election algorithm. Electron. J. Comb. 4, 1–6 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kazemi, R., Vahidi-Asl, M.: The variance of the profile in digital search trees. Discrete Math.Theor. Comput. Sci. 13(3), 21–38 (2011)

    MathSciNet  MATH  Google Scholar 

  13. Knuth, D.E.: The Art of Computer Programming, Sorting and Searching, vol. 3, 2nd edn. Addison Wesley Longman Publishing Co. Inc, Redwood City (1998)

    MATH  Google Scholar 

  14. Magner, A.: Profiles of PATRICIA Tries. PhD thesis, Purdue University, December (2015)

  15. Magner, A., Knessl, C., Szpankowski, W.: Expected external profile of patricia tries. In: Proceedings of the Eleventh Workshop on Analytic Algorithmics and Combinatorics, pp. 16–24 (2014)

  16. Naor, M., Wieder, U.: Novel architectures for P2P applications: the continuous-discrete approach. In: Proceedings of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’03, pp. 50–59. ACM, New York, NY, USA (2003)

  17. Park, G., Hwang, H., Nicodème, P., Szpankowski, W.: Profiles of tries. SIAM J. Comput. 38(5), 1821–1880 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  18. Pittel, B.: Asymptotic growth of a class of random trees. Ann. Probab. 18, 414–427 (1985)

    Article  MATH  Google Scholar 

  19. Pittel, B., Rubin, H.: How many random questions are needed to identify \(n\) distinct objects? J. Comb. Theory Ser. A 55(2), 292–312 (1990)

    Article  MATH  Google Scholar 

  20. Regnier, M., Jacquet, P.: New results on the size of tries. Inf. Theory IEEE Trans. 35(1), 203–205 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  21. Szpankowski, W.: Patricia tries again revisited. J. ACM 37(4), 691–711 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  22. Szpankowski, W.: Average Case Analysis of Algorithms on Sequences. Wiley, New York (2001)

    Book  MATH  Google Scholar 

  23. Titchmarsh, E.C.: The Theory of Functions. Oxford University Press, Oxford (1939)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abram Magner.

Additional information

This work was supported by NSF Center for Science of Information (CSoI) Grant CCF-0939370, and in addition by NSF Grant CCF-1524312 and NIH Grant 1U01CA198941-01.

Appendix

Appendix

1.1 Well Definedness and Analyticity of Series Related to \(A_k(s)\)

Proof of Lemma 2

Proof of (i)

Without loss of generality, we can assume that \(g(n) = Cn\), for some positive constant C, because the assumption \(g(n) = \varOmega (n)\) implies that, for large enough n, \(g(n) \ge Cn\). Next, we apply the ratio test, which gives

$$\begin{aligned} \left| e^{-(g(n+1)) - g(n))} \frac{\phi _{n+1}(s)n!}{(n+1)!\phi _n(s)} \right| = \left| \frac{n+s}{n+1} \right| e^{-(g(n+1) - g(n))} \sim e^{-(g(n+1) - g(n))}. \end{aligned}$$

Now, using the assumption about the growth of g(n), we have that the ratio is asymptotically less than 1, so that the series converges absolutely.

Proof of (ii)

If we can show that \(F_j(s)\) is analytic at \(s=0\), then the result follows, since a function defined by a power series at a given point is analytic at all points inside its disc of convergence. We’ll start by looking at \([s^m]\phi _n(s)\). Toward that end, define \(S_m\) to be the set of all subsets of size \(n-1-m\) of the set \([n-1]\). Then, noting that \(\phi _n(s)\) is a product of \(n-1\) monomials, so that each choice of m such monomials gives a contribution to \([s^m]\phi _n(s)\), we have

$$\begin{aligned} {[}s^m]\phi _n(s) = \sum _{X \in S_m} \prod _{x \in X} x. \end{aligned}$$
(42)

Since \(|S_m| = {n-1 \atopwithdelims ()n-1-m} = {n-1 \atopwithdelims ()m}\) and \(\prod _{x \in X} x \le \frac{(n-1)!}{m!}\),

$$\begin{aligned} {[}s^m]\phi _n(s) \le \frac{(n-1)!}{m!(n-1-m)!} \cdot \frac{(n-1)!}{m!} = \frac{((n-1)!)^2}{m!^2(n-1-m)!}. \end{aligned}$$
(43)

Now,

$$\begin{aligned} {[}s^m]F_j(s) = \sum _{n \ge j} e^{-\varOmega (n)}[s^m]\phi _n(s) \frac{1}{n!}, \end{aligned}$$
(44)

and

$$\begin{aligned}&\left| e^{-\varOmega (n)}[s^m]\phi _n(s)\frac{1}{n!}\right| \le e^{-\varOmega (n)}\frac{1}{n!} \frac{(n-1)!^2}{m!^2(n-1-m)!} \end{aligned}$$
(45)
$$\begin{aligned}&= e^{-\varOmega (n)} \frac{(n-1)!}{nm!^2(n-1-m)!} \mathop {\sim }\limits ^{n\rightarrow \infty } \frac{e^{-\varOmega (n)}n^{m-1}}{m!^2}. \end{aligned}$$
(46)

The series with these terms converges because of the exponential decay of \(e^{-\varOmega (n)}\) as \(n\rightarrow \infty \). This implies that the series defining \([s^m]F_j(s)\) converges, so that \(F_j(s)\) is analytic at 0. \(\square \)

1.2 Rough Upper Bounds on \(\mu _{n,k}, c_{n,k}, \tilde{G}_k(z), \tilde{V}_k(z)\)

1.2.1 Upper Bounds on \(|\tilde{G}_k(z)|\) as \(z\rightarrow \infty \)

Proof of Lemma 1

Proof of (i)

We proceed by induction on k, then on increasing domains.

Base Case for Induction on k

For \(k=0\), we have \(\tilde{G}_0(z) = ze^{-z}\), so that

$$\begin{aligned} |\tilde{G}_0(z)| = |z| |e^{-z}| = |z|e^{-\mathfrak {R}(z)}, \end{aligned}$$
(47)

which, for large enough |z| in the cone, is less than \(C|z|^{1+\varepsilon }\) (and, in fact, any C|z|), for any choice of \(C > 0\).

Inductive Step for k

We now assume that the claimed bound is true for \(0 \le j < k\), and we prove the claim for k via induction on increasing domains.

Increasing Domains Base Case

For the base case, observe that an upper bound on \(|\tilde{G}_k(z)|\) that is uniform in k holds:

$$\begin{aligned} \tilde{G}_k(z)&= e^{-z} \sum _{n\ge k+1} \mu _{n,k}\frac{z^n}{n!} \\&= ze^{-z} \sum _{n \ge k+1} \mu _{n,k}\frac{z^{n-1}}{n!} \\&\implies |\tilde{G}_k(z)| \le |z|e^{|z|-\mathfrak {R}(z)}, \end{aligned}$$

for any z, where we’ve used the fact that \(\mu _{n,k} \le n\). In particular, this applies in the truncated cone \( {\mathscr {C}}(\theta , R)\). Since \( {\mathscr {C}}(\theta , R)\) is compact, and both the upper bound on \(|\tilde{G}_k(z)|\) and \(|z|^{1+\varepsilon }\) are continuous, it has a maximum value in \( {\mathscr {C}}(\theta , R)\), so that there is some \(C=C(R)\) for which

$$\begin{aligned} |\tilde{G}_k(z)| \le C|z|^{1+\varepsilon }, \end{aligned}$$

which establishes the base case.

Inductive Step for Increasing Domains Induction

For the inductive step, we start with the recurrence for \(\tilde{G}_k(z)\):

$$\begin{aligned} |\tilde{G}_k(z)|&\le |e^{-qz}||\tilde{G}_k(zp)| + |e^{-pz}||\tilde{G}_k(zq)| \\&\quad + |\tilde{G}_{k-1}(pz)||1-e^{-qz}| + |\tilde{G}_{k-1}(qz)||1-e^{-pz}| \\&\le (|e^{-qz}| + |1-e^{-qz}|)Cp^{1+\varepsilon }|z|^{1+\varepsilon } + (|e^{-pz}| + |1-e^{-pz}|)Cq^{1+\varepsilon }|z|^{1+\varepsilon } \end{aligned}$$

Now, for any positive c,

$$\begin{aligned} |e^{-cz}| + |1-e^{-cz}| \xrightarrow {z\rightarrow \infty } 1 \end{aligned}$$

with z in the cone. Thus, we can choose |z| large enough (depending on \(\varepsilon \)) so that, simultaneously,

$$\begin{aligned} (|e^{-qz}| + |1-e^{-qz}|)p^{1+\varepsilon } < p \end{aligned}$$

and

$$\begin{aligned} (|e^{-pz}| + |1-e^{-pz}|)q^{1+\varepsilon } < q, \end{aligned}$$

which gives

$$\begin{aligned} |\tilde{G}_k(z)| \le Cp|z|^{1+\varepsilon } + Cq|z|^{1+\varepsilon } \le C|z|^{1+\varepsilon }. \end{aligned}$$

Proof of (ii)

Recall the functional equation for \(\tilde{G}_j(z)\).

$$\begin{aligned} \tilde{G}_j(z) = L[\tilde{G}]_{j-1}(z) + T[\tilde{G}]_j(z). \end{aligned}$$

Iterating this recurrence, we get

$$\begin{aligned} \tilde{G}_j(z) = L^{j}[\tilde{G}]_0(z) + \sum _{\ell =0}^{j-1} L^{\ell } T[\tilde{G}]_{j-\ell }(z), \end{aligned}$$
(48)

where, recall, \(\tilde{G}_0(x) = xe^{-x}\). Applying the definition of L, the first term becomes

$$\begin{aligned} \sum _{\ell =0}^{j} {j\atopwithdelims ()\ell } \tilde{G}_0(p^\ell q^{j-\ell }z) = \sum _{\ell =0}^j {j\atopwithdelims ()\ell } p^{\ell }q^{j-\ell } ze^{-p^\ell q^{j-\ell }z}. \end{aligned}$$

Now, for positive real z, we have

$$\begin{aligned} p^{\ell } q^{j-\ell }z \ge q^{j}z, \end{aligned}$$

so that

$$\begin{aligned} e^{-p^{\ell }q^{j-\ell }z} \le e^{-q^{j}z}. \end{aligned}$$

Taking the absolute value of the first sum, applying the triangle inequality and the observation just made, and pulling the resulting exponential factor and |z| out of the summation gives an upper bound of

$$\begin{aligned} |z|e^{-q^{j}|z|\cos (\arg (z))} \sum _{\ell =0}^j {j\atopwithdelims ()\ell } p^{\ell }q^{j-\ell } = |z|e^{-q^j|z|\cos (\arg (z))}. \end{aligned}$$

Turning to the second summation, we consider the \(\ell \)th term:

$$\begin{aligned} L^{\ell } T[\tilde{G}]_{j-\ell }(z) = \sum _{r=0}^{\ell } {\ell \atopwithdelims ()r} T[\tilde{G}]_{j-\ell }(p^{r}q^{\ell -r}z). \end{aligned}$$
(49)

Now, we upper bound \(\tilde{G}_{j-\ell }(x)\) and \(\tilde{G}_{j-\ell -1}(x)\) by \(C'|x|^{1+\varepsilon }\), for some \(C'\) independent of C, and an arbitrarily small \(\varepsilon > 0\). This we can do by part (i). This implies that

$$\begin{aligned} |T[\tilde{G}]_{j-\ell }(x)|&= |e^{-px}(\tilde{G}_{j-\ell }(qx) - \tilde{G}_{j-\ell -1}(qx))\\&+ e^{-qx}(\tilde{G}_{j-\ell }(px) - \tilde{G}_{j-\ell -1}(px))| \\&\le 4C'pe^{-q|x|\cos (\arg (x))}|x|^{1+\varepsilon }. \end{aligned}$$

Plugging this into (49) gives an upper bound of

$$\begin{aligned} 4C'p\sum _{r=0}^{\ell } {\ell \atopwithdelims ()r} p^{r}q^{\ell -r}|z|^{1+\varepsilon } e^{-p^{r}q^{\ell -r+1}|z|\cos (\arg (z))} \le 4C'p|z|^{1+\varepsilon }e^{-q^{\ell +1}|z|\cos (\arg (z))}. \end{aligned}$$

Here, we’ve used the fact that \(p^rq^{-r} = (p/q)^r \ge 1^r = 1\) to upper bound the exponent. After this, the only factor of the above expression which contain r is

$$\begin{aligned} \sum _{r = 0}^\ell {\ell \atopwithdelims ()r} p^r q^{\ell - r} = 1. \end{aligned}$$

The resulting upper bound is maximized when \(\ell = j-1\), so that the second sum of (48) is upper bounded by

$$\begin{aligned} 4C'p|z|^{1+\varepsilon }je^{-q^{j}|z|\cos (\arg (z))}. \end{aligned}$$

Thus, we have an upper bound of

$$\begin{aligned} |\tilde{G}_j(z)| \le |z|^{1+\varepsilon }e^{-q^j|z|\cos (\theta )}(1 + 4C'pj). \end{aligned}$$

Now, since \(j \le C\), we finally have

$$\begin{aligned} |\tilde{G}_j(z)| \le |z|^{1+\varepsilon }e^{-q^C|z|\cos (\theta )}(1 + 4CC'p). \end{aligned}$$

\(\square \)

1.2.2 Superexponentially Decaying Bound on \(\mu _{n,k}\) for \(k = \varTheta (n)\)

We now aim to prove (10) of Theorem 1. The natural way to do this is by induction on m and using the recurrence for \(\mu _{m,j}\), but the inductive hypothesis cannot then be applied for all \(h < m\): there appear terms of the form \(\mu _{h,j-1}\) in the recurrence, and it is sometimes the case that

$$\begin{aligned} Ch > (j-1), \end{aligned}$$

which happens precisely when \(h > m-1/C\). Thus, we must first prove a similar lemma which bounds \(\mu _{m,j}\) whenever \(m-j < \ell \), for any fixed \(\ell \ge 0\).

Lemma 15

For any \(C > 1\), there exist \(c_1, c_2 > 0\) such that, for n large enough,

$$\begin{aligned} \mu _{n,m} \le c_1n!e^{-c_2 m^2} \end{aligned}$$

whenever \(m \ge n - C\).

Proof

This is by induction on n.

Base Case

For the base case, we show that, for any \(M \ge 0\), we can find \(c_1\) and \(c_2\) such that the claimed inequality is satisfied whenever \(n \le M\). Given any \(M \ge 0\) and \(c_2 > 0\), we have, for \(n \le M\),

$$\begin{aligned} \mu _{n,m} \le M, \end{aligned}$$

and, provided that we take

$$\begin{aligned} c_1 \ge Me^{c_2 M^2}, \end{aligned}$$

this implies that

$$\begin{aligned} \mu _{n,m} \le c_1 n! e^{-c_2 m^2}, \end{aligned}$$

for all \(n,m \le M\).

Inductive Step

For the inductive step, we assume that, for appropriately chosen \(c_1, c_2\), the claimed inequality holds for \(\mu _{n', m'}\) for any \(n' < n\) with \(n > M\), and for any \(m'\). In what follows, we will derive a condition on \(c_2\) which must (and can) be satisfied in order for the induction to work. Now, by the recurrence for \(\mu _{n,m}\) and the fact that \(m \ge n-C\),

$$\begin{aligned} (1-T(-n))\mu _{n,m} = \sum _{j=n-C}^{n-1} {n\atopwithdelims ()j}p^jq^{n-j}\mu _{j,m-1}, \end{aligned}$$

and, since \(j \le n-1\), we have

$$\begin{aligned} m-1 \ge (n-1)-C \ge j-C, \end{aligned}$$

so that the inductive hypothesis can be applied to each term in the sum:

$$\begin{aligned} (1-T(-n))\mu _{n,m} \le \sum _{j=n-C}^{n-1} {n\atopwithdelims ()j}p^n \cdot c_1 j! e^{-c_2(m-1)^2} = c_1 n! p^n\sum _{j=n-C}^{n-1} \frac{e^{-c_2(m-1)^2}}{(n-j)!} \end{aligned}$$

We can further upper bound this by

$$\begin{aligned} c_1 C n! p^n e^{-c_2(m-1)^2}, \end{aligned}$$

and our goal is now to choose \(c_2\) such that

$$\begin{aligned} Cp^n e^{2 c_2 m - c_2}/(1-T(-n)) \le 1 - \varepsilon , \end{aligned}$$

for some positive constant \(\varepsilon \).

We need

$$\begin{aligned} e^{c_2(2m - 1)} \le \frac{(1-\varepsilon )(1 - T(-n))}{Cp^n}. \end{aligned}$$

Taking logarithms and dividing both sides by \(2m-1\), we must have, equivalently,

$$\begin{aligned} c_2 \le \frac{\log (1-\varepsilon ) - \log C + n\log (1/p) + \log (1 - T(-n))}{2m-1}. \end{aligned}$$

The required upper bound is lower bounded by

$$\begin{aligned} \frac{\log (1 - \varepsilon ) - \log C + n\log (1/p) + \log (1 - T(-n))}{2n-1}, \end{aligned}$$

which tends to \(\log (1/p)/2\) as \(n\rightarrow \infty \). Thus, provided n is sufficiently large (depending only on C and \(\varepsilon \); this can be enforced by choosing a sufficiently large M), the required upper bound is clearly positive, so that a \(c_2\) which satisfies it can be chosen. Then \(c_1\) can be chosen as dictated by the base case, and we have

$$\begin{aligned} \mu _{n,m} \le c_1n! e^{-c_2 m^2}, \end{aligned}$$

as desired. \(\square \)

We can now prove (10).

Proof (Proof of (10))

Throughout, we suppress floor and ceiling functions, which are insignificant to the analysis.

The proof is again by induction on n.

Base Case

By the same argument as in the proof of Lemma 15, for any M and \(c_2\), \(c_1\) can be chosen appropriately so as to ensure that the claimed inequality holds for \(n, m \le M\).

Inductive Step

We now proceed with the induction. Again using the recurrence and the fact that \(m \ge Cn\), we have

$$\begin{aligned} (1-T(-n))\mu _{n,m} \le \sum _{j=Cn}^{n-1} {n\atopwithdelims ()j} p^n (\mu _{j,m-1} + \mu _{n-j,m-1}). \end{aligned}$$

Now, we only know that \(m-1 \ge Cn - 1 = C(n-1/C)\). That is, for some of the terms in the sum, we cannot apply the induction hypothesis. To circumvent this problem, we split the sum into two parts, one of which we handle by the induction hypothesis, and the other by Lemma 15. That is, we will upper bound by

$$\begin{aligned} \sum _{j=Cn}^{n-1/C} {n\atopwithdelims ()j} p^n (\mu _{j,m-1} + \mu _{n-j,m-1}) + \sum _{j=n-1/C + 1}^{n-1} {n\atopwithdelims ()j} p^n (\mu _{j,m-1} + \mu _{n-j,m-1}). \end{aligned}$$

We now upper bound the first sum: applying the induction hypothesis, we can upper bound it by

$$\begin{aligned}&n!p^n c_1 e^{-c_2(m-1)^2} \sum _{j=Cn}^{n-1/C} \left[ \frac{1}{(n-j)!} + \frac{1}{j!} \right] \\&\le c_1 n! p^n e^{-c_2(m-1)^2} \left[ \frac{1}{(1/C)!} + \frac{1}{(Cn)!} \right] n(1-C). \end{aligned}$$

We thus require that

$$\begin{aligned} D n p^n e^{c_2(2m-1)}/(1 - T(-n)) \le 1 - \varepsilon , \end{aligned}$$

where D is some positive constant, and \(\varepsilon \) is any positive constant less than 1. Just as in the proof of Lemma 15, we can choose \(c_2\) small enough so that this holds for any n sufficiently large.

The second sum is handled analogously, and we choose the minimum of the two resulting constants for \(c_2\). We then choose \(c_1\) sufficiently large, and this completes the proof. \(\square \)

1.2.3 Bounds on \(|\tilde{C}_k(z)|\) and \(|\tilde{V}_k(z)|\) as \(z\rightarrow \infty \)

Proof of Lemma 10

Proof of (i)

We will prove a slightly stronger claim, because it will help in the implementation of the induction. In particular, we claim that the inequality holds for any z in the cone, regardless of magnitude.

To establish the claim for z in a compact region of the cone including the origin, we prove the following: the upper bound (uniform in k) of

$$\begin{aligned} |\tilde{C}_k(z)| \le |z|^2 e^{|z| - \mathfrak {R}(z)}, \end{aligned}$$

which holds for any \(z \in {\mathbb {C}}\), shows that there is some positive constant C for which the inequality holds for any k whenever \(|z| \le R\). The proof is as follows:

$$\begin{aligned} |e^{-z}| \left| \sum _{m\ge 0} c_{m,k} \frac{z^m}{m!} \right|&\le e^{-|z|\cos (\arg (z))} \sum _{m\ge 0} |c_{m,k}| \frac{|z|^m}{m!} \\&\le e^{-|z|\cos (\arg (z))} \sum _{m\ge 0} m(m-1)\frac{|z|^m}{m!} \\&\le |z|^2e^{|z|-|z|\cos (\arg (z))}, \end{aligned}$$

where we’ve used the fact that \(c_{m,k} \le m(m-1)\), itself a consequence of the bound

$$\begin{aligned} B_{m,k} \le m, \end{aligned}$$

which holds for all m. The remaining task is to demonstrate the polynomial upper bound for \(|z| > R\).

Base Case

For the base case, \(C_0(z) = 0\), and the inequality is trivially true throughout the cone.

Inductive Step

We now assume that the claimed inequality holds for \(k-1\), and we demonstrate it for k. We have, by the recurrence for \(\tilde{C}_k(z)\) and the inductive hypothesis,

$$\begin{aligned} |\tilde{C}_k(z)|&\le (|e^{-qz} + |1 - e^{-qz}|)Cp^{2+\varepsilon }|z|^{2+\varepsilon } + (|e^{-pz}| + |1 - e^{-pz}|)Cq^{2+\varepsilon }|z|^{2+\varepsilon } \\&\quad + C_2p^{2+\varepsilon }|z|^{2+\varepsilon }, \end{aligned}$$

where \(C_2 > 0\) and we’ve used the fact that we can make R large enough so that

$$\begin{aligned} |\tilde{G}_{k-1}(z)| \le C_3|z|^{1+\varepsilon /2}, \end{aligned}$$

for some constant \(C_3\), by Lemma 1.

Provided that we choose C large enough, we have \(C_2p^{2+\varepsilon } \le \varepsilon 'C\), for any positive \(\varepsilon '\). The rest of the proof is as in the expected value case, so we omit it.

Proof of (ii)

This follows from an easy modification of the proof of Lemma 1, part (ii), so we only sketch the proof.

We note that, as a result of part (i), which gives a polynomial upper bound (in |z|) on the growth of \(|\tilde{C}_\ell (z)|\) for all \(\ell \le C\), and Lemma 1, part (ii), we can write, for some constants \(C', C'' > 0\),

$$\begin{aligned} \tilde{V}_j(z) = L[\tilde{V}]_{j-1}(z) + C'e^{-C''|z|}, \end{aligned}$$

and iterating the recurrence shows that \(\tilde{V}_j(z)\) is a sum of terms which are exponentially decaying in |z|. \(\square \)

1.2.4 Superexponentially Decaying Bound on \(c_{n,k}\) for \(k = \varTheta (n)\)

Here we aim to prove (15) of Theorem 2. We start with the recurrence

$$\begin{aligned} c_{n,k}(1 - T(-n))&= \sum _{j=k}^{n-1}{n\atopwithdelims ()j}p^jq^{n-j}(c_{j,k-1} + c_{n-j,k-1} + 2\mu _{j,k-1}\mu _{n-j,k-1}) \\&\le \sum _{j=k}^{n-1} {n\atopwithdelims ()j} p^jq^{n-j} (c_{j,k-1} + c_{n-j,k-1}) \\&\quad + n\sum _{j=k}^{n-1} {n\atopwithdelims ()j} p^jq^{n-j}(\mu _{j,k-1} + \mu _{n-j,k-1}) \\&= \sum _{j=k}^{n-1} {n\atopwithdelims ()j}c_{j,k-1}(p^jq^{n-j} + p^{n-j}q^j) + n\mu _{n,k} (1 - T(-n)), \end{aligned}$$

so that

$$\begin{aligned} c_{n,k} \le \frac{2p^n \sum _{j=k}^{n-1} {n\atopwithdelims ()j}c_{j,k-1}}{1 - T(-n)} + n\mu _{n,k}. \end{aligned}$$
(50)

When \(n \ge 2\), we can upper bound \(T(-n)\) by

$$\begin{aligned} T(-n) \le 2p^{-n} \le 2p^{-2} \le 2^{1-2} = 1/2, \end{aligned}$$

so that

$$\begin{aligned} \frac{1}{1 - T(-n)} \le 2. \end{aligned}$$

First, we will need a simpler lemma.

Lemma 16

For all fixed \(\ell \in {\mathbb {Z}}^{\ge 0}\), there exist positive constants \(C_1, C_2\) such that, for all n and \(k \ge n-\ell \),

$$\begin{aligned} c_{n,k} \le C_1n!e^{-C_2 k^2}. \end{aligned}$$

To prove this, we need another bound on \(\mu _{n,k}\).

Lemma 17

There exist positive constants \(C^*_1, C^*_2\) such that, for all fixed \(\ell \in {\mathbb {Z}}^{\ge 0}\), all n, and \(k\ge n-\ell \),

$$\begin{aligned} n\mu _{n,k} \le C^*_1n!e^{-C^*_2 k^2}. \end{aligned}$$

Proof

This is an easy consequence of (10). \(\square \)

Proof of Lemma 16

The proof is by induction on n.

Base Case

By the initial conditions, \(c_{n,k}\le n^2\) for any nk. Thus, fixing some particular \(n_*\) and considering \(k < n_*\), we can fix a sufficiently large \(C_1\) and a \(C_2 > 0\) for which the claimed inequality holds for \(c_{n',k}\), \(n' \le n_*\).

Induction

Here we assume that the claim is true for \(n' < n\), with \(n\ge n_*\). We have

$$\begin{aligned} c_{n,k}&\le 4p^n \sum _{j=k}^{n-1} \frac{n!}{j!(n-j)!}C_1 j! e^{-C_2 (k-1)^2} + n\mu _{n,k} \end{aligned}$$
(51)
$$\begin{aligned}&\le 4p^n \sum _{j=k}^{n-1} \frac{n!}{(n-j)!} C_1 e^{-C_2 (k-1)^2} + n\mu _{n,k}. \end{aligned}$$
(52)

We were able to apply the induction hypothesis because \(j-(k-1) \le (n-1)-(k-1) \le \ell \) for all j over which the sum is taken.

To handle \(n\mu _{n,k}\), we apply Lemma 17. Now,

$$\begin{aligned} -C_2(k-1)^2 = -C_2k^2 + 2C_2k - C_2, \end{aligned}$$

so we require that

$$\begin{aligned} \frac{4p^n}{(n-k)!} e^{2C_2k - C_2} + \frac{1}{2} \le 1. \end{aligned}$$

It is easy to see that \(C_2\) can be chosen to satisfy this inequality for all n, \(k \ge n-\ell \):

$$\begin{aligned} e^{C_2(2k-1)} \le \frac{(n-k)!}{8p^n} \iff C_2 \le \frac{n\log \frac{1}{p} - \log 8 + \log ( (n-k)!)}{2k-1}. \end{aligned}$$
(53)

The first term of the numerator and the denominator are both \(\varTheta (n)\) as \(n\rightarrow \infty \), while the rest are bounded above and below by constants, so that, at least asymptotically (i.e., provided we’ve chosen \(n_*\) large enough), they are the only two that matter. It is thus sufficient to have

$$\begin{aligned} C_2 \le \frac{1}{2}\log (1/p) > 0. \end{aligned}$$

Furthermore, if we choose \(C_1 > 2C^*_1\), we have the claimed inequality. \(\square \)

Now we begin the proof of (15) of Theorem 2.

Proof of (15) of Theorem 2

The proof is similar to that of the lemma. It is by induction on n.

Base Case

The base case is exactly as in the proof of Lemma 16.

Inductive Step

We now assume that the claim is true for \(n' < n\), with \(n \ge n_* \ge 2\), with \(n_*\) as in the proof of Lemma 16. Let \(k \ge Cn\). Then, by the inequality (50),

$$\begin{aligned} c_{n,k} \le 4p^n\sum _{j=k}^{n-1} {n\atopwithdelims ()j} c_{j,k-1} + n\mu _{n,k}. \end{aligned}$$

To upper bound the terms of the sum, we note that we can apply the inductive hypothesis for any j such that \(k-1 \ge Cj\). Since \(k \ge Cn\), this means that any j satisfying \(j \le n-1/C\) is amenable to this approach. This gives, for such j,

$$\begin{aligned} c_{j,k-1} \le C_1 j! e^{-C_2(k-1)^2}. \end{aligned}$$

For \(j \in \{n-1/C, \dots , n-1\}\), we apply Lemma 16 to conclude that there exist \(C_1^*\) and \(C_2^*\) such that

$$\begin{aligned} c_{j,k-1} \le C_1^* j! e^{-C_2^*(k-1)^2}. \end{aligned}$$

Provided that we choose \(C_1 \ge C_1^*\) and \(C_2 \le C_2^*\), we can replace \(C_1^*\) and \(C_2^*\) by \(C_1\) and \(C_2\) in the above, so that the first sum is upper bounded by

$$\begin{aligned} 4p^n \sum _{j=k}^{n-1} {n\atopwithdelims ()j} C_1j!e^{-C_2(k-1)^2}. \end{aligned}$$

Next, to upper bound \(n\mu _{n,k}\), we appeal to the superexponentially decaying bound (10), and the rest of the proof proceeds as in that of Lemma 16. \(\square \)

1.3 Proof of Lemma 14

We will approach this by proving an upper and a lower bound on \(|\tilde{Q}_j(w,x)|\); that is, for some functions a(x) and b(x) satisfying certain growth properties (to be explained), we will prove that, for all sufficiently small positive constants \(\varepsilon \), for large enough |x|,

$$\begin{aligned} e^{\varepsilon b(|x|)} \le |\tilde{Q}_j(w, x)| \le e^{\varepsilon a(|x|)}. \end{aligned}$$
(54)

Provided that a(|x|) and \(b(|x|) = O(|x|)\) as \(x\rightarrow \infty \), we will then have

$$\begin{aligned} |Q_j(w, x)| = |e^{x}||\tilde{Q}_j(w, x)| = e^{|x|\cos (\arg (x))}|\tilde{Q}_j(w,x)|, \\ e^{|x|\cos (\arg (x)) + \varepsilon b(|x|)} \le |Q_j(u,x)| \le e^{|x|\cos (\arg (x)) + \varepsilon a(|x|)}, \end{aligned}$$

so that \( |Q_j(u, x)| = e^{|x|\cos (\arg (x)) + o(|x|)}. \) We propose \(a(x) = x-1\) and \(b(x) = -a(x)\).

As before, we derive a useful bound on \(|\tilde{Q}_j(w, x)|\) by setting \( \xi = 1 + |w-1|, \) so that \( {\mathbb {E}}[|w|^{B_{m,j}}] \le \xi ^m\), and plugging this into the definition of \(\tilde{Q}_j(w, x)\) gives

$$\begin{aligned} |\tilde{Q}_j(w, x)| \le e^{|x|(\xi - \cos \theta )}. \end{aligned}$$
(55)

We will use this inequality in what follows.

We now prove the claimed bounds on \(|\tilde{Q}_j(w, x)|\), for arbitrarily small fixed \(\varepsilon \). We do this by induction on j. The idea is as follows: we have, by Lemma 12, that \(Q_j(w, x) \sim e^{x}\), uniformly for all \(j \le k\), when \(x = O(1)\). This particularly implies that there is some large enough fixed \(x_*\) in the cone for which the claimed inequalities on \(\tilde{Q}_j(w, x_*)\) hold. In particular, they hold for x inside the cone with \(|x| \in (|x_*|, |x_*|/q]\), again for all \(j \le k\). In order to prove the inequalities for the rest of the cone (i.e., \(|x| \in (|x_*|/q, \infty )\)), we then apply the recurrence and the inductive hypothesis.

Base Case

Recall that \( \tilde{Q}_0(w, x) = 1 - xe^{-x}(1-w) = 1 + o(1), \) where the o is with respect to \(x\rightarrow \infty \). The claimed decay of the second term is because \(|xe^{-x}|\) remains bounded inside the cone, while \(1-w \xrightarrow {n\rightarrow \infty } 0\). Then, \( e^{\varepsilon a(|x|)} = e^{\varepsilon (|x|-1)} \rightarrow \infty . \) Furthermore, \( e^{\varepsilon b(|x|)} = e^{-\varepsilon (|x|-1)} \rightarrow 0. \) Thus, for sufficiently large |x| (depending on \(\varepsilon \)), the claimed inequality holds.

Inductive Step

For the induction, we assume that the claim is true for all \(h < j\), and we prove it for j. By the observation above, the inequalities hold for \(\tilde{Q}_j(w, x)\) when \(|x| \in (|x_*|, |x_*|/q]\), and it remains to establish that they hold for larger |x|, so we assume from here onward that \(|x| \in (|x_*|/q, \infty )\).

Recall the recurrence for \(\tilde{Q}_j(w, x)\), which holds for all \(j \ge 1\):

$$\begin{aligned} \tilde{Q}_j(w, x)&= \tilde{Q}_{j-1}(w, px)\tilde{Q}_{j-1}(w, qx) + e^{-qx}(\tilde{Q}_{j} - \tilde{Q}_{j-1})(w, px) \\&\quad + e^{-px}(\tilde{Q}_j - \tilde{Q}_{j-1})(w, qx). \end{aligned}$$

Upper Bound Inductive Step

We first handle the induction step for the upper bound. The first step is to upper bound \(|\tilde{Q}_j(w, x)|\) using the triangle inequality. Next, we handle the product: by the inductive hypothesis (applicable here because \(|x_*|< |qx| < |px|\)), we have

$$\begin{aligned} |\tilde{Q}_{j-1}(w, px)\tilde{Q}_{j-1}(w, qx)| \le e^{\varepsilon (a(p|x|) + a(q|x|))} = e^{\varepsilon a(|x|) - \varepsilon }, \end{aligned}$$

where the equality is easy algebra based on the definition of a(x). To handle the terms of the form \( |e^{-(1-c)x}| (|\tilde{Q}_j(w, cx)| + |\tilde{Q}_{j-1}(w, cx)|), \) we apply the bound (55) to both terms. This gives

$$\begin{aligned} |e^{-(1-c)x}|(|\tilde{Q}_j(w, cx)| + |\tilde{Q}_{j-1}(w, cx)|)&\le 2e^{|cx|(\xi - \cos \theta ) - (1-c)|x|\cos \theta } \\&= 2e^{|x|(c\xi - \cos \theta )} \le 2e^{|x|(p\xi - \cos \theta )}. \end{aligned}$$

Provided \(|\xi - 1|\) is sufficiently small (with respect to \(\cos \theta \)) and \(|\theta |\) sufficiently small with respect to p, the quantity in the exponent is negative and bounded away from 0. This can be done by making n sufficiently large.

Then,

$$\begin{aligned} |\tilde{Q}_j(w, x)|&\le e^{\varepsilon (a(p|x|) + a(q|x|))} \left( 1 + 4e^{|x|(p\xi - \cos \theta ) - \varepsilon (a(p|x|) - a(q|x|))} \right) \\&\le e^{\varepsilon (a(p|x|) + a(q|x|))} \left( 1 + 4e^{|x|(p\xi - \cos \theta )}\right) , \end{aligned}$$

where the second inequality is because \(a(c|x|) > 0\) when |x| is large enough (depending only on c). The factor in parentheses can be written as \( e^{\log (1 + 4e^{|x|(p\xi - \cos \theta )})} = e^{4e^{|x|(p\xi - \cos \theta )(1 + o(1))}}, \) since, by a previous observation, \(p\xi - \cos \theta < 0\). Thus, the upper bound becomes \( e^{\varepsilon a(|x|) - \varepsilon + 4e^{|x|(p\xi - \cos \theta )}}. \) Since the second term is a constant and the third term decays exponentially with respect to \(|x| \rightarrow \infty \), we can further upper bound by \( e^{\varepsilon a(|x|)}, \) provided |x| is sufficiently large. This concludes the proof of the upper bound.

Lower Bound Inductive Step

We now give the inductive step of the lower bound. First, we use the lower bound version of the triangle inequality (and we note that, for \(c > 0\), \(e^{-c|x|\cos (\arg (x))} \le e^{-c|x|\cos \theta }\), since \(\cos \theta \le \cos (\arg (x))\) and the function \(y \mapsto e^y\) is monotone increasing with respect to y):

$$\begin{aligned} |\tilde{Q}_j(w, x)|&\ge |\tilde{Q}_{j-1}(w, px)\tilde{Q}_{j-1}(w, qx)| \\&- e^{-p|x|\cos \theta } | \tilde{Q}_j - \tilde{Q}_{j-1}|(w, qx) - e^{-q|x|\cos \theta } | \tilde{Q}_j - \tilde{Q}_{j-1}|(w, px). \end{aligned}$$

We apply the inductive hypothesis to the product (justified by the same reasoning as in the upper bound proof) to get

$$\begin{aligned} |\tilde{Q}_{j-1}(w, px)\tilde{Q}_{j-1}(w, qx)| \ge e^{\varepsilon (b(p|x|) + b(q|x|))}. \end{aligned}$$

For the other two terms, we require an upper bound on expressions of the form \( e^{-(1-c)|x|\cos \theta } |\tilde{Q}_j - \tilde{Q}_{j-1}|(w, cx). \) Applying the triangle inequality and then the bound (55), we get that the above expression is upper bounded by

$$\begin{aligned} 2e^{-(1-c)|x|\cos \theta }e^{c|x|(\xi - \cos \theta )} = 2e^{|x|(c\xi - \cos \theta )} \le 2e^{|x|(p\xi - \cos \theta )}. \end{aligned}$$

Thus, we get

$$\begin{aligned} |\tilde{Q}_j(w, x)| \ge e^{\varepsilon (b(p|x|) + b(q|x|))}\left( 1 - 4e^{|x|(p\xi - \cos \theta ) - \varepsilon (b(p|x|) + b(q|x|))}\right) . \end{aligned}$$

Provided \(0< \varepsilon < \cos \theta - p\xi \) (which can hold if we choose \(\theta \) close enough to 0), we can see by substituting in the definition of b(x) that there exists a positive number \(\tau \) (depending only on \(\varepsilon , \theta , p, \xi \)) such that

$$\begin{aligned} |\tilde{Q}_j(w, x)| \ge e^{\varepsilon (b(p|x|) + b(q|x|)}(1 - 4e^{-\tau |x|}). \end{aligned}$$

As in the inductive step for the upper bound, we rewrite the second factor: \( (1 - 4e^{-\tau |x|}) = e^{-4e^{-\tau |x|}(1 + o(1))}. \) Then the bound becomes

$$\begin{aligned} |\tilde{Q}_j(w, x)| \ge e^{\varepsilon (b(p|x|) + b(q|x|)) - 4e^{-\tau |x|}(1 + o(1))}. \end{aligned}$$
(56)

Now, by definition of b(x), \( b(p|x|) + b(q|x|) = -a(|x|) + 1. \) As in the upper bound proof, the second term (i.e., \(\varepsilon \)) in the exponent of the right-hand side of (56) after applying this identity is a constant, while the third term decays exponentially, so that the exponent can be lower bounded by \(-a(|x|) = b(|x|)\). This concludes the proof. \(\square \)

1.4 De-Poissonization

1.4.1 De-Poissonization of \(\tilde{G}_k(z)\)

Here we recover the asymptotics of \(\mu _{n,k}\) from \(\tilde{G}_k(z)\). For this, we apply Theorem 1 of [10]. The inner condition follows immediately from Lemma 1. The outer condition we capture in the following claim.

Lemma 18

Let \(\theta \in (0, \pi /2)\). Then there exist some \(\phi < 1\) and \(C > 0\) such that, for z outside \( {\mathscr {C}}(\theta )\) and any \(k \ge 0\),

$$\begin{aligned} |\tilde{G}_k(z)e^{z}| \le Ce^{\phi |z|}. \end{aligned}$$

Proof

We start by recalling the uniform upper bound on \(|\tilde{G}_k(z)|\): for any \(k\ge 0\) and \(z \in {\mathbb {C}}\),

$$\begin{aligned} |\tilde{G}_k(z)| \le |z|e^{|z| - \mathfrak {R}(z)}. \end{aligned}$$

This implies that, for any fixed \(R > 0\), we can choose a \(C > 0\) such that the claimed inequality holds whenever \(|z| \le R\), for every \(k \ge 0\). It thus remains to check that it holds for \(|z| > R\), \(z \notin {\mathscr {C}}(\theta )\). This we do by induction on k.

Base Case

For \(k=0\), \(e^{z}\tilde{G}_0(z) = z\), and, for any positive \(\phi \), an appropriate R can be chosen such that the claimed inequality holds for \(|z| > R\), \(z \notin {\mathscr {C}}(\theta )\). More specifically, given \(\phi \), we choose R large enough so that

$$\begin{aligned} |z| \le e^{\phi |z|} \end{aligned}$$

whenever \(|z| > R\). Next, we choose \(C > 1\) and such that \(|z|e^{|z| - \mathfrak {R}(z)} \le Ce^{\phi |z|}\) for \(|z| \le R\). This implies that \(|e^{z}\tilde{G}_0(z)| \le Ce^{\phi |z|}\) for any |z|, as required.

Inductive Step

Now, assuming that the claimed inequality is true for \(0\le j < k\), we demonstrate that it holds for k. In fact, since the recurrence for \(\tilde{G}_k(z)\) can be put in the form

$$\begin{aligned} \tilde{G}_k(z) = \gamma _1(z)\tilde{G}_{k}(pz) + \gamma _2(z)\tilde{G}_{k}(qz) + t(z), \end{aligned}$$

with

$$\begin{aligned} \gamma _1(z)&= e^{-qz}, \gamma _2(z) = e^{-pz},\\ t(z)&= \tilde{G}_{k-1}(pz)(1-e^{-qz}) + \tilde{G}_{k-1}(qz)(1-e^{-pz}), \end{aligned}$$

it is sufficient to check the outer conditions required by Theorem 10 of [10]: in particular, we need to show that, for |z| sufficiently large and some \(\phi < 1\),

$$\begin{aligned} |\gamma _1(z)|e^{q\mathfrak {R}(z)}\le \frac{1}{3}e^{\phi q |z|},&|\gamma _2(z)|e^{p\mathfrak {R}(z)}\le \frac{1}{3}e^{\phi p |z|},&|t(z)|e^{\mathfrak {R}(z)} \le \frac{1}{3}e^{\phi |z|}. \end{aligned}$$
(57)

The first two inequalities easily hold: for \(c\in \{p,q\}\),

$$\begin{aligned} |e^{-cz}|e^{c\mathfrak {R}(z)} = e^{-c\mathfrak {R}(z) + c\mathfrak {R}(z)} = 1, \end{aligned}$$

and the claimed inequalities hold for any positive \(\phi \) and sufficiently large z (in particular, any \(|z| \ge \frac{\log 3}{q\phi }\) suffices).

For the third inequality, we apply the induction hypothesis:

$$\begin{aligned} |t(z)|e^{\mathfrak {R}(z)} \le Ce^{\phi p |z|} |e^{qz} - 1| + Ce^{\phi q |z|} |e^{pz} - 1|. \end{aligned}$$

Choosing \(\phi = \cos (\theta ) + \varepsilon \), for any positive constant \(\varepsilon \), we have, for any positive c,

$$\begin{aligned} |e^{cz}| = e^{c\mathfrak {R}(z)} = e^{c|z|\cos ( {\mathrm {arg}}(z))} \le e^{c|z|(\phi -\varepsilon )}, \end{aligned}$$

since \(z \notin {\mathscr {C}}(\theta )\). This implies that

$$\begin{aligned} |t(z)|e^{\mathfrak {R}(z)}&\le C\left[ e^{\phi p |z| + \phi q |z| - q|z|\varepsilon } + e^{\phi q|z| + \phi p|z| - p|z|\varepsilon } + e^{\phi p |z|} + e^{\phi q |z|} \right] \\&= Ce^{(\phi - q\varepsilon ) |z|}(1 + o(1)), \end{aligned}$$

so that, for sufficiently large |z| (depending only on \(\phi , p\)),

$$\begin{aligned} |t(z)|e^{\mathfrak {R}(z)} \le \frac{1}{3}e^{\phi |z|}, \end{aligned}$$

which completes the proof. \(\square \)

1.4.2 De-Poissonization of Variance

We now de-Poissonize using the following theorem from [10] (rephrased in our notation and simplified):

Theorem 8

(De-Poissonization of variance) Suppose that there is some \(\theta \in (0, \pi /2)\) such that the following conditions hold:

  • There is some \(\phi \in (0, 1)\) such that, for z outside the cone \( {\mathscr {C}}(\theta )\), \(e^{z}\tilde{G}_k(z)\) and \(e^{z}(\tilde{V}_k(z) + \tilde{G}_k(z)^2)\) are both \(O(e^{\phi |z|})\).

  • There is some \(\beta \le 1\) such that, for z inside \( {\mathscr {C}}(\theta )\), \(\tilde{G}_k(z)\) and \(\tilde{V}_k(z)\) are both \(O(z^{\beta })\).

Then

$$\begin{aligned} V_{n,k} = \tilde{V}_k(n) - n[\tilde{G}'_k(n)]^2 + O(\max \{ n^{\beta -1}, n^{2\beta -2} \}). \end{aligned}$$

Next we check that the hypotheses of this theorem are satisfied.

Conditions on \(\tilde{G}_k(z)\)

The inner and outer conditions on \(\tilde{G}_k(z)\) were already verified in the de-Poissonization in the expected value case.

Outer Condition on \(\tilde{V}_k(z) + \tilde{G}_k(z)^2\)

We now demonstrate that the outer condition holds for \(\tilde{V}_k(z) + \tilde{G}_k(z)^2 = \tilde{C}_k(z) + \tilde{G}_k(z)\). For this, it is sufficient to show that the same outer condition holds for \(\tilde{C}_k(z)\). We prove it by induction on k.

Base Case for Outer Condition on \(\tilde{V}_k(z) + \tilde{G}_k(z)^2\)

The base case, \(k=0\), is trivial, since \(\tilde{C}_0(z) = 0\).

Inductive Step for Outer Condition on \(\tilde{V}_k(z) + \tilde{G}_k(z)^2\)

Now we assume that the claim holds for \(k-1\), and we prove it for k. A bound for \(e^z\tilde{C}_j(z)\) which is uniform in j holds: in the proof of Lemma 10, we proved that, for all \(j \ge 0\) and \(z \in {\mathbb {C}}\),

$$\begin{aligned} |\tilde{C}_j(z)| \le |z|^2e^{|z| - \mathfrak {R}(z)}, \end{aligned}$$

which immediately implies that

$$\begin{aligned} |e^z\tilde{C}_j(z)| \le |z|^2e^{|z|}. \end{aligned}$$

Thus, for a given R and \(\phi \in (0, 1)\), there is some \(C > 0\) such that, whenever \(|z| \le R\), for any \(j \ge 0\),

$$\begin{aligned} |e^{z}\tilde{C}_j(z)| \le Ce^{\phi |z|}. \end{aligned}$$

Now we demonstrate that the same bound holds for \(|z| > R\). Recall that \(\phi \) in the case of \(\tilde{G}_k(z)\) is given by \(\cos (\theta ) + \varepsilon \), for any small enough fixed positive \(\varepsilon > 0\). We define \(\hat{\phi }\) to be slightly smaller:

$$\begin{aligned} \hat{\phi } = \cos (\theta ) + \varepsilon /2, \end{aligned}$$

and we note that the de-Poissonization result for \(\tilde{G}_k(z)\) implies that there is some \(\hat{R} > 0\) such that, whenever \(z \notin {\mathscr {C}}(\theta )\) and \(|z| > \hat{R}\), for any \(j \ge 0\),

$$\begin{aligned} |e^{z}\tilde{G}_j(z)| \le e^{\hat{\phi }|z|}. \end{aligned}$$
(58)

We will use this fact in the induction step for \(C_k(z)\) as follows: we adopt the same approach as in the expected value case, this time defining

$$\begin{aligned} \gamma _1(z)&= e^{-qz}, \\ \gamma _2(z)&= e^{-pz}, \\ t(z)&= \tilde{C}_{k-1}(pz)(1 - e^{-qz}) + \tilde{C}_{k-1}(qz)(1 - e^{-pz}) + 2 \tilde{G}_{k-1}(pz)\tilde{G}_{k-1}(qz). \end{aligned}$$

The conditions required of \(\gamma _1(z)\) and \(\gamma _2(z)\) were already verified, so we proceed to show that

$$\begin{aligned} e^{\mathfrak {R}(z)}|t(z)| \le \frac{1}{3}e^{\phi |z|}, \end{aligned}$$

for \(|z| > R\), for some \(R > 0\) independent of k. We again choose \(\phi = \cos (\theta ) + \varepsilon \), and applying the induction hypothesis and inequality (58) gives

$$\begin{aligned} |e^{z}t(z)| \le C\left[ e^{\phi p |z|}|e^{qz} - 1| + e^{\phi q |z|} |e^{pz} - 1| + C_2e^{\hat{\phi }|z|} \right] . \end{aligned}$$

The rest of the induction step goes exactly as in the expected value case, so we omit it.

Inner Condition on \(\tilde{V}_k(z)\)

As for the inner conditions, both follow from the asymptotic expansions for \(\tilde{G}_k(z)\) and \(\tilde{V}_k(z)\) derived by inverting their respective Mellin transforms. Both derivations are readily extended to \(z\rightarrow \infty \) inside the cone.

Since all conditions of the theorem are satisfied, the remaining task is to show that \(n[\tilde{G}'_k(n)]^2 = o(\tilde{V}_k(n))\). This we do using the Cauchy integral formula for derivatives, followed by upper bounding of the resulting integral expression (the main task will then be to choose an appropriate radius for the integration contour): for a circle \( {\mathscr {C}}\) of any radius R enclosing n,

$$\begin{aligned} \tilde{G}'_k(n) = \frac{1}{2\pi i}\oint _ {\mathscr {C}}\frac{\tilde{G}_k(\xi )}{(\xi - n)^2} {\;\mathrm {d}\xi }, \end{aligned}$$

which implies

$$\begin{aligned} |\tilde{G}'_k(n)| \le \frac{2\pi R}{2\pi } \frac{|\tilde{G}_k(\xi _*)|}{R^2} \le |\tilde{G}_k(\xi _*)| / R, \end{aligned}$$

where \(\xi _* = {\mathrm {argmax}}_{x \in {\mathscr {C}}} |\tilde{G}_k(x)|\). Now, since \(\tilde{G}_k(n) = O(n^{\beta (\alpha )}/\sqrt{\log n})\), and \(\xi _*\) is not too different from n, we expect that \(\tilde{G}_k(\xi _*) = O(n^{\beta (\alpha )}/\sqrt{\log n})\) as well. Provided that we can show this, if we choose

$$\begin{aligned} R = n^{\varDelta }/\varPsi (n) \end{aligned}$$

for some \(\varDelta > 0\) and slowly growing function \(\varPsi (n)\) which we will determine later, our bound becomes

$$\begin{aligned} n\tilde{G}'_k(n)^2 = O(n^{1 + 2\beta (\alpha ) - 2\varDelta }\varPsi (n)^2/\log n), \end{aligned}$$

and we would like to enforce the conditions

$$\begin{aligned} 1 + 2\beta (\alpha ) - 2\varDelta \le \beta (\alpha ) \end{aligned}$$

and

$$\begin{aligned} \varDelta \le 1, \end{aligned}$$

along with

$$\begin{aligned} \varPsi (n)^2/\log n = o(1/\sqrt{\log n}) \end{aligned}$$

and \(\varPsi (n) \xrightarrow {n\rightarrow \infty } \infty \) (so that, for any \(\varDelta \), \(R = o(n)\)). Choosing \(\varPsi (n)\) to satisfy these conditions is easy: we simply require that

$$\begin{aligned} \varPsi (n)^2 = o(\sqrt{\log n}) \implies \varPsi (n) = o( (\log n)^{1/4}), \end{aligned}$$

so that we can choose, say, \(\varPsi (n) = \log \log n\). It is easy to see that, for any \(\alpha \), there exists some \(\varDelta \) which satisfies both conditions simultaneously:

$$\begin{aligned} 1 + 2\beta (\alpha ) - 2\varDelta \le \beta (\alpha ) \iff \frac{1 + \beta (\alpha )}{2} \le \varDelta , \end{aligned}$$

and

$$\begin{aligned} \frac{1 + \beta (\alpha )}{2} \le 1 \iff \beta (\alpha ) \le 1. \end{aligned}$$

This last inequality is true for any \(\alpha \) within the range under consideration. With these choices,

$$\begin{aligned} n\tilde{G}'_k(n)^2 = o(n^{\beta (\alpha )}/\sqrt{\log n}), \end{aligned}$$

so \(n\tilde{G}'_k(n)^2 = o(\tilde{V}_k(n))\), provided that we can show that \(\tilde{G}_k(\xi _*) = O(\tilde{G}_k(n))\). To do this, the plan is to show that we can apply Theorem 1 to derive asymptotics for \(\tilde{G}_k(\xi _*)\). First, we verify that \(\xi _*\) remains within a cone around the positive real axis. Fixing some \(\theta \in (0, \pi /2)\) for the angle made with respect to the positive real axis, let A denote the point of the form \(n+it\), for some \(t \in {\mathbb {R}}^+\), which lies on the boundary of the cone. Furthermore, let B denote the point on the boundary of the cone which lies above the real axis and is nearest to n. Then we have

$$\begin{aligned} |A - n| = t, \end{aligned}$$

and

$$\begin{aligned} \frac{t}{n} = \tan (\theta ) \implies t = \tan (\theta )n = \varTheta (n). \end{aligned}$$

Next, we note that the angle made between the line segment connecting n and B and that connecting 0 and B must be \(\pi /2\), and we denote by \(\phi \) the angle between the segments connecting 0 and n and n and B. We have, easily, \(\phi = \pi /2 - \theta \), and it is trivial to see that the angle between the segments n to B and n to A is \(\theta \). Thus, we have that the length of the segment connecting n and B (i.e., the radius of a ball centered at n with maximum volume contained in the cone) is given by

$$\begin{aligned} \cos (\theta ) = \frac{|B - n|}{|A - n|} \implies |B - n| = |A - n|\cos (\theta ) = \varTheta (n)\cos (\theta ) = \varTheta (n), \end{aligned}$$

so that, since \(R = o(n)\), \(\xi _*\) must be inside the cone. We then examine the relationship between k and \(\xi _*\). Since \(n = \xi _*(1 + o(1))\),

$$\begin{aligned} k \sim \alpha \log n = \alpha \log (\xi _*(1 + o(1))) = \alpha \log \xi _* + o(1), \end{aligned}$$

so that \(k \sim \alpha \log \xi _*\). Applying Theorem 1 then shows that \(\tilde{G}_k(\xi _*) = O(\tilde{G}_k(n))\).

This completes the proof.

1.4.3 De-Poissonization for the Central Limit Theorem

The final step of the proof is inversion of the Poisson transform to recover a central limit theorem for the Bernoulli model. That is, knowing asymptotic information about \(\tilde{Q}_k(u, z)\), our goal is to recover \(Q_{n,k}(u)\). The Cauchy integral formula gives

$$\begin{aligned} Q_{n,k}(u) = \frac{n!}{2\pi i} \oint _{|z|=n} e^z \tilde{Q}_k(u, z) z^{-n-1} {\;\mathrm {d}z}, \end{aligned}$$
(59)

where the integration contour (we denote it by \( {\mathscr {C}}\)) is the circle centered at 0 with radius n. The evaluation of this integral will proceed in two stages. We expect that the main contribution will come from a small arc around the positive real axis, so we fix a cone around the positive real axis, and we show that the contribution outside the cone is negligible (by a lemma which we will soon state). Next, we break the remaining part of the contour into inner tails and a central region. The inner tails we show to be negligible using Lemma 14, the Taylor expansion for the cosine function around 0, and a careful choice of the split into the inner tails and the central region. Finally, the central region is evaluated using the expansion for \(\tilde{Q}_k(u, z)\) derived above, as well as the fact that

$$\begin{aligned} \frac{1}{\sqrt{2\pi }} \int _{-\infty }^{\infty } e^{-x^2} {\;\mathrm {d}x} = 1. \end{aligned}$$

Let \(\theta \) be an angle in \((0, \pi /2)\) for which

$$\begin{aligned} \tilde{l}_k(u, z) = z + \tilde{G}_k(z)\frac{\tau }{\sigma _{n,k}} + \tilde{V}_k(z)\frac{\tau ^2}{2\sigma _{n,k}^2} + R[\tilde{l}]_k(u, z)\frac{\tau ^3}{3!\sigma _{n,k}^3} + O(\sigma _{n,k}^{-1}), \end{aligned}$$

with \(R[\tilde{l}]_k(u, z) = O(n^{\beta (\alpha )})\). This \(\theta \) is guaranteed to exist by the analysis in Sect. 6.

We require a final estimate on the growth of \(Q_j(u, x)\) in order to upper bound the outer tails of (59):

Lemma 19

(Growth of \(Q_j(u, x)\) outside a cone) Let \(\theta \in (0, \pi /2)\). Then there is some \(\alpha \in (0, 1)\) and \(x_0 > 0\) such that, provided \(x \notin {\mathscr {C}}(\theta )\) and \(|x| \ge x_0\),

$$\begin{aligned} |Q_j(u, x)| \le e^{\alpha |x|}, \end{aligned}$$

uniformly in \(j \le k\).

Proof

We prove a slightly different claim: that, for each \(\theta \) with \(|\theta | \in (0, \pi /2)\), there is some \(\alpha < 1\) and \(x_0 > 0\) such that, for all \(j \le k\), if \(x \notin {\mathscr {C}}(\theta )\) and \(|x| \ge x_0\), then

$$\begin{aligned} |Q_j(u, x)| \le e^{\alpha |x| - 1}. \end{aligned}$$

Note the additional term of \(-1\) in the exponent. This we prove by induction in j. To accomplish this, for each j, we prove that the inequality holds for \(|x| \in [x_0', x_0)\) with \(x_0' = qx_0\), and we then use this and induction on increasing domains to prove that it holds for \(|x| \ge x_0\).

Base Case for j Induction

For the base case, recall that \(Q_0(u, x) = e^{x} - x(1-u)\), so that

$$\begin{aligned} |Q_0(u, x)| \le e^{|x|\cos (\arg (x))} + |x||1-u|. \end{aligned}$$

For appropriately chosen \(\alpha \) (say, \(\cos (\theta ) + \varepsilon \), for any small enough positive \(\varepsilon \)), |x| can be made large enough so that this satisfies the claimed property. That is, there is some \(x_0'\) for which the stated inequality holds whenever \(|x| \ge x_0'\). We define \(x_0\) to be \(x_0'/q\).

Induction on j, Base Case for Increasing Domains Induction

For the induction on j, we assume that the claim holds for \(j-1\), and we aim to prove it for j. To do this, we use induction on increasing domains. To verify the claim for \(|x| \in [x_0', x_0)\), we apply Lemma 12, which is justified because \(|x| < x_0\), to conclude that

$$\begin{aligned} Q_j(u, x) \sim e^{x}, \end{aligned}$$

so that

$$\begin{aligned} |Q_j(u, x)| \sim |e^{x}| = e^{|x|\cos (\theta )}, \end{aligned}$$

and, provided \(x_0\) is sufficiently large,

$$\begin{aligned} |Q_j(u, x)| \le e^{\alpha |x| - 1}, \end{aligned}$$

which gives us the base case of the increasing domains induction.

Increasing Domains Inductive Step

We now proceed to the inductive step. Applying the functional equation and the triangle inequality, then the inductive hypotheses,

$$\begin{aligned} |Q_{j}(u, x)| \le e^{\alpha |x| - 2} + 4e^{\alpha p|x| - 1} = e^{\alpha |x| - 2}\left( 1 + 4e^{-\alpha q|x| + 1} \right) \end{aligned}$$

Next, note that, since \(e^{-\alpha q|x| + 1} = o(1)\) as \(|x|\rightarrow \infty \), the second factor in the above product satisfies

$$\begin{aligned} 1 + 4e^{-\alpha q|x| + 1} \sim e^{4e^{-\alpha q|x| + 1}} = e^{o(1)}, \end{aligned}$$

so that, provided we choose |x| large enough,

$$\begin{aligned} |Q_j(u, x)| \le e^{\alpha |x| - 1}, \end{aligned}$$

which completes the proof. \(\square \)

Bounding the Outer Tails

The outer tails of (59) then become

$$\begin{aligned} \frac{n!}{2\pi i} \int _{|\arg (z)|> \theta } e^z \tilde{Q}_k(u, z) z^{-n-1} {\;\mathrm {d}z}&\sim \frac{\sqrt{2\pi n}n^ne^{-n}}{2\pi i} \int _{|\arg (z)|> \theta } e^{z}\tilde{Q}_k(u, z) z^{-n-1} {\;\mathrm {d}z} \\&= \frac{n^{n+1/2}e^{-n}}{\sqrt{2\pi } i} \int _{|\arg (z)| > \theta } e^{z}\tilde{Q}_k(u, z) z^{-n-1} {\;\mathrm {d}z}, \end{aligned}$$

where we used Stirling’s formula. Taking absolute values and applying Lemma 19 gives an upper bound of

$$\begin{aligned} \frac{n^{n+1/2}e^{-n + \alpha n}}{\sqrt{2\pi }} 2\pi n n^{-n-1} = n^{O(1)}e^{-n(1-\alpha )}, \end{aligned}$$

which is exponentially decaying in n, since \(\alpha < 1\).

Bounding the Inner Tails

Now we bound the inner tails. Specifically, we let \(\psi = n^{-\delta }\), for some \(\delta > 0\) to be determined, and the inner tails consist of that part of the contour where \(|\arg (z)| \in (\psi , \theta ]\). The choice of \(\psi \) is dictated by two opposing forces: it must be large enough that the inner tails are negligible but small enough so that the central part is easy to estimate precisely. In the range of integration of the inner tails, we have the estimate

$$\begin{aligned} |e^{z}\tilde{Q}_k(u, z)| = e^{n \cos (\arg (z))(1 + o(1))} \le e^{n\cos (\psi )}, \end{aligned}$$

by Lemma 14. Taylor expanding \(\cos (\psi )\) around 0 gives

$$\begin{aligned} \exp \left( n(1 - \frac{\psi ^2}{2!} + O(\psi ^4))\right) . \end{aligned}$$

We will require that \(n\psi ^2 = n^{1-2\delta } \xrightarrow {n\rightarrow \infty } \infty \), which translates to

$$\begin{aligned} 1 - 2\delta > 0 \implies \delta < 1/2. \end{aligned}$$

Then we can upper bound the inner tails by

$$\begin{aligned} \frac{n^{n+1/2} e^{-n}}{\sqrt{2\pi }} 2\pi n |e^{z}\tilde{Q}_k(u, z)| n^{-n-1}\le & {} n^{O(1)} e^{-n + n(1 - \frac{\psi ^2}{2!} + O(\psi ^4))}\\= & {} n^{O(1)} e^{-n^{1-2\delta }(1 + o(1))}, \end{aligned}$$

which is exponentially decaying in n, so negligible.

Estimating the Central Region

Now we estimate the central part. Inside the integral, letting \(\phi \) denote \(\arg (z)\), we can expand \(e^z z^{-n-1}\) as

$$\begin{aligned} e^{z}z^{-n-1}= & {} e^{ne^{i\phi } -(n+1)\log (ne^{i\phi })} = e^{n(1 + i\phi - \frac{\phi ^2}{2} + O(\phi ^3)) - (n+1)\log n - i\phi }\\= & {} e^{n}n^{-n-1}e^{-\frac{n\phi ^2}{2}(1 + o(1))}. \end{aligned}$$

Multiplying by the \(e^{-n}n^{n+1/2}\) outside the integral gives

$$\begin{aligned} \frac{1}{\sqrt{n}} e^{-\frac{n\phi ^2}{2}(1 + o(1))}. \end{aligned}$$

Applying the analysis of \(\tilde{Q}_k(u, z)\),

$$\begin{aligned} \tilde{Q}_k(u, z)&= \exp \left( \tilde{G}_k(z)\frac{\tau }{\sigma _{n,k}} + \frac{\tau ^2}{2} + O(\sigma _{n,k}^{-1})\right) \\&= \exp \left( \tilde{G}_k(n)\frac{\tau }{\sigma _{n,k}} + \frac{\tau ^2}{2} + O(\tilde{G}_k'(n)(z - n))\frac{\tau }{\sigma _{n,k}} + O(\sigma _{n,k}^{-1}) \right) , \end{aligned}$$

where we note that \(\tilde{G}_k(n) = \varTheta (n^{\beta (\alpha )}/\sqrt{\log (n)})\), while \(\tilde{G}_k'(n) = \tilde{O}(n^{\beta (\alpha )-1})\). Since \(\beta (\alpha ) - 1 \le 0\) and \(z - n = O(\psi ) = n^{-\varOmega (1)}\), the third term is \(o(\sigma _{n,k}^{-1})\). That is,

$$\begin{aligned} \tilde{Q}_k(u, z) = \exp \left( \tilde{G}_k(n)\frac{\tau }{\sigma _{n,k}} + \frac{\tau ^2}{2} + O(\sigma _{n,k}^{-1}) \right) . \end{aligned}$$

Putting these estimates together, we see that the contribution of the central region is given by

$$\begin{aligned} \frac{\exp \left( \tilde{G}_k(n)\frac{\tau }{\sigma _{n,k}} + \frac{\tau ^2}{2} + O(\sigma _{n,k}^{-1}) \right) }{\sqrt{2\pi n}} \int _{-\psi }^{\psi } e^{-\frac{n\phi ^2}{2}(1+o(1))} {\;\mathrm {d}\phi }. \end{aligned}$$

It is easy to see that we can complete the tails, and then we make the substitution \(x= n^{-1/2}\phi \), which gives

$$\begin{aligned} \frac{\exp \left( \tilde{G}_k(n)\frac{\tau }{\sigma _{n,k}} + \frac{\tau ^2}{2} + O(\sigma _{n,k}^{-1}) \right) }{\sqrt{2\pi }} \int _{-\infty }^{\infty } e^{-x^2} {\;\mathrm {d}x}. \end{aligned}$$

Since the integral, along with the factor \(\frac{1}{\sqrt{2\pi }}\), becomes 1, we have, finally,

$$\begin{aligned} {\mathbb {E}}[e^{B_{n,k}\frac{\tau }{\sigma _{n,k}}}] = \exp \left( \tilde{G}_k(n)\frac{\tau }{\sigma _{n,k}} + \frac{\tau ^2}{2} + O(\sigma _{n,k}^{-1}) \right) , \end{aligned}$$

and applying the Lévy continuity theorem shows that the claimed central limit theorem holds for properly normalized \(B_{n,k}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Magner, A., Szpankowski, W. Profiles of PATRICIA Tries. Algorithmica 80, 331–397 (2018). https://doi.org/10.1007/s00453-016-0261-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-016-0261-5

Keywords

Navigation