Use of global interactions in efficient quantum circuit constructions

Dmitri Maslov; Yunseong Nam

doi:10.1088/1367-2630/aaa398

1. Introduction

Trapped atomic ions [1] and superconducting circuits [2] are two examples of quantum information processing (QIP) approaches that have delivered small yet already universal and fully programmable machines. In superconducting circuits qubit interactions are enabled through custom designed electronic hardware involving Josephson junctions and microwave resonators [2]. Different interactions can be controlled individually to invoke the two-qubit gates. A global coupling, however, would not necessarily be natural to such a system, due to the difficulty of placing and connecting $O({n}^{2})$ individual resonators in the same area as n qubits. This said, it is possible to couple Josephson junction qubits to a single resonator mode, thereby enabling global interactions [3]. In trapped ion QIP, on the other hand, global interactions are more naturally realized as an extension of common two-qubit gate interactions [4–8]. In fact, the ability to implement arbitrary selectable two-qubit interactions generally requires a higher level of control, with individually focused external fields addressing each qubit [1]. Given the ease of implementing a global interaction over these two leading QIP approaches, we consider the use of global entangling gates, particularly applied to the trapped ions technology. We note that the results are technology-independent and therefore apply to any QIP approach, so long as proper global entangling operations are constructible.

One particular interaction available in the trapped ions approaches [1, 7, 8] to quantum computing is the so-called Molmer–Sorensen gate [9], also known as the XX coupling or Ising gate. To achieve computational universality, Molmer–Sorensen gate (either local addressable or global) is complemented by arbitrary single-qubit operations. These may come in different flavors, including the addressable $R(\theta ,\phi )$ rotations [1] of which at most two are needed to implement arbitrary single-qubit gate [10], or the addressable RZ rotation, which together with global RX and RY rotations also gives the single-qubit universality [7, 8]. Depending on the specifics, the control apparatus may allow the application of an XX gate to a selectable pair of qubits [1], globally [4–8], or globally to a subset of qubits [11, 12]. We furthermore note that the existing control apparatus described in reference [1] allows the application of the global Molmer–Sorensen (GMS) gates [11], however, to date, this approach has not been studied in detail. In each case above, XX gate comes at a higher cost (expressed in terms of the duration and/or average fidelity) compared to the single-qubit gates.

In this paper, we focus on minimizing the number of times an XX gate is called—be it addressable local or global, thereby targeting the most expensive resource in quantum computations using trapped ions QIP. Specifically, we center our efforts on finding the instances of quantum computations that admit a more efficient implementation using global entangling gates compared to what may be accomplished using local entangling gates. Given that the control by global entangling operators applies a certain operation to multiple data, it can be thought of as a quantum analog of classical SIMD (Single Instruction, Multiple Data) architecture. Our goal in this paper is thus to demonstrate practical advantages of quantum SIMD architecture beyond those examples already known.

Previous work demonstrated how to implement the parity function (fan-in gate in our terminology) using a constant number of two global entangling pulses [13]. We revisit the implementation of fan-in in section 3.1, since it is relevant to our more advanced constructions. Reference [14], figure 5 shows a two-GMS gate decomposition of the number excitation operator used in quantum chemistry simulations [15, 16]. References [7, 17] study the ways to implement quantum algorithms efficiently on a trapped ion quantum computer with the two-qubit gates enabled by the global entangling operator, concentrating on the case featuring anywhere between two to four qubits. Reference [10] focuses on quantum circuit compiling in the scenario when local addressable two-qubit gates are available. Reference [18] revisits the two-GMS gate parity measurement implementation of [13] and reduces the number of global pulses needed to just one (this construction can be inferred from figure 2), and shows how to measure the eigenvalue of a product of Pauli matrices using only a constant number of global entangling pulses. In contrast, here, we determine a set of important quantum circuits, focusing on the computations of arbitrary size, that can be accomplished using fewer entangling pulses in cases when global entangling control is available. The new circuits developed in our work include stabilizer circuits, Toffoli-4 gate, Toffoli-n gate, Quantum Fourier Transformation, and Quantum Fourier Adder circuits, thereby substantively extending the set of known efficient circuitry based on the global entangling pulse. The results are directly accessible for implementation over trapped ions approaches featuring global control, and make a case for mixed local/global entangling control.

Computational universality of the control given by selectable two-qubit couplings and arbitrary single-qubit gates was the subject of an early foundational study establishing the upper bound of $O({n}^{3}{4}^{n})$ and the lower bound of ${\rm{\Omega }}({4}^{n})$ on the number of the CNOT gates required to implement an arbitrary unitary [19]. The upper bound was later improved to $O({4}^{n})$ in [20, 21], at which point it asymptotically met the lower bound, settling the question of asymptotically optimal control by the entangling CNOTs gates. For logical-level fault tolerant circuits one more step is needed—specifically, that of decomposing all gates into a discrete fault-tolerant library, such as the one given by the Clifford and T gates. With the CNOT being a Clifford gate, the remaining step on top of asymptotically optimal constructions of [20, 21] is to decompose arbitrary single-qubit unitaries into Clifford+T circuits. Euler angle decomposition may be used to express arbitrary single-qubit unitary as a circuit with no more than three axial rotations [22], and z rotations can be synthesized optimally as single-qubit Clifford+T circuits [23, 24]. Given additional resources such as in-circuit measurement and classical feedback, even better solutions exist [25].

The upper bound of $O({4}^{n})$ on the number of CNOT gates [20, 21] gives rise to the upper bound of $O({4}^{n})$ on the number of GMS gates, since it can be easily established that a CNOT gate can be obtained using no more than constantly many GMS gates. Indeed, a 4-GMS implementation of the CNOT gate can be obtained by applying the two-GMS construction illustrated in figure 1 to n qubits with $\chi =\pi /2$ , and then again to $n-1$ qubits with $\chi =-\pi /2$ , selecting one specific qubit-to-qubit interaction that remains active. With the use of the maximal size GMS gates, this may be a slightly larger construction, relying on figure 8 to express smaller GMS gates in terms of the maximal size GMS gate, but one with constantly many GMS gates nonetheless.

**Figure 1.** Example of the usefulness of global gates. GMS4 denotes a GMS gate defined according to (2), applied to all four qubits shown in the figure. GMS3 denotes a three-qubit GMS gate, applied to qubit numbers 1, 2, and 3. The common argument χ of the GMS gates specifies that all ${\chi }_{{ij}}$ 's are equal to χ. The XX_ij(χ) gate denotes a local XX gate, applied to qubits i and j with the angle χ, see (1).
Download figure:
Standard image High-resolution image

**Figure 1.** Example of the usefulness of global gates. GMS4 denotes a GMS gate defined according to (2), applied to all four qubits shown in the figure. GMS3 denotes a three-qubit GMS gate, applied to qubit numbers 1, 2, and 3. The common argument χ of the GMS gates specifies that all ${\chi }_{{ij}}$ 's are equal to χ. The XX_ij(χ) gate denotes a local XX gate, applied to qubits i and j with the angle χ, see (1).
Download figure:
Standard image High-resolution image

In most practical cases, one may desire to implement a specific well-structured computation, and those frequently come with known implementations relying on fewer than $O({4}^{n})$ entangling gates.

Control by local addressable operations is clearly easier to work with as far as implementing quantum computations is concerned, since most quantum algorithms are expressed in terms of local operations. Secondly, the number of arbitrarily selectable two-qubit operations, $\tfrac{(n-1)n}{2}$ , for an n-qubit computation (recall that the XX coupling does not distinguish between gate's control and its target), is higher than 1, being the number of individual full-size global gates. Thirdly, an arbitrary circuit over two-qubit local control experiences only a constant factor blow up if needs be implemented as a circuit over global control (this is no more true if global control needs be expressed in terms of local control). These observations suggest that the local control is overall more nimble when it comes to implementing arbitrary quantum algorithms. However, it is not always the case that the implementations using local addressable gates are more efficient compared to those over global entangling operators. Indeed, it is known how to implement the 3-qubit Toffoli gate with only three size-3 GMS gates [7, 8], whereas the best known implementation over two-qubit local addressable control requires five entangling gates [22]. Motivated by this example, we look into what other important unitary transformations benefit from the control by global gates.

2. Global MS Gate

A local MS gate (XX) [9], acting on $i\mathrm{th}$ and $j\mathrm{th}$ qubits, is defined as

$\begin{eqnarray}{\mathrm{XX}}_{{ij}}({\chi }_{{ij}}) & = & {{\rm{e}}}^{-{\rm{i}}{({\hat{\sigma }}_{x}^{({\rm{i}})}+{\hat{\sigma }}_{x}^{(j)})}^{2}{\chi }_{{ij}}/4}={{\rm{e}}}^{-{\rm{i}}{\hat{\sigma }}_{x}^{({\rm{i}})}{\hat{\sigma }}_{x}^{(j)}{\chi }_{{ij}}/2}\\ & = & \left(\begin{array}{cccc}\cos ({\chi }_{{ij}}/2)\quad & 0\quad & 0\quad & -{\rm{i}}\sin ({\chi }_{{ij}}/2)\\ 0\quad & \cos ({\chi }_{{ij}}/2)\quad & -{\rm{i}}\sin ({\chi }_{{ij}}/2)\quad & 0\\ 0\quad & -{\rm{i}}\sin ({\chi }_{{ij}}/2)\quad & \cos ({\chi }_{{ij}}/2)\quad & 0\\ -{\rm{i}}\sin ({\chi }_{{ij}}/2)\quad & 0\quad & 0\quad & \cos ({\chi }_{{ij}}/2)\end{array}\right),\end{eqnarray} \tag{ 1 }$

where ${\hat{\sigma }}_{x}^{(i)}$ denotes the Pauli-x operator acting on $i\mathrm{th}$ qubit. In comparison, a global MS (GMS) gate for an n-qubit system is defined according to the equation

$\begin{eqnarray}\mathrm{GMS}({\chi }_{12},{\chi }_{13},\,\ldots ,\,{\chi }_{1n},{\chi }_{23},\,\ldots ,\,{\chi }_{n-1n}) & = & \exp \left(-{\rm{i}}\displaystyle \sum _{{\rm{i}}=1}^{n}\displaystyle \sum _{j={\rm{i}}+1}^{n}{({\hat{\sigma }}_{x}^{({\rm{i}})}+{\hat{\sigma }}_{x}^{(j)})}^{2}{\chi }_{{ij}}/4\right)\\ & = & \exp \left(-{\rm{i}}\displaystyle \sum _{{\rm{i}}=1}^{n}\displaystyle \sum _{j={\rm{i}}+1}^{n}{\hat{\sigma }}_{x}^{({\rm{i}})}{\hat{\sigma }}_{x}^{(j)}{\chi }_{{ij}}/2\right),\end{eqnarray} \tag{ 2 }$

which is equivalent to the application of local XX gates to all $\tfrac{(n-1)n}{2}$ pairs of qubits for an n-qubit system. Since any two local XX gates always commute, the GMS gate is uniquely defined. For simplicity, we will first focus on the GMS gate where ${\chi }_{12}={\chi }_{13}\,=\,\ldots \,=\,{\chi }_{1n}={\chi }_{23}\,=\,\ldots \,=\,{\chi }_{n-1n}$ [4–6, 8], and next consider other variants.

Intuitively, the availability of the GMS gate allows for an efficient implementation of a single-qubit–to–many-qubits coupling gate. Consider, for instance, a 4-qubit system as shown in figure 1. Applying the GMS gate on all four qubits and then applying the GMS gate to the top three qubits with the negative sign of the rotation parameter, results in a selective set of the XX gates acting between qubit number 4 and the rest, as shown in figure 1 on the right. This means that, together with the ability of leaving out a qubit of choice, we need only two (global) entangling operators to perform the desired transformation. Note that because qubit number 4 participates in all three XX gates as shown in figure 1 on the right hand side, even with the possibility of parallel operations acting on disjoint pairs of qubits at least three time steps would be required if we restrict ourselves to the local XX couplings.

In the rest of the paper, we rely on the standard [22] single-qubit gates, including Hadamard (H in formulas and circuit diagrams), axial rotations RX, RY, and RZ (X, Y, and Z in circuit diagrams), as well as the two-qubit CNOT gate.

3. Efficient circuits using the GMS gate

In this section, we present a suite of quantum transformations, where GMS gates may be handily used to increase circuit efficiency. We lay out the specific implementation details by explicitly constructing corresponding quantum circuits, and compare them to those obtained using only local entangling gates to highlight the efficiency gain.

3.1. Consecutive CNOTs: single-control many-target CNOT (fan-out), and many-control single-target CNOT (fan-in)

Consider a set of CNOT gates with a shared control qubit, also known as the fan-out gate. As illustrated in figure 2 for the sample case of n = 4, we can use a pair of GMS gates, together with single-qubit rotations $\mathrm{RX}(\theta )={{\rm{e}}}^{-{\rm{i}}{\hat{\sigma }}_{x}\theta /2}$ and $\mathrm{RY}(\theta )={{\rm{e}}}^{-{\rm{i}}{\hat{\sigma }}_{y}\theta /2}$ , to implement the entire set of such $n-1$ CNOT gates. In particular, we require a total of two GMS gates, one over n qubits with uniform angles $\pi /2$ and the other over $n-1$ qubits with the angle $-\pi /2$ , singling out the control qubit.

**Figure 2.** Four-qubit case of multiple CNOT gates sharing a single control qubit and targeting the rest of the qubits. Only two GMS gates are required to implement a total of $n-1$ local XX gates, corresponding to $n-1$ CNOT gates.
Download figure:
Standard image High-resolution image

An n-qubit fan-in gate (a set of CNOTs sharing a target) can be implemented as a layer of n Hadamard gates, followed by the fan-out gate, followed by the second layer of n Hadamard gates. This means that an arbitrary size fan-in gate can too be implemented using a constant number of two GMS gates. We note that these implementations were known to [13, 18] (fan-in was explicitly studied, and fan-out can be easily obtained from the fan-in). Observe that to measure the outcome of the parity function on the top qubit (see figure 2), the second GMS gate in the construction outlined in figure 2 needs not be applied, as it does not affect the qubit being measured.

An immediate application of this efficient implementation ( $n-1$ local XX gates replaced by a pair of GMS gates) may be observed, for instance, in stabilizer circuit constructions.

Figure 3 shows the encoding circuit for the 15-qubit Reed–Muller code [26], figure 12. This circuit containing a total of 34 CNOT gates may be implemented with 5 pairs of GMS gates, which would otherwise require 34 local XX gates. Since the $[[15,1,3]]$ encoding circuit is used to distill the $| A\rangle$ state [26], its efficient GMS-enabled implementation may potentially be used to synthesize the logical-level T gate efficiently, constituting an important optimization for fault-tolerant quantum computing. We note, however, that GMS gates may be difficult to use fault-tolerantly [14].

**Figure 3.** Encoding circuit *Tdistill* of the $[[15,1,3]]$ code [26], used to distill the $| A\rangle$ state. It relies on a set of 34 CNOT gates, that can be implemented using only 10 GMS gates.
Download figure:
Standard image High-resolution image

GMS gates can furthermore be employed to obtain an implementation of arbitrary n-qubit stabilizer unitary using at most $12n-18$ entangling pulses. To establish this, consider the 9-stage layered decomposition -C-P-C-P-H-P-C-P-C- of [27]. Observe that two of the -C- stages (each corresponds to the CNOT-based circuits) are given by the upper triangular Boolean matrices. This means that each can naturally be implemented as a set of $n-1$ fan-out gates. Of these, the smallest fan-out is the CNOT, and thus it can be implemented using a single GMS. This means that the total number of GMS gates required to implement an upper/lower triangular linear reversible transformation is $2n-3$ . The other two -C- stages are arbitrary linear transformations. Using LU decomposition, each can be implemented as a circuit over $2(2n-3)=4n-6$ GMS gates. The total GMS count required to implement an arbitrary stabilizer unitary is thus $2(2n-3)+2(4n-6)=12n-18$ .

The number of GMS gates required to implement an arbitrary stabilizer unitary, $12n-18$ , is significantly less than ${\rm{\Omega }}\left(\tfrac{{n}^{2}}{\mathrm{log}n}\right)$ of the two-qubit CNOT gates required to accomplish the same [28]. The comparison, however, is not fair. This is because the number of different functions computed by the CNOT gates spanning n qubits is $(n-1)n$ , whereas the number of the GMS gates with the fixed rotation angle of $\pi /2$ and arbitrary set of inputs is ${2}^{n}$ , which is greater on the order than $(n-1)n$ . A more fair approach is to compare the GMS count of $12n-18$ to the CNOT depth of $14n-4$ over Linear Nearest Neighbor (LNN) architecture [27]. This is because the number of functions computed by depth-1 CNOT circuits over LNN is given by the formula $\tfrac{{2}^{n+1}+{(-1)}^{n}}{3}$ , and this number is similar to ${2}^{n}$ . The comparison reveals that our GMS-based construction still gives a slight advantage.

3.2. Toffoli-n

We next consider the multiply-controlled gates, and specifically the multiple control Toffoli gates. We first focus on the 3-qubit (Toffoli-3) and the 4-qubit (Toffoli-4) cases.

The efficient use of GMS gates in the case of multiply-controlled NOT (Toffoli) has previously been shown for the Toffoli-3 [7], figure 2 and Toffoli-4 [17], equation (9) gates. In particular, reference [17] presents a GMS-based circuit decomposition for the triply-controlled Z gate, equivalent to the Toffoli-4 through conjugating the target by a pair of Hadamard gates. For convenience, we showed the respective constructions in figures 4(a) and (b). One may observe that in the case of the Toffoli-3 only 3 GMS gates are needed, compared to 5 local two-qubit gates [22, 29] and, for the Toffoli-4, only 7-GMS gates are needed, compared to 11 local two-qubit gates [29]. We note that, unlike in [29], the 7-GMS Toffoli-4 construction of [17] furthermore does not require an ancillary qubit.

**Figure 4.** GMS-based implementation of (a) Toffoli-3 [7] and (b) Toffoli-4 [17] without ancillary qubits.
Download figure:
Standard image High-resolution image

In pursuit of further gate count reduction, we consider employing ancillary qubits in our GMS-based construction of the n-qubit Toffoli gate. The employment of ancillary qubits to reduce the gate counts in constructing the Toffoli-n gate has in fact been extensively investigated in [19, 30], but in the context of relying on the local entangling gates. Using ancillae turns out to be helpful in the case of quantum circuits employing the GMS gate, as well. In the following, we show a step-by-step construction of the GMS-based ancilla-aided Toffoli-4 gate (we report no improvements to the Toffoli-3 circuit).

We start with a simple observation that the Toffoli-4 gate is equivalent to the CCCZ gate up to the conjugation by the Hadamard gates, such as the following illustration

The CCCZ $(a,b,c,d)$ gate performs the transformation

$\begin{eqnarray*}&&| {abcd}\rangle \mapsto {(-1)}^{{abcd}}| {abcd}\rangle ={({{\rm{e}}}^{{\rm{i}}\pi /8})}^{8{abcd}}| {abcd}\rangle ={w}_{16}^{8{abcd}}| {abcd}\rangle ,\end{eqnarray*}$

where w₁₆ is the primitive $16\mathrm{th}$ complex root of the number 1. Using mixed arithmetic equality $2{xy}=x+y-(x\oplus y)$ three times allows to rewrite the above formula as

$\begin{eqnarray*}&&| {abcd}\rangle \mapsto {w}_{16}^{a+b+c+d-(a\oplus b)-(a\oplus c)-(a\oplus d)-(b\oplus c)-(b\oplus d)-(c\oplus d)+(a\oplus b\oplus c)+(a\oplus b\oplus d)+(a\oplus c\oplus d)+(b\oplus c\oplus d)-(a\oplus b\oplus c\oplus d)}| {abcd}\rangle .\end{eqnarray*}$

This function can thus be implemented as a CNOT and $\mathrm{RZ}(\pm \pi /8)$ circuit by applying Z rotation with the positive sign to the linear terms $\{a,b,c,d,a\oplus b\oplus c,a\oplus b\oplus d,a\oplus c\oplus d,b\oplus c\oplus d\}$ and Z rotation with the negative sign to the terms $\{a\oplus b,a\oplus c,a\oplus d,b\oplus c,b\oplus d,c\oplus d,a\oplus b\oplus c\oplus d\}$ , with each such linear term obtainable by the CNOT gates. In the next, we will show how to induce all necessary CNOT gates to allow the application of the necessary RZ gates, using only a few GMS gates.

We first note that the linear functions with the single literate each, $\{a,b,c,d\}$ , are the original qubits provided to us on the input side of the circuit. Therefore, all length-1 linear terms may be implemented by simply applying $\mathrm{RZ}(\tfrac{\pi }{8})$ single-qubit rotation gates to each respective qubit. By doing so, we construct the circuit

using no GMS gates and implementing the transformation $| {abcd}\rangle \mapsto {w}_{16}^{a+b+c+d}| {abcd}\rangle$ . We next have to find how to apply as few as possible GMS gates in a way that enables to exercise the remaining 11 Z rotations.

To apply the Z rotation to the length-4 linear term, $a\oplus b\oplus c\oplus d$ , we introduce an ancillary qubit in the $| 0\rangle$ state, copy all qubits into it using a set of four CNOTs sharing the target, and then uncompute those CNOTs. This allows to apply one new Z rotation between the two layers of the CNOT gates, and the number of the GMS gates required to implement this construction is two, see figure 5. The circuit constructed thus far performs the transformation $| {abcd}\rangle \mapsto {w}_{16}^{a+b+c+d-(a\oplus b\oplus c\oplus d)}| {abcd}\rangle$ . Observe that each of the two sets of the CNOT gates on the left hand side of the circuit equality in figure 5 requires two GMS gates to be implemented (both are fan-in gates, considered earlier), for a total of four GMS gates, two GMS5 and two GMS4. However, the two GMS4 can be chosen with the opposite signs and they commute with all other gates we are about to introduce in the middle to cancel out. This means that only two GMS5 gates are needed in our construction.

**Figure 5.** Obtaining phase ${w}_{16}^{-(a\oplus b\oplus c\oplus d)}$ .
Download figure:
Standard image High-resolution image

**Figure 5.** Obtaining phase ${w}_{16}^{-(a\oplus b\oplus c\oplus d)}$ .
Download figure:
Standard image High-resolution image

We next need to apply the remaining 10 Z rotations to obtain the desired CCCZ gate. To do so, consider the following circuit identity

where the left hand side, trivially, performs a phase rotation by the angle θ applied to the linear function $x\oplus y$ and the right hand side reports an equivalent circuit based on the XX gate, up to a global phase. We can generalize this construction to n qubits, by replacing the XX gate with the GMS on the right hand side, while conjugating by the layer of Hamadards before and after. What this accomplishes is the application of phases to EXORs of all pairs of participating variables, as described by the circuit on the left hand side. Formally,

$\begin{eqnarray*}&&H[{x}_{1}]H[{x}_{2}]...H[{x}_{n}]\mathrm{GMS}n[{x}_{1},{x}_{2},\,\ldots ,\,{x}_{n}](\theta )H[{x}_{1}]H[{x}_{2}]...H[{x}_{n}]:\\ &&\quad | {x}_{1}{x}_{2}...{x}_{n}\rangle \mapsto {{\rm{e}}}^{{\rm{i}}\theta \displaystyle \sum _{j\lt k}{x}_{j}\oplus {x}_{k}}| {x}_{1}{x}_{2}...{x}_{n}\rangle .\end{eqnarray*}$

We next apply the above identity over GMS to our ongoing construction of the Toffoli-4 gate. To obtain length-3 linear functions, we may insert Hadamard-conjugated GMS5 $(\pi /8)$ in the middle of our current circuit (figure 5). The effect this has is the introduction of phase $\tfrac{\pi }{8}$ applied to all pairs of qubits participating in the construction. In the middle of the circuit the qubits we have are described by the linear functions $\{a,b,c,d,a\oplus b\oplus c\oplus d)\}$ . Thus, the set of EXOR pairs is $\{a\oplus b,a\oplus c,a\oplus d,b\oplus c,b\oplus d,c\oplus d,a\oplus b\oplus c,a\oplus b\oplus d,a\oplus c\oplus d,b\oplus c\oplus d\}$ . This means that the overall action preformed by the circuit with 3 GMS gates can be written as

$\begin{eqnarray*}&&| {abcd}\rangle \mapsto {w}_{16}^{a+b+c+d-(a\oplus b\oplus c\oplus d)+(a\oplus b)+(a\oplus c)+(a\oplus d)+(b\oplus c)+(b\oplus d)+(c\oplus d)+(a\oplus b\oplus c)+(a\oplus b\oplus d)+(a\oplus c\oplus d)+(b\oplus c\oplus d)}| {abcd}\rangle .\end{eqnarray*}$

Observe that the signs of the length-2 terms are not the ones we wanted to have. This may, however, be corrected by applying Hadamard-conjugated GMS4 $(-\pi /4)$ to the qubits $\{a,b,c,d\}$ , resulting in the phase correction by the product ${w}_{16}^{-2(a\oplus b)-2(a\oplus c)-2(a\oplus d)-2(b\oplus c)-2(b\oplus d)-2(c\oplus d)}$ , and leading to the desired transformation

$\begin{eqnarray*}&&| {abcd}\rangle \mapsto {w}_{16}^{a+b+c+d-(a\oplus b)-(a\oplus c)-(a\oplus d)-(b\oplus c)-(b\oplus d)-(c\oplus d)+(a\oplus b\oplus c)+(a\oplus b\oplus d)+(a\oplus c\oplus d)+(b\oplus c\oplus d)-(a\oplus b\oplus c\oplus d)}| {abcd}\rangle ,\end{eqnarray*}$

accomplished as a 4-GMS circuit shown in figure 6.

Using a similar approach, we can obtain a 3-GMS circuit implementing the CCZ gate on qubits a, b, and c, as follows

It is different from those reported in [7, 8].

In the remaining part of this subsection, we briefly outline an implementation of the n-qubit Toffoli gate using $3n-9$ GMS gates and $\tfrac{n-2}{2}$ ancillae for even n, and $3n-6$ GMS gates and $\tfrac{n-1}{2}$ ancillae for odd n, $n\geqslant 6$ . This beats $6n-12$ local CNOT gates result of [30], while using a comparable number of ancillae. Our construction relies on nesting efficient 3-GMS Toffoli-4 gates (shown in figure 9; we describe how to get to this construction in the next section), such as illustrated in figure 7, to obtain larger multiple control Toffoli gates. For odd n, one pair of 3-GMS Toffoli-3 gates needs to be used (equivalently, a set of two relative-phase Toffoli-3 gates, requiring 3 local entangling operations each [30]), explaining the difference between gate counts for odd and even n.

**Figure 7.** Ancilla-aided construction of the Toffoli-6 using a set of three Toffoli-4, each of which is constructible using three GMS gates.
Download figure:
Standard image High-resolution image

4. GMS with other parameters

So far, we focused on using GMS gates with all equal rotation angles ${\chi }_{{ij}}$ , and an arbitrarily selectable subset of qubits those global gates apply to. Such gates may not always be possible to obtain directly in an experiment. Indeed, one possible experimental setup [7] allows for the application of GMS gates affecting all n qubits participating in the computation. As such, an $(n-1)$ -qubit GMS gate may not be directly available on an n-qubit system. To circumvent this and enable smaller GMS gates, we propose the following. First, start with the circuit identity

Using the identity recursively, such as the next illustration,

we arrive at the conclusion that the $\mathrm{RZ}(\pi )$ gate effects a spin echo on the identical XX gates to its left and right, provided that the qubit that the $\mathrm{RZ}(\pi )$ applies to also participates in the XX gates, and as a result cancels out the respective XX interactions. Based on this property, figure 8 illustrates how to obtain an $(n-1)$ -qubit GMS gate out of two n-qubit GMS gates. The construction can be used iteratively to obtain global gates spanning arbitrarily selectable subsets of qubits, and enabling all constructions described in section 3 in the case when only the maximal size GMS gate is available. In fact, this inspired the construction of a more efficient Toffoli-4 implementation. Specifically, Toffoli-4 gate may be obtained using only 3 maximal size GMS gates on a 5-qubit machine. This is because substituting $\mathrm{GMS}4(-\pi /4)$ (Figure 8) into GMS-enabled implementation of the Toffoli-4 gate (figure 6) results in the circuit over 5 GMS5 gates, however, $\mathrm{GMS}5(\pi /8)$ used in figure 6 meets the newly introduced $\mathrm{GMS}5(-\pi /8)$ and they cancel out, reducing the GMS gate count to 3. This improved construction is illustrated in figure 9. Our optimized 3-GMS Toffoli-4 construction relies on notably fewer entangling pulses compared to 11 two-qubit gates in [29] or 7-GMS gates in [17].

**Figure 8.** GMS $(n-1)$ using two GMSn, illustrated in the case of n = 5.
Download figure:
Standard image High-resolution image

**Figure 8.** GMS $(n-1)$ using two GMSn, illustrated in the case of n = 5.
Download figure:
Standard image High-resolution image

**Figure 9.** Optimized implementation of the CCCZ gate using three GMS gates. Right hand side shows two Hadamard gates, removing which transforms CCCZ gate into Toffoli-4.
Download figure:
Standard image High-resolution image

The signs of ${\chi }_{{ij}}$ may furthermore be determined by the experiment such as the case in [1], disallowing their uniform assignment. It is, however, expected [11] that future trapped ions experiments will feature a fully controllable sign of the interaction, and this will not be an issue. Should the signs be uncontrollable, this provides an additional challenge, since the constructions in figures 1 and 2 rely on the ability to apply GMS gates with the inverted sign of the rotation angle. In case when the signs cannot be controlled individually, the inverse GMS gate can, in fact, be induced by the single-qubit corrections applied to the GMS gates with uncontrollable parameter signs as follows.

First, start with the following identity

$\begin{eqnarray*}&&{\mathrm{XX}}_{{ij}}^{\dagger }(\chi ):= {\mathrm{XX}}_{{ij}}(-\chi )=-i\cdot {\mathrm{RX}}_{i}(\pi ){\mathrm{RX}}_{j}(\pi ){\mathrm{XX}}_{{ij}}(\pi -\chi ),\end{eqnarray*}$

where ${\mathrm{RX}}_{k}$ denotes the RX gate applied to the ${k}^{\mathrm{th}}$ qubit. Using this identity $\tfrac{(n-1)n}{2}$ times allows to construct the ${\mathrm{GMS}}^{\dagger }$ using only one GMS gate, as follows:

$\begin{eqnarray*}&&{\mathrm{GMS}}^{\dagger }(\chi ):= \displaystyle \prod _{i=1}^{n}\displaystyle \prod _{j=i+1}^{n}{\mathrm{XX}}_{{ij}}(-\chi )\\ &&\quad =\,{(-i)}^{n(n-1)/2}\displaystyle \prod _{i=1}^{n}\displaystyle \prod _{j=i+1}^{n}{\mathrm{RX}}_{i}(\pi ){\mathrm{RX}}_{j}(\pi ){\mathrm{XX}}_{{ij}}(\pi -\chi )\\ &&\quad =\,{(-i)}^{n(n-1)/2}{\left(\displaystyle \prod _{i=1}^{n}{\mathrm{RX}}_{i}(\pi )\right)}^{n-1}\left(\displaystyle \prod _{i=1}^{n}\displaystyle \prod _{j=i+1}^{n}{\mathrm{XX}}_{{ij}}(\pi -\chi )\right)\\ &&\quad =\,{(-i)}^{n(n-1)/2}\left(\displaystyle \prod _{i=1}^{n}{\mathrm{RX}}_{i}((n-1)\pi )\right)\mathrm{GMS}(\pi -\chi ).\end{eqnarray*}$

In other words, whenever ${\mathrm{GMS}}^{\dagger }$ is not directly available due to the inability to invert the sign of the interactions, i.e., $\chi \in [0,\pi ]$ , the ${\mathrm{GMS}}^{\dagger }$ may be still be constructed with the use of a single GMS gate by taking the parameter value of $(\pi -\chi )\in [0,\pi ]$ , and performing single-qubit corrections. This enables constructions from figures 1 and 2 in the scenario with uncontrollable signs of the individual interactions within GMS.

4.1. Quantum Fourier arithmetic

Previously, we considered the case where all $| {\chi }_{{ij}}|$ are constant, regardless of the choice of i and j. However, it is possible that $| {\chi }_{{ij}}|$ drops off as a function of the distance $| i-j|$ . This may be natural given physical interaction strengths typically scale in the distance between qubits [31].

In case when the strength of the interaction falls off exponentially fast, as ${\chi }_{{ij}}\sim {2}^{-| i-j| }$ , it can be easily shown that the quantum Fourier transform (QFT) may be constructed efficiently using such global pulses. Specifically, the efficient implementation uses just $2n$ global pulses, as opposed to $\tfrac{(n-1)n}{2}$ local two-qubit gates, for a QFT of size n. This also enables an implementation of the quantum Fourier adder (QFA) [32] with only a linear number of global gates, rather than superlinear, making the Fourier-based arithmetic circuits more competitive than the Boolean counterparts [33]. Figures 10(a) and (b) show the QFT and QFA circuits. Figure 10(c) illustrates how the GMS gates may be used to deliver the reduced gate count scaling in constructing the Fourier circuits.

**Figure 10.** GMS-based QFT and QFA circuits. (a) shows the QFT circuit, where ${\theta }_{d}$ denotes a phase rotation gate with the rotation angle $\pi /{2}^{d}$ . (b) shows the QFA circuit, where ${\theta }_{d}^{{a}_{[j]}}$ denotes a phase rotation gate with the rotation angle $\pi /{2}^{d}$ , where a_[j] denotes the control qubit that corresponds to the $j\mathrm{th}$ bit value of the integer a of the input state $| a\rangle$ . (c) shows a subcircuit of the shape that repeatedly appears in (a) and (b), and how it may be implemented using only two GMS gates. The subscript *EXP* of GMS_EXP denotes the exponential drop off in the strength of the interaction, i.e., ${\chi }_{{ij}}\sim \pi /{2}^{| i-j| }$ .
Download figure:
Standard image High-resolution image

**Figure 10.** GMS-based QFT and QFA circuits. (a) shows the QFT circuit, where ${\theta }_{d}$ denotes a phase rotation gate with the rotation angle $\pi /{2}^{d}$ . (b) shows the QFA circuit, where ${\theta }_{d}^{{a}_{[j]}}$ denotes a phase rotation gate with the rotation angle $\pi /{2}^{d}$ , where a_[j] denotes the control qubit that corresponds to the $j\mathrm{th}$ bit value of the integer a of the input state $| a\rangle$ . (c) shows a subcircuit of the shape that repeatedly appears in (a) and (b), and how it may be implemented using only two GMS gates. The subscript *EXP* of GMS_EXP denotes the exponential drop off in the strength of the interaction, i.e., ${\chi }_{{ij}}\sim \pi /{2}^{| i-j| }$ .
Download figure:
Standard image High-resolution image

Unfortunately, the exponential drop off in the strength of the interaction appears to be unnatural. Instead, the decrease in the strength of the interaction as a power of the distance d as d^p, where $p\in [0,3]$ [11, 31] seems more realistic. This leads to the question of how well the desired exponential drop off can be approximated with such physical-level global gates. [34] provides an answer to this question. Specifically, the quality of such Fourier circuits (QFT, QFA) is well preserved even when we alter the fundamental form of the signal strength. For instance [34], when one replaces the exponential drop off, $\pi /{2}^{d}$ , where d is the distance between qubits, with a power law hierarchy, such as $\pi /{d}^{p}$ , one may choose the power p of the drop off power law such as to obtain the best possible quality of approximation. In fact, it has been calculated numerically that the power ${p}_{\mathrm{opt}}=1.4$ renders the maximum quality for the set of parameters considered in [34]. Fortunately, p = 1.4 is within the limits $p\in [0,3]$ [11, 31].

Motivated by the previous discussion, we next suggest an extended method of the power law approximation of the exponential drop off that is useful for quantum Fourier arithmetic circuits. Specifically, we propose using a few GMS gates to approximate a single stage of the exponentially dropping interaction strength (see figure 10(c)). This is in contrast to a simple replacement of the single-stage exponential drop off with a single-stage power law drop off, as was done in [34]. In particular, we numerically approximate the exponential drop off with a set of m power law drop offs, as follows,

$\begin{eqnarray}&&\displaystyle \frac{\pi }{{2}^{j}}\approx \displaystyle \sum _{i=1}^{m}\displaystyle \frac{\pi }{{b}_{i}{j}^{{p}_{i}}}.\end{eqnarray} \tag{ 3 }$

Since the circuit realization of each power law requires two GMS gates, the approximation by m power laws amounts to a cost of $2m$ GMS gates.

**Figure 11.** Fidelity of power law QFT with m = 2 near its analytically predicted optimum, ${b}_{1}=0.4$ , ${b}_{2}=-0.5$ , ${p}_{1}=2.5$ , ${p}_{2}=3.4$ . In the order of (a)–(d), we fix all four parameters except for (a) b₁, (b) b₂, (c) p₁, and (d) p₂. n = 10 (pluses), n = 12 (crosses), and n = 14 (asterisks).
Download figure:
Standard image High-resolution image

**Figure 11.** Fidelity of power law QFT with m = 2 near its analytically predicted optimum, ${b}_{1}=0.4$ , ${b}_{2}=-0.5$ , ${p}_{1}=2.5$ , ${p}_{2}=3.4$ . In the order of (a)–(d), we fix all four parameters except for (a) b₁, (b) b₂, (c) p₁, and (d) p₂. n = 10 (pluses), n = 12 (crosses), and n = 14 (asterisks).
Download figure:
Standard image High-resolution image

Our goal is to numerically determine a set of b_i and p_i such that the term (3) minimizes the approximation error, in order to best match the exact exponential drop off, as seen in the quantum Fourier arithmetic circuits. A straightforward generalization of the crude, yet efficient analytical works shown in [34] reveals that the fidelity of the QFT may be approximated by the term

$\begin{eqnarray}&&{F}_{{QFT}}\approx \exp \left\{-{\pi }^{2}\displaystyle \sum _{j=1}^{n}\displaystyle \frac{3(n-j)}{64}{\left[\displaystyle \frac{1}{{2}^{j}}-\left(\displaystyle \sum _{i=1}^{m}\displaystyle \frac{1}{{b}_{i}{j}^{{p}_{i}}}\right)\right]}^{2}\right\},\end{eqnarray} \tag{ 4 }$

meaning we obtain the best fidelity by minimizing the value of the sum in the exponent in (4). Minimizing the exponent in (4) analytically is a non-trivial task, and we thus resort to a numerical approximation. In particular, we restricted the search to $| {b}_{i}| \leqslant 0.6$ and $1.5\leqslant {p}_{i}\leqslant 4$ , closely following what may be achievable in the lab. We find that for m = 2 the selection of the values ${b}_{1}=0.4$ , ${b}_{2}=-0.5$ , ${p}_{1}=2.5$ , ${p}_{2}=3.4$ results in the minimal exponent in (4), that is consistent with the peaks in fidelity observed in figure 7 for the sample cases of the QFT with $n=10,12,$ and 14 qubits, see figure 11 for detail. The peak fidelity ${F}_{{peak}}\approx 1$ demonstrates a high quality of the double-power approximation of the exponential drop off, making the efficient GMS-based construction an attractive alternative in experiments.

We also conducted a similar numerical investigation for the QFA. This time, since j = 0, π-rotation (see (3)) in the QFA as shown in figure 10 needs to be implemented, we modify our power law expansion according to

$\begin{eqnarray}&&\displaystyle \frac{\pi }{{2}^{j}}\approx \displaystyle \sum _{i=1}^{m}\displaystyle \frac{\pi }{{b}_{i}{(j+1)}^{{p}_{i}}}.\end{eqnarray} \tag{ 5 }$

Once again, we numerically found the best choices of (b, p) pairs, as in the previous case of the QFT with m = 2, that result in the best performance.

5. Summary of the results

In table 1 we summarized the advantage of using global entangling pulse enabled constructions developed in this work over best known circuitry relying on both local entangling control and, when available, global entangling control. Columns #q and #e.g. show the number of physical qubits and the number of entangling gates needed to implement the operation specified by the column 'Operation', with various approaches to the entangling control specified by the names of the multicolumns. The benchmark functions used are Toffoli-n—the maximal size multiple control Toffoli gate over n qubits, AQFT-n—approximate QFT over n qubits, AQFA-n—approximate QFA of two n-bit numbers, and Tdistill—encoding circuit for the [[15, 1, 3]] code [26]. We selected between 2-GMS gate and 4-GMS gate approximations of the circuit layers (see figure 10(c)) to best demonstrate the advantage over two-qubit local control, and obtained the two-qubit gate count for AQFT and AQFA circuits over local control such as to match the performance of GMS-enabled constructions. We broke down the set of operations that benefit from GMS gates into three subsets—those suitable for near-term demonstration (selected by the virtue of relying on an already available number of qubits using a small number of entangling pulses; top circuit in the table), those targeted for next-generation machines (roughly, 10 to 15 qubits; second third of the table), and those applying to arbitrary n (bottom third of the table). Observe that for circuits suitable for the implementation over near-term and next-generation machines the advantage in the entangling gate count enabled by the global control is roughly by a factor 1.39 to 3.67, i.e., it is substantial. The minimal advantage shown is by a factor of 1.39 for the circuit AQFA-5. This circuit adds two 5-bit numbers and relies on the circuit layers with at most 5 two-qubit gates (see figure 10(b)). Such layers are too short to show a significant advantage in approximating by 2 or 4 GMS gates, and the advantage becomes more pronounced as the number of qubits grows. Specifically, the 20-qubit AQFA-10 already enjoys the optimization from 94 two-qubit local gates down to 53 entangling gates using a mix of global and local control, i.e., by a factor of 1.77.

Table 1. Advantages to the use of GMS gates. The numbers for the global control enabled implementations of the Toffoli-8..10 were obtained by combining the 3-GMS implementation of the Toffoli-3 from [8] with the nested construction illustrated in [22], figure 4.10. The two-qubit gate counts for AQFT-10..15 and AQFA-5..7 circuits were obtained using known standard constructions, so as to match the approximation quality of our GMS-enabled implementations while using the minimal number of local gates. A high number of N/A shown in the table may suggest that not enough effort has been put into developing implementations over global control yet, highlighting one of the main messages of our paper.

Operation	Local control		Global control (best known)		Global and local control (ours)
	#q	#e.g.	#q	#e.g.	#q	#e.g.
Toffoli-4	5	11 [29]	4	7 [17]	5	3
Toffoli-8	11	35 [29, 30]	13	33 [8, 22]	11	15
Toffoli-9	12	41 [29, 30]	15	39 [8, 22]	13	21
Toffoli-10	14	47 [29, 30]	17	45 [8, 22]	14	21
AQFT-10	10	30	N/A	N/A	10	17
AQFT-11	11	34	N/A	N/A	11	19
AQFT-12	12	38	N/A	N/A	12	21
AQFT-13	13	42	N/A	N/A	13	23
AQFT-14	14	46	N/A	N/A	14	25
AQFT-15	15	50	N/A	N/A	15	27
AQFA-5	10	32	N/A	N/A	10	23
AQFA-6	12	42	N/A	N/A	12	29
AQFA-7	14	58	N/A	N/A	14	35
Tdistill	15	34 [26]	N/A	N/A	15	10

Toffoli-n	$\lceil \tfrac{3n-3}{2}\rceil$	$6n-13$ [29, 30]	$2n-3$	$6n-15$ [8, 22]	$\lceil \tfrac{3n-2}{2}\rceil$	$6\lceil \tfrac{n}{2}\rceil -9$
Stabilizer	n	$O\left(\tfrac{{n}^{2}}{\mathrm{log}(n)}\right)$ [27, 28]	N/A	N/A	n	O(n)

6. Conclusion

In this paper, we studied the efficient use of a global entangling operator in realizing quantum circuitry of practical interest. We developed a number of circuit equalities using GMS gates, improving the accessibility of global entangling gates in quantum circuit constructions. Using various versions of the global entangling operator, we demonstrated the advantage in implementing stabilizer circuits, Toffoli-4 gate, Toffoli-n gate, Quantum Fourier Transformation, and Quantum Fourier Adder circuits. In each of the above, our circuits outperform best known circuitry in the scenario when the control is given by the two-qubit local addressable gates. Our conclusion is as follows: we believe that the control by a global entangling gate (an analog of single instruction, multiple data classical architecture) could be a helpful complement to the control by addressable two-qubit local gates.

Acknowledgments

Authors thank Professor Kenneth Brown from Georgia Institute of Technology and Professor Christopher Monroe from the University of Maryland—College Park for their discussions and help in the preparation of this manuscript.

This material was partially based on work supported by the National Science Foundation during DM's assignment at the Foundation. Any opinion, finding, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

YN acknowledges support from ARO MURI award W911NF-16-1-0349.

Use of global interactions in efficient quantum circuit constructions

Article metrics

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction

2. Global MS Gate

3. Efficient circuits using the GMS gate

3.1. Consecutive CNOTs: single-control many-target CNOT (fan-out), and many-control single-target CNOT (fan-in)

3.2. Toffoli-n

4. GMS with other parameters

4.1. Quantum Fourier arithmetic

5. Summary of the results

6. Conclusion

Acknowledgments

Use of global interactions in efficient quantum circuit constructions

Article metrics

Share this article

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction

2. Global MS Gate

3. Efficient circuits using the GMS gate

3.1. Consecutive CNOTs: single-control many-target CNOT (fan-out), and many-control single-target CNOT (fan-in)

3.2. Toffoli-n

4. GMS with other parameters

4.1. Quantum Fourier arithmetic

5. Summary of the results

6. Conclusion

Acknowledgments