Unitary quantum perceptron as efficient universal approximator(a)

E. Torrontegui; J. J. García-Ripoll

doi:10.1209/0295-5075/125/30004

Quantum computing and machine learning are two computing paradigms that fight the limitations of procedural programming. While the first one is based on a physically different model of computation, the second one reuses von Neumann architectures to build sophisticated approximation models that outperform traditional algorithms. Quantum machine learning merges ideas from both paradigms [1,2], to create new quantum algorithms such as engine ranking [3], data fitting [4], autoencoders [5,6], or autonomous agents [7].

In this work we challenge the notion of quantum neural networks, a term claimed by quantum machine learning works [8–19], which is far from settled [20]. A feed-forward neural network is made of perceptrons [21] that generate signals, $s_j=f(x_j)$ , as a nonlinear response to the weighted influence of other neurons, with some intrinsic biases $x_j=\sum_{k<j} w_{jk}s_{k}-\theta_j$ (cf. fig. 1(b)). Classical feed-forward networks are universal approximators of continuous functions [22] and are trained using reduced information to solve complex problems. A quantum analog of neural network faces the need of i) encoding the network in a Hilbert space, ii) defining a physical operation for the neuron activation, iii) designing an algorithm to train the network and, most important, iv) finding real-world applications of the quantum version.

**Fig. 1:** (a) Quantum perceptron as a qubit that excites coherently according to (1) with a probability $P_j=\frac{1}{2}(1+(\hat{\sigma}^z_j))=f(x_j)$ that grows nonlinearly with the activation potential x_j. (b) When this perceptron is integrated in a feed-forward neural network, the potential depends on neurons in earlier layers, *e.g.*, $x_{6}=\sum_{k=1}^{4} w_{6,k}\hat{\sigma}^z_k+\theta_{6}$ .
Download figure:
Standard image

We address these problems with a quantum perceptron that is a qubit with a nonlinear excitation response to an input field (cf. fig. 1(a))

$\begin{equation} \hat{U}_j(\hat{x}_j;f)|0_j\rangle = \sqrt{1 - f(\hat{x}_j)}|0_j\rangle + \sqrt{f(\hat{x}_j)} |1_j\rangle. \end{equation} \tag{ 1 }$

In a feed-forward network, the perceptron gate is conditioned on the field generated by neurons in earlier layers, $\hat{x}_j = \sum_{k<j} w_{jk}\hat{\sigma}^z_k - \theta_j$ , with similar weights w_jk and biases $\theta_j$ as classical networks. This allows us to prove that a network based on this perceptron is a universal approximator of arbitrary continuous functions. We also prove that the perceptron gate $\hat{U}_j(\hat{x}_j;f)$ has an efficient hardware implementation as a quasiadiabatic passage in an Ising model of interacting spins, with an implementation time that scales favorably $\mathcal{O}(L\times \log(\varepsilon/N)/\Omega_f)$ , with the number of layers L, number of neurons N, gate error ε and activation step size $\Omega_f$ (cf. fig. 1(a)). In addition to reproducing classical neural networks using quantum states, other applications of this perceptron include the design of multiqubit conditioned quantum gates, or the design of more general perceptrons with sophisticated response functions that can be applied in quantum sensing or classification of quantum states. Our perceptron is intimately related to a recent proposal by Cao et al. [23], which implements the nonlinear activation of a qubit using repeat-until-success quantum gates. As discussed later, our perceptron shares the same potential applications with various advantages: universality, scaling of resources, avoidance of phase wrapping (works for arbitrarily large $|x|$ ) and utility for general nonlinear sensing.

Classical neural networks

Classical neurons are modeled as a mathematical system which may become active $(s=1)$ or remain resting $(s=0)$ , as a response to the state of other n neurons. The neuron activation or perceptron [24] mechanism is the update rule

$\begin{equation} s_i'=f(x_i),\textrm{ with } x_i = \sum_{j=1}^nw_{ij}s_j-\theta_i, \end{equation} \tag{ 2 }$

which determines the probability $s_i'$ of the neuron being active. This rule involves an activation function f(x), the network topology induced by the weights w_ij and the intrinsic biases $\theta_i$ . When the activation f(x) is a step function, the neuron's response is bistable and reproduces the McCulloch and Pitts [25] model. However, it is more interesting to work with sigmoid functions —e.g., the logistic function $f(x_j)=1/(1+e^{-x_j})$ in fig. 1(a)— because they satisfy the conditions of the "universal approximation theorem" [22]. More precisely (see Supplementary Material Supplementarymaterial.pdf (SM)), any continuous function of N input bits $Q(s_1,\ldots,s_N)$ , can be approximated using the response of M additional neurons to those input bits, as $Q\simeq \sum_{k=N+1}^{N+M}\alpha_k s_k'$ . The weights α and w, and the biases θ, can be optimized or trained to minimize the approximation error, even when we ignore the function Q, such as in data classification and inference tasks. Even though the universal approximation theorem only requires two layers, the power of neural networks can be significantly enhanced using deep, nested architectures with multiple hidden layers. In particular, the final sum of the approximation theorem can be perfomed by one neuron, as shown in fig. 1(b), with $w \propto \alpha$ , to reconstruct the output function $Q \propto s_{final}'$ .

Quantum perceptron

Similarly to ref. [23], we implement a perceptron as a qubit that undergoes a SU(2) rotation (1) parameterized by an external input field $\hat{x}_j$ :

$\begin{eqnarray} \hat{U}_j(\hat{x}_j;f) &= \exp\left\{i \arcsin[f(\hat{x}_j)^{1/2}] \hat{\sigma}^y_j\right\}\textrm{ with}\\ \hat{x}_j &= \sum_{k<j} w_{jk} \hat{\sigma}^{z}_k - \theta_j. \end{eqnarray} \tag{ 3 }$

The perceptron qubit is characterized by quantum observables $(\hat{\sigma}^x,\hat{\sigma}^y,\hat{\sigma}^z)$ that rotate as

$\begin{equation} \begin{array}{l} \hat{\sigma}_{j}^{z\prime}=U^\dagger_j\hat{\sigma}_j^zU_j=C(\hat{x}_j)\hat{\sigma}_{j}^{z}+S(\hat{x}_j)\hat{\sigma}_{j}^{x},\\[5pt] \hat{\sigma}_{j}^{x\prime}=U^\dagger_j\hat{\sigma}_j^xU_j=-S(\hat{x}_j)\hat{\sigma}_{j}^{z}+C(\hat{x}_j)\hat{\sigma}_{j}^{x}, \end{array} \end{equation} \tag{ 4 }$

and $\hat{\sigma}^{y\prime}_j=\hat{\sigma}^y_j$ , with nonlinear functions $C(\hat{x}_j)=1-2f(\hat{x}_j)$ , $S(\hat{x}_j)=2\sqrt{f(\hat{x}_j)[1-f(\hat{x}_j)]}$ , that depend on the quantum field $\hat{x}_j$ generated by earlier neurons. This relation can be arbitrarily nested by the application of additional perceptron gates, that entangle those perceptrons with the input neurons and with earlier perceptrons, in a deep learning scheme. In this context, notice that when $w_{lj}\neq 0$ , perceptron l > j will be affected by the diagonal elements $\hat{\sigma}^z_j$ and the quantum fluctuations $\hat{\sigma}^x_j$ of the j-th perceptron, adding generalization power to the network.

The quantum perceptron contains the classical neural network as a limit and therefore satisfies the universal approximation theorem (see SM). Let us assume a three-layer setup such as the one in fig. 1(b), with the following conditions: i) we have N input qubits, M internal perceptrons and 1 output perceptron; ii) all perceptrons are initially in the unexcited state and the input layer is initialized to a classical input, $|s_1,s_2\ldots s_N\rangle |0_{N+1}\ldots 0_{N+M}0_{N+M+1}\rangle$ , iii) the final perceptron's weights and threshold are tuned to explore only the linear part of the sigmoid activation function $f(x)\propto x$ . Then, the output perceptron will be excited with a probability $s_{out}=\frac{1}{2}((\hat{\sigma}^z_{N+M+1})+1)$ ,

$\begin{align} s_{out}\simeq \sum_{j=1}^M w_{N+M+1,N+j} \biggl\langle f\biggl(\sum_{k=1}^Nw_{N+j,k}s_k-\theta_{N+j}\biggr)\biggr\rangle. \end{align} \tag{ 5 }$

This output probability is a linear combination of sigmoid functions of the input neurons: by virtue of the universal approximation theorem, this implies that s_out can be used to approximate any function $Q(\hat{\sigma}^z_1,\ldots,\hat{\sigma}^z_N)$ of the input neurons (see SM). This is true even when we do not measure the intermediate neurons —indeed, measuring those neurons introduces shot noise in the estimate of $\hat{\sigma}^z_{N+M+1}$ , deteriorating the approximation.

Implementation

The second and most important result in this work is that the perceptron gate can be implemented as a single (fast) adiabatic passage in a model of interacting spins, which opens the door to specialized hardware implementation of the quantum neural network. We construct the perceptron gate evolving a qubit with the Ising Hamiltonian

$\begin{eqnarray} \hat{H}(t) &= \frac{\hbar}{2}[-\Omega(t)\hat{\sigma}^x_j - \hat{x}_j \hat{\sigma}_j^z] \nonumber\\ &= \frac{\hbar}{2}\biggl[-\Omega(t)\hat{\sigma}^x_j+\theta_j\hat{\sigma}_j^z- \sum_{k<j} (\omega_{jk}\hat{\sigma}_k^z\hat{\sigma}_j^z)\biggr]. \end{eqnarray} \tag{ 6 }$

The qubit is controlled by an external transverse field $\Omega(t)$ , has a tuneable energy gap and interacts with other neurons through $\hat{x}_j$ . The instantaneous ground state of this Hamiltonian

$\begin{equation} |\Phi(\hat{x}_j/\Omega)\rangle = \sqrt{1-g(\hat{x}_j/\Omega)}|0\rangle +\sqrt{g(\hat{x}_j/\Omega)}|1\rangle \end{equation} \tag{ 7 }$

has a sigmoid excitation probability (cf. fig. 1(a), solid line)

$\begin{equation} g(x)=\frac{1}{2}(1+x/\sqrt{1+x^2}). \end{equation} \tag{ 8 }$

This suggests implementing the gate (1) in three steps: i) set the perceptron to the superposition $|+\rangle =\mathcal{H}|0\rangle =\frac{1}{\sqrt{2}}(|0\rangle +|1\rangle )$ with a Hadamard gate; ii) instantaneously boost the magnetic field $\Omega(0)=\Omega_0\gg |\hat{x}_j|$ ; iii) adiabatically ramp down the transverse field $\Omega(t_f)=\Omega_f$ in a time t_f, to do the transformation $\mathcal{A}(\hat{x}_j)|+\rangle \simeq |\Phi(\hat{x}_j/\Omega_f)\rangle$ .

As sketched in fig. 2, the energy gap in this protocol is larger than $|\Omega(t)|$ , ensuring many quasiadiabatic strategies $\Omega(t)$ to approximate $\hat{U}_j(\hat{x}_j;g)\simeq\mathcal{A}(\hat{x}_j)\mathcal{H}$ for $|\hat{x}_j|\leq |\hat{x}_\text{max}|\ll|\Omega_0|$ . We designed two: a linear ramp $\Omega(t) = \Omega_0(1-t/t_f)+\Omega_f t/t_f$ , and a FAQUAD (Fast-Quasi-Adiabatic passage) control [26] that limits non-adiabatic errors (see SM). As figure of merit we use the average fidelity

$\begin{equation} \bar{\mathcal{F}} = \int_{-x_\text{max}}^{x_\text{max}} \mathcal{F}[\Phi(x_j),\phi(t_f,x_j)]\mathrm{d}x_j \end{equation} \tag{ 9 }$

with $\mathcal{F}(\Phi,\phi) = |\langle\Phi(\hat{x}_j/\Omega_f)|\phi\rangle|^2$ and ϕ the final dynamical state driven by $\Omega (t)$ .

**Fig. 2:** Energy levels of the two-level system (6) as a function of the activation potential x_j. The perceptron gate begins with a large transverse field, $\Omega_0\gg |x_j|$ , such that the ground state is the approximate superposition $|+\rangle \propto |0\rangle + |1\rangle$ . When the transverse field is decreased, the state converges to $|\Phi(x_j/\Omega_f)\rangle$ given by (7).
Download figure:
Standard image

**Fig. 2:** Energy levels of the two-level system (6) as a function of the activation potential x_j. The perceptron gate begins with a large transverse field, $\Omega_0\gg |x_j|$ , such that the ground state is the approximate superposition $|+\rangle \propto |0\rangle + |1\rangle$ . When the transverse field is decreased, the state converges to $|\Phi(x_j/\Omega_f)\rangle$ given by (7).
Download figure:
Standard image

Figure 3(a) compares the linear and FAQUAD strategies to modify the transverse field. In fig. 3(b) we observe that for the same time t_f the FAQUAD protocol is more accurate; alternatively, given an error tolerance $\varepsilon=1-\mathcal{\bar{F}}$ , the FAQUAD design is 2–3 order of magnitudes faster than the linear ramp. The quantum perceptron also shows robustness against non-adiabatic passages and high deviations, beyond experimental errors, when scheduling the control (see SM). From approximate fits and using the adiabatic passage as reference, see fig. 3(b), we estimate that the total time for a perceptron gate to have an error ε scales as $t_{f,\varepsilon}=\mathcal{O}(\log(\varepsilon)^{1/0.15}\Omega_f^{-1})$ . When we have multiple neurons N spread over L layers, the gates of a single layer can be parallelized, keeping the total time bounded, but control errors accumulate exponentially with the number of qubits. A more realistic scaling that takes this into account is $T_{f,\varepsilon}=\mathcal{O}(L\times\log(\varepsilon/N)^{1/0.15}\Omega_f^{-1})$ .

**Fig. 3:** (a) Transverse field $\Omega(t)$ for the linear ramp (dashed line) and FAQUAD (solid line) protocols to implement the perceptron gate. (b) Average infidelity $1-\bar{\mathcal{F}}$ as a function of the total ramp time t_f, for the two ramp protocols. The FAQUAD process is fitted by $\sim c_0\exp[-c_1(\Omega_ft_f)^{c_2}]$ , $c_0=26.838, c_1= 6.577$ , and $c_2=0.150$ (black circles).
Download figure:
Standard image

We can compare this performance with a proposal for implementing a quantum perceptron using auxiliary qubits, conditioned rotations and measurements [23]. The gate implemented in that work is a rotation $\hat{U} = \exp[iq^{(k)}(x)\hat{\sigma}^y]$ with a nonlinear angle $q^{(k)}(x) = 2\arctan[\tan^{2^k}(x)]$ that converges to a step-wise function in the interval $x\in [-\pi/4,\pi/4]$ . This gate requires about k auxiliary qubits, a circuit depth $\mathcal{O}(14^k)$ and the total gate time scales polynomially $\mathcal{O}((n/\delta)^2)$ with the number of neurons per layer n and the step width $\delta\simeq \Omega_f$ of the network. An important point in the work by Cao et al. is that it demonstrates algorithmic applications for neural networks that are perfectly discriminating (rotation angles take values close to $\pi/2$ or 0 and P_j is either 0 or 1, as in the McCulloch and Pitts [25] model): those applications can also be reproduced with our Ising model perceptron by a suitable design of the final transverse field $\Omega_f$ and the biases $\theta_j$ . Finally, we have to remark that our perceptron's sigmoid response is easily tuned —the step size of q^(k) only takes fixed values $\simeq 2^{-k}$ — and it does not have wraparound problems. These advantages are relevant for broader applications such as sensing of unconstrained input fields x and are required for the perceptron to approximate arbitrary operations.

Parameterized quantum control

The quantum perceptron is an instance of a new problem in the optimal control theory [27,28]: to design a family of unitary operations that depend on a single parameter $\hat{U}_x\!\!\!: x\in[-x_\text{max},x_\text{max}]\to \textit{SU}~(2)$ , using a single control $\Omega(t)$ that does not have any knowledge of this parameter. The closest problem that we know of appears in NMR protocols for suppressing decoherence [29–32]: the external field x is created by an environment or residual cross-talk, and the goal is to preserve the quantum state $\hat{U}_x\sim 1$ or do the same unitary operation for any x. However, the quantum perceptron is far more general and includes other multiqubit gates.

For instance, the quantum perceptron can achieve multiqubit conditional quantum gates that have the form $\hat{W}_{mqb}=\exp[i Q(\hat{\sigma}^z_1,\ldots,\hat{\sigma}^z_{j-1})\hat{\sigma}^y_j]$ , with general continuous activation functions Q. The idea is to decompose the function Q as a linear combination of sigmoid excitation profiles $Q(\hat{\sigma}^z_1,\ldots,\hat{\sigma}^z_{j-1})\sim \sum_n \arcsin[f(\sum_{k<j}w_{jk}^{(n)}\hat{\sigma}^z_k-\theta_j^{(n)})]$ , reconstructing the multiqubit gate by several applications of perceptron gates with different parameters

$\begin{align} \hat{W}_{mqb} \simeq \prod_n \hat{U}_j\biggl(\sum_{k<j}w_{jk}^{(n)}\hat{\sigma}^z_k-\theta_j^{(n)}; f\biggr). \end{align} \tag{ 10 }$

Take for instance a XOR-like gate that flips a bit when the number of excited qubits are within a given range

$\begin{equation} s_{N+1} \to \bar{s}_{N+1}\; \mbox{if}\; M_1 < \sum_{i=1}^N s_i < M_2. \end{equation} \tag{ 11 }$

The ordinary XOR gate has N = 1 input, and thresholds $M_1=0$ and $M_2=2$ , but cannot be implemented using a single classical perceptron [33]. We can nevertheless implement the conditional logic (11) quantum mechanically, using two adiabatic passages with two different gaps $\theta_j^{(n)}$ and opposite signs of $\Omega_{0,f}$ for each passage, thus achieving upper and lower excitation thresholds (cf. in fig. 4).

**Fig. 4:** Perceptron responses that result from two applications of the nonlinear gate with different shifts and widths: we show ideal, non-differentiable curves (solid line) and optimized fits to the gate (dots) following eq. (10).
Download figure:
Standard image

Quantum sensing

Using the same ideas as for the design of multiqubit gates, we can engineer quantum sensors with responses that go beyond interference patterns. Such sensors would overcome the problems of phase wrapping, working as threshold- or range-sensors. As examples, fig. 4 shows two possible activations that are reconstructed with just two cycles of the perceptron gates: the rectangular shape (cf. fig. 4) required for the XOR gate (11), and a peaked response. Both examples were created using machine learning training algorithm in TensorFlow, recognizing that the product of unitaries in eq. (10) can be written as a single exponential where the rotation angle is an instance of a neural network.

Another application of the perceptron gate would be to reconstruct global properties from the signals sensed by multiple quantum sensors. Let us assume that we have an object with a property χ —a dipolar moment, a quadrupolar moment, a charge, etc. This object is the source of an electromagnetic field $\phi(x,t;\chi)$ that is ultimately detected by a set of N quantum sensors, whose state is changed: $\hat{\sigma}_{n}^{z\prime} \to \hat{U}_n^\dagger \hat{\sigma}^z_n \hat{U}_n$ , with $\hat{U}_n=\exp[-i\phi(x_i,t;\chi)\hat{\sigma}^y_n]$ . If the sensors are initially polarized, all in the same state, there will be a mapping between the values of the transformed $\hat{\sigma}^z_n$ to the desired property. In other words, $\tilde{\chi} \simeq Q(\hat{\sigma}^z_1,\ldots,\hat{\sigma}^z_n)$ . This suggests adopting a scheme such as the one in fig. 1(b), where the first layer would be the sensors and the final qubit will provide an approximation of the detected property $s_{out}\simeq \tilde{\chi}$ . Note that, not measuring neither the sensors nor the intermediate qubits, we achieve an enhanced sensitivity with respect to a classical estimate $\tilde{\chi} \simeq Q(\langle\hat{\sigma}^z_1\rangle,\ldots,\langle\hat{\sigma}^z_n\rangle)$ .

Conclusion

Summing up, we have introduced a quantum perceptron as a two-level system that exists in a superposition of resting and active states, and which reacts nonlinearly to the field generated by other neurons. When combined with other perceptrons in a neural network configuration, this nonlinear transformations acts as a universal approximator of arbitrary computable functions, and generator of sophisticated multiqubit operations beyond the Mølmer-Sørensen gate [34]. In the Supplementary Material we attach sophisticated numerical files (see Perceptrons.ipynb and Prime_number_tests.ipynb) to construct, train, and illustrate the approximation power and nesting of quantum perceptrons, training classically a small quantum network to detect prime numbers.

The second most important result is an implementation of the quantum perceptron gate as a quasiadiabatic passage on an Isign-type spin model. The resources in this implementation scale favorably with the network size and the total circuit error, and the adiabatic procedure has already been demonstrated in highly connected architectures with superconducting qubits [35–37], trapped ions [38,39] and nuclear magnetic resonance [40].

The perceptron gate is a multiqubit primitive that can be integrated in quantum computing environments —as primitives for the approximation of general discrete functions, as approximate classifiers of complex datasets, as implementation of a quantum oracle. The model of a quantum perceptron that we have introduced has other important ramifications, such as the design of complex controlled operations or the connection to quantum sensing sketched above. In particular, the image of the multi-layer perceptron circuit as a quantum sensor opens many interesting questions. For instance, how to define and optimize the sensitivity of these sensors? Can these threshold sensors be combined with other unitary operations, quantum states, etc.? If so, what are the quantum limits of threshold sensing vs. ordinary sensing of classical fields? We expect to address these and other questions in future works.

Acknowledgments

We acknowledge funding from MINECO/FEDER Project FIS2015-70856-P, CAM PRICYT project QUITEMAD+CM S2013-ICE2801, and Basque Government (Grant No. IT986-16).

Unitary quantum perceptron as efficient universal approximator^(a)

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

Abstract

Classical neural networks

Quantum perceptron

Implementation

Parameterized quantum control

Quantum sensing

Conclusion

Acknowledgments

Footnotes

Unitary quantum perceptron as efficient universal approximator(a)

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

Dates

Abstract

Classical neural networks

Quantum perceptron

Implementation

Parameterized quantum control

Quantum sensing

Conclusion

Acknowledgments

Footnotes

Unitary quantum perceptron as efficient universal approximator^(a)