1 Introduction

In 1961, Landauer identified a fundamental energetic requirement to perform logically-irreversible computations on nonvolatile memory [1]. Focusing on arguably the simplest case—erasing a bit of information—he found that one must supply at least \(k_\text {B}T \ln 2\) work energy (\(\approx 10^{-21} J\) at room temperature), eventually expelling this as heat. (Here, \(k_\text {B}\) is Boltzmann’s constant and T is the temperature of the computation’s ambient environment.)

Notably, though still underappreciated, Landauer had identified a thermodynamically-reversible transformation. And so, no entropy actually need be produced—energy is not irrevocably dissipated—at least in the quasistatic, thermodynamically-reversible limit required to meet Landauer’s bound.

Landauer’s original argument appealed to equilibrium statistical mechanics. Since his time, advances in nonequilibrium thermodynamics, though, showed that his bound on the required work follows from a modern version of the Second Law of thermodynamics [2]. (And, when the physical substrate’s dynamics are taken into account, this is the information processing Second Law (IPSL) [3].) These modern laws clarified many connections between information processing and thermodynamics, such as dissipation bounds due to system-state coarse-grainings [4], nanoscale information-heat engines [5], the relation of dissipation and fluctuating currents [6], and memory design [7].

Additional scalings recently emerged between computation time, space, reliability, thermodynamic efficiency, and robustness of information storage [8,9,10]. In contrast to Landauer’s bound, these tradeoffs involve thermodynamically-irreversible processes, implying that entropy production and therefore true heat dissipation is generally required depending on either practicality or design goals.

In addition to these tradeoffs, it is now clear that substantial energetic costs are incurred when using logic gates and allied information-processing modules to construct a computer. Especially so, when compared to custom designing hardware to optimally implement a particular computation [11].

Taken altogether these costs constitute a veritable Landauer’s Stack of the information-energy requirements for thermodynamic computing. Figure 1 illustrates Landauer’s Stack in the light of historical trends in the thermodynamic costs of performing elementary logic operations in CMOS technology. The units there are joules dissipated per logic operation. We take Landauer’s Stack to be the overhead including Landauer’s bound (\(k_\text {B}T \ln 2\) joules) up to the current (year 2020) energy dissipations due to information processing. Thus, the Stack is a hierarchy of energy expenditures that underlie contemporary digital computing—an arena of theoretically-predicted and as-yet unknown thermodynamic phenomena waiting detailed experimental exploration.

Fig. 1
figure 1

Historical trends in thermodynamic costs of performing elementary logic operations in CMOS technology quoted in energy dissipated (joules) per logic operation. Contemporary experimentally-accessible thermal resolution is approximately \(10^{-24}\) joules. Landauer’s Stack—Thermodynamic hierarchy of predicted “overhead” energy expenditures due to information processing that underlie contemporary digital computing, including Landauer’s Principle of logical irreversibility [1] (which is now seen as a consequence of the broader information processing Second Law \(\langle W \rangle \le k_\text {B}T \Delta h_\mu \) [3]): (a) Nonreciprocity [12]; (b) Computation rate \(1/\tau \) [9, 10]; (c) Accuracy: \(- \ln \epsilon \) [12]; (d) Storage stability; (e) Circuit modularity [11]; (f) Mismatched expectations [13, 14]; (g) Transitions between nonequilibrium steady-state storage states [15, 16]; and (h) Quantum coherence [17]. (2015 and prior portion of figure courtesy M. L. Roukes, data compiled from [18, and citations therein]. Landauer’s Stack cf. Table I in Ref. [19].) CMOS technology change to 3D device nodes around 2015 make linear feature size and its relation to energy costs largely incomparable afterwards [20,21,22,23]. There are, of course, other sources of energy dissipation in CMOS such as leakage currents that arise when electrons tunnel from gate to drain through a thin gate-oxide dielectric. Thermodynamically, this source is a kind of “housekeeping heat”, necessary to support the substrate’s electronic properties but not directly due to information processing

To account for spontaneous deviations that arise in small-scale systems, the Second Laws are now most properly expressed by exact equalities on probability distributions of possible energy fluctuations. These are the fluctuation theorems [24], from which the original Laws (in fact, inequalities) can be readily recovered. Augmenting the Stack, fluctuation theorems apply directly to information processing, elucidating further thermodynamic restrictions and structure [25,26,27,28].

The result is a rather more complete accounting for the energetic costs of thermodynamic computation, captured in the refined Landauer’s Stack of Fig. 1. In this spirit, here we report new bounds on the work required to compute in the very important case of computations driven externally by time-symmetric control protocols [12]. In surprising contrast to the finite energy cost of erasure identified by Landauer, here we demonstrate that the scaling of the minimum required energy diverges as a function of accuracy and so can dominate Landauer’s Stack. This serves the main goal in the following to validate and demonstrate the tightness of Ref. [12]’s thermodynamic bounds and do so in Landauer’s original setting of information erasure.

In essence, our argument is as follows. Energy dissipation in thermodynamic transformations is strongly related to entropy production. The fluctuation theorems establish that entropy production depends on both forward and reverse dynamics. Thus, when determining bounds on dissipation in thermodynamic computing, one has to examine both when the control protocol is applied in forward and reverse. By considering time-symmetric protocols we substantially augment Landauer and Bennett’s dissipation bound on logical irreversibility [29] with dissipation due to logical nonselfinvertibility (aka nonreciprocity). Our results therefore complement recent work on the consequences of logical and thermodynamic reversibility [30]. Parallel work on thermodynamic bounds for information processing in finite time, and bit-erasure in particular, include the use of optimized control in the linear response regime [31,32,33] and transport theory [34,35,36]. However, the cost of nonreciprocity necessarily goes beyond the cost of finite-time computing, because time-symmetrically driven computations incur this additional dissipation regardless of the rate at which they’re executed.

Why time-symmetric protocols? Modern digital computers are driven by sinusoidal line voltages and square-wave clock pulses. These control signals function as control parameters, directly altering the energetics and therefore guiding dynamics of the computer components. Being time-symmetric control signals, modern digital computers must then obey Ref. [12]’s error-dissipation trade-off. Moreover, the costs apply to even the most basic of computational tasks—such as bit erasure. Here, we present protocols for time-symmetrically implementing erasure in two different frameworks and demonstrate that both satisfy the new bounds. Moreover, many protocols approach the bounds quite closely, indicating that they may in fact be broadly achievable.

After a brief review of the general theory, we begin with an analysis of erasure implemented with the simple framework of two-state rate equations, demonstrating the validity of the bound for different protocols of increasing reliability. We then expand our framework to fully simulated collections of particles erased in an underdamped Langevin double-well potential, seeing the same faithfulness to the bound for a wide variety of different erasure protocols. We conclude with a call for follow-on efforts to analyze even more efficient computing that can arise from time-asymmetric protocols.

2 Dissipation in Thermodynamic Computing

Consider a universe consisting of a computing device—the system under study (SUS), a thermal environment at fixed inverse temperature \(\beta = 1 / k_\text {B} T\), and a laboratory device (lab) that includes a work reservoir. The set of possible microstates for the SUS is denoted \(\varvec{{\mathcal {S}}}\), with \(s\) denoting an individual SUS microstate. The SUS is driven by a control parameter \(x\) generated by the lab. The SUS is also in contact with the thermal environment.

The overall evolution occurs from time \(t = 0\) to \(t = \tau \) and is determined by two components. The first is the SUS’s Hamiltonian \({\mathcal {H}}_{SL}(s, x)\) that specifies its interaction with the lab device and determines (part of) the rates of change of the SUS coordinates consistent with Hamiltonian mechanics. We refer to the possible values of the Hamiltonian as the SUS energies. The second component is the thermal environment which exerts a stochastic influence on the system dynamics.

We design the lab to guarantee that a specific control parameter value x(t) is applied to the SUS at every time t over the time interval \(t \in (0, \tau )\). That is, the control parameter evolves deterministically as a function of time. The deterministic trajectory taken by the control parameter x(t) over the computation interval is the control protocol, denoted by \(\overrightarrow{x}\). The SUS microstate s(t) exhibits a response to the control protocol over the interval, following a stochastic trajectory denoted \(\overrightarrow{s}\).

For a given microstate trajectory \(\overrightarrow{s}\), the net energy transferred from the lab to the SUS is defined as the work, which has the following form [5]:

$$\begin{aligned} W(\overrightarrow{s}) = \int _0^\tau dt\, {{\dot{x}}} (t) \frac{\partial {\mathcal {H}}_{SL}}{\partial x}\Big |_{(s(t), x(t))} ~. \end{aligned}$$

This is the energy accumulated in the SUS directly caused by changes in the control parameter.

Given an initial microstate \(s_0\), the probability of a microstate trajectory \(\overrightarrow{s}\) conditioned on starting in \(s_0\) is denoted:

$$\begin{aligned} P(\overrightarrow{s}| s_0) = \Pr _{ \overrightarrow{x}}(\overrightarrow{\mathcal {S}}=\overrightarrow{s}| {\mathcal {S}}_0=s_0) ~. \end{aligned}$$

With the SUS initialized in microstate distribution \(\varvec{\mu }_{0}\), the unconditioned forward process gives the probability of trajectory \(\overrightarrow{s}\):

$$\begin{aligned} P(\overrightarrow{s}) = P(\overrightarrow{s}| s(0)) \varvec{\mu }_{0}(s(0)) ~. \end{aligned}$$

Detailed fluctuation theorems (DFTs) [37, 38] determine thermodynamic properties of the computation by comparing the forward process to the reverse process. This requires determining the conditional probability of trajectories under time-reversed control:

The reverse control protocol is , where \(x^\dagger \) is x, but with all time-odd components (e.g., magnetic field) flipped in sign. And, the reverse process results from the application of this dynamic to the final distribution \(\varvec{\mu }_{\tau }\) of the forward process with microstates conjugated:

$$\begin{aligned} R(\overrightarrow{s})&= R(\overrightarrow{s}| s(0)) \varvec{\mu }_{\tau }(s(0)^\dagger ) ~. \end{aligned}$$

The Crooks DFT [38] then gives an equality on both the dissipated work (or entropy production) that is produced as well as the required work for a given trajectory induced by the protocol:

here is itself a SUS microstate trajectory with .

Due to their practical relevance, we consider protocols that are symmetric under time reversal . That is, the reverse-process probability of trajectory \(\overrightarrow{s}\) conditioned on starting in microstate \(s_0\) is the same as that of the forward process:

$$\begin{aligned} R(\overrightarrow{s}| s_0) = P(\overrightarrow{s}| s_0) ~. \end{aligned}$$

However, the unconditional reverse-process probability of the trajectory \(\overrightarrow{s}\) is then:

$$\begin{aligned} R(\overrightarrow{s})&= P(\overrightarrow{s}| s(0)) \varvec{\mu }_{\tau }(s(0)^\dagger ) ~. \end{aligned}$$

This leads to a version of Crooks’ DFT that can be used to set modified bounds on a computation’s dissipation:

(1)

Suppose, now, that the final and initial SUS Hamiltonian configurations \({\mathcal {H}}_{SL}(s,x(\tau ))\) and \({\mathcal {H}}_{SL}(s,x(0))\) are both designed to store the same information about the SUS. The SUS microstates are partitioned into locally-stable regions that are separated by large energy barriers in these energy landscapes. On some time scale, a state initialized in one of these regions has a very low probability of escape and instead locally equilibrates to its locally-stable region. These regions can thus be used to store information for periods of time controlled by the energy barrier heights. Collectively, we refer to these regions as memory states \(\varvec{{\mathcal {M}}}\).

Then the probability of the system evolving to a memory state \(m'\in \varvec{{\mathcal {M}}}\) given that it starts in a memory state \(m\in \varvec{{\mathcal {M}}}\) under either the forward or reverse process is:

$$\begin{aligned} P'(m\rightarrow m')&= \frac{\int d\overrightarrow{s}\bigl [ \!\! \bigl [s(0) \in m\wedge s(\tau ) \in m'\bigr ] \!\! \bigr ]P(\overrightarrow{s})}{\int d\overrightarrow{s}\bigl [ \!\! \bigl [s(0) \in m\bigr ] \!\! \bigr ]P(\overrightarrow{s})} ~, \end{aligned}$$

where \(\bigl [ \!\! \bigl [E \bigr ] \!\! \bigr ]\) evaluates to one if expression E is true and zero otherwise.

To simplify the development, suppose that the energy landscape of each memory state looks the same locally. That is, up to translation and possibly reflection and rotation, each memory state spans the same volume in microstate space and has the same energies at each of those states. Further, suppose that the SUS starts and ends in a metastable distribution, differing from global equilibrium only in the weight that each memory state is given in the distribution. Otherwise, the distributions look identical to the global equilibrium distribution at the local scale of any memory state. This ensures that the average change in SUS energy is zero, simplifying the change \(\Delta \) in nonequilibrium free energy \({\mathcal {F}}\) [12]:

$$\begin{aligned} \Delta {\mathcal {F}}= -\beta ^{-1} \Delta H({\mathcal {M}}_t) ~, \end{aligned}$$

where \(H( \cdot )\) is the Shannon entropy (in nats), and \({\mathcal {M}}_t\) is the random variable for the memory state at time t. Finally, suppose that the time reversal of a microstate changes neither the memory state it exists in nor its equilibrium probability, for any time during the protocol. This holds for memory states distinguished primarily by the positions of the system particles and system Hamiltonians that are unchanging under time reversal. See Ref. [12] for details behind these assumptions and generalized bounds without them.

Then we have the following inequality:

$$\begin{aligned} \beta \langle W_\text {diss} \rangle&\ge \Delta H({\mathcal {M}}_t) + \!\! \sum _{m\in \varvec{{\mathcal {M}}}} \!\! \varvec{\mu }_{0}'(m) \!\! \sum _{m'\in \varvec{{\mathcal {M}}}} \!\! d(m, m') , \end{aligned}$$
(2)

where:

$$\begin{aligned} \varvec{\mu }_{0}'(m)&= \int ds\bigl [ \!\! \bigl [s\in m\bigr ] \!\! \bigr ]\varvec{\mu }_{0}(s)&\text {and} \\ d(m, m')&= P'(m\rightarrow m') \ln \frac{P'(m\rightarrow m')}{P'(m'\rightarrow m)} ~. \end{aligned}$$

See Appendix A for a proof sketch.

Recalling that \(\beta \langle W_\text {diss}(\overrightarrow{s})\rangle = \beta ( \langle W (\overrightarrow{s})\rangle - \Delta {\mathcal {F}} )\) and appealing to the inequality in Eq. (2), we find a simple bound on the average work over the protocol:

$$\begin{aligned} \beta \langle W \rangle&\ge \sum _{m\in \varvec{{\mathcal {M}}}} \varvec{\mu }_{0}'(m) \sum _{m'\in \varvec{{\mathcal {M}}}} d(m, m') \\&\equiv \beta \langle W \rangle _\text {min}^\text {t-sym}\nonumber ~. \end{aligned}$$
(3)

This provides a bound on the work that depends solely on the logical operation of the computation, but goes beyond Landauer’s bound.

Since we are addressing modern computing, we consider processes that approximate deterministic computations on the memory states. For such computations there exists a computation function \(C: \varvec{{\mathcal {M}}}\rightarrow \varvec{{\mathcal {M}}}\) such that the physically-implemented stochastic map approximates the desired function up to some small error. That is, \(P'(m\rightarrow C(m)) = 1 - \epsilon _m\) where \(0 < \epsilon _m\ll 1\). In fact, we require all relevant errors to be bound by a small error-threshold \(\epsilon \ll 1\). That is, for all \(m' \ne C(m)\) let \(P'(m, m') = \epsilon _{{m}\rightarrow {m'}}\) so that \(0 \le \sum _{m'\ne C(m)} \epsilon _{{m}\rightarrow {m'}} = \epsilon _m\le \epsilon \ll 1\).

We can then simplify Eq. (3)’s bound in the limit of small \(\epsilon \). First, we show that \(d(m, m') \ge 0\) for any pair of \(m, m'\) in the small \(\epsilon \) limit, where we have:

$$\begin{aligned} d(m, m')&= P'({m} \rightarrow {m'}) \ln \frac{P'({m} \rightarrow {m'})}{P'({m'} \rightarrow {m})} \\&\ge P'({m} \rightarrow {m'}) \ln P'({m} \rightarrow {m'}) ~. \end{aligned}$$

If \(C(m) = m'\), then \(P'({m} \rightarrow {m'}) = 1 - \epsilon _m\ge 1 - \epsilon \), so that:

$$\begin{aligned} d(m, m')&\ge (1-\epsilon ) \ln (1 - \epsilon ) ~, \end{aligned}$$

which vanishes as \(\epsilon \rightarrow 0\). And, if \(C(m) \ne m'\), then \(P'({m} \rightarrow {m'}) = \epsilon _{{m}\rightarrow {m'}}\), so that:

$$\begin{aligned} d(m, m')&\ge \epsilon _{{m}\rightarrow {m'}} \ln \epsilon _{{m}\rightarrow {m'}} ~, \end{aligned}$$

which also vanishes as \(\epsilon \rightarrow 0\). Setting this asymptotic lower bound on the dissipation of each transition facilitates isolating divergent contributions, such as those we now consider.

An unreciprocated memory transition \(C(m) = m'\) is one that does not map back to itself: \(C(m') \ne m\). The contribution to the dissipation bound is:

$$\begin{aligned} d(m, m')&= (1-\epsilon _m) \ln \frac{1-\epsilon _m}{\epsilon _{{m'}\rightarrow {m}}} \\&\ge (1-\epsilon ) \ln \frac{1-\epsilon }{\epsilon } ~. \end{aligned}$$

As \(\epsilon \rightarrow 0\), this gives:

$$\begin{aligned} d(m, m') \ge \ln \epsilon ^{-1} ~. \end{aligned}$$
(4)

That is, as computational accuracy increases (\(\epsilon \rightarrow 0\)), \(d(m, m')\) diverges. This means the minimum-required work (Eq. (3)) must then also diverge.

We then arrive at our simplified bound for the small-\(\epsilon \) high-accuracy limit from Eq. (3)’s inequality on dissipation by only including the contribution from unreciprocated transitions \(m' = C(m)\) for which \(m \ne C(m')\):

$$\begin{aligned} \beta \langle W \rangle&\ge \ln (\epsilon ^{-1}) \sum _{m\in \varvec{{\mathcal {M}}}} \varvec{\mu }_{0}'(m) \bigl [ \!\! \bigl [C(C(m)) \ne m\bigr ] \!\! \bigr ]\\&\equiv \beta \langle W \rangle _\text {min}^\text {approx}~. \nonumber \end{aligned}$$
(5)

In this way, we see how computational accuracy drives a thermodynamic cost that diverges, overwhelming the Landauer-erasure cost. A similar logarithmic relationship between dissipated work and error was demonstrated in the context of the adaptation accuracy of Escherichia coli and other simple biological systems [39].

The bound in Eq. (5) also applies to digital computing such as that performed with dynamic random-access memory (DRAM). We recognize that its operation places the device in a nonequilibrium steady state, appearing to negate the applicability of Crooks’ fluctuation theorem in Eq. (1). However, the remedy for systems whose steady states are nonequilibrium is simply to replace the equality with an inequality, implying that more work must be dissipated than in the case of a local-equilibrium steady state [40]. Thus, our derived bounds must still hold for these modern computing devices.

3 Erasure Thermodynamics

Inequalities Eqs. (3) and (5) place severe constraints on the work required to process information via time-symmetric control on memories. The question remains, though, whether or not these bounds can actually be met by specific protocols or if there might be still tighter bounds to be discovered.

To help answer this question, we turn to the case, originally highlighted by Landauer [1], of erasing a single bit of information. This remarkably simple case of computing has held disproportionate sway in the development of thermodynamic computing compared to other elementary operations. The following does not deviate from this habit, showing, in fact, that there remain fundamental issues. We explore this via two different implementations: The first described via two-state rate equations and the second with an underdamped double-well potential—Landauer’s original, preferred setting.

Suppose the SUS supports two (mesoscopic) memory states, labeled \(\text {L}\) and \(\text {R}\). The task of a time-symmetric protocol that implements erasure is to guide the SUS microscopic dynamics that starts with an initial \(50-50\) distribution over the two memory states to a final distribution as biased as possible onto the \(\text {L}\) state. The logical function \(C\) of perfect bit erasure is attained when \(C(\text {L}) = C(\text {R}) = \text {L}\), setting either memory state to \(\text {L}\). The probabilities of incorrectly sending an \(\text {L}\) state to \(\text {R}\) and an \(\text {R}\) state to \(\text {R}\) are denoted \(\epsilon _\text {L}\) and \(\epsilon _\text {R}\), respectively.

Error generation is described by the binary asymmetric channel [41]—the erasure channel \({\mathcal {E}}\) with conditional probabilities:

For any erasure implementation, this Markov transition matrix gives the error rate \(\epsilon _\text {L}=\epsilon _{\text {L}\rightarrow \text {R}}\) from initial memory state \({\mathcal {M}}_0=\text {L}\) and the error rate \(\epsilon _\text {R}=\epsilon _{\text {R}\rightarrow \text {R}}\) from the initial memory state \({\mathcal {M}}_0=\text {R}\).

Noting first that \(d(m, m) = 0\) generically, we then have:

$$\begin{aligned} d(\text {L}, \text {R})&= \epsilon _\text {L}\ln \frac{\epsilon _\text {L}}{1-\epsilon _\text {R}} ~, \\ d(\text {R}, \text {L})&= (1- \epsilon _\text {R}) \ln \frac{1-\epsilon _\text {R}}{\epsilon _\text {L}} ~. \end{aligned}$$

So, the bound of Eq. (3) simplifies to:

$$\begin{aligned} \beta \langle W \rangle _\text {min}^\text {t-sym}&= \frac{1}{2} \epsilon _\text {L}\ln \frac{\epsilon _\text {L}}{1 - \epsilon _\text {R}} + \frac{1}{2} (1 - \epsilon _\text {R}) \ln \frac{1 - \epsilon _\text {R}}{\epsilon _\text {L}} \nonumber \\&= \Big ( \frac{1}{2} - \langle \epsilon \rangle \Big ) \ln \frac{1-\epsilon _\text {R}}{\epsilon _\text {L}} ~, \end{aligned}$$
(6)

where \(\langle \epsilon \rangle = (\epsilon _\text {L}+ \epsilon _\text {R})/2\) is the average error for the process.

Notice further that \(C(C(\text {L})) = \text {L}\) but \(C(C(\text {R})) \ne \text {R}\), indicating that only the computation on \(\text {R}\) is nonreciprocal. Therefore, the bound of Eq. (5) simplifies to:

$$\begin{aligned} \beta \langle W \rangle _\text {min}^\text {approx}&= \frac{1}{2} \ln (\epsilon ^{-1}) ~. \end{aligned}$$
(7)

Applying Eq. (7) to DRAM directly provides a quantitative comparison beyond a formal divergence of energy costs. Contemporary DRAM exhibits a range of “soft” error rates around \(10^{-22}\) failures per write operation [42]. In fact, each write operation is effectively an erasure. (The quoted statistic is an average of 4, 000 correctable errors per 128 MB DIMM per year.) Using Eq. (7), this gives a thermodynamic cost of \(25~k_\text {B}T\), which is markedly larger than Landauer’s \(k_\text {B}T\ln 2\) factor. It is also, just as clearly, smaller by a factor of roughly 10 than the contemporary energy costs per logic operation displayed in Fig. 1. These numerical results on the ability to meet our bounds for the case of bit erasure support the conclusion that modern computers can still be improved in efficiency, despite that efficiency being ultimately limited by the bounds we introduced. The conclusion is further reinforced by the numerical simulations in the following sections that nearly achieve our theoretical bounds.

3.1 Erasure with Two-state Rate Equations

A direct test of time-symmetric erasure requires only a simple two-state system that evolves under a rate equation:

$$\begin{aligned}&\frac{d \Pr ({\mathcal {M}}_t=m)}{dt}=\sum _{m'} \big [ r_{m' \rightarrow m}(t) \Pr ({\mathcal {M}}_t=m')-r_{m \rightarrow m'}(t) \Pr ({\mathcal {M}}_t=m) \big ], \end{aligned}$$
(8)

obeying the Arrhenius equations:

$$\begin{aligned} r_{\text {R}\rightarrow \text {L}}(t) = Ae^{-\Delta E_\text {R}(t)/ k_\text {B}T} ~\text {and } r_{\text {L}\rightarrow \text {R}}(t) = Ae^{-\Delta E_\text {L}(t) / k_\text {B}T} ~, \end{aligned}$$

where the states are labeled \(\{\text {L}, \text {R}\}\) and the terms \(\Delta E_\text {R}(t)\) and \(\Delta E_\text {L}(t)\) in the exponentials are the activation energies to transit over the energy barrier at time t for the Right and Left wells, respectively.

These dynamics can be interpreted as a coarse-graining of thermal motion in a double-well potential energy landscape V(qt) over the positional variable q at time t. Above, A is an arbitrary constant, which is fixed for the dynamics. \(q^*_\text {R}\) and \(q^*_\text {L}\) are the locations of the Right and Left potential well minima, respectively. Thus, assuming that \(q=0\) is the location of the barrier’s maximum between them, we see that the activation energies can be expressed as \(\Delta E_\text {R}(t) =V(0,t)-V(q^*_\text {R},t)\) and \(\Delta E_\text {L}(t) =V(0,t)-V(q^*_\text {L},t)\). By varying the potential energy extrema \(V(q^*_\text {R},t)\), \(V(q^*_\text {L},t)\), and V(0, t) we control the dynamics of the observed variables \(\{ \text {L}, \text {R}\}\) in much the same way as is done with physical implementations of erasure where barrier height and tilt are controlled in a double-well [43].

Deviating from previous investigations of efficient erasure, where Landauer’s bound was nearly achieved over long times [43, 44], here the constraint to time-symmetric driving over the interval \(t \in (0, \tau )\) results in additional dissipated work. As Landauer described [1], erasure can be implemented by turning on and off a tilt from \(\text {R}\) to \(\text {L}\)—a time-symmetric protocol. However, to achieve higher accuracy, we also lower the barrier while the system is tilted energetically towards the \(\text {L}\) well.

Consider a family of control protocols that fit the profile shown in Fig. 2. First, we increase the energy tilt from \(\text {R}\) to \(\text {L}\) via the energy difference \(V(q^*_\text {R},t)-V(q^*_\text {L},t)\) measured in units of \(k_\text {B}T\). This increases the relative probability of transitioning \(\text {R}\) to \(\text {L}\). However, with the energy barrier at its maximum height, the transition takes quite some time. Thus, we reduce the energy barrier V(0, t) to its minimum height halfway through the protocol \(t= \tau /2\). Then, we reverse the protocol, raising the barrier back to its default height to hold the probability distribution fixed in the well and untilt so that the system resets to its default double-well potential.

Increasing the maximum tilt—given by \(V(q^*_\text {R},\tau /2)-V(q^*_\text {L},\tau /2)\) at the halfway time—increases erasure accuracy. Figure 3 shows that the maximum error \(\epsilon = \max \{ \epsilon _\text {R},\epsilon _\text {L}\}\) decreases nearly exponentially with increased maximum energy difference between left and right, going below 1 error in every 1000 trials for our parameter range. Note that \(\epsilon \) starts at a very high value (greater than 1/2) for zero tilt, since the probability \(\epsilon _\text {R}=\epsilon \) of ending in the \(\text {R}\) well starting in the \(\text {R}\) well is very high if there is no tilt to push the system out of the \(\text {R}\) well.

Figure 3 also shows the relationship between the work and the bounds described above. Given that our system consists of two states \(\{\text {L}, \text {R}\}\) and that we choose a control protocol that keeps the energy \(V(q^*_\text {L},t)\) on the left fixed, the work (marked by green \(+\)s in the figure) is [5]:

$$\begin{aligned} \langle W \rangle&= \int _0^\tau dt \sum _s \Pr ({\mathcal {S}}_t=s)\partial _t V(s,t) \\&= \int _0^\tau dt \Pr ({\mathcal {M}}_t=\text {R})\partial _tV(q^*_\text {R},t) ~. \end{aligned}$$

This work increases almost linearly as the error reduces exponentially.

Fig. 2
figure 2

Time-symmetric control protocol for implementing moderately-efficient erasure. This should be compared to Landauer’s original time-symmetric protocol [1]. It starts by tilting—increasing the difference in potential energy \((V(q^*_\text {R},t)-V(q^*_\text {L},t))/ k_\text {B}T\) between \(\text {L}\) and \(\text {R}\). We increase this value such that transitions are more likely to go from \(\text {R}\) to \(\text {L}\). Then we reduce the barrier height V(0, t) to increase the total flow rate. Finally, we reverse the previous steps, cutting off the flow by raising the barrier, then untilting

Fig. 3
figure 3

(Top) Maximum error \(\epsilon \) (blue dots) decreases approximately exponentially with increasing maximum tilt. The latter is given by the maximum energy difference between the right and left energy well \(V(q^*_\text {R},\tau /2)-V(q^*_\text {L},\tau /2)\). (Bottom) Work \(\langle W \rangle \) (green \(+\)s), scaled by the inverse temperature \(\beta =1/ k_\text {B}T\), increases with increasing maximum tilt and decreasing error. The Landauer work bound \(\langle W\rangle ^\text {Landauer}_\text {min}\) (orange \(\times \)s) is a very weak bound, asymptoting to a constant value rather than continuing to increase, as the work does. The bound \(\langle W\rangle ^{t\text {-sym}}_\text {min}\) (blue circles) on time-symmetrically driven protocols, on the other hand, is a very tight bound for lower values of maximum tilt. The work deviates from the time-symmetric bound for higher tilts. Finally, the approximate bound \(\langle W\rangle _\text {min}^\text {approx}\) (red \(+\)s), which scales as \(\ln \epsilon ^{-1}\), is not an accurate bound over the entire range, but it very closely matches the exact time-symmetric bound \(\langle W\rangle ^{t\text {-sym}}_\text {min}\) for small \(\epsilon \), as expected

As a first comparison, note that the Landauer bound \(\langle W \rangle ^\text {Landauer}_\text {min}=- k_\text {B}T\Delta H({\mathcal {M}}_t)\) (marked by orange \(\times \)s in the figure) is still valid. However, it is a very weak bound for this time-symmetric protocol. The Landauer bound saturates at \(k_\text {B}T \ln 2\). Thus, the dissipated work—the gap between orange \(\times \)s and green \(+\)s—grows approximately linearly with increasing tilt energy.

In contrast, Eq. (6)’s bound \(\langle W \rangle _\text {min}^{t\text {-sym}}\) for time-symmetric protocols is much tighter. The time-symmetric bound is valid: marked by blue circles that all fall below the calculated work (green \(+\)s). Not only is this bound much stricter, but it almost exactly matches the calculated work for a large range of parameters, with the work only diverging for higher tilts and lower error rates.

Finally, the approximate bound \(\langle W \rangle _\text {min}^\text {approx} = \frac{ k_\text {B}T}{2}\ln \epsilon ^{-1}\) (marked by red \(+\)s) of Eq. (7), which captures the error scaling, behaves as expected. The error-dependent work bound nearly exactly matches the exact bound for low error rates on the right side of the plot and effectively bounds the work. For lower tilts, this quantity does not bound the work and is not a good estimate of the true bound, but this is consistent with expectations for high error rates. This approximation should only be employed for very reliable computations, for which it appears to be an excellent estimate. Thus, the two-level model of erasure demonstrates that the time-symmetric control bounds on work and dissipation are reasonable in both their exact and approximate forms at low error rates.

3.2 Erasure with an Underdamped Double-well Potential

The physics in the rate equations above represents a simple model of a bistable thermodynamic system, which can serve as an approximation for many different bistable systems. One possible interpretation is a coarse-graining of the Langevin dynamics of a particle moving in a double-well potential. To explore the broader validity of the error–dissipation tradeoff, here we simulate the dynamics of a stochastic particle coupled to a thermal environment at constant temperature and a work reservoir via such a 1D potential. Again, we find that the time-symmetric bounds are much tighter than Landauer’s, reflecting the error–dissipation tradeoff of this control protocol class.

Consider a one-dimensional particle with position and momentum in an external potential and in thermal contact with the environment at temperature T. We consider a protocol architecture similar to that of Sect. 3.1, but with additional passive substages at the beginning middle and end: (i) hold the potential in the symmetric double-well form, (ii) positively tilt the potential, (iii) completely drop the potential barrier between the two wells, (iv) hold the potential while it is tilted with no barrier, (v) restore the original barrier, (vi) remove the positive tilt, restoring the original symmetric double-well, and (vii) hold the potential in this original form.

As a function of position q and time t, the potential then takes the form:

$$\begin{aligned} V(q, t) = a q^4 - b_0 b_f (t) q^2 + c_0 c_f(t) q ~, \end{aligned}$$

with constants \(a, b_0, c_0 > 0\). The protocol functions \(b_f(t)\) and \(c_f(t)\) evolve in a piecewise linear and time-symmetric manner according to Table 1, where \(t_0, t_1, \ldots , t_7 = 0, \tau /12, 3\tau /12, 5\tau /12, 7\tau /12, 9\tau /12, 11\tau /12, \tau \). The potential thus begins and ends in a symmetric double-well configuration with each well defining a memory state. During the protocol, though, the number of metastable regions is temporarily reduced to one. Figure 4 (top three panels) shows the protocol functions over time as well as the resultant potential function at key times for one such set of protocol parameters; see nondimensionalization in Appendix II. At any time, we label the metastable regions from most negative position to most positive the \(\text {L}\) state and, if it exists, the \(\text {R}\) state.

Table 1 Erasure protocol
Fig. 4
figure 4

Erasure via an underdamped double-well potential: Protocol functions b(t) (top panel, blue) and c(t) (second panel, orange) are symmetric in time, guaranteeing the potential function (third panel) to evolve symmetrically in time. Due to the spatial asymmetry in the potential over the majority of the protocol, however, erasure to state \(\text {L}\) (\(x<0\)) typically occurs, evidenced by the evolution of the system position for 100 randomly-chosen trajectories (bottom panel, black). The \(\text {L}\) and \(\text {R}\) states merge into one between times \(t_2\) and \(t_3\) and separate again between times \(t_4\) and \(t_5\). A single trajectory (bottom panel, green) shows the typical behavior of falling into the \(x<0\) region by time \(t_3\) and remaining there when the R state is reintroduced for the rest of the protocol

We simulate the motion of the particle with underdamped Langevin dynamics:

$$\begin{aligned} dq&= v dt \\ m dv&= - \left( \frac{\partial }{\partial q} V(q, t) + \lambda v \right) dt + \sqrt{2 k_\text {B}T\lambda }\, r(t) \sqrt{dt} ~, \end{aligned}$$

where \(\lambda \) is the coupling between the thermal environment and particle, m is the particle’s mass, and r(t) is a memoryless Gaussian random variable with \(\langle r(t) \rangle = 0\) and \(\langle r(t) r(t') \rangle = \delta (t-t')\). The particle is initialized to be in global equilibrium over the initial potential \(V(\cdot , 0)\). Figure 4 (bottom panel) shows 100 randomly-chosen resultant trajectories for a choice of process parameters.

The work done on a single particle over the course of the protocol with trajectory \(\bigl ( q(t) \bigr )_t\) is [5]:

$$\begin{aligned} W = \int _0^\tau dt \frac{\partial V(q, t)}{\partial t} \Big |_{q=q(t)} ~. \end{aligned}$$

Figure 5 shows the net average work over time for an erasure process, comparing it to (i) the Landauer bound, (ii) the exact bound of Eq. (6), and (iii) the approximate bound of Eq. (7). Notice that the final net average work lies above all three, as it should and that the time-symmetric bounds presented here are tighter than Landauer’s.

Fig. 5
figure 5

Average work in \(k_\text {B}T\) over time for an erasure (black). Calculated from the simulation-estimated values \(\epsilon _L\) and \(\epsilon _R\), Landauer’s bound is given by the dashed yellow line and our approximate and exact bounds (Eqs. (7) and (6)) are given in dashed red and blue lines, respectively

We repeat this comparison for an array of different parameters for the erasure protocol. As described in Appendix II, we vary features of the dynamics—including mass m, temperature T, coupling to the heat bath \(\lambda \), duration of control \(\tau \), maximum depth of the potential energy wells, and maximum tilt between the wells. Nondimensionalization reduces the relevant parameters to just four, allowing us to explore a broad swath of possible physical erasures with 735 different protocols. For each protocol, we simulate 100,000 trajectories to estimate the work cost and errors \(\epsilon _\text {R}\) and \(\epsilon _\text {L}\) of the operation.

Fig. 6
figure 6

Reference bound \(\langle W \rangle _\text {ref}^{t\text {-sym}}\) (blue line) lower bounds all of the shifted works \(\langle W \rangle _\text {shift}\) (green markers), often quite tightly. The approximate bound \(\langle W \rangle _\text {min}^{\text {approx}}\) (red dashed line) rapidly converges with decreasing error to \(\langle W \rangle _\text {ref}^{t\text {-sym}}\). Time-asymmetric protocols can do better, needing only to satisfy Landauer’s bound \(\langle W \rangle _\text {min}^\text {Landauer}\) (orange dotted line)

Figure 6 compares the work spent for each of the 735 erasure protocols to the sampled maximum error \(\epsilon = \max (\epsilon _\text {L},\epsilon _\text {R})\). Each protocol corresponds to a green cross, whose vertical position corresponds to the shifted work \(\langle W\rangle _\text {shift}\), which accounts for inhomogeneities in the error rate. Note that the exact bound \(\langle W \rangle _\text {min}^{t\text {-sym}}\) from Eq. (6) reduces to a simple relationship between work and error tolerance \(\epsilon \) when the errors are homogeneous \(\epsilon _\text {R}=\epsilon _\text {L}=\epsilon \):

$$\begin{aligned} \langle W\rangle ^{t\text {-sym}}_\text {ref}=\left( \frac{1}{2} -\epsilon \right) \ln \frac{1-\epsilon }{\epsilon } ~, \end{aligned}$$

which we plot with the blue curve in Fig. 6. The cost of inhomogeneities in the error is evaluated by the difference between this reference bound and the exact work bound. This cost is added to the calculated work for each protocol to determine the shifted work:

$$\begin{aligned} \langle W\rangle _\text {shift}=\langle W \rangle +\langle W\rangle ^{t\text {-sym}}_\text {ref}-\langle W\rangle ^{t\text {-sym}}_\text {min} ~, \end{aligned}$$

such that the vertical distance between \(\langle W \rangle _\text {shift}\) and \(\langle W \rangle _\text {ref}^{t\text {-sym}}\) in Fig. 6 gives the true difference \(\langle W \rangle - \langle W \rangle _\text {min}^{t\text {-sym}}\) between the average sampled work and exact bound for the simulated protocol.

Figure 6 shows that the shifted average works for all of the simulated protocols in green, including error bars, all lay above the reference work bound in blue. Thus, we see that all simulated protocols satisfy the bound \(\langle W \rangle \ge \langle W \rangle _\text {min}^{t\text {-sym}}\). Furthermore, many simulated protocols end up quite close to their exact bound. There are protocols with small errors, but they have larger average works. The error–dissipation tradeoff is clear.

The error–dissipation tradeoff is further illustrated in Fig. 6 by the red line, which describes the low-\(\epsilon \) asymptotic bound \(\langle W \rangle _\text {min}^\text {approx}\) given by Eq. (7). In this semi-log plot, it rather quickly becomes an accurate approximation for small error.

Finally, Fig. 6 plots the Landauer bound \(\langle W \rangle _\text {min}^\text {Landauer}\) as a dotted orange line. It is calculated using the final probability of the \(\text {R}\) mesostate. The bound is weaker than that set by \(\langle W \rangle _\text {ref}^{t\text {-sym}}\). As \(\epsilon \rightarrow 0\), the gap between \(\langle W \rangle _\text {ref}^{t\text {-sym}}\) and \(\langle W \rangle _\text {min}^\text {Landauer}\) in Fig. 6 relentlessly increases. The stark difference in the energy scale of the time-symmetric bounds developed here and that of the looser Landauer bound shows a marked tightening of thermodynamic bounds on computation.

Notably, the protocol Landauer originally proposed to erase a bit requires significantly more work than his bound \(k_\text {B}T \ln 2\) to reliably erase a bit. This extra cost is a direct consequence of his protocol’s time symmetry. It turns out that time-asymmetric protocols for bit erasure have been used in experiments that more nearly approach Landauer’s bound [45, 46]. Although, it is not clear to what extent time asymmetry was an intentional design constraint in their construction, since there was no general theoretical guidance until now for why time-symmetry or asymmetry should matter. Figures 3 and 6 confirm that Ref. [46]’s time-asymmetric protocol for bit erasure—where the barrier is lowered before the tilt, but then raised before untilting—is capable of reliable erasure that is more thermodynamically efficient than any time-symmetric protocol could ever be.

These underdamped simulations drive home the point that our bounds are independent of the details of the dynamics used for computation. Our results are very general in that regard. As long as the system starts metastable and is then driven by a time-symmetric protocol, the error–dissipation tradeoff quantifies the minimal dissipation that will be incurred (for a desired level of computational accuracy) by the time the system relaxes again to metastability.

4 Conclusion

We adapted Ref. [12]’s thermodynamic analysis of time-symmetric protocols to give a detailed analysis of the trade-offs between accuracy and dissipation encountered in erasing information.

Reference [12] showed that time symmetry and metastability together imply a generic error–dissipation tradeoff. The minimal work expected for a computation \({\mathcal {C}}\) is the average nonreciprocity. In the low-error limit—where the probability of error must be much less than unity (\(\epsilon \ll 1\))—the minimum work diverges according to:

$$\begin{aligned} \beta \langle W \rangle _\text {min}^\text {approx} = \bigl \langle \bigl [ \!\! \bigl [{\mathcal {C}}({\mathcal {C}}({\mathcal {M}}_0) ) \ne {\mathcal {M}}_0 \bigr ] \!\! \bigr ]\bigr \rangle _{{\mathcal {M}}_0} \ln (\epsilon ^{-1} ) \end{aligned}$$

Of all of this work, only the meager Landauer cost \(\Delta {\text {H}}({\mathcal {M}}_t)\), which saturates to some finite value as \(\epsilon \rightarrow 0\), can be thermodynamically recovered in principle. Thus, irretrievable dissipation scales as \(\ln (\epsilon ^{-1} )\). The reciprocity coefficient \( \bigl \langle \bigl [ \!\! \bigl [{\mathcal {C}}({\mathcal {C}}({\mathcal {M}}_0))\ne {\mathcal {M}}_0 \bigr ] \!\! \bigr ]\bigr \rangle _{{\mathcal {M}}_0}\) depends only on the deterministic computation to be approximated. This points out likely energetic inefficiencies in current instantiations of reliable computation. It also suggests that time-asymmetric control may allow more efficient computation—but only when time-asymmetry is a free resource, in contrast to modern computer architecture.

The results here verified these general conclusions for erasure, showing in detail how tight the bounds can be and, for high-reliability thermodynamic computing, how they overwhelm Landauer’s. It may be fruitful to explore the ideas behind our results in explicitly quantum, finite, and even zero-temperature systems. Refined versions of Landauer’s bound and other thermodynamic results can be obtained for such models [47, 48]. Also, explicit consideration of finite-time protocols can reveal efficiency advantages when treating ensembles of systems under majority-logic decoding [49,50,51]. Perhaps analogous refinements of the results presented here can be found as well.

Despite the almost universal focus on information erasure as a proxy for all of computing, we now see that there is a wide diversity of costs in thermodynamic computing. Looking to the future, these costs must be explored in detail if we are to design and build more capable and energy efficient computing devices. Beyond engineering and sustainability concerns, explicating Landauer’s Stack will go a long way to understanding the fundamental physics of computation—one of Landauer’s primary goals [52]. In this way, we now better appreciate the suite of thermodynamic costs—what we called Landauer’s Stack—that underlies modern computing.