Paper The following article is Open access

Thermodynamics and the structure of quantum theory

, , and

Published 19 April 2017 © 2017 IOP Publishing Ltd and Deutsche Physikalische Gesellschaft
, , Citation Marius Krumm et al 2017 New J. Phys. 19 043025 DOI 10.1088/1367-2630/aa68ef

1367-2630/19/4/043025

Abstract

Despite its enormous empirical success, the formalism of quantum theory still raises fundamental questions: why is nature described in terms of complex Hilbert spaces, and what modifications of it could we reasonably expect to find in some regimes of physics? Here we address these questions by studying how compatibility with thermodynamics constrains the structure of quantum theory. We employ two postulates that any probabilistic theory with reasonable thermodynamic behaviour should arguably satisfy. In the framework of generalised probabilistic theories, we show that these postulates already imply important aspects of quantum theory, like self-duality and analogues of projective measurements, subspaces and eigenvalues. However, they may still admit a class of theories beyond quantum mechanics. Using a thought experiment by von Neumann, we show that these theories admit a consistent thermodynamic notion of entropy, and prove that the second law holds for projective measurements and mixing procedures. Furthermore, we study additional entropy-like quantities based on measurement probabilities and convex decomposition probabilities, and uncover a relation between one of these quantities and Sorkin's notion of higher-order interference.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Quantum mechanics has existed for about 100 years now, but despite its enormous success in experiment and application, the meaning and origin of its counterintuitive formalism is still widely considered to be difficult to grasp. Many attempts to put quantum mechanics on a more intuitive footing have been made over the decades, which includes the development of a variety of interpretations of quantum physics (such as the many-worlds interpretation [1], Bohmian mechanics [2], QBism [3], and many others [4]), and a thorough analysis of its departure from classical physics (as in Bell's theorem [5] or in careful definitions of notions of contextuality [6]). In more recent years, researchers, mostly coming from and inspired by the field of quantum information processing (early examples include [21, 22, 51]), have taken as a starting point the set of all probabilistic theories. Quantum theory is one of them and can be uniquely determined by specifying some of its characteristic properties [53] (as in e.g. [19, 43, 51, 54, 55, 5761]).

While the origins of this framework date back at least to the 1960s [15, 16, 18], it was the development of quantum information theory with its emphasis on simple operational setups that led to a new wave of interest in 'generalised probabilistic theories' (GPTs) [51, 52]. This framework turned out to be very fruitful for fundamental investigations of quantum theory's information-theoretic and operational properties. For example, GPTs make it possible to contrast quantum information theory with other possible theories of information processing, and in this way to gain a deeper understanding of its characteristic properties in terms of computation or communication.

In a complementary approach, there has been a wave of attempts to find simple physical principles that single out quantum correlations from the set of all non-signalling correlations in the device-independent formalism [70]. These include non-trivial communication complexity [71], macroscopic locality [72], or information causality [73]. However, none of these principles so far turns out to yield the set of quantum correlations exactly. This led to the discovery of 'almost quantum correlations' [75] which are more general than those allowed by quantum theory, but satisfy all the aforementioned principles. Almost quantum correlations seem to appear naturally in the context of quantum gravity [77].

A relation to other fields of physics can also be drawn from information causality, which can be understood as the requirement that a notion of entropy [6669] exists which has some natural properties like the data-processing inequality [74]. These emergent connections to entropy and quantum gravity are particularly interesting since they point to an area of physics where modifications of quantum theory are well-motivated: Jacobson's results [78] and holographic duality [79] relate thermodynamics, entanglement, and (quantum) gravity, and modifying quantum theory has been discussed as a means to overcome apparent paradoxes in black-hole physics [80].

While GPTs provide a way to generalise quantum theory and to study more general correlations and physical theories, they still leave open the question as to which principles should guide us in applying the GPT formalism for this purpose. The considerations above suggest taking, as a guideline for such modifications, the principle that they support a well-behaved notion of thermodynamics. As A Einstein [32] put it,

'A theory is the more impressive the greater the simplicity of its premises, the more different kinds of things it relates, and the more extended its area of applicability. Therefore the deep impression that classical thermodynamics made upon me. It is the only physical theory of universal content which I am convinced will never be overthrown, within the framework of applicability of its basic concepts.'

Along similar lines, A Eddington [33] argued that 'The law that entropy always increases holds, I think, the supreme position among the laws of Nature. If someone points out to you that your pet theory of the Universe is in disagreement with Maxwell's equations—then so much the worse for Maxwell's equations. If it is found to be contradicted by observation—well, these experimentalists do bungle things sometimes. But if your theory is found to be against the second law of thermodynamics I can give you no hope; there is nothing for it but to collapse in deepest humiliation.'

Here we take this point of view seriously. We investigate what kinds of probabilistic theories, including but not limited to quantum theory, could peacefully coexist with thermodynamics. We present two postulates that formalise important physical properties which can be expected to hold in any such theory. On the one hand, these two postulates allow for a class of theories more general than quantum or classical theory, which thus describes potential alternative physics consistent with important parts of thermodynamics as we know it. Indeed, by considering a thought experiment originally conceived by von Neumann, we show that these theories all give rise to a unique, consistent form of thermodynamical entropy. Furthermore, we show that this entropy satisfies several other important properties, including two instances of the second law. On the other hand, we show that these postulates already imply many structural properties which are also present in quantum theory, for example self-duality and the existence of analogues of projective measurements, observables, eigenvalues and eigenspaces.

In summary, our analysis shows that important structural aspects of quantum and classical theory are already implied by these aspects of thermodynamics, but on the other hand it suggests that there is still some 'elbow room' for modification within these limits dictated by thermodynamics.

Thermodynamics in GPTs has been considered in some earlier works. In [35, 36], the authors introduced a notion of (Rényi-2-)entanglement entropy, and studied the phenomenon of thermalisation by entanglement [3739] and the black-hole information problem (in particular the Page curve [40]) in generalisations of quantum theory. Hänggi and Wehner [46] have related the uncertainty principle to the second law in the framework of GPTs. Chiribella and Scandolo ([45, 47], see also [48]) have considered the notion of diagonalization and majorization in general theories, leading to a resource-theoretic approach to thermodynamics in GPTs. There are various connections between their results and ours, but there are essential differences. In particular, they assume the purification postulate (which is arguably a strong assumption that in particular excludes classical thermodynamics), whereas we are not making any assumption on composition of systems whatsoever, and in this sense work in a more general framework. Furthermore, while Chiribella and Scandolo take a resource-theoretic approach motivated by quantum information theory, our analysis relies on a more traditional thermodynamical thought experiment (namely von Neumann's). We presented results related to some of those in the present paper in the conference proceedings [31]; here we use different assumptions and obtain additional results.

Our paper is organised as follows. We start with an overview of the framework of GPTs. Then we present von Neumann's thought experiment on thermodynamic entropy, and a modification of it due to Petz [42]. Although it relies on very mild assumptions, it already rules out all theories that admit a state space known as the gbit or squit (a square-shaped state space that can be used to describe one of the two local subsytems of a composite system known as the PR-box [83], exhibiting stronger-than-quantum correlations). Then we present our two postulates, and show that they imply many structural features of quantum theory. We show that theories that satisfy both postulates behave consistently in von Neumann's thought experiment and admit a notion of thermodynamic entropy which satisfies versions of the second law.

Because entropies are an important bridge between information theory and thermodynamics, in the final section we investigate the consequences of our postulates for generalisations of quantities of known significance in quantum thermodynamics [30], defined by applying Rényi entropies to probabilities in convex decompositions of a state, or of measurements made on a state. In particular, we show a relation between max-entropy and Sorkin's notion of higher-order interference [76]: equality of the preparation and measurement based max-entropies implies the absence of higher-order interference. Most proofs are deferred to the appendix. Several results of this paper have been announced in the Master Thesis of one of the authors [34].

2. The mathematical framework

Our results are obtained in the framework of GPTs [51, 52, 55, 85, 88]. The goal of this framework is to capture all probabilistic theories, i.e. all theories that use states to make predictions for probabilities of measurement outcomes. Although the framework is based on very weak and natural assumptions, we can only provide a short introduction of the main notions and results here. For more detailed explanations of the framework, see e.g. [34, 51, 52, 55, 86, 87]. The framework contains quantum theory and also the application of probability theory to classical physics, often referred to as classical probability theory, as special cases. It also contains theories which differ substantially from classical or quantum probability theory, for example boxworld [52], which allows superstrong nonlocality, and theories that allow higher-order interference [76].

A central notion is that of the state and the set of states, the state space ${{\rm{\Omega }}}_{A}$. A state contains all information necessary to calculate all probabilities for all outcomes of all possible measurements. One possible and convenient representation would be to simply list the probabilities of a set of 'fiducial' measurement outcomes which is sufficient to calculate all outcome probabilities for all measurements [51, 52]. An example is given in figure 1.

Figure 1.

Figure 1. An example state space, A, modelling a so-called 'gbit' [52] which is often used to describe one half of a PR-box. The operational setup is depicted on the left, and the mathematical formulation is sketched on the right. An agent ('Alice') holds a black box ω into which she can input one bit, $a\in \{0,1\}$, and obtains one output, $x\in \{1,2\}$. The box is described by a conditional probability $p(x| a)$. In the GPT framework, ω becomes an actual state, i.e. an element of some state space Ω. Concretely, $\omega =(1,p(1| 0),p(1| 1))\in {{\mathbb{R}}}^{3}$, where the first entry 1 is used to describe the normalisation, $p(1| 0)+p(2| 0)=p(1| 1)+p(2| 1)$. In this case, all probabilities are allowed by definition, so that the state space Ω becomes the square, i.e. the points $(1,s,t)$ with $0\leqslant s,t\leqslant 1$. Alice's input a is interpreted as a 'choice of measurement', and the two measurements are ${e}_{x=1}^{(a=0)},{e}_{x=2}^{(a=0)}$ resp. ${e}_{x=1}^{(a=1)},{e}_{x=2}^{(a=1)}$ such that ${\sum }_{x=1}^{2}{e}_{x}^{(a)}(\omega )=1$ for all states $\omega \in {\rm{\Omega }}$. If we describe effects by vectors by using the standard inner product, we have, for example, ${e}_{x=1}^{(a=0)}=(0,1,0)$, since ${e}_{x=1}^{(a=0)}(\omega )=P(1| 0)=(0,1,0)\cdot \omega $. There are four pure states, labelled ${\omega }_{1},\ldots ,{\omega }_{4}$. Every pure state ${\omega }_{i}$ is perfectly distinguishable from every other pure state ${\omega }_{j}$ for $j\ne i$, but no more than two of them are jointly distinguishable in a single measurement. More generally, every state on one side of the square is perfectly distinguishable from every state on the opposite side. The unit effect is ${u}_{A}=(1,0,0)$.

Standard image High-resolution image

It is possible to create statistical mixtures of states: let us assume a black box device randomly prepares a state ${\omega }_{1}$ with probability p1 and a state ${\omega }_{2}$ with probability p2. In agreement with the representation of states as lists of probabilities and the law of total probability, the appropriate state to describe the resulting measurement statistics is $\omega ={p}_{1}{\omega }_{1}+{p}_{2}{\omega }_{2}$. This means that the state space ${{\rm{\Omega }}}_{A}$ is convex and is embedded into a real vector space A (to be described below). Due to the interpretation of states as lists of probabilities (which are between 0 and 1) we demand that ${{\rm{\Omega }}}_{A}$ is bounded. Any state that cannot be written as a convex decomposition of other states is called a pure state. As pure states cannot be interpreted as statistical mixtures of other states, they are also called states of maximal knowledge. Furthermore, there is no physical distinction between states that can be prepared exactly, and states that can be prepared to arbitrary accuracy. Thus, we also assume that ${{\rm{\Omega }}}_{A}$ is topologically closed. In order to not obscure the physics by the mathematical technicalities introduced by infinite dimensions, we will assume that A is finite-dimensional. Thus ${{\rm{\Omega }}}_{A}$ is compact. Consequently, every state can be obtained as a statistical mixture of finitely many pure states [89].

Furthermore, it turns out to be convenient to introduce unnormalised states ω, defined as the non-negative multiples of normalised states. They form a closed convex cone ${A}_{+}:= {{\mathbb{R}}}_{\geqslant 0}\cdot {{\rm{\Omega }}}_{A}$. For simplicity of description, we choose the vector space containing the cone of states to be of minimal dimension, i.e. $\mathrm{span}({A}_{+})=A$.

We introduce the normalisation functional ${u}_{A}\,:A\to {\mathbb{R}}$ which attains the value one on all normalised states, i.e. ${u}_{A}(\omega )=1$ for all $\omega \in {{\rm{\Omega }}}_{A}$. It is linear, non-negative on the whole cone, zero only for the origin, and $\omega \in {A}_{+}$ is an element of ${{\rm{\Omega }}}_{A}$ if and only if ${u}_{A}(\omega )=1$. The normalisation ${u}_{A}(\omega )$ can be interpreted as the probability of success of the preparation procedure. For states with ${u}_{A}(\omega )\lt 1$, the preparation succeeds with probability ${u}_{A}(\omega )$. The states with normalisation $\gt 1$ do not have a physical interpretation, but adding them allows us to take full advantage of the notion of cones from convex geometry.

Effects are functionals that map (sub)normalised states to probabilities, i.e. into $[0,1]$. To each measurement outcome we assign an effect that calculates the outcome probability for any state. Effects have to be linear for consistency with the statistical mixture interpretation of convex combinations of states. A measurement (with n outcomes) is a collection of effects ${e}_{1},\ldots ,{e}_{n}$ such that ${e}_{1}\,+\,\ldots \,+\,{e}_{n}={u}_{A}$. Its interpretation is that performing the measurement on some state $\omega \in {{\rm{\Omega }}}_{A}$ yields outcome i with probability ${e}_{i}(\omega )$.

A set of states ${\omega }_{1},\ldots ,{\omega }_{n}$ is called perfectly distinguishable if there exists a measurement ${e}_{1},\ldots ,{e}_{n}$ such that ${e}_{i}({\omega }_{j})={\delta }_{{ij}}$, that is, 1 if i = j and 0 otherwise. A collection of n perfectly distinguishable pure states is called an n-frame, and a frame is called maximal if it has the maximal number n of elements possible in the given state space. In quantum theory, for example, the maximal frames are exactly the orthonormal bases of Hilbert space. In more detail, a frame on an N-dimensional quantum system is given by ${\omega }_{1}=| {\psi }_{1}\rangle \langle {\psi }_{1}| ,\ldots ,| {\psi }_{N}\rangle \langle {\psi }_{N}| $, where $| {\psi }_{1}\rangle ,\ldots ,| {\psi }_{N}\rangle $ are orthonormal basis vectors.

Transformations are maps $T\,:A\to A$ that map states to states, i.e. $T({A}_{+})\subseteq {A}_{+}$. Similarly as effects, they also have to be linear in order to preserve statistical mixtures. They cannot increase the total probability, but are allowed to decrease it (as is the case, for example, for a filter), thus ${u}_{A}\circ T(\omega )\leqslant {u}_{A}(\omega )$ for all $\omega \in {A}_{+}$.

Instruments9 [84] are collections of transformations Tj such that ${\sum }_{j}{u}_{A}\circ {T}_{j}={u}_{A}$. If an instrument is applied to a state ω, one obtains outcome j (and post-measurement state ${T}_{j}(\omega )/{p}_{j}$) with probability ${p}_{j}:= {u}_{A}({T}_{j}(\omega ))$. Each instrument corresponds to a measurement given by the effects ${u}_{A}\circ {T}_{j}$. We will say it 'induces' this measurement.

The framework of GPTs does not assume a priori that all mathematically well-defined states, transformations and measurements can actually be physically implemented. Here, we will assume that a measurement constructed from physically allowed effects is also physically allowed. Moreover, we assume that the set of allowed effects has the same dimension as ${A}_{+}$, because otherwise there would be distinct states that could not be distinguished by any measurement.

3. von Neumann's thought experiment

The following thought experiment has been applied by von Neumann [41] to find a notion of thermodynamic entropy for quantum states ρ. The result turns out to equal von Neumann entropy, $H(\rho )=-\mathrm{tr}(\rho \mathrm{log}\rho )$. We apply the thought experiment to a wider class of probabilistic theories.

We adopt the physical picture used by von Neumann [41] to describe the thought experiment10 ; we will comment on some idealisations used in this model at the end of this section. We consider a GPT ensemble $[{S}_{1},\ldots ,{S}_{N}]$, where Si denotes the ith physical system, and Nj of the systems are in state ${\omega }_{j}$, where j = 1,...,n and ${\sum }_{j}{N}_{j}=N$. This ensemble is described by the state $\omega ={\sum }_{j=1}^{n}{p}_{j}{\omega }_{j}$, where ${p}_{j}={N}_{j}/N$, which describes the effective state of a system that is drawn uniformly at random from the ensemble.

We introduce N small, indistinguishable, hollow boxes11 , and we put each ensemble system Sj into one of the boxes such that the system is completely isolated from the outside. Furthermore, we assume that the boxes form an ideal gas, which will allow us to use the ideal gas laws in the following derivation. This gas will be called the ω-gas. We will denote the total thermodynamic entropy of a system by H, with a subscript which may indicate whether it is the total entropy of a gas, which potentially depends both on the states of the GPT systems in the boxes and on the classical degrees of freedom (positions, momenta) of the boxes, or just the entropy of the GPT or of the classical degrees of freedom.

At first we need to investigate how the entropy of the gas and of the ensemble are related to each other because later on, we will only consider the gas. So we consider also a second GPT ensemble $[{S}_{1}^{\prime },\ldots ,{S}_{N}^{\prime }]$ (described by $\omega ^{\prime} \in {{\rm{\Omega }}}_{A}$) implanted into a gas the same way. At temperature T = 0, the movement of the boxes freezes out and we are left with the GPT ensembles. In this case, the thermodynamic entropies of the gases and the GPT ensembles must satisfy: ${H}_{\omega \mbox{-} \mathrm{gas}}-{H}_{\omega ^{\prime} \mbox{-} \mathrm{gas}}={H}_{\omega \mbox{-} \mathrm{ensemble}}-{H}_{\omega ^{\prime} \mbox{-} \mathrm{ensemble}}$. Remember that the heat capacity is $C=\delta Q/{\rm{d}}T$, and as the gases only differ in their internal systems, which are isolated, C is the same for both gases. With ${\rm{d}}H=\delta Q/T$ we thus find that ${H}_{\omega \mbox{-} \mathrm{gas}}-{H}_{\omega ^{\prime} \mbox{-} \mathrm{gas}}$ is constant in T, i.e. ${H}_{\omega \mbox{-} \mathrm{gas}}-{H}_{\omega ^{\prime} \mbox{-} \mathrm{gas}}={H}_{\omega \mbox{-} \mathrm{ensemble}}-{H}_{\omega ^{\prime} \mbox{-} \mathrm{ensemble}}$ for all temperatures.

The central tool for the thought experiment is a semi-permeable membrane. Whenever a box reaches the membrane, the membrane opens that box and measures the internal system. Depending on the result, a window is opened to let the box pass, or the window remains closed. It is crucial to note that this membrane will not cause problems in the style of Maxwell's demon, as was already discussed by von Neumann himself, because the membrane does not distinguish between its two sides.

Now we begin with the experiment itself; see figure 2. We consider a state $\omega ={\sum }_{j=1}^{n}{p}_{j}{\omega }_{j}$ where ${\omega }_{j}$ are perfectly distinguishable pure states, and ${p}_{j}={N}_{j}/N$, where Nj boxes contain a system in the state ${\omega }_{j}$. We assume that the ω-gas is confined in a container of volume V. Let there be a second container which is identical to the first one, but empty. The containers are merged together, the wall of the non-empty container separating the containers replaced by a semi-permeable membrane which lets only ${\omega }_{1}$ pass. At the opposite wall of the non-empty container we insert a semi-permeable membrane which only blocks ${\omega }_{1}$. The solid wall in the middle and the outer semi-permeable membrane are moved at constant distance until the solid wall hits the other end.

Figure 2.

Figure 2. This figure shows von Neumann's thought experiment, as described in the main text. Stages (1)–(5) also feature in Petz' version.

Standard image High-resolution image

Once this is accomplished, i.e. in stage (4) in figure 2, one container has all ${\omega }_{1}$-boxes and the other one contains all the rest. Note that this procedure is possible without performing any work as can be seen via Dalton's Law [90]: the work needed to push the semi-permeable membrane against the ${\omega }_{1}$-gas can be recollected at the other side from the moving solid wall, which is pushed by the ${\omega }_{1}$-gas into empty space. Thus we have separated the ${\omega }_{1}$-boxes from the rest. We repeat a similar procedure until all the ${\omega }_{j}$-gases are separated into separate containers of volume V.

Next we compress the containers isothermally to the volumes ${p}_{j}V$, respectively. Denoting the pressure by P, and using the ideal gas law, we obtain the required work

where log denotes the natural logarithm. As the temperature and thus the internal energy remain constant, we extract heat ${{Nk}}_{B}T{\sum }_{j}{p}_{j}\mathrm{log}{p}_{j}$.

At this point, we have achieved that every container contains a pure state ${\omega }_{j}$. We now transform every ${\omega }_{j}$ to another pure state $\omega ^{\prime} $ which we choose to be the same for all containers. This is achieved by opening the boxes and applying a reversible transformation Tj in every container j which satisfies ${T}_{j}{\omega }_{j}=\omega ^{\prime} $. These transformations exist due to postulate 1. Since the same transformation Tj is applied to all small boxes in any given container j (without conditioning on the content of the small box), this operation is thermodynamically reversible.

Now we merge the containers, ending with a pure $\omega ^{\prime} $-gas in the same condition as the initial ω-gas. This merging is reversible, because the density is not changed and because all states are the same, so one can just put in the walls again. The only step that caused an entropy difference was the isothermal compression. Thus, the difference of the entropies between the ω-gas and the $\omega ^{\prime} $-gas (which are equal to the entropies of the respective GPT ensembles) is ${{Nk}}_{B}{\sum }_{j}{p}_{j}\mathrm{log}{p}_{j}$. Therefore ${H}_{\omega \mbox{-} \mathrm{ensemble}}={H}_{\omega ^{\prime} \mbox{-} \mathrm{ensemble}}-{{Nk}}_{B}{\sum }_{j}{p}_{j}\mathrm{log}{p}_{j}$. If we assume that pure states have entropy zero, we thus end up with

Equation (1)

and with the following entropy per system of the ensemble:

Equation (2)

In summary, we have made the following assumptions to arrive at this notion of thermodynamic entropy:

Assumptions 1. 

  • (a)  
    Every (mixed) state can be prepared as an ensemble/statistical mixture of perfectly distinguishable pure states.
  • (b)  
    A measurement that perfectly distinguishes those pure states can be implemented as a semi-permeable membrane, which in particular does not disturb the pure states that it distinguishes.
  • (c)  
    All pure states can be reversibly transformed into each other.
  • (d)  
    Thermodynamical entropy H is continuous in the state. (Since ensembles must have rational coefficients ${p}_{j}={N}_{j}/N$, we need this to approximate arbitrary states in the thought experiment.)
  • (e)  
    All pure states have entropy zero.

A generalised version of the thought experiment presented by Petz [42] is applicable to more general decompositions: suppose that ${\omega }_{1},\ldots ,{\omega }_{n}\in {{\rm{\Omega }}}_{A}$ are perfectly distinguishable, but not necessarily pure. Let ${p}_{1},\ldots ,{p}_{n}$ be a probability distribution. Then Petz' thought experiment implies that

Equation (3)

The main idea is that steps (1)–(5) of von Neumann's thought experiment can be run even if the perfectly distinguishable states ${\omega }_{1},\ldots ,{\omega }_{n}$ are mixed and not pure (as long as the membrane will still keep them undisturbed). Then the entropy of the state in (5) can be computed by making an additional extensivity assumption: denote the GPT entropy of an ω-ensemble of N particles in a volume V by ${H}_{\omega \mbox{-} \mathrm{ensemble}}(N,V)$, then this assumption is that

for $\lambda \geqslant 0$. Assuming in addition that the entropy of the n containers adds up, the total entropy of the configuration in step (5) is $N{\sum }_{j}{p}_{j}H({\omega }_{j})$, from which Petz obtains (3). While this approach needs this additional extensivity assumption, it does not need to postulate that all pure states can be reversibly transformed into each other (in contrast to von Neumann's version). Under the assumption that all pure states have entropy zero, it reproduces equation (2) as a special case.

We conclude this section with a few comments on the idealisations used in the thought experiments above. The use of gases in which the exact numbers of particles with each internal state is known parallels von Neumann's argument in [41]. We rarely if ever have such precise knowledge of particle numbers in real physical gases, so our argument involves a strong idealisation, but one that is common in thermodynamics and that has also been made by von Neumann12 .

Although fluctuations in work are significant for small particle numbers, in the thermodynamic limit of large numbers of particles there is concentration about the expected value given, in von Neumann's protocol, by the von Neumann entropy, and therefore our arguments (and von Neumann's) have the most physical relevance in this large-N situation. This is of course true for classical thermodynamics as well—indeed, the use made of the ideal gas law and Dalton's law in von Neumann's argument are additional places where large N is needed if one wants fluctuations to be negligible. We expect finer-grained considerations to be required for a thorough study of fluctuations in finite systems, which is one reason for interest in the additional entropic measures studied in section 5.6, but von Neumann's argument does not concern these finer-grained aspects of the thermodynamics of finite systems.

4. Why the 'gbit' is ruled out

In section 2, we have introduced the 'gbit', a system for which the state space Ω is a square. Gbits are particularly interesting because they correspond to 'one half' of a Popescu–Rohrlich box [83] which exhibits correlations that are stronger than those allowed by quantum theory [70]. One might wonder whether the thought experiments of section 3 allow us to define a notion of thermodynamic entropy for the gbit. We will now show that this is not the case, which can be seen as a thermodynamical argument for why we do not see superstrong correlations of the Popescu–Rohrlich type in our universe.

Since not all states of a gbit can be written as a mixture of perfectly distinguishable pure states, von Neumann's original thought experiment cannot be of direct use here. However, we may resort to Petz' version: every mixed state ω of a gbit can be written as a mixture of perfectly distinguishable mixed states, as illustrated in figure 3. Furthermore, the other crucial assumption on the state space is satisfied, too: for every pair of perfectly distinguishable mixed states, there is an instrument (a 'membrane') that distinguishes those states without disturbing them. We even have that all pure states can be reversibly transformed into each other (namely by a rotation of the square).

Figure 3.

Figure 3. In an attempt to define a notion of thermodynamic entropy for the gbit, we can decompose any state into perfectly distinguishable states. This is done in two steps, as explained in the main text.

Standard image High-resolution image

Thus, we can analyse the behaviour of a gbit state space in Petz' version of the thought experiment. Any continuous notion of thermodynamic entropy H consistent with this thought experiment would thus have to satisfy (3). However, we will now show that the gbit does not admit any notion of entropy that satisfies (3). Consider different decompositions of the state $\omega =\tfrac{1}{2}{\omega }_{a}+\tfrac{1}{2}{\omega }_{b}$ in the centre of the square, where ${\omega }_{a}=p{\omega }_{1}+(1-p){\omega }_{2}$ as well as ${\omega }_{b}=p{\omega }_{3}+(1-p){\omega }_{4}$. It is geometrically clear that every choice of $0\lt p\lt 1$ corresponds to a valid decomposition. We find (applying equation (3) to ω for the first equality, and to ${\omega }_{a}$ and ${\omega }_{b}$ for the second):

This expression can never be constant in p, no matter what value of entropy of the four pure states $H({\omega }_{i})$ we assume. Thus, the entropy $H(\omega )$ of the centre state ω is not well-defined, since it depends on the choice of decomposition.

In other words, the structure of the gbit state space enforces that any meaningful notion of thermodynamic entropy H will not only be a function of the state, but a function of the ensemble that represents the state. If a state ω is represented by different ensembles, then this will in general give different values of entropy.

So what goes wrong for the gbit? Clearly, all we can say with certainty is that the combination of assumptions made in von Neumann's thought experiment turns out not to yield a unique notion of entropy, while a deeper physical interpretation seems only possible under further assumptions on the interplay between the gbit and the thermodynamic operations. However, a comparison with quantum theory motivates at least one further speculative attempt at interpretation. In the example above, we have decomposed a state ω into two perfectly distinguishable states ${\omega }_{a}$ and ${\omega }_{b}$, which can themselves be decomposed into pairs of perfectly distinguishable states ${\omega }_{1}$ and ${\omega }_{2}$, or ${\omega }_{3}$ and ${\omega }_{4}$ respectively. In quantum theory, this would only be possible if ${\omega }_{a}$ and ${\omega }_{b}$ are orthogonal, which would then imply that all four states ${\omega }_{1},\ldots ,{\omega }_{4}$ are pairwise orthogonal. This would enforce that there exists a unique projective measurement (a 'membrane') that distinguishes all these four states jointly. This membrane could feature in von Neumann's thought experiment (or other similar thermodynamical settings), yielding a unique notion of thermodynamic entropy.

On the other hand, in the gbit, the four pure states ${\omega }_{1},\ldots ,{\omega }_{4}$ are not jointly perfectly distinguishable. Hence there is no canonical choice of 'membrane' that could be used in the thought experiment to define a unique natural notion of entropy for the gbit states. Entropy will be 'contextual', depending on the choice of membrane resp. ensemble decomposition that is used in any given specific thermodynamical setting. Therefore, the implication 'pairwise distinguishability $\Rightarrow $joint distinguishability', which is true for quantum theory, has thermodynamic relevance. This implication, if suitably interpreted, leads to the 'exclusivity principle' [7, 8, 91], namely that the sum of the probabilities of pairwise exclusive propositions cannot exceed 1 (in this case these propositions correspond to the outcomes of the jointly distinguishing measurement). This suggests that the exclusivity principle, which has so far been considered only in the realm of contextuality, may be thermodynamically relevant. This observation is also closely related to the notion of 'dimension mismatch' described in [82], and to orthomodularity in quantum logic (see for example [23]).

5. A class of theories with consistent thermodynamic behaviour

5.1. The two postulates

In this section we introduce the two postulates that express key operational concepts from thermodynamics. The first postulate is motivated by the universality of thermodynamics and the distinction between microscopic and macroscopic behaviour. At first we consider the universality of thermodynamics, in the sense that thermodynamics is a very general theory whose basic principles can be applied to many possible implementations, as already noticed by N Carnot [44]:

'In order to consider in the most general way the principle of the production of motion by heat, it must be considered independently of any mechanism or any particular agent. It is necessary to establish principles applicable not only to steam engines but to all imaginable heat-engines, whatever the working substance and whatever the method by which it is operated.'

Recalling von Neumann's thought experiment in the case of quantum theory, we can think of thermodynamical protocols (which will ultimately also include heat engines) as acting on a given ensemble, defined as a probabilistic mixture of pure states chosen from a fixed basis. If we interpret ensembles with different choices of basis as different 'working substances', then Carnot's principle should apply: protocols that can be implemented on one ensemble (say, ensemble 1) can also be implemented on the other (say, ensemble 2)13 . In quantum theory, this universality is ensured by the existence of unitary transformations: all orthonormal bases can be translated into each other by a unitary and therefore reversible map. In this sense, the state of ensemble 1 can in principle be transferred to ensemble 2, then the thermodynamic protocol of ensemble 2 can be performed (if we have also transformed the projectors describing the membranes accordingly), and then one can transform back. Even if this cannot always be achieved in practice, the corresponding unitary symmetry of the quantum state space (considered as passive transformations between different descriptions) enforces the aforementioned universality14 .

This universality of implementation, as well as independence of the choice of labels and descriptions, should continue to hold in all generalised theories that we consider. An orthonormal basis from quantum theory is nothing else than a set of perfectly distinguishable pure states, i.e. an n-frame. Therefore, in our generalised theories, we expect that this universality of implementation is achieved by the existence of reversible transformations that, in analogy to unitary maps, transform any given n-frame into any other:

Postulate 1. For each $n\in {\mathbb{N}}$, all sets of n perfectly distinguishable pure states are equivalent. That is, if $\{{\omega }_{1},\ldots ,{\omega }_{n}\}$ and $\{{\varphi }_{1},\ldots ,{\varphi }_{n}\}$ are two such sets, then there exists a reversible transformation T with $T{\omega }_{j}={\varphi }_{j}$ for all j.

Furthermore, postulate 1 expresses a physical property that is crucial for thermodynamics: that of microscopic reversibility. Many characteristic properties of thermodynamics arise from limited experimental access to the microscopic degrees of freedom, which by themselves undergo reversible time evolution. This reversibility, for example, forbids evolving two microstates into one, which is at the heart of the non-decrease of entropy. If the experimenter had full access to the microscopic degrees of freedom, then he or she could convert any state of maximal knowledge to any other one as long as he or she preserved distinguishability. Postulate 1 formalises this microscopic basis of thermodynamics by demanding the existence of 'enough' distinguishability-preserving, microscopic transformations T, which can be understood as reversible time evolutions.

Postulate 1 has substantial information-theoretical justifications and consequences. The basic concepts of both thermodynamics and information processing are independent of the choice of implementation. For information processing this is formalised by the Turing machine which admits a multitude of physical realisations. Perfectly distinguishable pure states can be taken as bits, and postulate 1 expresses that all bits (or their higher-dimensional analogues) are equivalent. It is for this reason that postulate 1 was called generalised bit symmetry in [34], and its restriction to pairs of distinguishable states was called bit symmetry in [64]. Starting with Landauer's principle, 'thermodynamics of computation' [92] has become a fruitful paradigm that relates the two apparently disjoint fields. The two complementary interpretations of postulate 1 are one instance of this.

Now we turn to our second postulate. We are looking for theories very similar to the thermodynamics we are used to; thus it is essential that we can adopt basic notions of standard thermodynamics unchanged or with only very small alterations. Two such notions of great importance are (Shannon) entropy $S=-{k}_{B}{\sum }_{j}{p}_{j}\mathrm{log}{p}_{j}$ and majorization theory. In classical and quantum thermodynamics, these notions operate on the coefficients in a decomposition of a state into perfectly distinguishable pure states (in quantum theory, the eigenvalues). In order to not change thermodynamic theory too much, we would also like this to be possible in our more general state spaces. Thus, we demand that every state has a convex decomposition into perfectly distinguishable pure states.

Note that this was indeed one of our assumptions in von Neumann's thought experiment in section 3. There, it allowed us to realise any state ω as a 'quasiclassical ensemble', i.e. as an ensemble of states that behave like classical labels. This gives us a further justification of our second postulate: thermodynamic (thought) experiments require that states have an ensemble interpretation. An unambiguous notion of 'counting of microstates' demands that the ensembles consist of perfectly distinguishable, pure states. Without this, obtaining a phenomenological thermodynamics for which the theory is the underlying microscopic theory seems problematic. Thus, our second postulate is

Postulate 2. Every state $\omega \in {{\rm{\Omega }}}_{A}$ has a convex decomposition $\omega ={\sum }_{j}{p}_{j}{\omega }_{j}$ into perfectly distinguishable pure states ${\omega }_{j}$.

It is tempting to interpret the two postulates as reflecting the microscopic and the macroscopic aspects of thermodynamics, respectively: while postulate 1 describes microscopic reversibility of the pure states that may describe single particles in thermodynamics, postulate 2 ensures that mixed states can be interpreted macroscopically as descriptions of quasiclassical ensembles, composed of a large number of particles that are separately in unknown but distinguishable microstates.

We will not introduce any further postulates. In particular, we will not make any assumptions on the composition of systems. All our results are therefore independent from notions like tomographic locality [51] (which is arguably dispensable in many important situations [81]) or purification [56] (which is a rather strong assumption); we do not assume either of the two.

5.2. Some consequences of postulates 1 and 2

Postulates 1 and 2 have been analysed in [43], but in a different context: instead of investigating thermodynamics, the goal in [43] was to obtain a reconstruction of quantum theory, by supplementing postulates 1 and 2 with further postulates. Some of the insights from [43] will be important here, and are therefore briefly discussed below. Starting with section 5.4, we will also obtain new results that are interesting in a thermodynamic context.

In contrast to Hilbert space, there is no apriori notion of inner product for GPTs. However, as shown in [64], we get a natural inner product $\langle \cdot ,\,\cdot \rangle $ as a consequence of postulates 1 and 2: it satisfies $\langle T\omega ,T\varphi \rangle =\langle \omega ,\varphi \rangle $ for all reversible transformations T, and $0\leqslant \langle \omega ,\varphi \rangle \leqslant 1$ for all states $\omega ,\varphi \in {{\rm{\Omega }}}_{A}$. Furthermore, $\langle \omega ,\omega \rangle =1$ for all pure $\omega \in {{\rm{\Omega }}}_{A}$ and $\langle \varphi ,\varphi \rangle \lt 1$ for all mixed $\varphi \in {{\rm{\Omega }}}_{A}$, and $\langle \omega ,\varphi \rangle =0$ if $\omega ,\varphi \in {{\rm{\Omega }}}_{A}$ are perfectly distinguishable. Thus, all perfectly distinguishable states are orthogonal, as in quantum theory.

Moreover, the cone of unnormalized states becomes self-dual with this choice of inner product. In particular, every effect e can be taken as a vector in ${A}_{+}$, such that $e(\omega )=\langle e,\omega \rangle $. In standard quantum theory, this is the Hilbert Schmidt inner product on the real vector space of Hermitian matrices: $\langle X,Y\rangle =\mathrm{tr}({XY})$ for $X={X}^{\dagger }$, $Y={Y}^{\dagger }$.

Quantum theory has more structure: the convex set of density matrices ${{\rm{\Omega }}}_{A}$ has faces15 , and these faces are in one-to-one correspondence to subspaces of Hilbert space (namely, a face F contains all density matrices that have support on the corresponding Hilbert subspace). To every face F, we can associate a number $| F| $ which is the dimension of the corresponding Hilbert subspace, and $F\subsetneq G$ implies $| F| \lt | G| $. Every face F is generated by $| F| $ pure and perfectly distinguishable states in F (an $| F| $-frame in F), and every (smaller) frame that is a subset of F can be completed, or extended, to a frame which has $| F| $ elements and thus generates F.

In all theories that satisfy postulates 1 and 2, all these properties hold in complete analogy [43]. However, since faces do not any more correspond to Hilbert spaces, the numbers $| F| $ do not have an interpretation as the dimension of a subspace. Instead, we call $| F| $ the rank of F. If von Neumann's thought experiment is supposed to make sense for these theories, we need a way to formalise the working of a semipermeable membrane, which in quantum theory is done via projective measurements.

Since we are dealing with unnormalized states, the corresponding analogue in GPTs will be formulated in terms of the set of unnormalized states ${A}_{+}$. As one can see in the case of the gbit, it is not automatic that we have any notion of 'projective measurements' for any given state space. However, postulates 1 and 2 turn out to ensure that projective measurements exist. For any face F of ${A}_{+}$ (the non-negative multiples of the corresponding face of ${{\rm{\Omega }}}_{A}$), consider the orthogonal projector PF onto the span of F. One can show that PF is positive, i.e. maps (unnormalized) states to (unnormalized) states [43]. Moreover, PF does not disturb the states in the face F.

Thus, to a given set of mutually orthogonal faces ${F}_{1},\ldots ,{F}_{m}$ such that $| {F}_{1}| \,+\text{}\ldots \,+\,| {F}_{m}| ={N}_{A}$, we can associate an instrument with transformations ${T}_{i}:= {P}_{{F}_{i}}$, which describes a projective measurement, as in a semipermeable membrane. Transformation Ti leaves the states in face Fi unperturbed, but fully blocks out states in the other faces, i.e. ${T}_{i}\omega =0$ for $\omega \in {F}_{j}$, $i\ne j$. In standard quantum theory, these transformations are ${P}_{{F}_{i}}\rho ={\pi }_{i}\rho {\pi }_{i}$, where ${\pi }_{i}$ is the orthogonal Hilbert space projector onto the ith Hilbert subspace. The rank condition becomes $\mathrm{tr}({\pi }_{1})+\,\ldots \,+\,\mathrm{tr}({\pi }_{m})={N}_{A}$ (the total Hilbert space dimension), and mutual orthogonality is ${\pi }_{i}{\pi }_{j}={\delta }_{{ij}}{\pi }_{i}$. We will show in section 5.4 that the mutually orthogonal faces replace the eigenspaces from quantum theory and that the projective measurement described here can be interpreted as measuring an observable.

The Hilbert space projector ${\pi }_{i}$ therefore also has an interpretation as an effect in standard quantum theory: it yields the probability of outcome i in the projective measurement on a state ρ, namely $\mathrm{tr}({\pi }_{i}\rho )$. The analogous effect in a GPT that satisfies postulates 1 and 2, corresponding to a face F, is

(identifying the effect uA with a vector via the inner product). The effect uF is sometimes called the 'projective unit' of F. In quantum theory, we can write ${\pi }_{i}={\sum }_{j}| {\psi }_{j}\rangle \langle {\psi }_{j}| $, where the $| {\psi }_{j}\rangle $ are an orthonormal basis of the corresponding Hilbert subspace. The same turns out to be true in our GPTs: we have

Equation (4)

where ${\omega }_{1},\ldots ,{\omega }_{| F| }$ is any frame that generates F. Therefore, the probability to obtain outcome i in the projective measurement above on state ω is $\langle {u}_{{F}_{i}},\omega \rangle =\langle {u}_{A},{P}_{{F}_{i}}\omega \rangle $.

5.3. State spaces satisfying postulates 1 and 2

It is easy to see that both quantum and classical state spaces satisfy postulates 1 and 2. By a 'classical state space', we mean a state space that consists of discrete probability distributions. Concretely, for any number $N\in {\mathbb{N}}$ of mutually exclusive alternatives, consider the state space

Any pure state is given by a deterministic probability vector, i.e. ${\omega }_{i}=(0,\ldots ,0,1,0,\ldots ,0)$ (where 1 is on the ith place). If we have two equally sized sets of such vectors (as in postulate 1), then there is always a permutation that maps one set to the other. In fact, the reversible transformations correspond to the permutations of the entries. Postulate 2 is then simply the statement that

Which state spaces are there, in addition to standard complex quantum theory and classical probability theory, that satisfy postulates 1 and 2? We think that this question is very difficult to answer. Thus, we formulate the following

Open problem 1. Classify all state spaces that satisfy postulates 1 and 2.

From the results in [43], we know which state spaces satisfy postulates 1 and 2 and one additional property: the absence of third-order interference. The notion of higher-order interference has been introduced by Sorkin [76], and has since been the subject of intense theoretical [93, 95, 96] and experimental [97102] interest.

For the main idea, think of three mutually exclusive alternatives in quantum theory (such as three slits in a triple-slit experiment), described by orthogonal projectors ${\pi }_{1},{\pi }_{2},{\pi }_{3}$. The event that alternative 1 or alternative 2 takes place is described by the projector ${\pi }_{12}={\pi }_{1}+{\pi }_{2};$ similarly, we have ${\pi }_{13}$, ${\pi }_{23}$ and ${\pi }_{123}$. Their actions on density matrices are described by superoperators

(and similarly for the other projectors). As a consequence, we obtain that ${P}_{12}\ne {P}_{1}+{P}_{2}$, which expresses the phenomenon of interference. However, it is easy to check that

Equation (5)

which means that interference over three alternatives can be reduced to contributions from interferences of pairs of alternatives. Similar identities hold for an arbitrary number $n\geqslant 4$ of alternatives: quantum theory admits only pairwise interference, and no 'third-order interference' which would be characterised by a violation of this equality.

In the context of postulates 1 and 2, we have an analogous notion of orthogonal projectors, and thus we can consider (5) and its generalisation to $n\geqslant 4$ alternatives on a state space with $N\geqslant n$ perfectly distinguishable states. Postulating this 'absence of third-order interference' in addition to postulates 1 and 2 gives us the following:

(Lemma 33 in [43]).

Theorem 2 The possible state spaces which satisfy postulates 1 and 2 and which do not admit third-order interference, in addition to classical state spaces, are the following. First, for $N\geqslant 4$ perfectly distinguishable states, there are only three possibilities:

  • Standard complex quantum theory.
  • Quantum theory over the real numbers. That is, only real entries are allowed in the $N\times N$ density matrices.
  • Quantum theory over the quaternions. The state spaces are the self-adjoint $N\times N$ quaternionic matrices of unit trace.

For $N=3$ perfectly distinguishable states, all of the above and one exceptional solution are possible, namely quantum theory over the octonions (but only for the case of $3\times 3$ unit trace density matrices).

For $N=2$ (the 'bit' case), we have the $d$-dimensional Bloch ball state spaces ${{\rm{\Omega }}}_{d}:= \{{(1,r)}^{T}| r\in {{\mathbb{R}}}^{d},\parallel r\parallel \leqslant 1\}$ with $d\geqslant 2$. They are analogous to the standard Bloch ball ${{\rm{\Omega }}}_{3}$ of quantum theory, with very similar descriptions of effects etc. Their group of reversible transformations may either be $\mathrm{SO}(d)$ (which corresponds to $\mathrm{PU}(2)$ for $d=3$), or some subgroup of ${\rm{O}}(d)$ which is transitive on the sphere (such as $\mathrm{SU}(2)$ for $d=4$).

Mathematically, these examples correspond to the state spaces of the finite-dimensional irreducible formally real Jordan algebras [24, 43]. We do not know whether there are theories that satisfy postulates 1 and 2 but admit higher-order interference and therefore do not appear on this list. In theorem 12, we will show that the question whether a theory has third-order interference is related to the properties of its Rényi entropies.

5.4. Observables and diagonalization

A central part of physics are observables and how they can be measured. In standard quantum theory, we can introduce observables in two different ways, which both equivalently lead to the prescription that observables are described by Hermitian operators/matrices.

First, in finite dimensions, we can characterise observables as those objects that linearly assign real expectation values to states. In the case of quantum theory it follows that observables are represented by matrices X, and Hermiticity $X={X}^{\dagger }$ implies that expectation values $\mathrm{tr}(\rho X)$ are always real. Linearity is enforced by the statistical interpretation of states, for the same reason that effects in GPTs are linear.

Second, we can introduce observables by saying that there is a projective measurement ${\pi }_{1},\ldots ,{\pi }_{n}$ that measures this observable, and which has outcomes ${x}_{1},\ldots ,{x}_{n}\in {\mathbb{R}}$. This leads to the Hermitian operator $X={\sum }_{i=1}^{n}{x}_{i}{\pi }_{i}$. Since every Hermitian operator can be diagonalized, these two definitions are equivalent.

Our two postulates provide the structure to introduce observables in a completely analogous way. First, using the inner product, we can define observables as linear maps of the form

and thus identify them with elements $x\in A$ of the vector space that carries the states (as in quantum theory, where this vector space is the space of Hermitian matrices). As noticed in [62], every such vector has a representation of the form

Equation (6)

where the ui are projective units corresponding to mutually orthogonal faces Fi, ${x}_{i}\in {\mathbb{R}}$, and ${x}_{i}\ne {x}_{j}$ for $i\ne j$. The analogy with quantum theory goes even further: due to (4), we have $x={\sum }_{i}{x}_{i}{\sum }_{j}{\omega }_{i}^{(j)}$, whenever ${\omega }_{i}^{(1)},\ldots ,{\omega }_{i}^{(| {F}_{i}| )}$ is a frame on Fi. This corresponds to the identity $X={\sum }_{i}{x}_{i}{\sum }_{j}| {\psi }_{i}^{(j)}\rangle \langle {\psi }_{i}^{(j)}| $ in standard quantum theory. In analogy to quantum theory we will call the Fi eigenfaces and the xi eigenvalues. To further justify this terminology, note that the xi are eigenvalues of the map ${\sum }_{i}{x}_{i}{P}_{i}$, where Pi are the orthogonal projectors onto the spans of the faces Fi.

Theorem 3. If postulates 1 and 2 hold, then every element $x\in A$ has a representation of the form $x={\sum }_{j=1}^{n}{x}_{j}{u}_{j}$ where ${x}_{j}\in {\mathbb{R}}$ are pairwise different and the ${u}_{j}$ are the projective units of pairwise orthogonal faces ${F}_{j}$ such that ${\sum }_{j}{u}_{j}={u}_{A}$. This decomposition $x={\sum }_{j=1}^{n}{x}_{j}{u}_{j}$ is unique up to relabelling. In analogy to quantum theory, we will call the ${x}_{j}$ eigenvalues and the ${F}_{j}$ eigenfaces.

Furthermore, for every real function $f$ with suitable domain of definition, we can define

Equation (7)

as in spectral calculus.

If ${P}_{j}$ is the orthogonal projector onto the span of ${F}_{j}$, then $({P}_{1},\ldots ,{P}_{n})$ is a well-defined instrument with induced measurement $({u}_{1},\ldots ,{u}_{n})$ which leaves the elements of $\mathrm{span}({F}_{j})$ invariant:

In analogy to quantum theory, we will call this instrument the projective measurement of the observable $x$.

We will give a proof in the appendix 16 . Equation (7) allows us to define a notion of entropy, in full analogy to quantum mechanics.

(Spectral entropy).

Definition 4 If A is a state space that satisfies postulates 1 and 2, we define the ${\rm{spectral}}\ {\rm{entropy}}$ for any state $\omega \in {{\rm{\Omega }}}_{A}$ as

where $\omega ={\sum }_{i}{p}_{i}{\omega }_{i}$ is any convex decomposition of ω into pure and perfectly distinguishable states ${\omega }_{i}$, and $0\mathrm{log}0:= 0$.

Theorem 3 tells us that this definition is independent of the choice of decomposition: it is easy to check that

where $\mathrm{log}\omega $ is understood in the sense of spectral calculus as in (7). The right-hand side is manifestly independent of the decomposition. It can also be written $S(\omega )={u}_{A}(\eta (\omega ))$, where $\eta (x)=-x\mathrm{log}x$ for $x\gt 0$ and $\eta (0)=0$. In particular,

Equation (8)

To see this, note that any pure state ${\omega }_{1}=\omega $ can be extended to a set of perfectly distinguishable pure states ${\omega }_{1},{\omega }_{2},\ldots ,{\omega }_{{N}_{A}}$ such that $\omega =1\cdot {\omega }_{1}+0\cdot {\omega }_{2}\,+\,\ldots \,+\,0\cdot {\omega }_{{N}_{A}}$. Conversely, if $S(\omega )=0$, then any decomposition of ω must have coefficients $(1,0,\ldots ,0)$.

5.5. Thermodynamics in the context of postulates 1 and 2

If a state space satisfies postulates 1 and 2, then it also satisfies all the assumptions that we have made in von Neumann's thought experiment. It is easy to check all items in assumptions 1: (a) is simply postulate 2, and (c) is a consequence of postulate 1. As we have seen in the previous section, our two postulates imply that we have orthogonal projectors sharing important properties with those of standard quantum theory. If we make the physical assumption that we can actually implement them by means of semipermeable membranes (as in quantum theory), we obtain (b). Item (e) is the same as (8). Note that assumption (d) is not a mathematical assumption about the state space, but a physical assumption about thermodynamic entropy. This shows part of the following (the full proof will be given in the appendix):

Observation 5. von Neumann's thought experiment, as explained in section 3, can be run for every state space that satisfies postulates 1 and 2. The notion of thermodynamic entropy $H$ that one obtains from that thought experiment turns out to equal spectral entropy $S$ as given in definition 4,

This is consistent with assumptions 1. Furthermore, it is also consistent with Petz' version of the thought experiment, because spectral entropy satisfies

Equation (9)

for every convex decomposition $\omega ={\sum }_{j}{p}_{j}{\omega }_{j}$ of ω into perfectly distinguishable, not necessarily pure states ${\omega }_{j}$.

Thus, spectral entropy S gives meaningful and consistent physical predictions in situations like von Neumann's and Petz' thought experiments. However, we clearly do not know whether S is a consistent notion of physical entropy in all thermodynamical situations.

It turns out that there are further properties of S that encourage its physical interpretation as a thermodynamical entropy. In particular, we will now show that the second law holds in two important situations. We start by considering projective measurements ${P}_{1},\ldots ,{P}_{n}$. Projective measurements can model semipermeable membranes as in von Neumann's thought experiment, or they describe the measurement of an observable as explained in section 5.4. Consider the action of this measurement on a given state ω. With probabilities $({u}_{A}\circ {P}_{j})(\omega )$, this measurement yields the outcome j with post-measurement state ${\omega }_{j}:= {P}_{j}\omega /({u}_{A}\circ {P}_{j}(\omega ))$. Performing this measurement on every particle of an ensemble (without learning the outcomes) yields a new ensemble, described by the post-measurement state

Projective measurements do not decrease the entropy of the ensemble:

Theorem 6. Suppose postulates 1 and 2 are satisfied. Let ${P}_{1},\ldots ,{P}_{n}$ be orthogonal projectors which form a valid instrument. Then the induced measurement with post-measurement ensemble state $\omega ^{\prime} ={\sum }_{j}{P}_{j}w$ does not decrease entropy: $S(\omega ^{\prime} )\geqslant S(\omega )$.

The proof will be given in the appendix. As in standard quantum theory, projectors Pj form a valid instrument if and only if they are mutually orthogonal, i.e. ${P}_{i}{P}_{j}={\delta }_{{ij}}{P}_{i}$, and complete: ${\sum }_{i}{u}_{A}\circ {P}_{i}={u}_{A}$.

Another important manifestation of the second law is in mixing procedures as in figure 4. Consider tanks that are separated by walls. Similarly to von Neumann's thought experiment, let the jth tank contain an Nj -particle gas that represents an ${\omega }_{j}$-ensemble. Furthermore, assume that all the gases are at the same pressure and density. Identifying thermodynamic entropy H with spectral entropy S (as suggested by observation 5), the entropy of the GPT-ensemble in tank j is ${N}_{j}S({\omega }_{j})$, where S is the entropy per system. Thus the total GPT-entropy is ${\sum }_{j}{N}_{j}S({\omega }_{j})$. We remove the walls and let the gases mix. Then we put the walls back in. Now all the tanks contain gases hosting ${\sum }_{j}\tfrac{{N}_{j}}{N}{\omega }_{j}$ ensembles at the same conditions as before, where $N={\sum }_{j}{N}_{j}$. The total GPT-entropy in the end is given by ${\sum }_{j}{N}_{j}S\left({\sum }_{k}\tfrac{{N}_{k}}{N}{\omega }_{k}\right)={NS}\left({\sum }_{k}\tfrac{{N}_{k}}{N}{\omega }_{k}\right)$. As the gases in the tanks have the same density, volume, temperature and pressure as before, the only difference in entropy is due to the GPT-ensembles. The second law requires that the entropy does not decrease in this process, i.e. that ${\sum }_{j}{N}_{j}S({\omega }_{j})\leqslant {NS}\left({\sum }_{j}\tfrac{{N}_{j}}{N}{\omega }_{j}\right)$ and thus ${\sum }_{j}\tfrac{{N}_{j}}{N}S({\omega }_{j})\leqslant S\left({\sum }_{j}\tfrac{{N}_{j}}{N}{\omega }_{j}\right)$. The following theorem shows that our two postulates guarantee that this is true:

Figure 4.

Figure 4. A process mixing gases by removing a separating wall. Theorem 7 ensures that this process does not decrease entropy, i.e. ${\rm{\Delta }}H\geqslant 0$, if thermodynamic entropy H is identified with spectral entropy S as suggested by von Neumann's thought experiment.

Standard image High-resolution image

Theorem 7. Assume postulates 1 and 2. Then entropy is concave, i.e. for ${\omega }_{1},\ldots ,{\omega }_{n}\in {{\rm{\Omega }}}_{A}$ and ${p}_{1},\ldots ,{p}_{n}$ a probability distribution, we have

Equation (10)

Thus, the second law automatically holds for mixing processes. One way to prove (10) is to see that S equals 'measurement entropy' as we will show in section 5.6, proven to be concave in [66, 67]. However, there is a simpler proof that uses a notion of relative entropy, which is an important notion in its own right.

Definition 8. For state spaces A that satisfy postulates 1 and 2, we define the ${relative}\ {entropy}$ of two states $\omega ,\varphi \in {{\rm{\Omega }}}_{A}$ as

Here, for $\varphi ={\sum }_{j}{q}_{j}{\varphi }_{j}$ any decomposition into a maximal frame, $\mathrm{log}\varphi ={\sum }_{j}\mathrm{log}({q}_{j}){\varphi }_{j}$ according to theorem 3. (As in quantum theory, this can be infinite if there are qj = 0 such that $\langle \omega ,{\varphi }_{j}\rangle \ne 0$).

A notion of relative entropy in GPTs has also been defined in Scandolo's Master Thesis [48], but under different assumptions, as discussed in the introduction. Relative entropy continues to satisfy Klein's inequality, a fact that is useful in proving theorem 7. The proof is similar to that within standard quantum theory and deferred to the appendix.

(Klein's inequality).

Theorem 9 For all $\omega ,\varphi \in {{\rm{\Omega }}}_{A}$,

Klein's inequality can be used to give a simple proof of theorem 7:

Given all the calculations in this subsection in terms of orthogonal projections, it may seem at first sight as if every statement or calculation in quantum theory can be analogously made in the more general state spaces that satisfy postulates 1 and 2. However, this may not quite be true, as the fact that the following is an open problem shows:

Open problem 2. For state spaces satisfying postulates 1 and 2, if ω is a pure state, and $P$ an orthogonal projection, then is $P\omega $ also (up to normalisation) a pure state?

In classical and quantum state spaces, the answer is 'yes', but we do not know if a positive answer follows from postulates 1 and 2 alone. We will return to this problem in theorem 12.

Note that Chiribella and Scandolo have applied similar techniques and found beautiful results, including some which are comparable to some of ours, in [45, section 7] (see also [48]). They derive diagonalizability of states from a very different set of postulates.

5.6. Information-theoretic entropies and their relation to physics

So far we have considered entropy from a thermodynamic perspective. But entropies also arise in information theory, and as the GPT framework is mostly studied in quantum information theory, indeed there have been many results on entropy from a information-theoretic perspective. Our exposition will mainly follow [66], but has also been given in a slightly different formalism in [67].

Let $e=({e}_{1},\ldots ,{e}_{n})$ and $f=({f}_{1},\ldots ,{f}_{m})$ be two measurements such that there exists a map $M\,:\{1,\ldots ,n\}\to \{1,\ldots ,m\}$ with

If M is bijective, then the measurement f is simply a re-labelling of e. If there exists a k with $M(j)\ne k\ \ \forall j$, then because of the normalisation of the e-measurement, fk = 0, i.e. fk corresponds to a trivial outcome that never happens. If M is not injective, then f is a coarse-graining of e (or vice versa, e a refinement of f) in the sense that f is obtained from e by collecting several outcomes of e and giving them a common new outcome label (and by possibly adding the 0-effect a few times), see figure 5. In this sense, we do not care about which of the ej triggered the new effect.

Figure 5.

Figure 5. A coarse-graining of a measurement is created by having several measurement results trigger the same output.

Standard image High-resolution image

However, there exist trivial refinements/coarse-grainings: for those, ${e}_{j}\propto {f}_{M(j)}\ \ \forall j$. We write ${e}_{j}={p}_{j}{f}_{M(j)}$. Then such a measurement can be obtained by performing f, and if outcome k is triggered, we activate a classical random number generator which generates the final outcome j among those j with $M(j)=k$ with probability

Thus, a trivial refinement does not yield any additional information about the GPT-system. We call a measurement fine-grained if it does not have any non-trivial refinements. The set of fine-grained measurements on any state space A is denoted ${{ \mathcal E }}_{A}^{* }$.

Now we consider the Rényi entropies [65], which are defined for probability distributions ${\bf{p}}=({p}_{1},\ldots ,{p}_{n})$ as

where $\alpha \in (0,\infty )\setminus \{1\}$. Furthermore,

with $\mathrm{supp}({\bf{p}})=\{{p}_{j}\ | \ {p}_{j}\gt 0\}$ is called the max-entropy, and

is called the min-entropy. Also,

is just the regular Shannon entropy H.

For $\alpha \in [0,\infty ]$ and GPTs satisfying postulates 1 and 2, we generalise the classical Rényi entropies:

where $\omega ={\sum }_{j}{p}_{j}{w}_{j}$ is any decomposition into perfectly distinguishable pure states. According to theorem 3, the result is independent of the choice of decomposition. We have ${S}_{1}=S$, the spectral entropy of definition 4.

Following [66], for every $\alpha \in [0,\infty ]$ and $\omega \in {{\rm{\Omega }}}_{A}$, we define the order-α Rényi measurement entropy as

where Hα on the right-hand side denotes the classical Rényi entropy. The order-α Rényi decomposition entropy is defined as

Equation (11)

where the infimum is over all convex decompositions of ω into pure states ${\varphi }_{j}\in {{\rm{\Omega }}}_{A}$.

The idea of measurement entropy is to characterise the state before a measurement. For example, in quantum theory, particles prepared in a state $| \psi \rangle $ which all give the same result in energy measurements would be said to be in an energy eigenstate. If instead we performed a position measurement, the resulting distribution of positions would have non-zero entropy. However, this entropy would arguably not come from the initial state, but from the measurement process itself due to the uncertainty principle.

Suppose we would like to prepare a state ω by using states of maximal knowledge (i.e. pure states) ${\varphi }_{j}$, and a random number generator which gives output j with probability pj. Then the decomposition entropy quantifies the smallest information content (entropy) of a random number generator that would be necessary to build such a device. For more detailed operational interpretations of measurement and decomposition entropy, in particular for $\alpha =1$, see [66, 67] Note that in quantum theory, measurement, decomposition and spectral Rényi entropies all coincide, with the $\alpha =1$ case giving von Neumann entropy, $S(\omega )=-\mathrm{tr}(\omega \mathrm{log}\omega )$.

Our first result is that the spectral and measurement definitions of the entropies agree:

Theorem 10. Consider any state space $A$ which satisfies postulates 1 and 2. Then the Rényi entropies ${S}_{\alpha }$ and the Rényi measurement entropies ${\widehat{S}}_{\alpha }$ coincide, and upper-bound the Rényi decomposition entropy ${\check{S}}_{\alpha }$, i.e.

In particular, for $\alpha =1$, the measurement entropy $\widehat{S}$ is the same as the spectral entropy S from definition 4, which we have identified with thermodynamical entropy $H$ in observation 5.

The inequality ${\check{S}}_{\alpha }\leqslant {S}_{\alpha }$ is easy to see: for a decomposition $\omega ={\sum }_{i}{p}_{i}{\omega }_{i}$ into perfectly distinguishable pure states ${\omega }_{i}$, the states ${\omega }_{i}$ can also be seen as a fine-grained measurement, yielding outcome probabilities Pi. So taking the infimum over all decompositions gives at most ${H}_{\alpha }({\bf{p}})={S}_{\alpha }(\omega )$. The equality between Sα and ${\widehat{S}}_{\alpha }$ is shown in the appendix.

We do not know in general whether postulates 1 and 2 imply that ${\check{S}}_{\alpha }={S}_{\alpha }$ for all α. Interestingly, we know it for $\alpha =2$ and $\alpha =\infty $:

Theorem 11. If a state space satisfies postulates 1 and 2, then ${\check{S}}_{2}(\omega )={S}_{2}(\omega )$ and ${\check{S}}_{\infty }(\omega )={S}_{\infty }(\omega )$ for all states ω.

Proof. To give the reader an idea of the kind of arguments involved, we present the proof for S2, but defer the proof for ${S}_{\infty }$ to the appendix. If $\omega ={\sum }_{j}{p}_{j}{\omega }_{j}$ is any convex decomposition into a maximal set of perfectly distinguishable pure states (without loss of generality ${p}_{1}\geqslant {p}_{2}\geqslant \ldots $), and $\omega ={\sum }_{j}{q}_{j}{\varphi }_{j}$ any (other) convex decomposition into pure states ${\varphi }_{j}$ (also with ${q}_{1}\geqslant {q}_{2}\geqslant \ldots $,) then ${\sum }_{j}{p}_{j}^{2}=\langle \omega ,\omega \rangle ={\sum }_{j}{q}_{j}^{2}+{\sum }_{j\ne k}{q}_{j}{q}_{k}\langle {\varphi }_{j},{\varphi }_{k}\rangle \geqslant {\sum }_{j}{q}_{j}^{2}$ since $\langle {\varphi }_{j},{\varphi }_{k}\rangle \geqslant 0$. Thus, we have

and since ${\check{S}}_{2}(\omega )$ is defined as the infimum over the right-hand side, we obtain that ${\check{S}}_{2}(\omega )\geqslant {S}_{2}(\omega );$ we find the converse inequality in theorem 10.□

We do not know whether the same identity holds for the most interesting case $\alpha =1$, the case of standard thermodynamic entropy $S={S}_{1}$. In the max-entropy case $\alpha =0$, however, we have a surprising relation to higher-order interference:

Theorem 12. Consider a state space satisfying postulates 1 and 2. Then the following statements are all equivalent:

  • (i)  
    The state space does not have third-order interference.
  • (ii)  
    The measurement and decomposition versions of max-entropy coincide, i.e. ${\check{S}}_{0}(\omega )={S}_{0}(\omega )$ for all states ω.
  • (iii)  
    The state space is either classical, or one on the list of theorem 2.
  • (iv)  
    If ω is a pure state and ${P}_{F}$ any orthogonal projection onto any face $F$, then ${P}_{F}\omega $ is a multiple of a pure state.
  • (v)  
    The 'atomic covering property' of quantum logic holds.

The equivalences $({\rm{i}})\iff ({\rm{iii}})\iff ({\rm{iv}})\iff ({\rm{v}})$ are shown in [43]; our new result is the equivalence to (ii), which is shown in the appendix.

Absence of third-order interference is meant in the sense of equation (5), as introduced originally by Sorkin [76]: only pairs of mutually exclusive alternatives can possibly interfere. It is interesting that this is related to an information-theoretic property of max-entropy S0, as given in (ii). We do not currently know whether S0 (or, in particular, the identity of ${\check{S}}_{0}$ and S0) has any thermodynamic relevance in the class of theories that we are considering, but it certainly does within quantum theory, where it attains operational meaning in single-shot thermodynamics [28, 29].

As (iii) shows, this theorem is closely related to open problem 1: it gives properties of conceivable state spaces that satisfy postulates 1 and 2, but are not on the list of known examples (namely, they do not satisfy any of $({\rm{i}})\mbox{--}({\rm{v}})$). Similarly, (iv) shows the relation of higher-order interference to open problem 2, and (v) relates all these items to quantum logic. In fact, one can show that postulates 1 and 2 imply that the set of faces of the state space has the structure of an orthomodular lattice, which is often seen as the definition of quantum logic. For readers who are familiar with the terminology of quantum logic, we give some additional remarks in section A.3 in the appendix.

6. Conclusions

As discussed in the introduction, many works (dating back at least to the 1950s) have considered quantum theory as just one particular example of a probabilistic theory: a single point in a large space of theories that contains classical probability theory, as well as many other possibilities that are non-quantum and non-classical. More recent works have focused on the information-theoretic properties of quantum theory, for example deriving quantum theory as the unique structure that satisfies a number of information-theoretic postulates.

Rather than attempt a derivation of quantum theory from postulates, this paper has examined the thermodynamic properties of quantum theory and of those theories that are similar enough to quantum theory to admit a good definition of thermodynamic entropy, and of some version of the second law. Postulate 1 states that there is a reversible transformation between any two sets of n distinguishable pure states. This can be thought of as an expression of the universality of the representation of information, in particular that a choice of basis is arbitrary, and also allows for reversible microscopic dynamics, as is crucial for thermodynamics. Postulate 2 states that every state can be written as a convex mixture of perfectly distinguishable pure states. This ensures that a mixed state describing an ensemble of many particles can be treated as if each particle has an unknown microstate, drawn from a set of distinguishable possibilities.

Much follows from postulates 1 and 2, without needing to assume any other aspects of the standard formalism of quantum theory. In order to derive thermodynamic conclusions, we considered the argument originally employed by von Neumann in his derivation of the mathematical expression for the thermodynamic entropy of a quantum state. The argument involves a thought experiment with a gas of quantum particles in a box, and semi-permeable membranes that allow a particle to pass or not depending on the outcome of a quantum measurement. By applying the same thought experiment, we showed that given any theory satisfying postulates 1 and 2, there is a unique expression for the the thermodynamic entropy, equal to both the spectral entropy and the measurement entropy. By way of contrast, a fictitious system defined by a square state space, which arises as Alice's local system of an entangled pair producing stronger-than-quantum 'PR box' correlations, does not satisfy either Postulate. This system—the gbit—does not admit a sensible notion of thermodynamic entropy, at least not one that is given to it by the von Neumann or Petz arguments. While many works have discussed the inability of quantum theory to produce arbitrarily strong nonlocal correlations, this connection with thermodynamics deserves further investigation. It would be very interesting, for example, if Tsirelson's bound on the strength of quantum nonlocal correlations could be derived from a thermodynamic argument.

There are many other consequences of postulates 1 and 2 for both thermodynamic and information-theoretic entropies. For example, a form of the second law holds in that neither projective measurements nor mixing procedures can decrease the thermodynamic entropy. The spectral and measurement order-α Renyi entropies coincide for any α. The spectral and decomposition order-α Renyi entropies coincide for $\alpha =2$ or $\infty $. An open question is whether any theory satisfying postulates 1 and 2 is completely satisfactory from the thermodynamic point of view. While the von Neumann and Petz arguments can be run with no trouble in the presence of postulates 1 and 2 as we have shown, there could still be a different physical scenario, in which theories would fail to exhibit sensible behaviour unless they have even more of the structure of quantum theory.

Finally, another major open question is whether quantum-like theories exist, satisfying postulates 1 and 2, that are distinct from quantum theory in that they admit higher-order interference. Roughly speaking, this means that three or more possibilities can interfere in order to produce an overall amplitude, unlike in quantum theory, where different possibilities only interfere in pairs. We extend the results of [43], where it was shown that in the context of postulates 1 and 2 the existence of higher-order interference is equivalent to each of three other statements. We provide an equivalent entropic condition: there is higher-order interference if and only if the measurement and decomposition versions of the max entropy do not coincide.

Our understanding of quantum theory would be greatly improved if higher-order interference could be ruled out by simple information-theoretic, thermodynamic, or other physical arguments. On the other hand, if theories with higher-order interference exist and are eminently sensible, an immediate question is whether an experimental test could be performed to distinguish such a theory from quantum theory. While previous experiments [97102] only tested for a zero versus non-zero value of higher-order interference, sensible higher-order theories that satisfy postulates 1 and 2 (if they exist) could help to inform future experiments by supplying concrete models that can be tested against standard quantum theory.

Acknowledgments

We would like to thank Matt Leifer for many useful discussions, and we are grateful to the participants of the 'Foundations of Physics working group' at Western University for helpful feedback. We would also like to thank Giulio Chiribella and Carlo Maria Scandolo for coordinating the arXiv posting of their work with us. This research was supported in part by Perimeter Institute for Theoretical Physics. Research at Perimeter Institute is supported by the Government of Canada through the Department of Innovation, Science and Economic Development Canada and by the Province of Ontario through the Ministry of Research, Innovation and Science. This research was undertaken, in part, thanks to funding from the Canada Research Chairs programme. This research was supported by the FQXi Large Grant 'Thermodynamic versus information theoretic entropies in probabilistic theories'. HB thanks the Riemann Center for Geometry and Physics at the Institute for Theoretical Physics, Leibniz University Hannover, for support as a visiting fellow during part of the time this paper was in preparation.

Appendix

A.1. Proofs

A.1.1. Proof that observables are well-defined

In this appendix, a decomposition of a state into perfectly distinguishable pure states (which always exists due to postulate 2) will be called a 'classical decomposition'.

Lemma 13. Assume postulates 1 and 2. Let $F\ne \{0\}$ be a face of ${A}_{+}$ and $\omega \in {{\rm{\Omega }}}_{A}\cap F$. Then there exists a classical decomposition $\omega ={\sum }_{j}{p}_{j}{\omega }_{j}$ with ${\omega }_{j}\in F$ for all j.

Proof. Let $\omega ={\sum }_{j}{p}_{j}{\omega }_{j}$ be a classical decomposition with ${p}_{j}\ne 0$. As $\omega \in F$ and F a face, ${\omega }_{j}\in F$ for all j.□

Proof of theorem 3. Let $x\in A$ be arbitrary. By lemma 5.46 from [62] there exists a frame $\{{\omega }_{j}\}$ and ${x}_{j}^{\prime }\in {\mathbb{R}}$ such that $x={\sum }_{j}{x}_{j}^{\prime }\,{\omega }_{j}$. We extend $\{{\omega }_{j}\}$ to a maximal frame by adding ${x}_{j}^{\prime }:= 0$ for the new indices j. Now we group together the j with the same ${x}_{j}^{\prime }$ value, and by relabelling we find that $x={\sum }_{k=1}^{n}{x}_{k}{\sum }_{i}{\omega }_{k;i}$ where the xk are pairwise different values of the ${x}_{j}^{\prime }$ and the ${\omega }_{k;i}$ are the ${\omega }_{j}$ that belong to this ${x}_{j}^{\prime }$ value. For any given k, the ${\omega }_{k;i}$ generate a face Fk with projective unit ${u}_{k}={\sum }_{i}{\omega }_{k;i}$.

Therefore we find a decomposition $x={\sum }_{k=1}^{n}{x}_{k}{u}_{k}$ with xk pairwise different real numbers and uk order units of faces Fk and ${\sum }_{k=1}^{n}{u}_{k}={u}_{A}$.

Now we show that the faces Fk are mutually orthogonal:

Let $\omega \in {F}_{k}$ be an arbitrary normalised state. By lemma 13 it has a classical decomposition $\omega ={\sum }_{j}{p}_{j}{\omega }_{j}^{(k)}$ which uses only pure states ${\omega }_{j}^{(k)}\in {F}_{k}$. W.l.o.g. we assume that these pure states form a generating frame of Fk, by extending the frame and adding pj = 0 to the decomposition. Consider another face Fm, i.e. $m\ne k$. Likewise to ω, let $\omega ^{\prime} \in {F}_{m}$ be an arbitrary normalised state and $\omega ^{\prime} ={\sum }_{j}{q}_{j}{\omega }_{j}^{(m)}$ be a classical decomposition with ${\omega }_{j}^{(m)}$ a generating frame for Fm. For the other faces define ${\omega }_{j}^{(i)}:= {\omega }_{i;j}$. Then ${u}_{i}={\sum }_{j}{\omega }_{j}^{(i)}$ and in total ${u}_{A}={\sum }_{i}{u}_{i}={\sum }_{i}{\sum }_{j}{\omega }_{j}^{(i)}$. As $\langle \nu ,\nu \rangle =1$ for all pure states $\nu \in {{\rm{\Omega }}}_{A}$, this implies that the ${\omega }_{j}^{(i)}$ are mutually orthogonal:

and therefore $\langle {\omega }_{j}^{(i)},{\omega }_{h}^{(g)}\rangle \geqslant 0$ implies $\langle {\omega }_{j}^{(i)},{\omega }_{h}^{(g)}\rangle =0$ for all $(i,j)\ne (g,h)$. Thus we find $\langle \omega ,\omega ^{\prime} \rangle ={\sum }_{j}{\sum }_{b}{p}_{j}{q}_{b}\langle {\omega }_{j}^{(k)},{\omega }_{b}^{(m)}\rangle =0$ because $m\ne k$. As $\omega \in {F}_{k}$ and $\omega ^{\prime} \in {F}_{m}$ were arbitrary (normalised) states, this implies that Fk and Fm are orthogonal. As $k\ne m$ were arbitrary, all the faces are mutually orthogonal.

Now we will show that the decomposition $x={\sum }_{j}{x}_{j}{u}_{j}$ is unique. So assume there are two decompositions $x={\sum }_{j=1}^{{n}_{a}}{a}_{j}{u}_{j}^{(a)}={\sum }_{j=1}^{{n}_{b}}{b}_{j}{u}_{j}^{(b)}$ with ${a}_{j}\in {\mathbb{R}}$ pairwise different and projective units ${u}_{j}^{(a)}$ that add up to the order unit (analogously for b) and belong to pairwise orthogonal faces ${F}_{j}^{(a)}$. W.l.o.g. we assume that the aj and bj are ordered by size, i.e. ${a}_{1}\lt {a}_{2}\lt ...\,\lt \,{a}_{{n}_{a}}$. We want to show ${a}_{1}={b}_{1}$. The ${u}_{j}^{(a)}$ generate the faces ${F}_{j}^{(a)}$. Let ${\omega }_{j;i}^{(a)}$ be a generating frame for the face ${F}_{j}^{(a)}$, especially ${\sum }_{i}{\omega }_{j;i}^{(a)}={u}_{j}^{(a)}$. As the faces are mutually orthogonal and the projective units add up to uA, the ${\omega }_{j;i}^{(a)}$ form a maximal frame; in particular they add up to uA (likewise for b). Therefore:

Analogously show ${b}_{1}\geqslant {a}_{1}$, i.e. ${b}_{1}={a}_{1}$ in total.

Now suppose there was a $k\gt 1$ and an i with $\langle {\omega }_{1;j}^{(a)},{\omega }_{k;i}^{(b)}\rangle \ne 0$, i.e. $\langle {\omega }_{1;j}^{(a)},{\omega }_{k;i}^{(b)}\rangle \gt 0$. Then

This is a contradiction. Thus $\langle {\omega }_{1;j}^{(a)},{\omega }_{k;i}^{(b)}\rangle =0$ for all $k\gt 1$ and i. Therefore we find ${u}_{1}^{(b)}({\omega }_{1;i}^{(a)})={\sum }_{j}\langle {\omega }_{1;j}^{(b)}\,,$ ${\omega }_{1;i}^{(a)}\rangle ={\sum }_{j,k}\langle {\omega }_{k,j}^{(b)},\,$ ${\omega }_{1;i}^{(a)}\rangle ={u}_{A}({\omega }_{1;i}^{(a)})=1$ and analogously ${u}_{1}^{(a)}({\omega }_{1;i}^{(b)})=1$. By proposition 5.29 from [62], we have ${{\rm{\Omega }}}_{A}\cap F=\{\omega \in {{\rm{\Omega }}}_{A}\ | \ {u}_{F}(\omega )=1\}$. Therefore a generating frame of ${F}_{1}^{(a)}$ is contained in ${F}_{1}^{(b)}$ and vice versa. Thus we find ${F}_{1}^{(a)}={F}_{1}^{(b)}$ and ${u}_{1}^{(a)}={u}_{1}^{(b)}$.

For the remaining indices, we construct an inductive proof: choose $L\in {\mathbb{R}}$ large enough such that ${a}_{1}+L\gt \max \{{a}_{{n}_{a}},{b}_{{n}_{b}}\}$, and define $x^{\prime} := x+L\cdot {u}_{1}^{(a)}$, i.e. $x^{\prime} ={\sum }_{j=1}^{{n}_{a}}({a}_{j}+{\delta }_{j,1}\cdot L)\,$ ${u}_{j}^{(a)}={\sum }_{j=1}^{{n}_{b}}({b}_{j}+{\delta }_{j,1}\cdot L){u}_{j}^{(b)}$. Furthermore defining ${a}_{1}^{\prime }:= {a}_{2}$, ${a}_{2}^{\prime }:= {a}_{3}$ ,..., ${a}_{{n}_{a}}^{\prime }:= {a}_{1}+L$, ${u}_{1}^{(a^{\prime} )}:= {u}_{2}^{(a)}$, ${u}_{2}^{(a^{\prime} )}:= {u}_{3}^{(a)}$,..., ${u}_{{n}_{a}}^{(a^{\prime} )}:= {u}_{1}^{(a)}$ and likewise for ${b}_{j}^{\prime }$, we find $x^{\prime} ={\sum }_{j=1}^{{n}_{a}}{a}_{j}^{\prime }{u}_{j}^{(a^{\prime} )}={\sum }_{j=1}^{{n}_{b}}{b}_{j}^{\prime }{u}_{j}^{(b^{\prime} )}$ with ${a}_{1}^{\prime }\lt {a}_{2}^{\prime }\lt ...\,\,\lt \text{}{a}_{{n}_{a}}^{\prime }$ and ${b}_{1}^{\prime }\lt {b}_{2}^{\prime }\lt ...\,\lt \,{b}_{{n}_{b}}^{\prime }$. Repeating the exact same procedure as before, we obtain ${a}_{1}^{\prime }={b}_{1}^{\prime }$ and ${u}_{1}^{(a^{\prime} )}={u}_{1}^{(b^{\prime} )}$, i.e. ${a}_{2}={b}_{2}$ and ${u}_{2}^{(a)}={u}_{2}^{(b)}$. We iterate to find aj = bj and ${u}_{j}^{(a)}={u}_{j}^{(b)}$ for all j. Note that as all maximal frames have the same size and as the projective units add up to uA, necessarily na = nb.

At last we construct the projective measurement that corresponds to measuring the observable x: for Fk, let Pk be the orthogonal projector onto the span of Fk (in particular, ${P}_{k}\,:A\to \mathrm{span}({F}_{k})$ surjective). We know that these projectors are positive and linear and satisfy ${u}_{A}\circ {P}_{k}={u}_{k}$. Furthermore $0\leqslant {u}_{k}={u}_{A}\circ {P}_{k}\leqslant {u}_{A}$ and ${\sum }_{k}{u}_{A}\circ {P}_{k}={\sum }_{k}{u}_{k}={u}_{A}$, i.e. we obtain a well-defined measurement; therefore the Pk form a well-defined instrument. As they are projectors, the Pk leave the elements of Fk unchanged.□

A.1.2. Proof of observation 5

In order to show that $H(\omega )=S(\omega )$ is consistent with assumptions 1, we only have to show that $\omega \mapsto S(\omega )$ is continuous, to comply with assumption (d). According to theorem 10 (which we will prove below), the spectral entropy $S(\omega )$ equals measurement entropy $\widehat{S}(\omega )$. But it is well-known [67] and easy to see from its definition that $\widehat{S}$ is continuous.

It remains to show equation (9). So let $\omega ={\sum }_{j}{p}_{j}{\omega }_{j}$ be any decomposition of ω into perfectly distinguishable, not necessarily pure states ${\omega }_{i}$. Decompose all the ${\omega }_{i}$ into perfectly distinguishable pure states ${\omega }_{j}^{(i)}$, i.e. ${\omega }_{j}={\sum }_{i}{q}_{j}^{(i)}{\omega }_{j}^{(i)}$. Perfectly distinguishable states live in orthogonal faces, thus $\langle {\omega }_{i},{\omega }_{j}\rangle =0$ for $i\ne j$ (note that this is a conclusion that follows from postulates 1 and 2, but could not be drawn from bit symmetry alone in [64]). Thus, we also have $\langle {\omega }_{i}^{(j)},{\omega }_{k}^{(l)}\rangle =0$ for $i\ne k$ or $j\ne l$, and so $\omega ={\sum }_{{ij}}{p}_{j}{q}_{j}^{(i)}{\omega }_{j}^{(i)}$ is a decomposition of ω into perfectly distinguishable pure states. Define the real function $\eta :[0,1]\to {\mathbb{R}}$ via $\eta (x):= -x\mathrm{log}x$ for $x\gt 0$ and $\eta (0)=0$. Due to theorem 3 and $\eta ({xy})=-{xy}\mathrm{log}x-{xy}\mathrm{log}y$, we have

and therefore

This completes the proof of observation 5.□

A.1.3. Proof of the second half of theorem 11

Use the notation of the first half of the proof. We claim that ${\max }_{\varphi \in {\rm{\Omega }}}\langle \omega ,\varphi \rangle ={p}_{1}$. The inequality '$\geqslant $' is trivial (consider the special case $\varphi ={\omega }_{1}$). To see the inequality '$\leqslant $', note that $\langle \omega ,\varphi \rangle ={\sum }_{j}{p}_{j}{\lambda }_{j}$, where ${\lambda }_{j}:= \langle {\omega }_{j},\varphi \rangle \in [0,1]$ satisfies ${\sum }_{j}{\lambda }_{j}=\langle {\sum }_{j}{\omega }_{j},\varphi \rangle =\langle u,\varphi \rangle =1$, and so $\langle \omega ,\varphi \rangle \leqslant {p}_{1}$ for all φ. Thus

Similarly as in the first part of the proof, we obtain ${\check{S}}_{\infty }(\omega )\geqslant {S}_{\infty }(\omega )$. The converse inequality from theorem 10 for $\alpha =\infty $ concludes the proof. □

A.1.4. Proof of Klein's inequality and the second law for projective measurements

We consider an ensemble of systems described by an arbitrary state $\omega \in {{\rm{\Omega }}}_{A}$. To all systems of this ensemble we apply a projective measurement described by orthogonal projectors Pa which form an instrument, resulting in a new ensemble state $\omega ^{\prime} $. The Pa project onto the linear span of faces Fa that replace the eigenspaces from quantum theory. We want to show that the measurement cannot decrease the entropy of the ensemble, i.e.

We decompose the proof into several steps. Our basic idea follows the proof of a similar statement for quantum theory in [50]: we reduce the proof of the second law to Klein's inequality. But as we do not have access to an underlying pure state Hilbert space, we will need to use a different argument for why Klein's inequality implies the second law for projective measurements.

So at first we prove Klein's inequality, adapting the proof of [50]. We note that a similar proof has also been found by Scandolo [48], albeit under different assumptions.

Proof of theorem 9. We consider two arbitrary states $\omega ,\nu $ with classical decompositions $\omega ={\sum }_{j}{p}_{j}{\omega }_{j}$, $\nu ={\sum }_{k}{q}_{k}{\nu }_{k}$, where w.l.o.g. the ${\omega }_{j}$ and the ${\nu }_{k}$ form maximal frames. We define the matrix ${P}_{{jk}}:= \langle {\omega }_{j},{\nu }_{k}\rangle $. All its components are non-negative, i.e. ${P}_{{jk}}\geqslant 0$, because the scalar product itself is non-negative for all states. As all maximal frames have the same size, the matrix is a square matrix; as maximal frames sum to uA, the rows and columns sum to one: ${\sum }_{j}{P}_{{jk}}={\sum }_{k}{P}_{{jk}}=1$. Thus, we get

We define ${r}_{j}:= {\sum }_{k}{P}_{{jk}}{q}_{k}$. Note that the rj form a probability distribution: ${r}_{j}\geqslant 0$ and ${\sum }_{j}{r}_{j}={\sum }_{k}{\sum }_{j}{P}_{{jk}}{q}_{k}={\sum }_{k}{q}_{k}=1$. Using the strict concavity of the logarithm, we find:

Therefore we get

We recognise the last expression as the classical relative entropy of the probability distributions pj and rj. This classical relative entropy has the important property that it is never negative. Thus:

In order to get the main proof less convoluted, we will state some technical parts as lemmas.

Lemma 14. Assume postulate 1 and 2. Consider orthogonal projectors ${P}_{j}$ which form an instrument. Then the ${P}_{j}$ are mutually orthogonal:

Proof. We prove ${P}_{k}{P}_{j}\omega =0$ for all $\omega \in A$, $j\ne k$. If ${P}_{j}\omega =0$ this is trivial, so from now on assume ${P}_{j}\omega \ne 0$. As the cone is generating (i.e. $\mathrm{Span}({A}_{+})=A$) and the projectors linear, it is sufficient to show ${P}_{k}{P}_{j}\omega =0$ for all $w\in {A}_{+}$. As Pj is positive, ${P}_{j}\omega \ne 0$ implies that $({u}_{A}\circ {P}_{j})(\omega )\gt 0$ because only the zero-state is normalised to 0. Using ${u}_{A}={u}_{A}\circ ({\sum }_{j}{P}_{j})={\sum }_{j}{u}_{A}\circ {P}_{j}$ and ${P}_{j}{P}_{j}={P}_{j}$:

As the projectors are positive and only the zero-state is normalised to 0, this shows ${P}_{k}{P}_{j}\omega =0$ for $k\ne j$.□

Lemma 15. Assume postulates 1 and 2. Consider an orthogonal projector $P$ which projects onto the linear span of a face $F$ of ${A}_{+}$. Then for all states $\omega \in {A}_{+}$ we find $P\omega \in F$.

Proof. From basic convex geometry (see e.g. proposition 2.10 in [63]), we know that $F=\mathrm{span}(F)\cap {A}_{+}$. Since P is positive, we have $P\omega \in {A}_{+};$ furthermore, since P projects onto F, we have $P\omega \in \mathrm{span}(F)$, thus $P\omega \in F$.□

Proof of theorem 6. We know that $S(\omega \parallel \omega ^{\prime} )=-S(\omega )-\langle \omega ,\mathrm{log}\omega ^{\prime} \rangle \geqslant 0$. As in theorem 11.9 from [50] , we claim $-\langle \omega ,\mathrm{log}\omega ^{\prime} \rangle =S(\omega ^{\prime} )$ and therefore $-S(\omega )+S(\omega ^{\prime} )\geqslant 0$. Thus we only have to prove $-\langle \omega ,\mathrm{log}\omega ^{\prime} \rangle =S(\omega ^{\prime} )$. But as we do not have access to an underlying pure state Hilbert space, our proof is different from [50].

By lemma 14, the Pa are mutually orthogonal, i.e. ${P}_{a}{P}_{b}={\delta }_{{ab}}{P}_{b}$. By symmetry of the Pa also the ${P}_{a}\omega $ are mutually orthogonal: $\langle {P}_{a}\omega ,{P}_{b}\omega \rangle =\langle \omega ,{P}_{a}{P}_{b}\omega \rangle =0$ for $a\ne b$. This also shows that the Fa are mutually orthogonal. If ${P}_{a}\omega =0$ we use the decomposition ${P}_{a}\omega ={u}_{A}({P}_{a}\omega ){\sum }_{k}{r}_{{ak}}{w}_{{ak}}$ with ${r}_{{ak}}={\delta }_{{ak}}$ and wak an arbitrary generating frame of Fa. If ${P}_{a}\omega \ne 0$, then $\tfrac{{P}_{a}\omega }{{u}_{A}({P}_{a}\omega )}\in {F}_{a}\cap {{\rm{\Omega }}}_{A}$ and by lemma 13, there is a classical decomposition $\tfrac{{P}_{a}\omega }{{u}_{A}({P}_{a}\omega )}={\sum }_{k}{r}_{{ak}}{\omega }_{{ak}}$ with ${\omega }_{{ak}}\in {F}_{a}$. We complete the ${\omega }_{{ak}}$ to generating frames of the Fa by adding terms with ${r}_{{ak}}=0$. As we are using classical decompositions/frames, we know $\langle {\omega }_{{aj}},{\omega }_{{ak}}\rangle ={\delta }_{{jk}}$. Furthermore, as the Fa are mutually orthogonal, we know $\langle {\omega }_{{aj}},{\omega }_{{bk}}\rangle =0$ for $a\ne b$.

We note that the the waj form a maximal frame:

For $a\ne b$ we have ${P}_{b}{\omega }_{{aj}}={P}_{b}{P}_{a}{\omega }_{{aj}}=0$, so we have a classical decomposition

with ${\omega }_{{aj}}$ a maximal frame that satisfies ${P}_{a}{\omega }_{{bj}}={\delta }_{{ab}}{\omega }_{{bj}}$. Note that we do not need to normalise $\omega ^{\prime} $ as the measurement itself is required to be normalised. Using

and

Equation (12)

as well as the symmetry of the Pa we finally find:

Equation (13)

A.1.5. Proof that measurement and spectral entropies are identical

In the main text we encountered different ways to define the entropy. One of them is to adapt classical entropy definitions by using the coefficients of a classical decomposition. Another is to adapt classical entropy definitions by using measurement probabilities and minimising over all fine-grained measurements. Here we will show that in the context of postulates 1 and 2, these two concepts yield the same Rényi entropies.

To prove this, we will first analyse fine-grained measurements in further detail. The results will allow us to reproduce the quantum proof found in [66] for our GPTs.

Lemma 16. Assume postulates 1 and 2. Consider an arbitrary fine-grained measurement $({e}_{1},\ldots ,{e}_{n})$. Then for all $j$ there exist some ${c}_{j}\in [0,1]$ and a pure state ${\omega }_{j}\in {{\rm{\Omega }}}_{A}$ such that ${e}_{j}={c}_{j}\langle {\omega }_{j},\cdot \rangle $.

Proof. If ej = 0, we can just take cj = 0 and any pure state ${\omega }_{j}$. So from now on assume ${e}_{j}\ne 0$.

Because of self-duality there exists some $\omega ^{\prime} \in {A}_{+}$ such that $\langle \omega ^{\prime} ,\cdot \rangle ={e}_{j}$. As ${e}_{j}\ne 0$ also $\omega ^{\prime} \ne 0$ and therefore ${u}_{A}(\omega ^{\prime} )\ne 0$. With ${A}_{+}={{\mathbb{R}}}_{\geqslant 0}\cdot {{\rm{\Omega }}}_{A}$ and ${c}_{j}:= {u}_{A}(\omega ^{\prime} )\gt 0$ there exists an $\omega \in {{\rm{\Omega }}}_{A}$ such that $\omega ^{\prime} ={c}_{j}\cdot \omega $. We want to prove that ω is pure, so assume it was not pure. Then it has a classical decomposition $\omega ={\sum }_{k=0}^{N}{p}_{k}{\omega }_{k}$ with ${p}_{k}\gt 0$ and $N\geqslant 1$. By relabelling we can assume j = n, i.e. we consider ${e}_{n}={c}_{j}{\sum }_{k=0}^{N}{p}_{k}\langle {\omega }_{k},\cdot \rangle $. Define a measurement $({e}_{1}^{\prime },\ldots ,{e}_{n+N}^{\prime })$ by ${e}_{k}^{\prime }:= {e}_{k}$ for all $k=1,2,\ldots ,n-1$ and ${e}_{n+i}^{\prime }:= {c}_{j}{p}_{i}\langle {\omega }_{i},\cdot \rangle $ for all $i=0,1,\ldots ,N$. Because of $0\leqslant {c}_{j}{p}_{i}\langle {\omega }_{i},\cdot \rangle ={e}_{n+i}^{\prime }$ and ${\sum }_{k=1}^{n+N}{e}_{k}^{\prime }={\sum }_{k=1}^{n-1}{e}_{k}+{\sum }_{i=0}^{N}{c}_{j}{p}_{i}\langle {\omega }_{i},\cdot \rangle ={\sum }_{k=1}^{n}{e}_{k}={u}_{A}$ this is a well-defined measurement.

Now define $M:\{1,\ldots ,n+N\}\to \{1,\ldots ,n\}$ by $M(i):= i$ for all $i=1,\ldots ,n-1$ and $M(i):= n$ for all $i=n,\ldots ,n+N$. Then we get

Thus the measurement $({e}_{1}^{\prime },\ldots ,{e}_{n+N}^{\prime })$ is a refinement of $({e}_{1},\ldots ,{e}_{n})$. With ${e}_{n}^{\prime }({\omega }_{0})={c}_{j}{p}_{0}={e}_{n}({\omega }_{0})$ and ${e}_{n}^{\prime }({\omega }_{1})=0\ne {e}_{n}({\omega }_{1})$ we find that ${e}_{n}^{\prime }$ is not proportional to en, thus the fine-graining is non-trivial. This is in contradiction to our assumptions. Thus ω has to be pure. Furthermore $1={u}_{A}(\omega )\geqslant {e}_{j}(\omega )={c}_{j}\langle \omega ,\omega \rangle ={c}_{j}$.

So in total we have found that ${e}_{j}={c}_{j}\langle \omega ,\cdot \rangle $ with $\omega \in {{\rm{\Omega }}}_{A}$ pure and ${c}_{j}\in [0,1]$.□

Lemma 17. Assume postulates 1 and 2. Let $\omega \in {{\rm{\Omega }}}_{A}$ and $\omega ={\sum }_{j=1}^{d}{p}_{j}{\omega }_{j}$ be a decomposition into a maximal frame. Then the measurement that perfectly distinguishes the ${\omega }_{j}$ (i.e. ${e}_{k}({\omega }_{j})={\delta }_{{jk}}$) can be chosen to be fine-grained.

Proof. Define ${e}_{j}:= \langle {\omega }_{j},\cdot \rangle $. As maximal frames add up to the order unit, this is a well-defined measurement and it satisfies ${e}_{j}({\omega }_{k})={\delta }_{{jk}}$. It remains to show that this measurement is fine-grained.

Consider a fine-graining ${e}_{k}^{\prime }$ with ${e}_{i}={\sum }_{\{j| M(j)=i\}}{e}_{j}^{\prime }$. By self-duality, there exist ${c}_{j}\geqslant 0$ and ${\omega }_{j}^{\prime }\in {{\rm{\Omega }}}_{A}$ such that ${e}_{j}^{\prime }={c}_{j}\langle {\omega }_{j}^{\prime },\cdot \rangle $ and therefore ${\sum }_{\{j| M(j)=k\}}{c}_{j}{\omega }_{j}^{\prime }={\omega }_{k}$. As $1={u}_{A}({\omega }_{k})={\sum }_{\{j| M(j)=k\}}{c}_{j}{u}_{A}({\omega }_{j}^{\prime })={\sum }_{\{j| M(j)=k\}}{c}_{j}$ we find that ${\sum }_{\{j| M(j)=k\}}{c}_{j}{\omega }_{j}^{\prime }={\omega }_{k}$ is a convex decomposition of a pure state. This requires cj = 0 or ${\omega }_{j}^{\prime }={\omega }_{k}$. In both cases ${e}_{j}^{\prime }={c}_{j}\langle {\omega }_{k},\cdot \rangle ={c}_{j}{e}_{k}$ holds true for all j with $M(j)=k$. Therefore, the fine-graining is trivial.□

Lemma 18. Assume postulates 1 and 2. Consider a fine-grained measurement ${\bf{e}}=({e}_{1},\ldots ,{e}_{N})\in {{ \mathcal E }}^{* }$. Then the maximal number of perfectly distinguishable states $d$ (often denoted as ${N}_{A}$) satisfies $d\leqslant N$.

Furthermore, consider a state $\omega \in {{\rm{\Omega }}}_{A}$ with classical decomposition $\omega ={\sum }_{j=1}^{d}{p}_{j}{\omega }_{j}$ into a maximal frame. Define the vector ${\bf{q}}:= {({e}_{j}(\omega ))}_{1\leqslant j\leqslant N}$ of outcome probabilities and the $N$-component vector ${\bf{p}}=({p}_{1},\ldots ,{p}_{d},0,\ldots ,0)\in {{\mathbb{R}}}^{N}$. Then ${\bf{q}}\,\prec \,{\bf{p}}$, i.e. there exists a bistochastic $N\times N$-matrix $M$ with ${\bf{q}}=M{\bf{p}}$.

Proof. By lemma 16 there exist ${c}_{j}\in [0,1]$ and pure ${\omega }_{j}^{\prime }\in {{\rm{\Omega }}}_{A}$ such that ${e}_{j}={c}_{j}\langle {\omega }_{j}^{\prime },\cdot \rangle $. Define ${q}_{l}:= {e}_{l}(\omega )={c}_{l}\langle {\omega }_{l}^{\prime },\omega \rangle $. Using ${\sum }_{j=1}^{N}{e}_{j}={u}_{A}$ and ${\sum }_{j=1}^{d}{\nu }_{j}={u}_{A}$ for an arbitrary maximal frame $({\nu }_{1},\ldots ,{\nu }_{d})$ we find:

As ${c}_{j}\in [0,1]$, ${\sum }_{j=1}^{N}{c}_{j}=d$ shows $d\leqslant N$.

Set ${q}_{l| j}:= {e}_{l}({\omega }_{j})$, introduce the N-component vector ${\bf{p}}:= ({p}_{1},\ldots ,{p}_{d},0,\ldots ,0)$ and use that measurement effects and states of maximal frames add up to the order unit:

For $j\leqslant d$ we define ${M}_{l,j}:= {q}_{l| j}$. If $d\lt N$ we also define ${M}_{l,j}:= \tfrac{1-{c}_{l}}{N-d}$ for $N\geqslant j\gt d$. M is an N × N-matrix and it is bistochastic: first of all, ${M}_{l,j}\geqslant 0$ for all $l,j$. Furthermore:

This bistochastic matrix maps ${\bf{p}}$ to ${\bf{q}}$, i.e. $M\cdot {\bf{p}}={\bf{q}}$:

Now we come to the proof of the theorem:

Proof of theorem 10. Consider an arbitrary fine-grained measurement $({e}_{1},\ldots ,{e}_{N})$ and an arbitrary state $\omega \in {{\rm{\Omega }}}_{A}$ with classical decomposition $\omega ={\sum }_{j=1}^{d}{p}_{j}{\omega }_{j}$ into a maximal frame. Define ${q}_{l}:= {e}_{l}(\omega )$ and the N-component vector ${\bf{p}}=({p}_{1},\ldots ,{p}_{d},0,\ldots ,0)$. Let M be the bistochastic matrix from lemma 18 with ${\bf{q}}=M\cdot {\bf{p}}$. By Birkhoff's theorem, it is a convex combination of permutation matrices, i.e. $M={\sum }_{\sigma \in {S}_{N}}{a}_{\sigma }{P}_{\sigma }$ for a probability distribution aσ and permutation matrices Pσ. W.l.o.g. we only consider the Shannon entropy; the proof for the Rényi entropies works exactly the same way. As the Shannon entropy is Schur-concave and invariant under permutations:

Furthermore $H({\bf{p}})=-{\sum }_{j=1}^{d}{p}_{j}\mathrm{log}{p}_{j}=S(\omega )$ is the entropy of a measurement that perfectly distinguishes the ${\omega }_{j}$, i.e. ${e}_{j}({\omega }_{k})={\delta }_{{jk}}$. Because of lemma 17, such a measurement can be chosen to be finegrained. Therefore we find:

A.1.6. Proof of theorem 12

As mentioned in the main text, the equivalences $({\rm{i}})\iff ({\rm{iii}})\iff ({\rm{iv}})\iff ({\rm{v}})$ are shown in [43]. We will now prove the equivalence $({\rm{ii}})\iff ({\rm{v}})$, which proves theorem 12. Taking into account theorem 10, and formulating the atomic covering property in the context of theories that satisfy postulates 1 and 2, it remains to show the equivalence of the following two statements:

  • For all states $\omega \in {{\rm{\Omega }}}_{A}$, we have ${\check{S}}_{0}(\omega )\geqslant {S}_{0}(\omega )$.
  • If F is any face of ${A}_{+}$, and ω is any pure state, then the smallest face G that contains both F and ω has rank $| G| \leqslant | F| +1$. (Note that this is trivial if $\omega \perp F$.)

We will first prove that $({\rm{ii}}^{\prime} )\Rightarrow ({\rm{v}}^{\prime} )$, which is equivalent to $\neg ({\rm{v}}^{\prime} )\Rightarrow \neg ({\rm{ii}}^{\prime} )$. So suppose that there exists some face F of ${A}_{+}$ and a pure state ω such that the face G generated by both has rank $| G| \geqslant | F| +2$. Let ${\omega }_{1},\ldots ,{\omega }_{| F| }$ be a frame that generates the face F. Then F is also generated by $\nu := \tfrac{1}{| F| }{\sum }_{j=1}^{| F| }{\omega }_{j}$, i.e. the normalised projective unit of F. This is because every face containing ν also contains all the ${\omega }_{j}$ (and vice versa), and F is the smallest face with this property.

Now consider the state $\tfrac{1}{2}\omega +\tfrac{1}{2}\nu $. The smallest face that contains this state must be G. If this state had a decomposition into $| F| +1$ or fewer perfectly distinguishable pure states, then these would also generate G, and so $| G| \leqslant | F| +1$, in contradiction to our assumption. Thus any decomposition of $\tfrac{1}{2}\omega +\tfrac{1}{2}\nu $ into perfectly distinguishable pure states uses at least $| F| +2$ states with non-zero coefficients, i.e. ${S}_{0}(\tfrac{1}{2}\omega +\tfrac{1}{2}\nu )\geqslant \mathrm{log}(| F| +2)$. But $\tfrac{1}{2}\omega +\tfrac{1}{2}\nu =\tfrac{1}{2| F| }{\sum }_{j=1}^{| F| }{\omega }_{j}+\tfrac{1}{2}\omega $ is a convex decomposition into $| F| +1$ pure states, thus

It remains to show that $({\rm{v}}^{\prime} )\Rightarrow ({\rm{ii}}^{\prime} )$. So suppose that (v') holds, but that there is a state $\omega \in {{\rm{\Omega }}}_{A}$ with ${\check{S}}_{0}(\omega )\lt {S}_{0}(\omega );$ we will show that this leads to a contradiction. By definition of ${\check{S}}_{0}$, if this is the case, then there exist pure states ${\omega }_{1},\ldots ,{\omega }_{n}$ with $n=\exp ({\check{S}}_{0}(\omega ))$ and ${p}_{1},\ldots ,{p}_{n}\geqslant 0$, ${\sum }_{i}{p}_{i}=1$, such that $\omega ={\sum }_{i=1}^{n}{p}_{i}{\omega }_{i}$. Using property (v'), and recursively looking at the faces generated by ${\omega }_{1}$, generated by ${\omega }_{1},{\omega }_{2}$, generated by ${\omega }_{1},{\omega }_{2},{\omega }_{3}$ and so forth, shows that the rank of the face G generated by ${\omega }_{1},\ldots ,{\omega }_{n}$ can be at most n. Since $\omega \in G$, this shows that ω can be decomposed into n or fewer pure perfectly distinguishable states. Therefore ${S}_{0}(\omega )\leqslant \mathrm{log}n={\check{S}}_{0}(\omega )$.□

A.3. Some additional remarks on theorem 12 and quantum logic

The GPT framework has a close relation to quantum logic. This is not surprising, since much of the terminology of GPTs has appeared much earlier, in work on quantum logic and beyond. The approach via convex sets of states and observables can be traced back to Mackey [18] (who immediately made connections to quantum logic), and was developed further through the 1960s and beyond. A partial list of references includes [14, 20, 88], the last two of which offer axiomatic characterisations of quantum theory. Interaction with the quantum logic tradition continued, with the orthomodularity of the lattice of faces of the state and/or effect spaces often providing a point of contact, especially in Ludwig's work [20]. Also closely related to the convex sets approach was the work of Foulis and Randall [1013] who, for example, studied ways of combining probabilistic systems.

Since postulates 1 and 2 imply that the state cone ${A}_{+}$ is self-dual (so coincides with the effect cone), and that each face of this cone is the intersection of the cone with the image of a filter (equivalently given self-duality, a compresssion in the sense of [24]), we have from these postulates alone, via e.g. [24], theorem 8.10, that the face lattice is orthomodular. The notion of orthomodular partially ordered set, or its special case, the notion of orthomodular lattice, is often taken to define the notion of quantum logic, so we can say that postulates 1 and 2 imply that the face lattice of the cone of states, (equivalently of the cone of effects, or of the set of normalised states) is a quantum logic.

The covering property, in its most common variant the atomic covering property, states that for any element x of the lattice and any atom a not below or equal to x, $a\vee x$ covers x17 .The equivalence of (iv) and (v), in a setting more general than postulates 1 and 2, is proposition 9.7 of [24] (first appearing in proposition 4.2 in [25] and the discussion preceding it). Along with orthomodularity, the covering law was one of the assumptions of Piron's famous lattice-theoretic characterisation ([94]; also [17] and see the discussion in [27] or for more detail and proofs, [26 pp 18–38, 114–122]) of a class of lattices close to, although larger than, that of real, complex, and quaternionic quantum theory. A generalisation of the covering law was also used in Ludwig's axiomatization (see e.g. [20]18 ) of quantum theory within the convex sets framework, in which the relevant lattice is a lattice of faces of the state space (equivalent to a lattice of extremal effects in his context), and the result characterised real, complex, and quaternionic quantum theory19 .

Footnotes

  • Some authors have recently begun referring to instruments as operations, but long-standing convention in quantum information theory (including [50]) uses the term 'operation' for the quantum case of what we are calling transformations (which are completely positive maps). Also, Davies and Lewis [84] define instrument more generally, to allow for continuously-indexed transformations, where we only consider finite collections Tj.

  • 10 

    Our thought experiment is identical to von Neumann's, up to two differences: first, we translate all quantum notions to more general GPT notions; second, while von Neumann implements the transition from (5) to (6) in figure 2 via sequences of projections, we implement this transition directly via reversible transformations.

  • 11 

    For a more detailed discussion of the physical properties of these small boxes, we refer the reader to von Neumann's original work [41].

  • 12 

    Here, von Neumann's thought experiment is formulated in terms of a frequentist view on probabilities, which is standard in most treatments on thermodynamics. A treatment involving a finite ensemble where the frequencies (and perhaps the total particle number) are stochastic might seem more suitable from a Bayesian point of view; it would likely raise issues about whether the amount of work extracted from a finite system is subject to fluctuations. For systems that are finite or out of equilibrium, measures such as Shannon's are known not to be the whole story (see [30] and references therein). But even for finite systems with a more realistic treatment of uncertainty about particle numbers, the von Neumann entropy still gives the expected work in the protocol he considers. We defer these issues to future work, although we note that [30] suggests the operational entropies discussed in section 5.6 are among the relevant tools for tackling them.

  • 13 

    Here we only consider ensembles of identical Hilbert space dimensions. If the dimensions are different (say, 2 versus 3), then one can implement different sets of protocols on the ensembles (say, ones involving semipermeable membranes that distinguish 3 alternatives in the latter, but not the former case). One could then still discuss a notion of universality in Carnot's spirit, by referring to the equivalence of, say, a state space with N = 3 alternatives to a subspace of a state space with $N=2\times 2$ alternatives, but we will not discuss this further here.

  • 14 

    In classical thermodynamics, the analogue of a choice of basis is the labelling of the distinguishable configurations. Clearly, the availability of thermodynamic protocols does not change under relabelling.

  • 15 

    A face of a convex set C is a convex subset $F\subseteq C$ with the property that $\lambda x+(1-\lambda )y\in F$ with $0\lt \lambda \lt 1$ and $x,y\in C$ implies $x,y\in F$ [89]. We say that F is generated by ${\omega }_{1},\ldots ,{\omega }_{n}$ if F is the smallest face that contains ${\omega }_{1},\ldots ,{\omega }_{n}$.

  • 16 

    This can also also obtained by combining the fact that postulates 1 and 2 imply the state space is projective (first part of theorem 17 in [43]) and self-dual (proposition 3 in [43]) with results such as theorem 8.64 in [24].

  • 17 

    Here 'y covers x' means '$y\geqslant x$, $y\ne x$, and there is no z distinct from x and y such that $y\geqslant z\geqslant x$, i.e. 'y is above x with nothing in between. An atom is an element that covers 0.

  • 18 

    The results in [20] were mostly obtained in a series of papers in the late 1960s and early 1970s.

  • 19 

    Both Piron and Ludwig also made an atomicity assumption (which may be considered more technical than substantive, and always holds in finite dimension) and also assumed lattice dimension 4 or greater, so Hilbert spaces of dimension 3 or less were not dealt with, nor were spin factors or the exceptional Jordan algebra. These low-dimensional cases also satisfy Piron's, and Ludwig's premises, but a theorem ruling out other instances satisfying them appears to be lacking.

Please wait… references are loading.