Research PaperThe perception of octave pitch affinity and harmonic fusion have a common origin
Introduction
Humans enjoy melody, "the essential basis of music" in the words of Helmholtz (1863/1954). A melody is a sequence of periodic sounds with specific frequency ratios, forming musical intervals that are perceived as pitch relations. The precision with which these intervals are perceived is of course limited; it depends on the listener's musical training, the intervals themselves, and other factors (Burns and Ward, 1978; Rakowski, 1990; Perlman and Krumhansl, 1996; McDermott et al., 2010; McClaskey, 2017; Graves and Oxenham, 2017). However, in the Western world at least, even people with no substantial musical education readily detect an error of only one semitone (corresponding to a frequency change of about 6%) in the production of one note of a familiar melody (Dowling and Fujitani, 1971; Trainor and Trehub, 1994).
Throughout the human auditory system, up to the cortical level, frequency is represented tonotopically, along unidimensional neural maps (Romani et al., 1982; Talavage et al., 2004). A straightforward hypothesis, therefore, is that the representation of a melodic interval in the auditory system is simply a distance between neural excitations along an axis representing pitch as a logarithmic function of frequency. This would suggest that there is no "physiologically special" melodic interval (apart from the unison). Psychophysical results in line with this hypothesis were obtained by Kallman (1982, experiment 1), who required ordinary Western students to rate the similarity of successive pure tones as a function of their frequency ratio. The ratings smoothly decreased as the frequency ratio varied in small logarithmic steps from 1:1 to about 5:1. Remarkably, no local peak was observed for the simple frequency ratio 2:1, i.e., one octave, even though in the Western musical system two notes forming an octave interval bear the same name and are treated as equivalent sounds (Krumhansl and Shepard, 1979). Analogous findings were reported by Hoeschele et al. (2012, experiment 1).
However, at odds with these results, a number of other experiments have suggested that for a substantial proportion of human listeners, two tones forming a small-integer frequency ratio have a distinctive affinity (or similarity) in pitch. Ratios such as 3:2, 4:3, or 5:4 have been used in some of these experiments (Cohen et al., 1987; Schellenberg and Trehub, 1994, 1996), but the ratio most often used was 2:1, one octave. The demonstrations of octave pitch affinity (OPA) have been based on a variety of methodologies (Deutsch, 1973; Idson and Massaro, 1978; Kallman and Massaro, 1979; Massaro et al., 1980; Demany and Armand, 1984; Hoeschele et al., 2012; Borra et al., 2013; Jacoby et al., 2019).1 In the eight studies that we just cited, OPA was observed using pure-tone stimuli. This is an important detail since the peripheral auditory system behaves as a spectrum analyzer (Schnupp et al., 2012). Ordinary periodic sounds are instead complex tones, and thus consist of a sum of harmonics with frequencies equal to integer multiples of a given fundamental frequency. Consequently, two complex tones one octave apart typically have common spectral components. In addition, the pitch of certain complex tones is subject to octave ambiguities (Terhardt et al., 1982, 1986), which could explain the perception of an affinity between such tones when their fundamental frequencies are one octave apart (Regev et al., 2019). The phenomenon of OPA is more intriguing when it is observed for sounds with no common spectral component and an unambiguous pitch, such as pure tones.
The origin of OPA, for sounds such as pure tones, is the subject of a basic controversy. On one side of the debate, it is contended that OPA is essentially the consequence of an acculturation process (Burns and Ward, 1982; Sergeant, 1983; Jacoby et al., 2019). According to this culturalist hypothesis, Western listeners exhibit OPA because they have learned, consciously or unconsciously, a musical grammar in which tones one octave apart are functionally equivalent. Arbitrary musical grammars can be learned quite rapidly, by mere passive exposure to sound sequences constructed from these grammars (Loui et al., 2010; Rohrmeier et al., 2011). The musical rule of octave equivalence is certainly not arbitrary, because this rule is culturally widespread (Dowling and Harwood, 1986; Brown and Jordania, 2011). However, its main origin might be unrelated to the perception of pitch relations (Burns and Ward, 1982; McPherson et al., 2020). The rule might originate from the mere fact that the sum of two complex tones one octave apart is a single complex tone, with the same period as one of the two added tones. The culturalist explanation of OPA is consistent with the fact that, within the Western adult population, sensitivity to OPA appears to be stronger in musicians than in non-musicians (Allen, 1967; Demany and Armand, 1984; Jacoby et al., 2019), although this could of course be due to an influence of sensitivity to OPA on the willingness to become a musician. Jacoby et al. (2019) suggested in addition that the Tsimane', an Amazonian population living in isolation from Western culture, are completely insensitive to OPA. Western children tested by Sergeant (1983) showed a similar insensitivity and this led the author to assert that OPA was a "concept" rather than a percept. In line with such a view, Regev et al. (2019) found that musically educated listeners who were able to identify an octave interval as such did not manifest a sensitivity to OPA when their brain response to pitch changes was assessed via the "mismatch negativity" evoked potential.
On the other side of the debate, it is contended that OPA originates from physiological processes that are essentially independent of the cultural environment. The experimental evidence supporting this general hypothesis is currently very limited. Sensitivity to OPA has been found in two studies on non-human animals (Blackwell and Schlosberg, 1943; Wright et al., 2000); but the stimuli used by Wright et al. were spectrally rich periodic sounds. Using instead pure tones, Demany and Armand (1984) obtained results suggesting that OPA exists, and is even strong, in 3-month-old human infants. Another argument was put forth by Terhardt (1971, 1974, 1987). In his view, OPA originates from a learning process, but not from musical acculturation: what is learned is the harmonic structure of natural periodic sounds, such as human vocalizations. Due to this learning process, the pitch interval corresponding to a subjectively perfect melodic octave is the pitch interval of harmonics with a frequency ratio of 2:1 in natural periodic sounds. A well-established fact is that when musically educated listeners are requested to set two successive pure tones exactly one octave apart by adjusting their frequency ratio, the obtained ratio is generally slightly larger than 2:1 (Ward, 1954; Ohgushi, 1983; Demany and Semal, 1990; Hartmann, 1993; Rosner, 1999). Terhardt argued that this apparent anomaly ‒ often called the "octave enlargement" effect ‒ can be explained by small repulsive interactions between the representations of simultaneous pure tones in the periphery of the auditory system. He found confirmation of this hypothesis in precise measurements of the pitch of individual spectral components of complex tones. However, Peters et al. (1983) and Hartmann and Doty (1996) failed to replicate Terhardt's observations: they found that the pitch of a complex tone component is not significantly affected by the other components. Their work thus cast serious doubts on the validity of Terhardt's ideas about OPA.
Here, we report new evidence that OPA has a natural basis. More precisely, our study indicates that even for musically educated Western listeners, the pitch interval defining a subjectively perfect melodic octave is largely determined by universal auditory processes rather than by cultural factors. Our essential finding is that the perception of OPA is closely linked to the auditory phenomenon of harmonic fusion. A periodic complex tone is normally heard as a single sound, with a single pitch (related to the fundamental frequency). Yet, it is initially represented in the auditory system as a set of harmonics that, in isolation, evoke different pitches. Their subsequent fusion involves a detection of small-integer frequency ratios ("harmonicity"). When, for example, a 800-Hz harmonic is mistuned by 5% in a complex tone with a 400-Hz fundamental frequency, adult Western listeners perceive two sounds rather than one: the mistuned harmonic is heard as a pure tone standing out of a complex tone (Moore et al., 1986; Hartmann et al., 1990). Harmonic fusion is thought to be helpful in everyday life because real-world acoustic scenes often include simultaneous periodic sounds, produced by separate sources and differing in fundamental frequency; the perceptual segregation of such sounds requires a grouping of their respective spectral components (Bregman, 1990; de Cheveigné, 1997; Kidd et al., 2003; Carlyon and Gockel, 2008; Micheyl and Oxenham, 2010; Popham et al., 2018). Harmonic fusion is apparently operative in newborn infants (Bendixen et al., 2015), in Amazonian listeners isolated from Western culture (McDermott et al., 2016; McPherson et al., 2020), and in at least some non-human mammals (Tomlinson and Schwarz, 1988; Kalluri et al., 2008; Song et al., 2016). Moreover, neural correlates of this perceptual phenomenon have been found in the auditory cortex of monkeys (Fishman and Steinschneider, 2010; Fishman et al., 2014; Feng and Wang, 2017). Thus, harmonic fusion clearly has a natural basis. This should also be the case for OPA if OPA is closely linked to harmonic fusion.
In all but three of the past studies concerning OPA and harmonic fusion, these two phenomena have been investigated in isolation. Interestingly, a similar asymmetry was observed in both cases. First, the "octave enlargement" effect mentioned above suggests that OPA is generally stronger slightly above the physical octave (2:1) than slightly below it. Second, when the listeners' task was to detect small octave mistunings in stimuli consisting of simultaneous pure tones, performance was found to be generally poorer when the octave was stretched than when it was compressed, thus suggesting that harmonic fusion is more tolerant to stretchings than to compressions (Demany et al., 1991; Borchert et al., 2011; Bonnard et al., 2013, 2017). From this resemblance, one could suspect the existence of a link between OPA and harmonic fusion. However, the three studies in which the two phenomena were examined jointly, in the same listeners, did not provide evidence for such a link (Demany and Semal, 1990; Bonnard et al., 2013, 2016). In the present study, OPA and harmonic fusion were again investigated in the same listeners, but with a new methodology. We provide evidence that the two phenomena are linked by showing that the perception of OPA by a given listener is highly correlated with the perception of harmonic fusion by the same listener.
Section snippets
Conditions and stimuli
In this experiment, as well as experiments 2, 5, and 6, we measured the perceptual detectability of octave mistunings, i.e., deviations from a frequency ratio of 2:1, in cyclical sound sequences. Each sequence was built from two short stimuli: (1) a pure tone (T1); (2) a sum of two simultaneous pure tones with higher frequencies that were always exactly one octave apart (T2+T3). Each stimulus had a total duration of 130 ms and was gated on and off with 5-ms raised-cosine amplitude ramps. There
Method
In experiment 1, mistuning detection was easier in SIM than in ALT, as revealed by the fact that Δ had to be larger in ALT than in SIM to get a similar level of performance. Experiment 2 confirmed that the SIM condition was easier than the ALT condition, and determined whether the perceptual advantage provided by a simultaneous presentation of T1 and T2+T3 could be obtained if the simultaneity was illusory rather than real.
Four conditions were employed. In two of them, the sound sequences were
Experiment 3
To check that T1 and T2+T3 were perceived as simultaneous in ALTnoise, we firstly verified that the noise bands were of a sufficiently high level to elicit a continuity illusion. In experiment 3, the 12 listeners who had completed experiment 2 were presented with ALTnoise and SIMnoise sequences in which the level difference between the noise bands and the tones (+8 dB in experiment 2) was now adjustable. The task was to set the noise bands (as a whole) to the level just sufficient for the
Rationale and method
In the experiments described above, mistuning detection was investigated in a limited frequency register: the frequency of T1 varied between 300 and 600 Hz. Experiment 6 essentially replicated the ALTnoise and SIMnoise conditions of experiment 2 with two new ranges of T1 frequency: a "low" register, 200–300 Hz, and a "high" register, 1200–1800 Hz. In the low register, there was no a priori reason to expect results very different from those of experiment 2. However, previous research suggested
General discussion
In the present study, we investigated the perceptual detectability of octave mistunings via two subjectively quite different cues: OPA (for tones presented sequentially) and harmonic fusion (for tones presented simultaneously). Our results demonstrate, in a population of musically educated Western listeners, the existence of an intimate link between OPA and harmonic fusion. Since harmonic fusion undoubtedly originates from physiological processes taking place in every human auditory system, we
Declaration of Competing Interests
None.
Acknowledgements
We thank Josh McDermott, Peter Cariani, Alain de Cheveigné, and an anonymous reviewer for discussions and/or comments on a previous version of the manuscript. This research was partly funded by an MRC Core Award G101400 to author RPC.
References (103)
- et al.
Harmonic fusion and pitch affinity: is there a direct link?
Hear. Res.
(2016) - et al.
Intervals, scales, and tuning
- et al.
Pitch perception: a difference between right- and left-handed listeners
Neuropsychologia
(1998) Standard-interval size affects interval-discrimination thresholds for pure-tone melodic pitch intervals
Hear. Res.
(2017)- et al.
Pitch, harmonicity and concurrent sound segregation: psychoacoustical and neurophysiological findings
Hear. Res.
(2010) - et al.
Incidental and online learning of melodic structure
Conscious. Cogn.
(2011) - et al.
The upper frequency limit for the use of phase locking to code temporal fine structure in humans: a compilation of viewpoints
Hear. Res.
(2019) Octave discriminability of musical and non-musical subjects
Psychon. Sci.
(1967)- et al.
A model of perceptual segregation based on clustering the time series of the simulated auditory nerve firing probability
Biol. Cybern.
(2007) - et al.
Newborn infants detect cues of concurrent sound segregation
Dev. Neurosci.
(2015)