INTRODUCTION
For many decades, models and accounts of masking were based on the power spectrum of the masker and signal, as processed by the amplitude characteristics of a bank of putative auditory filters, and on the signal-to-noise ratio at the outputs of those filters (Fletcher,
1940; Patterson,
1976; Glasberg and Moore,
1990). Although such models account for a wide range of data, more recent accounts have focussed on a number of important failures of the power spectrum model. These include the effects of masker uncertainty and other higher-level aspects of sound (“informational masking”), the effects of masker-to-signal onset delay in simultaneous masking (i.e. the “overshoot effect”), and the non-linear growth of masking observed when the signal has a frequency substantially higher than that of the masker (Zwicker,
1965; Neff and Callaghan,
1987; Jennings et al.,
2011; Yasin et al.,
2013). In contrast to the operational flavour of early models, contemporary accounts of auditory processing are often inspired by, and explicitly include, biological phenomena such as cochlear non-linearity, neural adaptation, and, sometimes, the operation of the efferent system (Patterson et al.,
1995; Lopez-Poveda and Meddis,
2001; Meddis and O'Mard,
2006; Wojtczak and Oxenham,
2009; Jennings et al.,
2011).
The present study is concerned with an important phenomenon that has inspired considerable experimentation aimed at unearthing its physiological basis (Smith et al.,
1986; Kohlrausch and Sander,
1995). Smith et al. (
1986) compared the simultaneous masking of a pure tone by two maskers that had identical power spectra and consisted of
N consecutive components of a harmonic complex. The components of the two maskers were summed in so-called positive and negative Schroeder phase (Schroeder,
1970), corresponding to values of +1 and −1 for the parameter
C in Eq.
1, where
N is the total number of components and
θ
n is the phase of the
nth component:
$$ {\theta}_{\mathrm{n}}= C\pi n\left(n-1\right)/N $$
(1)
For sinusoidal signals longer than the period of the masker, the masker with the positive curvature (
C = 1; “
S+”) produced less masking than the one with the negative curvature (
C = −1; “
S−”). Similar findings were subsequently obtained by Kohlrausch and Sander (
1995), who also found that thresholds for brief tones varied markedly throughout the period of the
S+ but not the
S− masker. They concluded that the
S+ masker had a phase curvature that was opposite to that of the auditory filter centred on the signal frequency, which therefore must be negative. They proposed that this led to components having roughly equal phase at the output of the auditory filter, leading to a “peaked” response. This result has since been replicated and extended to other levels and probe frequencies (Carlyon and Datta,
1997b; Summers and Leek,
1998; Summers,
2000; Oxenham and Dau,
2001a,
b,
2004). Subsequent studies have obtained a more fine-grained estimate of auditory filter phase curvature by measuring masking for a number of values of
C in Eq.
1 (Lentz and Leek,
2001; Oxenham and Dau,
2001b; Oxenham and Ewert,
2005).
When the masker and signal are presented simultaneously, differences in masking produced by
S+ and
S− maskers could arise for several reasons. These include listening in the dips in the modulated neural response to the
S+ masker and the operation of cochlear non-linearities such as the greater suppression of the probe tone by
S− than by
S+ maskers (Recio and Rhode,
2000; Oxenham and Dau,
2004). Carlyon and Datta (
1997a) argued that forward masking would not be influenced by dip listening, and that a comparison of forward masking by
S+ and
S− complexes could provide an estimate of the average amount of excitation produced by each masker in the auditory filter centred on the signal and therefore of fast-acting compression in the auditory system. They reasoned that, when the auditory filter output was highly peaked, compression would reduce the amplitude of those peaks, and that this effect would be larger than for a stimulus where the auditory filter outputs had a flat envelope. Consistent with this prediction, forward-masked thresholds were substantially lower for the
S+ masker than for the
S− masker. The size of this difference was greatest at the highest masker level (69-dB SPL/component) tested and decreased at lower masker levels. Carlyon and Datta noted that this was consistent with a role for basilar-membrane compression, which is also reduced at low levels. However, they also noted that auditory filter bandwidths are narrowest at low levels, and that the peaked filter outputs require the interaction of many components and therefore a wide filter bandwidth. They additionally pointed out that their results could be influenced by fast-acting compression at any stage of auditory processing, provided that the compression was faster than the 10-ms period of their maskers and prior to the processing stage at which detection occurred. Subsequently, Gockel et al. (
2003) reported greater forward masking by a random-phase than by a cosine-phase harmonic complex and interpreted their results in terms of the cosine-phase complex undergoing greater compression and, additionally, greater mutual suppression between the masker components. Because compression and suppression are manifestations of the same basilar membrane (BM) non-linearity, we will use the term “BM non-linearity” throughout most of the rest of this article and will discuss the relationship between the two phenomena in the “
Discussion.”
More recently, Wojtczak and Oxenham (
2009) proposed an alternative mechanism that could lead to effects of masker phase on forward-masked thresholds, without those phase effects necessarily influencing the amount of masker excitation at the signal frequency. They measured the effect of phase curvature for signal frequencies of 1, 2, and 6 kHz, both for on-frequency maskers, where the masker spectrum encompassed the signal frequency, and off-frequency maskers, where the frequencies of the masker components were all below the signal frequency. The results for the on-frequency maskers replicated and extended Carlyon and Datta’s (
1997a) results. The off-frequency maskers also showed a strong effect of phase curvature but only for the 6-kHz signal frequency, even though the highest masker component was more than half an octave below that of the signal. This effect was reduced when the masker duration was reduced from 200 to 30 ms. Wojtczak and Oxenham (
2009) argued that this off-frequency phase effect could not be explained by BM compression, because of evidence that the BM response at a given place is linear when driven by sufficiently lower-frequency components (e.g. Ruggero et al.,
1997). They also dismissed an explanation in terms of fast-acting compression central to the BM, because this could not explain the fact that no phase effect was observed for off-frequency maskers at low signal frequencies. Instead, they attributed the effect of phase on off-frequency masking to the operation of the medial olivocochlear reflex (MOCR), whose activation was proposed to be dependent on the response of auditory nerve fibres tuned to the masker. They assumed that this activation was greater when that “internal response” had a flat envelope than when it had a peaky envelope, and that, for the 200-ms masker, this MOCR activation reduced the neural response to the signal. The fact that the phase of off-frequency maskers affected forward masking only at the 6-kHz signal frequency was explained by citing evidence for greater efferent activation at high frequencies (Kawase et al.,
1993; Kawase and Liberman,
1993). Note that, according to this explanation, masker phase could affect the response to the signal without necessarily affecting the average amount of neural activity elicited by the masker in the auditory filter centred on the signal. Another potential mechanism, the middle-ear muscle (MEM) reflex, could in principle also allow masker phase to influence thresholds by virtue of its effect on masker excitation in a frequency region remote from that of the signal. A subsequent study (Wojtczak et al.,
2015) combined behavioural measures with recordings of oto-acoustic emissions. The results were complex to interpret, and the authors concluded that there was no strong evidence either for an effect of the MOCR or of the MEM reflex.
The present study provides a further investigation of the effects of phase curvature on non-simultaneous masking and, in particular, investigates the potential role of the efferent system. It imposes some further constraints on the possible physiological mechanisms and reaches conclusions that differ from those of Wojtczak and Oxenham (
2009). Experiment 1 replicated their main findings, including the effect of masker phase curvature and duration on the off-frequency masking of a 6-kHz signal in forward masking. This off-frequency effect was variable across listeners but statistically significant. Experiment 2 re-examined the conclusion that off-frequency effects only occur at high signal frequencies. By manipulating the fundamental frequency (F0) of off-frequency maskers, it showed that a substantial phase effect can be obtained at lower signal frequencies, provided that there are enough components in the off-frequency masker. Experiment 3 measured the effect of phase curvature of an off-frequency masker on the detection of a 6-kHz signal in
backward masking. The phase effect was also variable across listeners but correlated with the effect of phase on forward masking in the same group of listeners. This cannot be due to the operation of the MOCR or of the MEM. We conclude that although the exact mechanism underlying the effect of masker phase on forward-masked thresholds are not known, the data are consistent with a combination of the BM non-linearity and, possibly, more central compression.