Audio sampling rates and the Fourier transform
Christopher Montgomery (“Monty”) recently posted an excellent argument against distributing music in 192 kHz, 24-bit form, as opposed to the usual 44.1 kHz (or 48 kHz), 16-bit form. I think, however, that many of the people who are inclined to doubt this sort of thing are going to doubt it at a much more fundamental level than the level he’s addressed it at. And I don’t just mean the math-phobic; I know I would have doubted it, once. For years, and even after finishing an undergraduate degree in electrical engineering, I wondered whether speaking of signals in terms of their frequency content was really something that could be done as glibly and freely as everyone seemed to assume it could be. It’s an assumption that pervades Monty’s argument — for instance, when he states that “all signals with content entirely below the Nyquist frequency (half the sampling rate) are captured perfectly and completely by sampling”. If you don’t believe in speaking of signals in terms of their frequency content, you won’t know what to make of that sentence.
As it happens, the assumption is completely correct, and the glibness and freeness with which people talk of the frequency domain is completely justified; but it originally took some serious proving by mathematicians. To summarize the main results, first of all, the Fourier transform of a signal is unique. When you’ve found one series of sine waves and cosine waves that when added together are equal to your signal, there is no other; you’ve found the only one. (Fourier transforms are usually done in terms of complex exponentials, but when one is dealing with real signals, they all boil down to sines and cosines; the imaginary numbers disappear in the final results.) If you construct a signal from sinusoids of frequencies below 20 kHz, there’s no possibility of someone else analyzing it some other way and finding frequencies higher than that in it — unless, of course, he does it wrong (an ever-present danger).
Also, the Fourier representation is complete: any signal can be exactly represented as a sum of sinusoids (generally an infinite sum of them, or an integral which is the limit of an infinite sum of them). There are no signals out there which defy Fourier analysis, and which might be left out entirely when one speaks of the “frequency content” of a signal. Even signals that look nothing like sine waves can be constructed from sine waves, though in that case it takes more of them to approximate the signal well.
But the main thing that makes it possible to be so glib about the frequency domain is that the Fourier transform is orthogonal. (Or in its complex-exponential variants, unitary, which is the corresponding concept for complex numbers.) What it means for a transform to be orthogonal can be illustrated by the example of coordinate transforms in three-dimensional space. In general, a coordinate transform of a three-dimensional object may twist it, bend it, or stretch it, but an orthogonal transform can only rotate it and possibly flip it over to its mirror image. When viewing 3D objects on a computer screen, applying an orthogonal transform just results in looking at the same object from a different angle; it doesn’t fundamentally change the object. At most it might flip the ‘handedness’, changing a right hand into a left hand or vice versa. In the Fourier transform there are not just three numbers (the three coordinates) being transformed but an infinite number of them: one continuous function (the signal) is being transformed into another continuous function (its spectrum); but again, orthogonality means that sizes are preserved. The “size”, in this case, is the total energy of the signal (or its square root — what mathematicians call the L2 norm, and engineers call the root-mean-square). Applying that measure to the signal yields the same result as does applying the same measure to its spectrum. This means that one can speak of the energy in different frequency bands as being something that adds together to give the total energy, just as one speaks of the energy in different time intervals as being something that adds up to give the total energy — which of course is the same whether one adds it up in the time domain or the frequency domain. This also applies, of course, to differences between signals: if you make a change to a signal, the size of the change is the same in the frequency domain as in the time domain. With a transform that was not orthogonal, a small change to the signal might mean a large change in its transform, or vice versa. This would make it much harder to work with the transform; you would constantly have to be looking over your shoulder to make sure that the math was not about to stab you in the back. As it is, it’s a reliable servant that can be taken for granted. As in the case of 3D coordinate transforms, but in a vaguer sense, the Fourier transform is just a different way of looking at the same signal (“looking at it in the frequency domain”), not something that warps or distorts it.
Engineers these days seem to go mostly by shared experience, in feeling comfortable with the Fourier transform: it hasn’t stabbed any of their fellow-professionals in the back, so it probably won’t do so for them, either. But as a student, I didn’t feel comfortable until I’d seen proofs of the results described above. In general, learning from experience means learning a lot of things the hard way; that just happens not to be so in this particular case: there are no unpleasant surprises lurking.
Now, when trying to use the Fourier transform on a computer, things do get somewhat more complicated, and there can be unpleasant surprises. Computers don’t naturally do the Fourier transform in its continuous-function version; instead they do discrete variants of it. When it comes to those discrete variants, it is possible to feed them a sine wave of a single frequency and get back an analysis saying that it contains not that frequency but all sorts of other frequencies: all you have to do is to make the original sine wave not be periodic on the interval you’re analyzing it on. But that is a practical problem for numerical programmers who want to use the Fourier transform in their algorithms; it’s not a problem with the continuous version of the Fourier transform, in which one always considers the entire signal, rather than chopping it at the beginning and end of some interval. It is that chopping which introduces the spurious frequencies; and in contexts where this results in a practical problem, there are usually ways to solve it, or at least greatly mitigate it; these commonly involve phasing the signal in and out slowly, rather than abruptly chopping it. In any case, it’s a limitation of computers doing Fourier transforms, not a limitation of computers playing audio from digital samples — a process which need not involve the computation of any Fourier transforms.
Much more could be said about the Fourier transform, of course, but the above are some of the main reasons why it is so useful in such a wide variety of applications (of which audio is just one).
Having explained why sentences like
“All signals with content entirely below the Nyquist frequency (half the sampling rate) are captured perfectly and completely by sampling”
are meaningful, and not merely some sort of mathematical shell game, a few words about Monty’s essay itself. As regards the ability of modern computer audio systems to reproduce everything up to the Nyquist limit, I happen to have been sending sine waves through an audio card recently — and not any kind of fancy audio device, just five-year-old motherboard audio, albeit motherboard audio for which I’d paid a premium of something like $4 over a nearly-equivalent motherboard of the same brand with lesser audio. This particular motherboard audio does 192 kHz sample rates, and I was testing it with sine waves of up to the Nyquist frequency (96 kHz). Graphed in Audacity, which shows signals by drawing straight lines between the sample points, the signals looked very little like a sine wave. But when I looked at the output on an oscilloscope with a much higher sample rate, it was a perfect sine wave. Above 75 kHz, the signal’s amplitude started decreasing, until at 90 kHz it was only about a third of normal; but it still looked like a perfect sine wave. Reproducing a sine wave given only three points per wavelength is something of a trick, but it’s a trick my system can and does pull off, exactly as per Monty’s claims. Accurate reproduction of things only dogs can hear, in case one wants to torture the neighboorhood pooch with extremely precise torturing sounds! (Or in my case, in case one wants to do some capacitor ESR testing.)
The limits of audio perception are not something where I’ve looked into
the literature much, but I have no reason to doubt what Monty says about
it. Something I did wonder, after reading his essay, though, was: what
about intermodulation distortion in the ear itself? That is, distortion
of the same sort that he describes in amplifiers and speakers. Being
made of meat, the human ear is
perfectly linear; and pretty much any nonlinearity gives some amount of
intermodulation distortion. Unlike in the case of intermodulation
distortion in audio equipment, though, this would be natural
intermodulation distortion: if, for instance, one heard a violin being
played in the same room, one would be hearing whatever intermodulation
distortion resulted in the ear from its ultrasonic frequencies; those
would thus comprise part of the natural sound of a violin, and
reproducing them thus could be useful. Also, nonlinearities can be
complicated: any given audio sample might not excite some particular
nonlinearities that might nevertheless be excited by a different sort of
music. But as the hypothetical language (“could”, “would”) indicates,
these are theoretical possibilities, which can be put to rest by
appropriate experiments. As per a test Monty links to, which was
“constructed to maximize the possibility of detection by placing the
intermodulation products where they’d be most audible” — and
nevertheless found that ultrasonics made no audible difference. I only
took note of that sentence on re-reading; but this
nonlinearity-in-the-ear idea is what that test was designed to check for.
Poking around at the Hydrogen Audio forums, the explanation for why nonlinearity in the ear doesn’t produce audible lower frequencies seems to be that:
- Ultrasonics get highly attenuated in the outer parts of the ear, before they could do much in the way of intermodulation distortion. (It’s quite common for higher frequencies to get attenuated more, even in air; this is why a nearby explosion is heard as a “crack”, but a far-off one is more of a boom.)
- Intermodulation distortion then imposes a further attenuation, the spurious frequencies introduced by distortion having much less energy than the original frequencies.
- Generally in music the ultrasonic parts are at a lower volume than the audible parts to begin with.
Multiply these three effects together, or even just the first two of them, and perhaps one always gets something too small to be heard. In any case, as Monty states, it’s impossible to absolutely prove that nobody can hear ultrasonics even in the most specially-constructed audio tracks. But when one is considering this sort of thing as a commercial proposition, the question is not whether exceptional freaks might exist, but what the averages are.
(Update: Monty tells me that contrary to what I’d originally stated above, “by most measures the ear is quite linear”, and “exhibits low harmonic distortion figures and so virtually no intermodulation.” The text above has been corrected accordingly. I’d seen references to nonlinearity in the hair cells; and it’d be hard to avoid it in neurons; but those are after the frequencies have been sorted out.)
The book Democracy in America, by Alexis de Tocqueville, is widely recommended to those wishing to know about the US political system. Personally, I tried to read it at one point, but found it boring, and only got through fifty pages or so. Yet I devoured from cover to cover Tocqueville’s later book The Old Regime and the Revolution, about pre-revolutionary France. It’s a much better book, in a lot of ways.
Tocqueville was a young man when he wrote Democracy in America, after spending eighteen months traveling through America and talking to the best-informed people he could find. Though probably wiser at age 26 (when he started his journey) than 99.9% of people twice that age, he was still no match for his later self; in the later book, he alludes several times to errors of youthful enthusiasm that he committed in the earlier one. Also, in the earlier book, he was writing about a foreign country, not about his own. The language and mode of expression were not his native ones; fine nuances and things that were left unsaid must have escaped him in some cases. His later book, besides being about his own country, was researched in much greater detail; he delved deeply into formerly-private government records, much as modern historical researchers do. The book is heavily footnoted.
But it’s not just that it’s a better book in the abstract; it’s a lot more relevant than the earlier book to the US political system as it is today. In writing of the old regime of France, he was writing of a formerly decentralized system that had gradually, over the century or so prior to the Revolution, turned into a centralized one — one in which bureaucrats from Paris (or answering to Paris) poked into all manner of details of people’s lives. The change had been little remarked, since the old institutions of local control had been left intact, but had been bypassed. Before Tocqueville’s book appeared, Frenchmen had been of the habit of speaking of centralization as one of the benefits of the Revolution, but he showed that that particular change was more in appearance than in reality.
That is the resemblance in the large; in details, there are also a surprising number of resemblances. The courts of the old regime, for instance, he describes as increasingly interjecting themselves into politics, yet on the other hand increasingly abandoning their role as legal arbiters to special administrative courts. (That this is a resemblance may come as a surprise to readers who are unfamiliar with the number of special court systems established in the US government today, and the number of “administrative hearings” of various sorts that are conducted. Tax courts, which resolve matters relating to the IRS, are perhaps the most widely known; but even the National Transportation Safety Board has its own courts with its own judges.) Tocqueville also describes the middle classes, in pre-revolutionary society, as being divided into squabbling groups, each trying to chisel favors out of the government in its own way, but more similar to one another than they realized.
Of course there are unsurprising resemblances too, such as that the old regime was a system that, to support itself, levied an increasing number and variety of taxes, and nevertheless was going bankrupt. But those are not what make the book interesting.
Taking Advantage of the Placebo Effect
The placebo effect is well known for interfering with medical experiments. It’s not just that if you tell patients that a drug is going to have an effect they tend to believe it has had that effect. It’s that it tends to actually have that effect, when measured by objective measures such as blood tests. Thus the use of double-blindness in experiments, where not only the patients but also the doctors dealing with them have no idea whether they’ve been given the active drug or a placebo. Something about wishing really does make it so, somewhat, when it comes to health; the placebo effect is not just in patients’ minds but also in their bodies.
But besides the nuisances it causes to experimenters, the placebo effect can also be taken advantage of. Doctors sometimes prescribe drugs as placebos; since it genuinely helps the patients, it’s hard to argue with that practice. It’s deception, but in a good cause. (The deception had better not be too obvious, though, or it’ll do no good. Indeed, if the doctor himself believes in the placebo, that’s best.) Christian Scientists use nothing but the placebo effect. But what’s a skeptic to do, to get some of this goodness? If a doctor prescribes something for me, I’m going to look it up on the net and find out how it works; if it doesn’t, that’ll be apparent, and the fact that the doctor prescribed it will not impress me. As for faith healers, starting by insulting one’s own intelligence doesn’t seem like the way to proceed in harnessing the powers of the mind. So what to do?
I believe in the placebo effect directly. I cut out the middlemen, and the foolishness, and just take the thing straight. The placebo effect is going to help me because I know it will; because it is an established principle of medical science that it will if I believe it will; and I do so believe. Whatever boost my mind can give to my body, in getting better from whatever ailments might afflict me (not particularly much, at the moment), it’ll give.
(Update, June 3 2016: If it seems a bit odd to you that this effect can be gamed in such a way, you’ll be relieved to learn that it’s not all that powerful in the first place; most of what people call “the placebo effect” is just regression to the mean.)