Norman's bloghttp://yarchive.net/blog/physics/index.atomNorman Yarvinhttp://yarchive.net/blog/physics/index.atomCopyright 2011-2012 Norman Yarvin
PyBlosxom http://pyblosxom.bluesock.org/ 1.5-dev git-master
2012-04-22T02:31:12ZAudio sampling rates and the Fourier transformhttp://yarchive.net/blog/2012/04/21/fourier_transform2012-04-22T02:31:12Z2012-04-22T02:31:12Z<p>Christopher Montgomery (“Monty”) recently posted an <a href="http://people.xiph.org/~xiphmont/demo/neil-young.html">excellent
argument</a> against distributing music in 192 kHz, 24-bit form, as
opposed to the usual 44.1 kHz (or 48 kHz), 16-bit form. I think,
however, that many of the people who are inclined to doubt this sort of
thing are going to doubt it at a much more fundamental level than the
level he’s addressed it at. And I don’t just mean the math-phobic; I
know I would have doubted it, once. For years, and even after finishing
an undergraduate degree in electrical engineering, I wondered whether
speaking of signals in terms of their frequency content was really
something that could be done as glibly and freely as everyone seemed to
assume it could be. It’s an assumption that pervades Monty’s argument —
for instance, when he states that “all signals with content entirely
below the Nyquist frequency (half the sampling rate) are captured
perfectly and completely by sampling”. If you don’t believe in speaking
of signals in terms of their frequency content, you won’t know what to
make of that sentence.</p>
<p>As it happens, the assumption is completely correct, and the glibness and
freeness with which people talk of the frequency domain is completely
justified; but it originally took some serious proving by mathematicians.
To summarize the main results, first of all, the Fourier transform of a
signal is unique. When you’ve found one series of sine waves and cosine
waves that when added together are equal to your signal, there is no
other; you’ve found the only one. (Fourier transforms are usually done
in terms of <a href="http://en.wikipedia.org/wiki/Euler%27s_formula">complex exponentials</a>, but when one is dealing with real
signals, they all boil down to sines and cosines; the imaginary numbers
disappear in the final results.) If you construct a signal from
sinusoids of frequencies below 20 kHz, there’s no possibility of someone
else analyzing it some other way and finding frequencies higher than that
in it — unless, of course, he does it wrong (an ever-present danger).</p>
<p>Also, the Fourier representation is <em>complete</em>: any signal can be exactly
represented as a sum of sinusoids (generally an infinite sum of them, or
an integral which is the limit of an infinite sum of them). There are no
signals out there which defy Fourier analysis, and which might be left
out entirely when one speaks of the “frequency content” of a signal.
Even signals that look nothing like sine waves can be constructed from
sine waves, though in that case it takes more of them to approximate the
signal well.</p>
<p>But the main thing that makes it possible to be so glib about the
frequency domain is that the Fourier transform is <em>orthogonal</em>. (Or in
its complex-exponential variants, <em>unitary</em>, which is the corresponding
concept for complex numbers.) What it means for a transform to be
orthogonal can be illustrated by the example of coordinate transforms in
three-dimensional space. In general, a coordinate transform of a
three-dimensional object may twist it, bend it, or stretch it, but an
orthogonal transform can only rotate it and possibly flip it over to its
mirror image. When viewing 3D objects on a computer screen, applying an
orthogonal transform just results in looking at the same object from a
different angle; it doesn’t fundamentally change the object. At most it
might flip the ‘handedness’, changing a right hand into a left hand or
vice versa. In the Fourier transform there are not just three numbers
(the three coordinates) being transformed but an infinite number of them:
one continuous function (the signal) is being transformed into another
continuous function (its spectrum); but again, orthogonality means that
sizes are preserved. The “size”, in this case, is the total energy of
the signal (or its square root — what mathematicians call the
<a href="http://mathworld.wolfram.com/L2-Norm.html"><em>L<sup>2</sup></em> norm</a>, and engineers call the
root-mean-square). Applying that measure to the signal yields the same
result as does applying the same measure to its spectrum. This means
that one can speak of the energy in different frequency bands as being
something that adds together to give the total energy, just as one speaks
of the energy in different time intervals as being something that adds up
to give the total energy — which of course is the same whether one adds
it up in the time domain or the frequency domain. This also applies, of
course, to differences between signals: if you make a change to a signal,
the size of the change is the same in the frequency domain as in the time
domain. With a transform that was not orthogonal, a small change to the
signal might mean a large change in its transform, or vice versa. This
would make it much harder to work with the transform; you would
constantly have to be looking over your shoulder to make sure that the
math was not about to stab you in the back. As it is, it’s a reliable
servant that can be taken for granted. As in the case of 3D coordinate
transforms, but in a vaguer sense, the Fourier transform is just a
different way of looking at the same signal (“looking at it in the
frequency domain”), not something that warps or distorts it.</p>
<p>Engineers these days seem to go mostly by shared experience, in feeling
comfortable with the Fourier transform: it hasn’t stabbed any of their
fellow-professionals in the back, so it probably won’t do so for them,
either. But as a student, I didn’t feel comfortable until I’d seen
proofs of the results described above. In general, learning from
experience means learning a lot of things the hard way; that just happens
not to be so in this particular case: there are no unpleasant surprises
lurking.</p>
<p>Now, when trying to use the Fourier transform on a computer, things do
get somewhat more complicated, and there can be unpleasant surprises.
Computers don’t naturally do the Fourier transform in its
continuous-function version; instead they do discrete variants of it.
When it comes to those discrete variants, it is possible to feed them a
sine wave of a single frequency and get back an analysis saying that it
contains not that frequency but all sorts of other frequencies: all you
have to do is to make the original sine wave not be periodic on the
interval you’re analyzing it on. But that is a practical problem for
numerical programmers who want to use the Fourier transform in their
algorithms; it’s not a problem with the continuous version of the Fourier
transform, in which one always considers the entire signal, rather than
chopping it at the beginning and end of some interval. It is that
chopping which introduces the spurious frequencies; and in contexts where
this results in a practical problem, there are usually ways to solve it,
or at least greatly mitigate it; these commonly involve phasing the
signal in and out slowly, rather than abruptly chopping it. In any case,
it’s a limitation of computers doing Fourier transforms, not a limitation
of computers playing audio from digital samples — a process which need
not involve the computation of any Fourier transforms.</p>
<p>Much more could be said about the Fourier transform, of course, but the
above are some of the main reasons why it is so useful in such a wide
variety of applications (of which audio is just one).</p>
<p>Having explained why sentences like</p>
<blockquote>
<p>“All signals with content entirely below the Nyquist frequency
(half the sampling rate) are captured perfectly and completely by
sampling”</p>
</blockquote>
<p>are meaningful, and not merely some sort of mathematical shell game, a
few words about Monty’s essay itself. As regards the ability of modern
computer audio systems to reproduce everything up to the Nyquist limit, I
happen to have been sending sine waves through an audio card recently —
and not any kind of fancy audio device, just five-year-old motherboard
audio, albeit motherboard audio for which I’d paid a premium of something
like $4 over a nearly-equivalent motherboard of the same brand with
lesser audio. This particular motherboard audio does 192 kHz sample
rates, and I was testing it with sine waves of up to the Nyquist
frequency (96 kHz). Graphed in <a href="http://audacity.sourceforge.net/">Audacity</a>, which shows signals by
drawing straight lines between the sample points, the signals looked very
little like a sine wave. But when I looked at the output on an
oscilloscope with a much higher sample rate, it was a perfect sine wave.
Above 75 kHz, the signal’s amplitude started decreasing, until at 90 kHz
it was only about a third of normal; but it still looked like a perfect
sine wave. Reproducing a sine wave given only three points per
wavelength is something of a trick, but it’s a trick my system can and
does pull off, exactly as per Monty’s claims. Accurate reproduction of
things only dogs can hear, in case one wants to torture the neighboorhood
pooch with extremely precise torturing sounds! (Or in my case, in case
one wants to do some capacitor ESR testing.)</p>
<p>The limits of audio perception are not something where I’ve looked into
the literature much, but I have no reason to doubt what Monty says about
it. Something I did wonder, after reading his essay, though, was: what
about intermodulation distortion in the ear itself? That is, distortion
of the same sort that he describes in amplifiers and speakers. Being
<a href="http://www.terrybisson.com/page6/page6.html">made of meat</a>, the human ear is <del>far from</del><ins>not</ins>
perfectly linear; and pretty much any nonlinearity gives some amount of
intermodulation distortion. Unlike in the case of intermodulation
distortion in audio equipment, though, this would be natural
intermodulation distortion: if, for instance, one heard a violin being
played in the same room, one would be hearing whatever intermodulation
distortion resulted in the ear from its ultrasonic frequencies; those
would thus comprise part of the natural sound of a violin, and
reproducing them thus could be useful. Also, nonlinearities can be
complicated: any given audio sample might not excite some particular
nonlinearities that might nevertheless be excited by a different sort of
music. But as the hypothetical language (“could”, “would”) indicates,
these are theoretical possibilities, which can be put to rest by
appropriate experiments. As per a test Monty links to, which was
“constructed to maximize the possibility of detection by placing the
intermodulation products where they’d be most audible” — and
nevertheless found that ultrasonics made no audible difference. I only
took note of that sentence on re-reading; but this
nonlinearity-in-the-ear idea is what that test was designed to check for.</p>
<p>Poking around at the Hydrogen Audio forums, the explanation for why
nonlinearity in the ear doesn’t produce audible lower frequencies seems
to be that:</p>
<ul>
<li>Ultrasonics get highly attenuated in the outer parts of the ear, before
they could do much in the way of intermodulation distortion. (It’s quite
common for higher frequencies to get attenuated more, even in air; this
is why a nearby explosion is heard as a “crack”, but a far-off one is
more of a boom.)</li>
<li>Intermodulation distortion then imposes a further attenuation, the
spurious frequencies introduced by distortion having much less energy
than the original frequencies.</li>
<li>Generally in music the ultrasonic parts are at a lower volume than the
audible parts to begin with.</li>
</ul>
<p>Multiply these three effects together, or even just the first two of
them, and perhaps one always gets something too small to be heard. In
any case, as Monty states, it’s impossible to absolutely prove that
nobody can hear ultrasonics even in the most specially-constructed audio
tracks. But when one is considering this sort of thing as a commercial
proposition, the question is not whether exceptional freaks might exist,
but what the averages are.</p>
<p>(Update: Monty tells me that contrary to what I’d originally stated
above, “by most measures the ear is quite linear”, and “exhibits low
harmonic distortion figures and so virtually no intermodulation.” The
text above has been corrected accordingly. I’d seen references to
nonlinearity in the hair cells; and it’d be hard to avoid it in neurons;
but those are after the frequencies have been sorted out.)</p>Entropy is not chaoshttp://yarchive.net/blog/2011/10/05/entropy_is_not_chaos2011-10-05T18:21:02Z2011-10-05T18:21:02Z<p>Mediocre physics teachers who are trying to explain the concept of
entropy often say that entropy is a sort of measure of chaos, with
increases in entropy meaning increased chaos. I found that claim
confusing from the first time I heard it; once I got a grip on the
concept of entropy, I realized that it’s simply false: entropy has little
to do with chaos. Consider, for instance, a bucket into which
different-color paints have been slopped, forming a chaotic mess of
colors. That mess has less entropy than it will after you mix it to a
orderly uniform color, which is the opposite of the way the
entropy-means-chaos idea would have it. Likewise, a room filled with a
chaotic mixture of air at different temperatures has less entropy than it
will after the temperatures all equilibrate to the same value. Or take a
situation in which you have two cylinders, one filled with air and the
other evacuated, and connected by a pipe with a valve. Once you open the
valve, half the air will rush from the full cylinder to the empty; this
will increase the entropy. But which situation is more chaotic than the
other? Relative to the everyday meaning of chaos, it’d be hard to say.</p>
<p>As for what entropy is, if it’s not chaos — well, as with other things
in physics, a definition could be given simply enough, but wouldn’t mean
much to anyone who didn’t already know how to put it in context. (“The
logarithm of <em>what</em>?”) The concept takes a lot of understanding; I
didn’t really get a grip on it until I spent a lot of quality time with
Enrico Fermi’s book <em>Thermodynamics</em>. That book explains it probably as
simply as it can be explained, but it’s still not easy.</p>
<p>It’s a worthwhile concept, though. One can get the impression from
casual physics talk that entropy is only good for making gloomy
statements about the heat death of the universe, and how everything is
doomed to run down and deteriorate. (Or in the above case, how it’s
easier to mix paints than to unmix them.) There is that aspect of it,
but entropy is also a practical tool. Using it it one can, for instance,
derive the Clausius-Clapeyron equation, which relates the vapor pressure
of a liquid to its heat of vaporization. Or one can use it to calculate
the exhaust velocity of a rocket engine, under the assumption of shifting
equilibrium.</p>
<p>While on the subject of chaos, it’s also worth mentioning that the
“chaos” defined in the branch of mathematics known as “chaos theory” also
isn’t chaos in the usual sense of the English language. In chaos theory,
water dripping from a faucet is a “chaotic process”. That’s because the
exact size of each drip and the exact interval between drips is hard to
predict, even though to the eye it looks like a steady drip, drip, drip,
and though the average person would say you were nuts to call it chaotic.
This has rendered scientific papers a bit more difficult to read, since
it can be hard to tell whether “chaotic” is meant in the ordinary sense
or in the chaos-theory sense. Unlike in the case of entropy, I have
difficulty labeling this technical concept of “chaotic” worthwhile, since
I’ve never encountered anyone making any practical use of it, and since I
don’t know why labeling something “chaotic” would help with anything: you
couldn’t predict it precisely before, and you still can’t predict it
precisely.</p>