MPEG Audio Coding
CD Bit Rate
Everybody knows that CDs are capable of producing
high-quality audio reproduction. However, the problem with CDs is that they use
a bit rate of:
CD bit rate = 16 bits per sample x 44,100 samples per second x 2 stereo channels = 1.41 Mbps
This is fine for a dedicated CD player but is unsuitable
for digital radio transmission because the bit rate would require a very wide
bandwidth and because of the fact that terrestrial bandwidth is very limited,
this would severely limit the number of radio stations that could fit into the
narrow bandwidth that is made available for radio systems such as FM or DAB. For
example, DAB is presently transmitted in what is called Band III on 7 channels
named 11B through to 12D with carrier frequencies of approximately 218 MHz to
229 respectively. DAB transmits a maximum useful data rate of 1.6 Mbps in about
1.5 MHz of bandwidth for one of these channels. Therefore, in the present
bandwidth available to DAB in the UK which is approximately 12 MHz, only 8 radio
stations could be used if the same audio encoding format were used for DAB as it
is for CDs.
Perceptual Audio Coding
The solution to this problem is to use perceptual audio
coding as used by MPEG (Motion Pictures Expert Group). MPEG, as well as designing
coders/decoders (codecs) for encoding video, they have also designed encoding
schemes for the transmission or storage of high quality audio at reduced
bit rates. DAB uses MPEG-1 Layer 2 or MPEG-2 Layer 2 encoding which I will refer to
as MP2. This format is the predecessor to MP3 which is Layer 3.
Perceptual coding uses the hypothesis that if the ear
cannot perceive some sounds then there is no point in encoding these sounds. Not
encoding the sounds that we cannot hear allows a reduction in the overall number
of bits needed to encode the signal and therefore the bit rate can be reduced.
Reducing the number of bits that are needed to encode a signal or file is
generally called data compression and the ratio of the original number of bits
to the number of bits after compression is called the compression ratio. For
audio the bit rate for CD is used as the original bit rate and the compressed
version is the perceptually coded version.
Perceptual coding has made great advances in recent years.
In the following table, the MPEG standards are listed in chronological order
showing the bit rate required to achieve “near CD-quality”:
| Codec | Bit Rate | Compression Ratio
|
| Layer 1 (MP1) | 384 kbps | 3.7
|
| Layer 2 (MP2) | 256
kbps | 5.5
|
| Layer 3 (MP3) | 192 kbps | 7.3
|
| AAC* | 128
kbps | 11.0
|
*Advanced Audio Coding
Perceptual coding is the combination of psychoacoustics and
digital signal processing (DSP). DSP is a branch of communication engineering
which uses DSP chips that are optimized to perform arithmetic operations very
quickly. DSP chips are microprocessors but not the same as those that are found
inside a PC because they are designed for different purposes.
Psychoacoustics
Psychoacoustics is the scientific study of the perception
of sound.
Through thorough listening tests to different sounds, psychoacoustic experts
have come up with a model of how humans hear. It has been discovered that for
example, when a high amplitude tone (a tone is a sinewave at a single frequency)
is present in a sound then lower amplitude tones at frequencies close to the
tone’s frequency cannot be perceived. Through exhaustive studies, curves have
been drawn from listening tests which plot how large an amplitude a tone has to
be in order to be perceived when there is a tone of a given frequency in the
signal. These curves that have been plotted are called masking curves and are
the basis by which the perceptual encoders reduce the number of bits needed to
encode an audio signal. As well as the masking curves described above a curve
has been plotted that shows the amplitude at which a tone can just be perceived
when no other sounds are present. This curve is called the ‘threshold of
hearing’ curve. An example of a masking curve and the threshold of hearing
curve is shown in the figure below:
The figure above shows the threshold of hearing curve below and a single
tone (sinewave) with a frequency of 1kHz. The green curve is the masking curve
due to that tone and the band of noise in yellow at a centre frequency of about
1.5kHz cannot be perceived by the human ear because of the masking effect of the
tone at 1kHz.
An example of the way in which an audio signal is encoded
using MPEG Audio encoding as might occur in a DAB radio station is as follows.
Say the source material is a CD. The sample amplitudes of the audio signal on
the CD are sent to the MPEG encoder. The encoder then stores the samples in
memory until there is a full block of samples. Then a fast Fourier transform (FFT)
is performed on the samples to find the frequency domain representation
(frequency content) of these samples. These frequency domain values are then
sent to the psychoacoustic model so that the appropriate masking curves can be
calculated. Then the amplitudes of the frequency samples (frequency components)
are compared with the masking curves and any frequency components that have
amplitudes that fall below the masking curves are not transmitted. Frequency
components whose amplitudes are above the masking curves are then encoded using
a bit allocation algorithm which allocates bits so as to maximize the signal to
mask ratio for that frequency. As well as the curves that are derived from the
frequency content in the signal, the threshold of hearing curve is also applied
to that any tones with amplitudes below this threshold are not encoded.
As well as the effects of large amplitude tones masking
nearby, smaller amplitude frequency components there are also time domain
effects. For example, when a loud tone is present at a certain frequency then if
a tone of a similar amplitude at that frequency is present shortly after the
first one then the later tone cannot be perceived by the listener and so is not
encoded to save bits.
The DAB system specifies that the decoders in receivers do
not need to be upgraded while improvements in psychoacoustic modelling can be
applied to the encoder’s psychoacoustic model at the broadcaster’s end. This
is a sensible decision although for a given encoder such as MP2, only limited
improvements can be expected to be made. Major improvements are made when an
encoder is redesigned altogether such as has happened when MP3 was designed and
then with AAC. It is a shame that AAC cannot be used with DAB because it
achieves a far better audio quality than MP2 with half the bit rate that MP2
uses but unfortunately we’re stuck with MP2.
|