MP2 vs AAC+
The most important single technology in a digital radio system is the audio
codec, because the efficiency of the audio codec is the main determinant of
the overall efficiency of the system.
The efficiency of an audio codec is determined by what bit rate level it
needs to use to provide a certain level of audio quality, and the lower the
required bit rate the more
efficient the audio codec is.
The reason why the efficiency of the audio codec is so important to a
digital radio system is that the amount of capacity available is fixed (e.g.
the capacity on a DAB multiplex),
therefore the lower the bit rate level required the more radio stations can
transmit on a multiplex, and the more radio stations that can transmit eases the pressure on
the bit rate levels which makes it more likely that better audio quality will
be provided.
Also, the cost of transmitting a multiplex is almost constant irrespective
of the number of radio stations that are carried on it, so the higher the
number of radio stations that can be carried the lower the transmission costs
will be per radio station, which also increases the likelihood that
broadcasters will choose to provide better audio quality.
The way to compare the efficiency and audio quality performance of
different audio codecs or different bit rate levels is by carrying out
listening tests that follow the BS.1116 standard, which is an ITU
(International Telecommunications Union) standard for blind listening tests --
the standard's full title is: BS.1116 : Methods for the subjective assessment of small impairments in audio systems including multichannel sound systems.
To explain how BS.1116 listening tests are carried out, it's probably best
to use a screeenshot of the piece of software called ABC/HR
that people on the Hydrogen Audio forums use when they carry out listening
tests. The following image shows a small part of the ABC/HR user interface
that is used to test a single compressed sample:
The ABC/HR software randomly assigns the uncompressed original sample to
either the left or the right arrowed buttons near the bottom and the
compressed sample of the same sample to the other of the left or right arrowed
buttons. Beneath these is a button labelled 'Ref', which allows the user to
listen to the uncompressed original at any time.
The tester does not know which of the left or right buttons the compressed
sample has been assigned to, so they must first identify which is the
compressed sample, and then assign a grade to that sample on a scale from 1 -
5, and assign a value of 5.0 to the uncompressed original. The grade assigned
to the compressed sample depends on how badly the audio quality of the
compressed sample has been impaired relative to the uncompressed original. The
impairment scale used in BS.1116 is as follows:

Once all testers have finished grading the samples, the results are
compiled using a statistical analysis methodology that is laid out in the
BS.1116 standard.
The testers must first demonstrate that their hearing is good enough to
identify audio artefacts, and doing this along with using a standardised
statistical analysis methodology allows the BS.1116 testing procedure to
produce repeatable results, which means that it allows objective rather
than subjective comparisons
to be made between audio codecs and between bit rate levels.
However, the listening test results are heavily dependent on how difficult
the audio samples are to compress, so there will always be some variability
between tests that use different samples. BS.1116 tests are meant to use
'killer samples' that try to 'break' the codecs under test, although some of
the listening tests that the UK DAB industry have carried out have used
easy-to-encode samples in order to produce favourable results.
BS.1116 is the most appropriate listening test for assessing the audio
quality of audio codecs used on digital radio systems, because the testing
methodology was designed to assess 'small impairments' to the audio.
Unfortunately, there has been a trend towards using the BS.1115 MUSHRA listening test
standard, because it is easier to carry out, but the problem with MUSHRA is
that it was designed to assess 'medium to large impairments' to the audio --
it was originally designed to test audio codecs used for Internet radio
distribution at very low bit rates -- so it is an unsuitable testing
methodology for digital radio, because audio on digital radio should only
suffer from small impairments.
Therefore, only test results from listening tests that conform (or
conform as best they can) to the BS.1116 test will be considered below.
(For more technical information about MP2, see the DAB
vs DAB+ page.)
The following figure shows the listening test results for a few different
audio codecs at different bit rate levels. For example, the curve for MP2 is
labelled 'LII', the single point for 128 kbps MP3 is labelled 'LIII' and you
can also see the performance of AAC.
Although this test is a bit old now, it clearly shows how badly MP2
performs, because at 128 kbps it was classified as providing
"Annoying" audio quality.

Unfortunately, the listening test from which the above results were taken
didn't use the BS.1116 impairment grading scale. The following figures show the
two different impairment scales:
| BS.1116
impairment scale |
Impairment
scale used for the above test results |
 |
 |
Therefore the above test results are all offset with respect to the BS.1116
scale as follows:
| Impairment |
BS.1116 |
Scale
for the above test results |
| Imperceptible |
5.0 |
5.0 |
| Perceptible, but not annoying |
4.0 |
4.5 |
| Slightly annoying |
3.0 |
3.5 |
| Annoying |
2.0 |
2.5 |
| Very annoying |
1.0 |
1.5 |
Therefore, the above test results are overestimating the audio quality
level because the results are all shifted upwards by 0.5 with respect to the
BS.1116 impairment scale. The above test results also subtracts 5.0 from all
results but this doesn't alter the quality level relative to 'imperceptible'.
The above listening test results for MP2 are tabulated below after
correcting for the above changes so that comparisons can be made between MP2
and AAC+ in subsequent sections.
Bit Rate
kbps |
Diff-grade
|
Add 5.0 |
Subtract 0.5 to
equate to BS.1116 scale |
| 192 |
-1.17 |
3.83 |
3.33 |
| 160 |
-1.85 |
3.15 |
2.65 |
| 128 |
-2.1 |
2.9 |
2.4 |
(For more technical information about AAC+, see the DAB
vs DAB+ page.)
AAC stands for Advanced Audio Coding, and it is an audio codec that was
designed in the early to mid 1990s after tests had shown that removing the constraints
to be backwardly compatible with MP2 and MP3 allowed much better performance to
be achieved. AAC was standardised in 1997, so it could have been adopted by
DAB, but they ignored it, and here we are ten years later and DAB has only
just adopted AAC+, which is simply an extension of AAC.
'AAC+' is actually a brand name, and the official name for this audio codec
is HE-AAC, or High-Efficiency AAC. But as AAC+ is an extension and superset of
AAC, I think that the name AAC+ sums up what the audio codec is very well.
The figures in this section are from listening tests carried out by
audio coding enthusiasts on the Hydrogen
Audio forum. These tests conform as closely to the BS.1116 standard as
they can and they do use the BS.1116 impairment scale.
The following figure shows the test results from a 128
kbps listening test carried out in December 2005 where various audio
codecs were tested:

This figure shows how well modern audio codecs perform these days, as iTunes AAC,
Lame MP3, Ogg Vorbis and WMA Pro all show very good scores. If you compare the
above results with those for MP2 you can see that 128 kbps AAC scores 1.4
points higher than 192 kbps MP2 and 2.3 points higher than 128 kbps MP2.
The reason why Shine has such a low score is because it was used as a 'low
anchor', which is the use of a codec that is expected to perform much worse
than the other codecs -- it's a sort of sanity check for the results and which
allows comparisons with previous test results.
The following figure shows the results
from a 64 kbps listening test carried out in July 2007:

'HE-AAC' is the official name for AAC+, Vorbis AoTuV refers to Ogg Vorbis,
and the High Anchor used was 96 kbps AAC and the Low Anchor was 48 kbps AAC -
like with the case of the Shine MP3 codec mentioned above that was used as a
low anchor, a high anchor is a bit rate/codec combination that is expected to
perform significantly better than the best-performing codec under test so that
results can be compared with previous test results.
AAC+ is still a new codec, as it only came out in about 2003, and it is
under intense development due to it being the most efficient audio codec that
exists today, so its performance should improve significantly over the next
few years, as audio (and video) codecs always improve a lot in their early
years of existence.
The figure below shows the results
from a 48 kbps listening test carried out in February 2006:

(CT stands for Coding Technologies)
'HEv1' and 'HEv2' stand for HE-AAC versions 1 and 2, respectively. The
difference between versions 1 and 2 is that HE-AACv2 uses a technology called
'parametric stereo', which only uses 2 - 3 kbps to encode the stereo
information (i.e. the rest of the bit rate is used to encode the mono audio).
However, the results clearly show that at 48 kbps, HE-AACv1 provides better
quality than HE-AACv2 for both the Coding Technologies and Nero AAC+
implementations, so joint stereo (using mid/side and intensity stereo joint
stereo coding) performs better than parametric stereo even though parametric
stereo frees up more bits for encoding the other audio information.
The following table shows the best results for AAC, AAC+ and MP2 from all
of the listening tests mentioned above:
Bit Rate
kbps |
MP2 |
AAC |
HE AACv1 |
| 192 |
3.33 |
- |
- |
| 160 |
2.65 |
- |
- |
| 128 |
2.40 |
4.74 |
- |
| 96 |
|
4.59 |
|
| 64 |
- |
- |
3.74 |
| 48 |
- |
- |
3.30 |
The following figure displays the results from the above table graphically:

The above results clearly show that AAC and AAC+ are vastly superior to MP2
in terms of both efficiency and the absolute level of audio quality that can
be provided (for bit rate levels that are likely to used on digital radio). It is a shame that those working on DAB during the 1990s
completely ignored AAC considering that development of AAC began in 1993 and
it was standardised in 1997, because if they had adopted it then DAB wouldn't
be in the ridiculous mess that it is in now where millions of DAB receivers
will be made obsolete over the coming years, and the UK will have to go
through a long process of transmitting new AAC+ stations in parallel with the
existing MP2 stations -- so quality will actually have to get worse before it
gets better in order to fit in any new AAC+ stations.
|