Friday, March 14, 2014

xiph: 24/192 Music Downloads ...and why they make no sense

Another article (this one from March 2012) about high-definition audio, sampling rates, human hearing, and why the CD-standard format is a really good format for distributing audio.

Some notable quotes:

Unfortunately, there is no point to distributing music in 24-bit/192kHz format. Its playback fidelity is slightly inferior to 16/44.1 or 16/48, and it takes up 6 times the space.

Yes, this surprised me too. Unnecessary, I agree, but inferior?. As it turns out, inaudible ultrasonic content frequencies can cause intramodular distortion that may be audible unless you've got very high-end equipment that can cleanly reproduce it.

The concept is similar to how high/mid/low frequencies can interfere with each other, which is why most good amplifiers and speakers use separate discrete components (along with appropriate filters and crossovers) to process these ranges independently.

The article also talks about how 16 bits really is enough to capture the full dynamic range a human ear is capable of hearing:

16 bit linear PCM has a dynamic range of 96dB according to the most common definition, which calculates dynamic range as (6*bits)dB. Many believe that 16 bit audio cannot represent arbitrary sounds quieter than -96dB. This is incorrect.
With use of shaped dither, which moves quantization noise energy into frequencies where it's harder to hear, the effective dynamic range of 16 bit audio reaches 120dB in practice, more than fifteen times deeper than the 96dB claim.

120dB is greater than the difference between a mosquito somewhere in the same room and a jackhammer a foot away.... or the difference between a deserted 'soundproof' room and a sound loud enough to cause hearing damage in seconds.

And has so much range that any recording actually using it would be annoying to listen to in a normal room - I'd be constantly turning the volume up (to hear the quiet parts) and down (to avoid pain during the loud parts.) Just like I sometimes have to do when listening to music in the car or when trying to watch a movie while others are talking in the same room.

I would contend that 16-bits is more than sufficient, because audio that is comfortable to listen to in an uncontrolled environment should be compressed to less thaneven 96dB. (Although the high dynamic range is definitely welcome when I'm in a quiet room listening on good equipment.)

Finally, I want to quote a section on how loudness can impact preferences. I always knew that people tend to prefer louder sound, but I didn't realize how subtle the changes can be, and why you can't eliminate this factor from audio testing without specialized equipment:

The human ear can consciously discriminate amplitude differences of about 1dB, and experiments show subconscious awareness of amplitude differences under .2dB. Humans almost universally consider louder audio to sound better, and .2dB is enough to establish this preference. Any comparison that fails to carefully amplitude-match the choices will see the louder choice preferred, even if the amplitude difference is too small to consciously notice. Stereo salesmen have known this trick for a long time.

The professional testing standard is to match sources to within .1dB or better. This often requires use of an oscilloscope or signal analyzer. Guessing by turning the knobs until two sources sound about the same is not good enough.

Please read the complete article including the footnotes. There's a lot of great information in here.

If people want to switch to a new audio format that sounds better than the MP3s they have now, they would be best off by simply re-ripping their CDs into a lossless format like FLAC or Apple Lossless. By simply removing all compression artifacts, you will get audio that will sound great if your playback equipment is up to the task.


Drew said...

I agree for the most part, except for the part about the intermodular distortion. Most modern mastering is done at 24/192 so that is where the distortion is going to occur. The mastering engineer can try to filter it out then or leave it in. If it is in the audible range at that point, it will still be in the audible range when it is downsampled. My point is I don't think 24/192 would ever sound worse than 16/44. Best bet is to record/mix/master on 2" analog and listen on vinyl :P

Shamino said...

You are correct. The intermodular distortion is not the result of mastering at high bit-rates, but of trying to actually record and preserve ultrasonic frequencies in the final mix presented to the end-customers.

The distortion is an artifact of your playback technology not being able to respond in a linear fashion to such a huge frequency range, not of problems with the recorded material itself. It will primarily manifest itself if you get a playback device capable of ultrasound and try to run such a signal through normal amplifiers and speakers. They will either filter out the ultrasound (eliminating any theoretical advantage) or will introduce distortion in the audible frequencies.

If you read the article in detail, it talks about different ways to eliminate the distortion. One way is to get better amplifiers and speakers that can reproduce ultrasound without creating distortions in the lower frequencies. Another is to just filter out the ultrasound and not try to reproduce it at all. Another is to filter out those frequencies during the final mix-down so they never get into your recording. Another is to filter them from the mix and record using existing 44KHz sampling. The author's point is that all three will sound the same, but the latter approaches will cost a lot less, meaning you will be able to spend that money on other more effective ways of improving what you hear.