Lossy compression formats like mp3 and AAC are known to create artifacts such as clipping. In order to avoid clipping, it is typically recommended to lower the signal peak levels before compression. This article explains how peak levels are affected by lossy compression and shows how to avoid clipping in compressed files.
Most music nowadays is distributed in compressed formats: either mp3 (MPEG-1 Layer III) or AAC (Advanced Audio Coding). These compression algorithms reduce the size of a CD-quality audio by a factor of 5–10, depending on the chosen bitrate. This is a far stronger compression than a typical 2× ratio achievable by lossless codecs, like FLAC or ALAC. Therefore, the signal encoded by mp3 or AAC cannot be preserved exactly. These algorithms create an approximation of the signal that sounds as close to the original as possible.
Lossy encoding can be viewed as a low-bit-depth quantization of a signal. The precision of this quantization depends on the selected bitrate, while quantization noise (a compression error: the difference between the original and the decoded signal) is spectrally shaped to be minimally audible — this is achieved by a psychoacoustic model.
The amplitude of quantization noise depends on the chosen bitrate and signal complexity. Slowly-changing tonal signals are easy to approximate, while random noises are hard (see the examples below). The amplitude of compression noise is often proportional to the signal level, much like with a 32-bit floating-point sample format. The noise of a 32-bit float format is always 150 dB lower than the signal level, while the noise of mp3 or AAC compression is usually only 15–30 dB below the signal level.
When quantization noise is added to the waveform, it can change the peaks levels. If the waveform has been brickwall-limited to a certain level, chances are that 50% of waveform peaks will rise in level.
This increase in levels in often wrongly attributed to ISPs — intersample peaks (or true peaks). But, in fact, it has little to do with ISPs. In the waveform above, true peaks have been limited to −1 dBTP, but after lossy compression, both sample peaks and true peaks are significantly higher. The cause of this increase is quantization happening during lossy compression.
Lossy compression is often easy to identify by looking at the spectrogram. The upper frequencies are completely cut (a psychoacoustic model finds them inaudible) and the cutoff line is serrated, with occasional “black holes” below the cutoff. Signals at middle frequencies are typically preserved much better because they matter more for the perception. The goal of a psychoacoustic model is to allocate more bits to spectrogram bins that have a higher chance of being audible and shape quantization noise below the masking threshold.
There is no clear answer: it depends on the encoder, the bitrate, and the signal itself. When mastering your recording, it is always important to consider high chances of lossy compression somewhere down the line, and completely out of your control. The most typical recommendation from “Mastering for iTunes” is to keep true peak levels at or under −1 dBTP. As can be seen from the sample above, this is not always sufficient to prevent clipping: in that example, true peaks rose by 1.73 dB after mp3 encoding. A mastering engineer's typical goal is to prevent most of the clipping, not all of it. In fact, there are some pathological cases where peak levels are rising by as much as 10 dB after lossy compression. Below is a sample of white noise with a binary p.d.f. (distribution of sample amplitudes) that experiences a dramatic rise in peak levels after either mp3 or AAC compression.
Interestingly, any clipping that happens because of peak level increase during lossy encoding is reversible! Clipping happens during file decoding, while the internal representation of an mp3 (or AAC) file is not clipped—very much like a floating-point sample format is clipped upon playback, but may contain valid signals above 0 dB. Some decoders are smart and able to apply some negative gain to prevent clipping. Others, like RX 7.01, can decode to a non-clipping 32-bit float format, where you can manually take care of any overshoots. Unfortunately, most decoders are dumb: they decode to a 16-bit sample format and clip. So, the safest way to prevent clipping of mp3 or AAC files is to leave some headroom below 0 dB. Even half a decibel of headroom will eliminate most of the audible clipping.
The new update of RX introduces a unique feature for automatic prevention of clipping during lossy encoding. It is available directly in the File – Export window and has two new modes of operation:
When clipping prevention is enabled, RX automatically finds the correct level adjustment for the file depending on the amount of clipping occurring in the codec. This guarantees that your encoded file does not clip upon decoding. When clipping prevention is off, lossy formats are encoded in the old way that does not protect from codec clipping.
The Limiter will leave larger sections of the file unchanged in level and will only attenuate sections that would experience clipping. However, like any dynamic processing, this may create pumping.
The Normalize mode can completely avoid pumping at the expense of slightly reducing the overall level of the file. Both of these options run the encoding slower than the old way because they are dynamically adjusting file levels to ensure that no clipping occurs in the codec.
In addition to that, RX can fix codec clipping that has occurred in files compressed elsewhere. Upon decoding, all lossy files will be decompressed to a 32-bit floating-point format without clipping. This means peaks above 0 dBFS will be automatically recovered and brought under 0 dB either by normalization or with a peak limiter like in Ozone.
When encoding is lossy formats, care has to be taken to avoid clipping, because lossy codecs do increase peak levels of the signal. A traditional way of addressing the problem is leaving around 1 dB of headroom. A better way offered in RX since version 7.01 is to use “Prevent clipping” option which guarantees no overs during lossy encoding with minimum possible amount of headroom.