Most leading streaming services have implemented some form of loudness normalization, turning the act of mastering for streaming platforms into a bit of a minefield.
In this article we’ll take a look at some myths surrounding how the different services normalize audio, whether there are ideal LUFS and peak levels, how normalization impacts song-to-song level balances in albums, and why some songs still sound quieter than others, even with normalization turned on.
This is probably the most common myth out there about mastering for streaming services; I see it all the time. Usually, the conversation goes something like this:
Intrepid Engineer: “How loud should I master for Spotify?”
Half of Facebook: “-14 LUFS!”
Me: “Hold on, it’s actually not quite that simple…”
So how did we all get so confused? Why isn’t it that simple, and what level should you master to? Before we tackle that head-on, I want to rewind a little and address a systemic misconception that is at the root of a lot of this.
In my view, this started with the language we’ve been using to talk about normalization, particularly one word: “target.” Time and time again I’ve heard people mention a “normalization target,” usually with the accompanying value of -14 LUFS. But think about what a target is. It’s something you aim for with the goal of hitting as precisely as possible. That’s not what normalization is about though, and it’s not really how it works.
The goal behind loudness normalization was never to force, or even encourage, mastering engineers to work toward a specific level. That’s what broadcast standards like EBU R128 and ATSC A/85, or laws like the CALM Act are for. Loudness normalization, on the other hand, is purely for the benefit of the end-user. It is there so that when an end-user is listening to program material from a variety of sources (e.g. a playlist) they don’t have to constantly reach to adjust their volume control. That’s it.
As such, I’d like to reframe the notion of a normalization or loudness “target” to that of a reference level. Functionally, the loudness of the program material is measured, compared to the reference level, and a precise gain offset is applied to match the measured loudness to the reference level. With this in mind, there’s less need to aim so specifically; just get it in the ballpark and the algorithm will work out the rest.
At this point you’d be forgiven for thinking, “That’s nice and all, but semantics aside, how loud should I be mastering?” Hopefully though, you’re beginning to see that in a lot of ways it’s really up to you. However there are a few best practices, so let’s look at some specifics.
First, let’s talk about peak level. Since most streaming platforms are still using lossy compression codecs, it’s important to leave a bit of peak headroom to avoid distortion during the encode and decode process. We’ve discussed this in more detail in a previous post but a good rule of thumb is to leave at least 1 dB True Peak headroom. Sometimes though, more can sound better, especially with louder material or lower bitrates.
One good way to audition this is by using the Codec Preview module in Ozone. Not all streaming platforms use MP3 or AAC all the time, but those two codecs can certainly give you a good idea of where others might overshoot.
Next, let’s discuss the trickier matter of reference levels. Not only do all the services use different reference levels, many of them don’t even use the same normalization method. In fact, as of this writing, only Tidal, Amazon Music, and YouTube use BS.1770 (aka, LUFS). Others, such as Spotify, use ReplayGain (often with a modified reference level), while others still have developed their own normalization methods. Apple’s SoundCheck is a good example of the latter case.
To muddy the waters further, there’s nothing to prevent any of the streaming services from changing either their reference level, normalization method, or both down the road. In fact, Spotify has done this in the past. A few years ago they lowered their reference level by 3 dB. They have also stated that in the future they plan to use BS.1770, likely with a reference level of -14 LUFS integrated, but of course, that’s subject to change, and when this switch will take place is anybody’s guess.
Once we let go of the notion of a “target” that we’re responsible for hitting and accept that each platform will adjust the gain appropriately, it frees us to master to the level that best suits the music. If you want to really crush something, you’re free to do that, it will just get turned down more. If you want to leave a higher crest factor you’re free to do that too. The extra punch of the wider peak-to-average ratio may even help it stand out from the crowd a bit.
The one caveat is that if you’re below the reference level of a particular platform, your song may get turned up (depending on the service). Spotify will apply extra limiting of its own design to do this, while Pandora will allow clipping. If this is a concern to you, tools like Loudness Penalty can show you more precisely what each service will do.
While this isn’t strictly a myth for every streaming service, it is for any of the major ones that encourage album based playback. Out of Spotify, Apple Music, YouTube, Pandora, and Tidal, only YouTube and Pandora use track normalization exclusively. For platforms like these where users predominantly listen to singles or radio type streams, this makes some sense.
Spotify and Apple Music, on the other hand, both have an album mode. The technique employed for album normalization is to use either the level of the loudest song on an album (or EP), or the average level of the entire album, and set that equal to the platform reference level. Then the same gain offset is applied to all other songs on the album. For Spotify and Apple Music this kicks in when two or more songs from an album are played consecutively.
Interestingly, Tidal has elected to use album normalization for all songs, even when they’re in a playlist. This method was implemented after Eelco Grimm published research on the matter in 2017, presenting strong evidence that album normalization is preferred for both album and playlist listening by a majority of users. If we analyze this, it points to another important fact: we shouldn’t let normalization reference levels dictate how we level songs on an album, but rather let the artistic intent and natural flow of the music be our guide.
This one surprised me, but I saw it twice just last week, and those weren’t the first times either. It seems to arise from the experience of hearing a song you’ve mastered next to a bigger commercial release and still having the impression that your song is quieter. This is often despite the fact that: A. you mastered to a level equal to or greater than the platform reference level, and B. normalization is turned on.
Naturally, this feels counterintuitive. If anything breeds conspiracy theories it’s not fully understanding something that you think you do, and as a result of normalization not behaving quite as people expect, they seem to think if you know the right people at Spotify, or have the force of a major label behind you, Spotify will work some magic behind the scenes and make your song just a little bit louder. This, of course, is demonstrably untrue.
A full explanation of the technical details of all normalization methods currently in use is outside the scope of this article, but let me briefly summarize how ReplayGain works, and then show you what I feel is a powerful visual example.
ReplayGain is essentially calculated in three steps. First, a loudness filter that emulates the sensitivity of the human ear is applied, rolling off below 150 Hz and accentuating frequencies around 3–4 kHz. Second, the file is sliced into 50 ms long blocks and the RMS level of each block is calculated and stored. Third, these RMS levels are sorted from softest to loudest on a scale from 1 to 100% and the value at 95% is chosen as the representative loudness of the whole file. If you’re interested, you can read the full spec here.
If this seems a bit convoluted to you, don’t worry, you’re not alone. The upshot is that it only takes the loudest 5% of a song to offset the entire level to a softer point than you might expect. For example, songs with soft intros, or which are relatively soft throughout with the exception of one big chorus, may initially sound quiet compared to a song that maintains a fairly consistent level throughout.
To illustrate this in an admittedly artificial way, let’s take a look at two very similar files. Both consist of 100 seconds of pink noise, with a small difference in the level profiles. The first file plays at -16 dBFS RMS for 96 seconds before changing to -8 dBFS RMS for the final 4 seconds, while the other file plays at -16 dBFS RMS for 94 seconds before switching to -8 dBFS RMS for the final 6 seconds.
A LUFS reading of the 96% file reveals an integrated level of -15.7 LUFS
Whereas the same measurement of the 94% file yields an integrated level only 0.4 dB louder at -15.3 LUFS. Intuitively, this feels more or less right.
Now, what about ReplayGain? Measuring the 96% file shows that on Spotify it would be turned up by 5.5 dB.
But when we measure the 94% file we find that it will be turned down by 1.7 dB.
This is a measured difference of a whopping 7.2 dB! Again, this is an admittedly manufactured example, but I think it serves to illustrate a point. For the prototypical three and a half minute pop song, it only takes a little over two seconds in either direction change the outcome.
I get it, it’s a lot to absorb. I know I’ve not given many concrete numbers or rules of thumb, but hopefully, you see that it’s because there are a lot of variables that have the potential to change at any time. Still, since you’ve made it this far, let me share with you a few personal axioms that guide my day to day work:
Hopefully these four parting tips, along with a better understanding of the forces at work on loudness normalized streaming platforms, will better equip you to make masters that translate well not only today, but for years to come.