We engineers spend a lot of time polishing our vocals, and we know exactly why we do it. However, some people might look at the work involved—the editing, the manual de-essing, the noise-reducing (if needed)—and wonder, why go through all of that before the mixing process?
Certainly our media landscape conditions us to forgive more glaring errors. Look on audio forums today and you’ll see all sorts of pros with real cred lamenting the clicks, pops, and bad edits in the latest pop hit.
But is that any reason for us, as audio engineers, to lower our bar? I would argue no, because the virtues of vocal polishing far outweigh the time it takes. They even outweigh their immediate, apparent benefits: in polishing vocals before the mix, you’ll find psychological and creative rewards as well.
Let’s dive into what vocal polishing will get you, and let me give you a peek into my workflow, which centers around one principle:
I learned this one the hard way, but boy, when I got it through my thick skull, it made a difference. Cutting up the vocal into separate regions, properly placing the vocal line against the music or programmatic material—these techniques went a long way towards minimizing CPU-extensive automation or implementing time-based effects that were no longer needed.
How? Well, if the vocal already had the groove it needed before I touched a plug-in or a piece of gear, it inevitably required less processing to aid in its feel.
Similarly, a vocal already balanced for dynamics—whether by clip-gaining in the DAW, or in a program like RX or RX Elements—necessitates less compression later, and thus, allows for more versatility in how I choose to use compression in the mix.
It frees me up to use color compression (like an optical emulation) without worrying about unnatural dynamic effects, or conversely, to tighten the dynamic range without too much gain reduction, and thus, too much unwanted color.
The same goes for de-essing. If I make the choice to manually de-ess before the mixing process, sure, it’ll be grueling as all get out, but it also means I don’t need to use the de-esser in the chain, and I won’t need to automate a de-esser depending on the strength of the sibilance. All of this frees up CPU resources for more creative tasks.
EQ is also impacted by a good polish. De-essing before mixing, a modicum of leveling, de-noising, de-clipping, de-clicking—all of this renders my EQ choices more consistent and less drastic. I won’t need to use/automate a different EQ in response to a louder or softer phrase (which might have a different harmonic character), or one with more sibilance (ditto), or one with extraneous noise attached (double ditto).
You can do this in your DAW, where you can accomplish a lot with simple editing and RX Elements, or you could use something like RX 7. You can also use Nectar Elements to give your vocals professional clarity and polish:
The process of cutting the vocals into regions, and of moving these regions infinitesimally around, helps create a sense of groove, and can easily be done within your DAW. When you decide to do this is a matter of careful consideration: if the song has a swing feel, but the singer anticipated the beat, sounding square in the process, you can reposition the phrases around the grid to engender more groove from the vocal performance. That would be one example.
You can also use clip-gaining within the DAW to massage strange level shifts, and to do a certain amount of manual dynamics processing. If someone is saying the “chock” of “chock full” too loudly, you can lower the level with clip-gaining.
You can also de-ess with clip-gaining in the DAW simply by pulling down the obvious sibilances (they look like footballs in the waveform. A combination of clip-gaining and fades can help you minimize extraneous breaths, and, to some extent, you can even mitigate plosives by zooming into the waveform drastically and cutting out the offending ”pop,” then joining the resulting waveforms with a well placed crossfade.
Unfortunately, one can not provide a tutorial that fits all platforms, because how these operations work depends on your DAW. For instance, Logic Pro X provides no automatic clip-gaining shortcut, so it’s a bit of a pain, but Pro Tools does—and more: Pro Tools gives you powerful key commands for crossfades, region separation, region nudging, and other useful things.
Certain operations cannot be accomplished in your DAW, but are handled easily in RX Elements. Noise can be minimized with the noise-reduction module. Clicks and pops can be extricated with the de-click module—and if you’re using pro tools, you can do this selectively with audiosuite rendering. Hum from ground loops can be taken out, and distortions can be repaired with the De-Clip module.
I encourage you to try Elements, because not only is it a good introduction to the RX ecosystem, but it might pique your interest into going wholeheartedly into RX 7, as I did. This is encouraged, because RX 7 is a powerful, comprehensive tool.
This is one of those great moments where, as a working engineer, I am pleased to recommended the product I’m blogging writing about as a wholehearted fanboy: whether I’m editing vocals for short films like Ryker, web-series like BKPI, podcasts like Startalk Live, singles for various bands, or my own pet projects, I use iZotope RX with wanton abandon, and here’s how.
First, I position the vocal for groove/timing within the DAW as described above. Then, I render that track and bring it over to RX in standalone mode.
Leveling: I use the Leveler module for clip-gaining, using rather dynamic settings, and I’m not ashamed to admit it. Yes, I’ll need to go through with a fine-tooth comb and make sure nothing has been overcorrected, but the dialogue settings provided in the Leveler get me in the ballpark of musical consistency in a non-destructive way. I tend to go phrase by phrase instead of globally, with some (not a lot) of the de-essing parameter engaged.
De-essing: In RX 5 and below, I used manual de-essing all of the time, pulling down all the “footballs” and balancing them by ear after leveling. This worked, but was ever so slow. Thankfully RX 7 has a de-esser that feels almost the same as manual de-essing.
Plosives: If I’m working to sync, like with a movie, I can’t use the in-DAW plosive-editing technique described previously, as it might throw off the sync. Well, I can; there’re ways around it, sure, but the Plosive module gets rid of all the nasty P sounds with minimal effort on my part. It also sounds quite transparent.
Mouth De-click: Finally! A module sorely needed for podcasts. I don’t really need to say much about Mouth De-Click “it works.” It helps people stay engaged in vocals without ASMR-inducing or chalkboard-like side effects.
De-clip: When Eugene Mirman goes full laugh on Startalk Live, his mic clips. That’s not a spoiler—that’s inevitable for someone with such a robust laugh. De-clip fixes that issue 90 percent of the time in a way that’s transparent. Sometimes I need to marry the process with de-crackle for better results. Thankfully I have the choice.
EQ Match: If a singer or vocalist is talking off axis for a bit, or sounds inconsistent enough to take me out of the piece, EQ match does a great job at getting them close.
Dialogue de-noise: A huge go-to, and all over a movie I just did that was featured at the New York International Short Film Festival called Future Genesis. All sorts of background noise cropped up in that film, completely unbefitting of two people standing in a field. Because these noises were static enough, this module eliminated the issues. It usually does. It’s also great in musical scenarios; you’d be surprised how many pro studios I’ve had to remove the walls from.
Spectral Repair: A singer hands me the golden take. Just one problem: it’s a ballad and there’s a motorcycle plainly audible in the money note. This is a job for spectral repair, outlined in tutorials like this one.
Dialogue Isolate: You can also use dialogue isolate similarly to Spectral Repair, and with a bit less time on your end. Here’s a recent example of me using Dialogue Isolate and some other basic RX processing to zap a motorcycle right out of an actor’s take in a recent project.
This is the raw track:
And here’s the RX’d version:
It might sound a little processed now, but once the music bed, foley, and room ambiance are added, you’ll be hard pressed to know a motorcycle was ever there.
That’s about all I need in a given session of vocal editing/vocal polishing. Yes, it seems daunting at first, but it always benefits my process, for two essential reasons.
When you take the time to polish vocals before inserting them into a mixing session, you’re you’re studying the ins and outs of the vocal itself. You’re learning what makes them tick. This renders the mixing process a much quicker, funner affair, because your ear automatically knows what’s working with the surrounding track and what isn’t.
If there’s too much of a problematic frequency competing with an instrument, notching that frequency takes far less time, because you don’t need to go hunting for it. You also don’t need to guess at how much you should lower the offending frequency: you already know what degree of processing will take the vocal to an undesirable place by virtue of your by-now familiarity. Because you’ve become familiar with its structural integrity, recognizing potential problems around the vocal in the mix becomes much easier.
Likewise, as you’re polishing, ideas will percolate—nay, they will present themselves. You’ll hear a phrase and think, “this will sound lovely with a wash of reverb on this one word,” or “a delay throw on this line will definitely do the trick.”
You’ll feel the dynamics of the lines and have an innate idea of how to mix around them, automatically judging which phrases deserve more of an intimate treatment, and which could use bombastic gestures from the accompanying instruments.
Also, in culling away all the technical glitches and faults, you may surprise yourself with gold: sometimes, a mistake in the vocal, or a bit of noise that needs to be eliminated, can be reintroduced into the track as an effect. This has happened many times in my practice, far more often than you’d think.
On the song “Today Tomorrow’s Coming” from the musical Get Got, a whine from a pair of noise-cancelling headphones wound its way into a recorded track. I had to get rid of it, yes. But in the chorus, timed right, it made for a great piece of ear candy. In the following example, it shows up buried in the mix, once 10 seconds in, and then again more prominently in the right headphone at 13 seconds.
Granted, when you’re working for an employer, you don’t have as much leeway as when you’re working on your own projects, but if you discuss the move with the producer, or slip it in tastefully, you’d be surprised what you can add in to the betterment of the music. I call it the Bob Ross school of mixing, where you turn your “happy accidents” into assets.
Vocal polishing is a necessity in my practice, and one I used to think of with some dread. It was not where my passions lay when it came to mixing. However, over the years, I got better at it, and I got faster at it. But that’s not what’s important. No, I’d wager that things really change when I learned to appreciate the process. Instead of dreading its banality, I began to welcome the learning opportunity, as it would free me up for more creative pursuits in the mix. It is this realization—as well as the tips and tricks I’ve outlined above—that I really hope to impress upon you.