Welcome back! You’ve made it to the final episode of our audio post-production series. So far, we’ve talked about the kind of work that’s out there, approaches to post projects, tools and technologies, and some of the wacky stuff you’ll see on the front lines of mixing audio for image.
We’ll wrap up this series with some tips for mixing, deliverable specs, and some dire media warnings. Onward to the scary stuff, just in time for Dia de los Muertos.
Your output is only as good as the quality of your input.
Let’s presume for a moment that your source audio is well captured, or that you’ve already done the work required to get things sounding at least passable. Review some of the tips in the previous articles to fine tune your audio before the balancing begins, particularly the plug-in discussion in Part 3.
Again, do not start working on the project until you have a finished and locked video edit. Of course, you may be forced to mix without the final edit, and in some cases, you may even need to mix without picture. Huh? Yes. I’ve had to mix without video numerous times, usually because the editor is adding B-roll or redoing CGI, or the director wants to color correct at the 11th hour. (Color correction is the video equivalent of mastering audio.) The video sync may not change, but as deadlines approach, everyone kicks into high gear to get ‘er done. Try to avoid doing this if you can, but you never know what twists and turns you’ll encounter.
If you haven’t already, get your audio tracks grouped according to the following tracks:
Create a submaster for each group so you can add bus processing to control dynamics and shape EQ overall. This also gives you five faders to manipulate, rather than 100 or however many.
It’s good to control the dynamics of each bus so you can maintain consistent levels on your main output. I like using a maximizer-type limiter, like the one found in Ozone mastering software. You have to be kind of careful, because you can really hear the artifacts if you hit these things too hard. No more than -2 dB or so of gain reduction, in my experience.
Start by identifying the loudest part of the soundtrack and use that as the upper limit of your dynamic range. Zoom out on your session and look for the largest concentration of big fat waveforms. Loop this section and check dynamics processing for clips or over-compression, that sort of thing. This is where you’ll want to check your main mix level meter to make sure you comply with the delivery spec.
In mixing for image, dialogue is king and/or queen. Don’t let other elements bury the speech, unless the action dictates that treatment. It doesn’t matter how much you like the music or the clever sound design. It’s all about balance; everything in its place. Rather than guessing where the dialogue should sit in the mix, there are tools, such as the Intelligibility Meter on iZotope’s Insight 2 plug-in that can help you find the sweet spot.
This isn’t rock ‘n roll, so don’t use the compressors to crush your signal. Rather than make mixes sound “punchy,” this will just crank up the noise floor. Set your compressors for no more than -6 dB of compression.
Dialogue, in particular, is sensitive to over-compression. The challenge here is that many amateur actors will start each line of dialogue with confidence and projection, then peter out at the end of a phrase. This is especially true of interviews. Use clip gain to boost gain gradually toward the end of a line delivered forte piano (loud, then soft). That little upward slope will make noise floor changes more subtle while keeping the input gain constant in your signal processing chain.
Mixing music vs. dialogue. I am inherently lazy, so on the music sub bus I use a sidechain compressor input fed by a send from the dialogue bus to create an intelligent music ducker that will save me from having to automate every single fader move. Let the dialogue modulate the levels for you, it’s easy!
When EQing music, you need to preserve the energy of the track while simultaneously making room in the frequency spectrum for the VO. Usually, this means turning down the 1–4 kHz band in the music. The problem there is what happens to the music when there is no VO present. Sounds muffled, loses energy.
One fix for this is to use a multiband compressor set to compress only the upper mids of the music between 1.5–4 kHz, where vocal intelligibility happens. Wait, not done yet. As in the example above, create a send from the dialogue bus to feed the multiband compressor sidechain input. This will only trigger the frequency-dependent compressor when there is VO present, and the amplitude of the VO signal will modulate the depth of the compression effect on that narrow frequency band in the music track. Voila! Problem solved using science.
If you’ve done your job correctly with each track, there should be very little EQ work needed for the final mix stage. If you need a little more sumthin’ sumthin’, you can EQ the submasters as necessary.
This is entirely dependent on the production, so I’ll refrain from telling you to flange the hi hats. One thing to watch out for is ADR—you’ll need to match the ambience of the original scene, so if the scene is a basketball gym, for instance, you may need to add reverb to the ADR lines to match the visual.
Tweaking audio tracks to sound like a radio or CB or airport PA announcements is called “futzing.” There are some cool plug-ins out there to help recreate the sound of walkie talkies or what have you, but you can do a lot of those things with EQ, reverb, and distortion plug-ins. A very common effect is “music from next door,” in which you roll off all the highs and bump up the lows to simulate the neighbor’s annoying disco party at 2:00 a.m.
One of my favorite ADR stories involves futzing a line to sound as though the famous actor were inside a barrel. As much as I wanted to put this guy into the actual barrel, I made do by coercing him into holding a garbage can over the top half of his head while he spoke. It sounded perfect, and we all chuckled over getting him to do it. There’s a picture somewhere...
Once you get levels and balance under control, it’s time to print your mix. Don’t “bounce” these mixes, mix in real time. Why not bounce, you might ask? For one, if you bounce, you don’t get to listen to the whole program, which means you can’t judge the dynamics of your mix, or identify problems that you may have missed. Yes, bouncing saves loads of time, but it’s not worth gambling on leaving a mistake in your mix. Like accidentally leaving a track muted, or forgetting to un-solo an effects track. Almost sounds like I’m speaking from experience, right?
Instead, create a stereo destination track (or 5.1, 7.1 or whatever you need) in your session, and route all bus outputs to those destination tracks. Arm those tracks to record, and roll!
It is not unusual to mix a soundtrack in smaller increments, as short as a few seconds or as long as a few minutes, depending on the length of the scene. You simply start at the beginning, recording your mix until you hear a mistake or something that needs to be repaired. At that point you pause, fix the problem, rehearse the section, backspace an appropriate amount of time, and punch in on your record tracks.
Automation is very helpful in this process, but judicious use of dynamic signal processing can help compensate for a multitude of level changes.
When you’ve finished mixing the program, consolidate the entire track (if you have punch-ins) to a single stereo or multichannel audio file that is exactly the same length as the video. Same start time, same end time, same overall length.
You may never encounter time code as a mixer, and be all the better for it. It’s been a very long time since I’ve had to sync to tape, but I remember the challenges and shortcomings of time code very clearly. These days we get AAF/OMF files that neatly line up with the video, and stay in sync for the duration of the project. So good.
A few words on TC use—it’s still an important part of the production (shooting, editing) process for syncing video and audio takes, and invaluable if you’re working on a long form film project. You can use TC to determine which reel you’re working on, and learn if the project is designed for broadcast (Hint—25 frames per second (fps) is the PAL broadcast frame rate. We use 29.97 fps drop-frame TC for NTSC broadcast in the US).
You will also encounter 23.97 fps, 24 fps, and 59.98 fps. Make sure that the frame rate of your DAW session matches the frame rate of the incoming video file, or you will be in a world of hurt when it doesn’t sync up. The “get info” command at the finder level of your OS should yield the pertinent video file details.
Oh yes, don’t change the time code rate in the middle of a mix. Don’t even ask why, it brings back painful memories.
Nearly every project and producer will have different delivery specs for audio, based on the final destination of your mix. Here are some of the common specs you’ll encounter...
I’m a little old fashioned, so I like stereo 24-bit 48 kHz BWAV files. Just about any DAW can output that format, and just about every video editor can accept that format. You may see variations, like 16-bit/48 kHz WAV, or 24-bit/48 kHz AIF. You should get suspicious if you see a request for a 44.1 kHz spec, MP3s, or anything below 48khz. That may indicate the need for a “teachable moment” with your producer.
Question: When would you NOT want to make a 24/48 BWAV mix? A: When you need to deliver multichannel files, like a 5.1 or 7.1 mix, in which case you would deliver an AIF file. Why? Because WAV files tend to lose their channel assignments sometimes when being laid back to video. Something in the way the file data header is stored. Don’t ask me why, but I have encountered this too many times to take chances. It’s all the same to video editors, so I prefer to be safe and deliver AIFs for multi-channel mixes.
As for marrying the audio to the video (“layback”), we aren’t video specialists, so, unless YOU happen to be one, leave the layback to the video editor. In a pinch we can use Quicktime 7 to do the job, but there are some caveats in doing so. Don’t trust your DAW’s “Bounce to Quicktime” option—it bounces faster than realtime, so you’re not hearing the soundtrack, and you’re never sure what the QT codec is doing to the video.
Since the passage of the CALM Act in 2010, broadcasters and producers are responsible for adhering to very specific overall audio levels, measured for short term, long-term, and peak specs.
Loudness is now measured in LUFS or LKFS (Loudness Units relative to Full Scale). Broadcast entities will usually ask for something in the -23 to -25 dB LKFS range, with peaks in the range of -2 dB to -10 dB, depending on the format (stereo vs. surround) and the network.
Netflix, Amazon, and other online VOD services have their own deliverable specs. These range from -15 dB to -18db LKFS with peaks of -2 dB to -10 dB. As you can see, that’s quite a difference in range and peak specs, be sure to check with your producer to obtain the actual spec they need to subscribe to, or you may find yourself remixing your program.
My clients are simple—they usually ask for -16 LKFS for stereo mixes and -24 LKFS for multichannel mixes. I like that, it’s easy to remember.
So, how do you meter this strange beast? There are lots of options out there in the world, but one that I often use is the excellent iZotope Insight plug-in, with Spectrogram, Sound Field, Spectrum Analyzer and the all-important LUFS/LKFS metering module.
Oy. More files. Some producers will require you to do an M&E (music and effects) mix, which is essentially everything BUT dialogue. This is typically used for foreign distributors so they can ADR the foreign languages right into your pre-mix. Some will require you to deliver stems (separate stereo dialogue, music, and effects mixes). Knowing this up front, I created a mix template that accommodates busses and print tracks for ALL of those options, PLUS a 5.1 mix and 4 channel live show mix. The routing is ridiculous, but it saves tons of time in setting up a session. All the processing, metering, and labelling is built into the template, so it comes together very quickly.
Name the mix record tracks according to your prescribed naming conventions, allowing for version numbers or variations. Save a copy of your session to coincide with the number of the mix. So, A mix named 2018.10.10_Stereo_MixedAudio_V2_-16LKFS.wav would live within a session named 2018.10.10_FinalMix_V2. Adapt to your own nomenclature, or that specified by the producer.
In a word—no. Okay, yes, but only in emergencies. (Sorry, my video brethren and sisterns.)
This is mostly because their video editing software doesn’t have the resolution to make critical audio edits. They have 1 frame resolution, or about 1/30th of a second, generally speaking. DAWs have 1 sample resolution, or 1/48000th of a second @ 48khz. 1 sample is roughly 1600 times smaller than 1 frame, in terms of audio resolution at 48khz, which means we can edit breaths and noises with far greater accuracy than video editors.
In terms of mixing, video editors may have access to some of the more sophisticated audio processing tools we use, but many do not. Then there’s that pesky audio learning curve, not to mention the mountain of deliverable options. Thankfully, this is what keeps us audio mixers in business.
Yes, but again, only in emergencies. Many DAWs give you the ability to do simple butt cut video edits, but this should only be done when there’s no other option. We don’t have the sophisticated color correction tools or format export options that the video folk use.
That said, blessed be the editors who can do BOTH with equal proficiency.
Whew. We have covered a lot of ground in these four articles. For the sake of brevity, I have left out a good many details which you will simply have to research on your own. There are many excellent resources out there for prospective audio post editors and mixers, beginning with the great video series from iZotope featuring techniques and tips from expert practitioners. Check them out as soon as you can, you’ll learn a ton from them.
Best of luck with your travails, I hope you endeavor to make audio better for the video and film world. Don’t forget, turn those TV commercials up as loud as you can so they blast people off their living room sofas! (I kid.) Thanks for reading, I hope you found this to be useful!