Video Editor Works with Adobe Premiere Pro.jpg
August 22, 2018

Mixing Audio for Video, Part 1: Getting Started

In part one of this mini series, “Mixing Audio for Video,” we cover the basics: types of audio post formats, typical workflows, and some key terminology.

Do you ever wonder why your favorite movies or TV shows sound so good? Or why TV commercials are so much louder than your favorite movies or TV shows? Or why some internet videos sound so bad?

In this mini-series we’re going to discuss the creation of soundtracks for video and film, also known as audio post-production. Mixing audio for video is a fairly deep subject, so you get four articles instead of one. Covering all the bases would take a book (or books) and would need to be revised every 15 minutes, due to the ever-changing nature of business and technology.

This first article, “Getting Started,” covers the basics, a little background, some terminology, and hopefully gets you interested in diving deeper. Subsequent articles will address workflow, standards, deliverables, and careers in audio post. Read on, and stay with us for the whole series, if you can tear yourself away from YouTube.

Getting started in the world of audio for video

Way back in the dark ages of the 20th century, budding recording engineers often had to make a career choice between producing music or producing audio for visuals, like film or TV soundtracks. Specialized tasks meant using specialized tools to get the job done, and about the only thing in common between the two gigs was the use of tape machines and synchronizers.

The introduction of computerized digital audio editing systems in the late 1980s made it easier to use the same recording and editing tools to produce both music and soundtracks. I’m talking about nonlinear editing systems like AVID AudioVision, AMS Audiofile, NED PostPro, and early Pro Tools. That era was the first time I heard the term “convergence,” in this case referring to the blurring of lines between the worlds of audio and video production. As a result of this convergence, engineers at my recording studios had to learn to do post sessions by day and music sessions at night in order to take advantage of all the work opportunities. Music was (and is) fun, but post paid the bills.

The process has evolved a great deal in the intervening years, so, for those just getting started, let’s take a look at the modern process for audio post-production.

Types of audio post

We should make a distinction here between audio post and other soundtracks, like podcasts, radio commercials, or audio books. When we talk about audio post, we are referring to audio created for a visual component. Such as…


Television shows can be nearly any length, but most US broadcast programs are designed for the 30 or 60 minute format.

Many TV shows are produced by highly experienced TV studio production teams in LA. Reality TV can be shot just about anywhere, but requires a good deal of post-production (both audio and video) in order to create a professional result.


Short films can run a few minutes, long form films can run a couple of hours—or even many hours, if we’re talking Gone With The Wind, clocking in a hair short of 4 hours. This category includes productions for HBO, Netflix, and Amazon, as well as the traditional major film studios.

At the other end of the financial spectrum, independent film makers producing a small or no budget project still need audio post, and working on these films can be a great way to get some on-the-job training.


Commercial projects can include TV commercials, infomercials, PSAs, promos for other programming, and political ads. Commercials run in very short formats, ranging from :05, :10, :15, :30, and :60 in length. There are longer commercials, but it tends to get expensive buying airtime for a 2:00 commercial.

These can run on TV, at movie theaters, before your favorite kitten videos on YouTube, and just about anyplace that features streaming video content, and are usually created by advertising agencies and top notch video production teams using a dedicated audio person or persons.


When some big company needs to train its employees or customers how to do something, they make a video. Likewise, if they’re rolling out a new product, talking about HR policies, crowing about quarterly financial performance, etc.

These are supposed to be shot by professional videographers, but often, in the effort to save shekels, they will neglect to hire a professional audio person for field recordings. This can result in good looking video with unusable audio. Unless they hire an audio post professional to salvage their noisy, distant, boomy, reverby, lip smacky, skype-y, drop out-y, -60 dB, horrible audio captured with a camera mic next to a cement mixer. You think I’m kidding.


Games are fun. Well, game audio is fun…in moderation. Most AAA games have a dedicated audio team to create and capture sounds, which will be absolutely unique to the game they’re building. It can also be a tremendous amount of work, requiring thousands of audio files authored into a game engine using middleware like Wwise, Unity, or FMOD.

Creating soundtracks in different languages multiplies the number of files to be managed and increases the time it takes to create game audio assets.

Because of the specialized workflow, we’re not going to discuss game audio in detail here. But if you want to learn more about some of the top trends for creating sounds and music for virtual reality and game applications, check out this interview with game industry veteran Brian Schmidt.


Olivia de Havilland, Leslie Howard, and Vivien Leigh in 'Gone with the Wind'


According to the dictionary, workflow is “the sequence of processes through which a piece of work passes from initiation to completion.” Let’s look at the audio post workflow steps in order and see how they apply to different productions.


A pre-production meeting gets you together with the production company, director, or advertising agency before the actual production begins. If you’re lucky enough to get in on this meeting, you can bring your expert opinions to the production team, potentially saving them time and effort. If they are open to creative input, you could also help shape the soundtrack at the concept stage. It also means you can have some impact on determining the audio budget, always a good thing.

Up for discussion: timeline, original music vs. library music, sound design, ADR, versioning, localization, and, of course, budget.

Remember: an hour of pre-production will save you 10 hours of flailing.


Makeup is applied, craft services are consumed, lights are lit, actors act, video is shot, audio is captured, takes are logged, computers animate the action sequences, and most of the budget is spent during this part of the process.

Video editing

After all of the visuals have been captured or created, the director works with a video editor to select the best footage and assemble it in such a way as to create a good story. When the editing is complete, you will receive a finished version of the project that (in theory) can not be changed. That’s called “picture lock.” Picture lock can only be achieved when the deadlines have passed and the budget has been exceeded. (Kidding. Sort of.)

Creating an audio session, importing data

Your video editor should give you an AAF or OMF export from their system containing all of the audio edits and media to re-create their audio edit. Importing the file into your DAW will give you a starting place to review the audio work the video editor has completed.

At this point you will also import the edited video, making sure your picture is in sync with the audio from AAF/OMF. Take care to maintain the correct time code frame rate, per the editor’s notes.

A word or two about time code—learn it! If you want to geek out on TC, check this link for starters.

Common time code formats include:

  • 24 frames per second, for film production (sometimes 23.976 in the video realm)
  • 25 frames, for countries using 50 Hz AC power
  • 29.97 frames, for most video production in the US
  • 29.97 drop frame, for US broadcast production
  • 30 frames, for black & white TV production, and sometimes CG animation

Most workstations will give you a warning message when your session TC frame rate does not match that of the video. You will definitely want to use the same TC rate as the video, otherwise... while things may seem to be in sync for awhile, any mix or “layback” will be out of sync with the final picture. (“Layback” is the process of embedding your mixed audio track into the final video file.)


This is where you sit down with the director or producer and they tell you exactly what they want and where they want it. You’ll view the entire film/video in order to take notes on dialogue, ambience, sound effects, and music. If you have an audio team, you will divide those tasks into groups.


Where there is dialogue, it will always be the most important part of the soundtrack. An experienced editor will give you their dialogue edits on different tracks, one per actor. In cases where a location audio person was hired to record production sound, you may receive two tracks of audio per actor—a lavalier (clip-on mic) and a boom (shotgun) mic. Your job is to select the track that sounds the best and is most consistent throughout the shoot. If you end up having to use both audio tracks, make sure that correct phase is observed between the two tracks.

If there is noise on a dialogue track you may have to use noise reduction or other software tools to repair audio so it may be used in the final mix. The noise may result from something simple, such as 60 cycle AC line hum, constant background noise from a fan or air conditioner, or other consistent sources. This is easily removed using iZotope RX as part of an offline or real-time process.

For other noises, the RX standalone app has some features that are nothing short of miraculous in their ability to remove or reduce mouth clicks, clothes rustling, complex or intermittent background noise, plosives and de-essing, and it can even fix clipping! Every post-audio engineer I know uses this software daily to get ‘er done. Check it out!


If there are problems with audio captured on set that can’t be repaired with RX, you may have to use audio from another take, or perform ADR (automatic / automated dialogue replacement, also called “looping”). ADR means getting the talent into a recording studio to:

Replace missing audio or flubbed lines

Replace dialogue obscured by noise on set

Provide a dialogue bridge for an edited plot line

Actors will watch their scene with pre-roll and re-do their lines when cued by three countdown beeps (“3… 2… 1… Go!”). They will usually do three or four takes in a row as the video loops over and over. Hence the aptronym “looping.” The producer or engineer will pick the best performance and replace the original dialogue section with the new performance. Even though you may use the same mic as the original recording, you will likely have to use EQ, compression, and reverb to get this new performance to match the timbre of the original.


Whenever you have dialogue edits, there will be gaps in the background ambient sound. There is nothing more unsettling than listening to a soundtrack where the ambience is inconsistent—or nonexistent—from shot to shot. This means you will need to edit the background sound to fill those holes, or to make a scene feel contiguous.

If your production sound recordist captured room tone on location, your problems are solved by filling holes with room tone. If they didn’t, there are tools to recreate random room tone based on noise samples excerpted from existing dialogue recordings.

iZotope’s Ambience Match module in RX 6 Advanced can analyze a sample of ambient noise, then generate a region of randomly generated room tone or other ambience, and the length can be determined by the user. This has to be one the most frequently used tools in my kit.

Sound effects (SFX)

Whether finding the perfect car crash sound in a library, creating footsteps on a Foley stage, using synthesizers to craft other-wordly soundscapes, or just grabbing a mic and recorder and heading outside to capture the sounds of nature, this is the sound editor’s opportunity to get creative.

SFX Libraries are great for low-budget projects, but you shouldn’t use them in broadcast productions or films. Some sounds are just too recognizable. (See “red tailed hawk,” or “Wilhelm Scream.”)

Custom SFX. Major film and TV productions use teams to gather and create their own vocabulary of sound SFX which become as much a part of a series as the music itself. I’m thinking about the soundtrack for Stranger Things, or just about any sound in the Star Wars universe.

Foley is the art of creating sounds on a stage full of junk (basically), which can include slamming doors, footsteps in pits of sand/gravel/leaves, etc., sloshing water, breaking glass, you name it. Foley artists are the pro’s who create those sounds in real-time as they watch the action on a large screen. Need the sound of bones breaking? Twist some celery. Walking on snow? Pound your fists into a bowl of corn starch. You get the picture. (Here are more ideas for recording foley.)


Whether using royalty-free library music or a custom score, the director or producer will have final say over what music is used, and where/when the music is present. Sometimes video editors will create music edits to fit a scene, and occasionally it even makes musical sense. Other times, it’s up to us mixers to make sure it has an appropriate number of beats in a bar. The key thing is to make sure the beats and accents coincide with the rhythm of the on-screen action as the director intended, and that music begins and ends when it’s supposed to.

A video editing workstation typically resolves to one-frame increments (about 1/30th of a second), which does not always fall directly on a beat or hit. Audio DAWs operating at 24 bits / 48 kHz sample rate have a resolution of 1/48,000th of a second; exponentially more accurate than video editing, that’s why video editors allow audio editors to finesse music edits.


Assembling all of the above elements into a DAW timeline and balancing each group of sounds into a cohesive soundtrack is an art unto itself. Big time post facilities use multiple workstations and operators online simultaneously to feed sound elements to a sound re-recording engineer or engineers, who balance all of the sub-grouped sounds into a mix.

On smaller productions, one individual is responsible for wrangling all the sonic elements in a single mix session. We’ll discuss how to handle that part of the workflow in a later segment.


Your director/producer should give you a spec sheet for deliverables, and they will differ greatly from one medium to the next. We’ll go over deliverables in more detail in Part 4!


Understanding the workflow is key to a successful beginning in audio post, especially if you want to do this on a regular basis. This is a big topic with lots of elements to master, so review this first chunk to get familiar with the process. Next outing we’ll dig deeper into TV, film, commercial, and corporate work to see how they are similar, and we’ll look at ways in which the work can be very different.

Until then, go watch a Star Wars movie with the sound turned down and imagine what it must have been like for sound designer Ben Burtt to see this footage for the very first time—sans lightsaber sounds, blasters going peew-peew, or the sound of Darth Vader’s ship careening through the vacuum of space. He made all of that stuff up from scratch! In the process, he created a new vocabulary of sounds and raised the bar for sound designers and mixers forever.