What's the difference between audio file formats?

As technology advanced in the realm of recorded sound, so did the way that we listen to music. Within a few decades after the innovation of music recording, the record was born. The modern iteration of the vinyl record was established fully in the middle of the 20th century, and was the standard for many years. Analog formats ruled as the 1970s and 1980s brought us cassettes and 8-tracks…and then CDs arrived, and the digital revolution in audio took off in earnest.

Here we are in the 2020s, and we’ve grown used to the variety of audio file formats we come across while sharing, streaming, or working with music. If you’re an engineer, which format is best for your project? When you’re finished, on what service will it be heard? What format does that service use? What type of files should you give to your clients? The options may seem confusing, but we’ll break it down here.

General audio file types

When differentiating between audio files, the overarching categories are uncompressed audio, lossless audio, and lossy audio formats. Compression in this case, is not the dynamic range compression we use when processing audio, but rather data compression.

Uncompressed audio, refers to a file that has not been data compressed.

Lossless audio is a data compressed format that preserves all the original data. You can think of it similarly to a ZIP file. The file is encoded (compressed) and then when decoded all the information is retained.

Compressed lossy audio uses data compression to make the size of the audio file smaller by removing inaudible material. This allows for easier transmission of the audio data (say, over email, or on a website). This compression is destructive; that is, after the audio has been compressed it isn’t possible to regain that information

Uncompressed audio file formats

WAV

AIFF

Lossless audio file formats

ALAC

FLAC

Lossy audio file formats

MP3

M4A

Defining audio file formats

Uncompressed audio

We understand tape, records, etc. as analog formats. Analog audio refers to a continuous-time signal where electrical voltages are analogous with sound pressure levels. Digital audio, by contrast, is a discrete-time signal captured into numerical samples by way of an analog-to-digital converter using pulse code modulation, or PCM. PCM samples the audio at uniform intervals. The process of quantization converts each analyzed sample to the nearest digital value. Linear PCM (or LPCM) is similar but unlike PCM the quantization values are linear and proportional to amplitude. In most cases, PCM is the catchall term for both.

A bit of a backstory on “sampling,” for context. The Nyquist-Shannon Sampling theorem states (broadly speaking) that to sample a signal accurately, the rate of sampling must be greater than two times the highest frequency. So to faithfully recreate a 22 kHz signal (which is above the highest frequency nearly all humans can hear), we must sample it at 44.1 kHz – at least the minimum and maximum amplitude of the signal.

Another important thing to note is bit-depth. The on and off of a digital system (1s and 0s) uses a system called binary. A byte is made up of 8 bits; a word is one or more bytes. 16 bits is 16 values in a word, and 24 bits is 24 values in a word. Bit-depth is how much data is included in each digital word. The more bits, the higher the resolution, which affects not only the dynamic range of a file, but the signal to noise ratio.

WAV files

.WAV file

.WAV files – Waveform Audio File Format – are uncompressed audio files, developed by Microsoft and IBM back in 1991. They utilize LPCM encoding. WAV files are one of the more popular digital audio formats and a gold standard in studio recording. WAV was one of the first digital audio formats, and quickly became a staple across all platforms. Despite decades of progress, it still maintains its position as one of the world’s leading pro audio formats.

WAV files capture and recreate an original audio waveform at the highest quality without affecting or altering the sonic characteristics of the sound in any way. WAV uses PCM (Pulse Code Modulation) to encode the data by slicing it into small chunks to provide the highest quality possible. It’s a lossless file format, meaning that there is no data loss whatsoever. So what gets captured and recorded is the closest mathematical/digital representation of the original audio waveform—no noticeable audio quality loss happens in the process.

At minimum, you want to track 24-bit, 44.1kHz files, to capture the full dynamic range of human hearing as well as minimize noise and allow for full representation of dynamic range.

Another related file is the .BWF – Broadcast Wave File. It has the same information as a .WAV file from an audio quality perspective, but contains extra header information that can be useful for broadcast. This may contain timecode, and other information about the file itself (max momentary and integrated loudness, date of origin, name of originator, etc). Not all systems can read this metadata, but it can be helpful in film and broadcast situations.

WAV files are also uncompressed, meaning that the data is stored as-is in full original format that doesn’t require decoding. This provides enormous versatility allowing for superb editing and manipulation.

AIFF files

.AIFF file

.AIFF (Audio Interchange File Format) files are functionally the same, but are a bit differently encoded. AIFF files are the Apple equivalent to .WAV files. They also use LPCM encoding, but have the ability to save more metadata into the header of the file. If you are working on a Mac with Logic or GarageBand, this is the file type you are mostly likely to encounter.

For the most part, whether you use .AIFF or .WAV files comes down to personal preference, and the artists you collaborate with. In either case, these are the formats you want to be working with for tracking, mixing, or mastering. It provides studio-grade audio recording and playback. Offering sample rate and bit depth options just like WAV files, AIFF registers the audio waveform as accurate samples (slices) using PCM to offer the highest possible audio recording quality and sound replication. Just like WAV, AIFF also stores data in uncompressed, lossless format, meaning you get no quality loss, just pure sonic happiness.

So what’s the difference between the two? It mainly boils down to history. AIFF was created by Macintosh in 1988, allowing full studio-quality audio recording and playback on Apple computers. WAVE was created from a partnership between Microsoft Windows and IBM in 1992, so WAV files played back natively on Windows machines. Nowadays both formats can be recorded and played back natively on any operating system, so they’re easily interchangeable, offering the same high-quality audio, regardless of format.

.WAV vs. .AIFF

So if WAV and AIFF can both offer the same highest studio-quality audio, which one should you choose? Well, that will really depend on your use case. For starters, the historical prevalence still stands today. WAV files are more popular on Windows, whereas AIFF files keep their ground on Macs. If you’re planning to send your audio files to the studio for further overdubbing or mixing, consistency with your session is important, so talk with your sound engineer about what format they plan to use in the session, and make sure your audio bounces match. The great news is, regardless of which of the two formats you choose, you will achieve exactly the same superb audio quality.

Lossless audio file formats

.FLAC file

FLAC stands for “Free Lossless Audio Codec” and is an open source, data compressed file that retains all the file’s information in the encoding and decoding process. After compression, the file is usually reduced between 50-70%. The amount of data compression chosen in the encoding process dictates the percentage, as well as how long it takes to encode. The code has been optimized so that decoding speeds stay about the same.

.ALAC file

Not wanting to be left out, Apple created its own file, ALAC, which stands for “Apple Lossless Audio Codec,” and is functionally similar to .FLAC. It usually is placed in the MP4 container, which has the extension .m4a - this same extension is used for Apple’s lossy audio codec, but the encoding is different - the container is the same. This format is used for Apple Music’s Lossless Audio playback. The files use more CPU power to decode, vs .FLAC.

Lossy audio file formats

MP3 files

.MP3 file

Uncompressed audio formats like WAV and AIFF provide gorgeous sound quality, but at the cost of high file size. With the boom of internet file-sharing in the mid-90s, people quickly realized sending uncompressed files over dial-up connections was impractical—and oftentimes impossible. Which is why MP3s (MPEG-2 Audio Layer III) were born.

The most common type of lossy audio file formats is the .mp3. This format is the 3rd layer for the MPEG-1 format, which was then expanded further via the MPEG-2 format into the .mp3 format we know of today. It was developed mainly by the Fraunhofer society. The files are encoded using perceptual encoding to reduce the quality or eliminate entirely information that has been determined to be beyond what most humans can hear.

While a three-minute song would average 30MB in WAV or AIFF format, that same song converted to MP3 would take up a tenth of the space—only around 3MB. With compression algorithms that were capable of achieving impressively small file sizes, MP3 became a staple of the internet era and has maintained its strong position to date.

Like images, smaller audio files lose clarity and detail.

However, small file size came at the cost of sound quality. Take the pair of images above. On the left, you can see every little wrinkle and color vividly. A highly compressed image (on the right), however, becomes very pixelated and loses all of the clarity and detail. The same happens when you compress an audio file.

Different compression formats use varying methods to re-encode the data in a way that saves space. But this saving of space means some data has to get lost in the process. Usually, high frequencies are the first ones to go, as the majority of people can’t hear the details in really high frequencies. The lower the encoding quality, the more frequencies and details will get lost in your audio.

Having said that, modern compression algorithms allow for higher bitrates, which, in turn, means that they’re able to achieve high compression ratios with little noticeable loss to the quality of the audio. Bitrate represents the amount of data conveyed per second of audio content, with the general rule of thumb being: smaller bitrates = smaller file sizes. So if you want to maintain good quality, yet still make use of the fact that MP3s are easy to share with friends and family, keep your bitrate above 128 Kbps (kilobits per second).