Understanding Spectrograms

September 25, 2014

Repair and restore your audio:

RX 7

iZotope email subscribe

Never Miss An Article!

Sign up for our newsletter and get tutorials and tips delivered to your inbox. 

The key to successful audio restoration lies in your ability to correctly analyze the situation—much like a doctor recognizing symptoms that point to a certain illness.

Constantly honing the ear to distinguish the noises and audio events that need to be corrected can be a life-long, never-ending quest. Fortunately, spectrogram technology—which is included in our award-winning audio repair toolkit, RX—makes this task easier by providing a visual representation of audio.

The topics covered in this blog post include:

  • How a spectrogram works
  • How to examine your audio file
  • How to fine-tune the display

To learn more about spectrograms, read our blog post on how to fix common audio problems using a spectrogram.

How a spectrogram works

The aim of any good visualization tool for audio repair and restoration is to provide you with more information about an audible problem. This not only helps inform your editing decisions, but, in the case of a spectrogram display, can provide new, exciting ways to edit audio—especially when used in tandem with a waveform display.

So what is a spectrogram? A spectrogram is a very detailed, accurate image of your audio, displayed in either 2D or 3D. Audio is shown on a graph according to time and frequency, with brightness or height (3D) indicating amplitude. Whereas a waveform shows how your signal’s amplitude changes over time, the spectrogram shows this change for every frequency component in the signal.

How to examine your audio file

If you’re used to using the waveform display, it may take a while to get your head around this unique way to “see” the audio. As a start, let’s look at a few simple pieces of audio.

Here’s a picture of a sine wave moving up in pitch from 60 to 12,000 Hz as seen using a waveform view.

One thing you’ll notice when looking at the waveform display is that it’s good at showing audio amplitude, but less effective at showing what’s happening at different frequencies. For example, we can easily see here that the sine wave is the same level for the entire duration of the file. However, we can’t tell much about how the pitch or frequency changes over time.

Now let’s look at this same audio file using a spectrogram.

Now it’s very obvious that the pitch of the audio is moving up! The horizontal axis shows time, just like the waveform display. But now, the vertical axis shows us frequency in Hz—the pitch of the event that’s happening. We can see how loud events are by how bright the image is. The black background is silence, while the bright orange curve is the sine wave moving up in pitch.

Now let’s look at something more complex: the human voice. Here’s a short, spoken phrase as seen through a waveform display:

What we’re seeing here is the amplitude of the spoken words over time. If we switch to the spectrogram view, we’ll see many things we can’t see in the waveform view:

The human voice is much more complex than it might seem from looking at the waveform view. Each word is made up of a fundamental frequency (at the bottom of the spectrogram), harmonics that extend above that frequency, sibilance (“S” sounds) that begin or end words, and more. And of course, you can now see more clearly the noise that is surrounding the voice.

This is why having a detailed spectrogram display is so important to doing audio restoration. It helps you clearly see the problems that you’re trying to fix.

How to fine-tune the display

Not all spectrograms are created equal. An algorithm known as the “Fast Fourier Transform,” or FFT for short, is used to compute this visual display. Many products that feature a spectrogram display allow you to adjust the size of the FFT, but what does this mean for audio repair and restoration? Changing the FFT size will change the way the algorithm computes the spectrogram, causing it to look different. Depending on the type of audio you’re working with and visualizing, this may help.

As a rule, higher FFT sizes give you more detail in frequencies (frequency resolution), while lower FFT sizes give you more detail in time (time resolution).

If you’re trying to identify a plosive, mic handling noise, or other muddy low-frequency information, a higher FFT size in your spectrogram settings will help. If you’re trying to identify a high frequency event, or working with a transient signal (such as a percussion or drum loop), choose a lower FFT size.

The following image is of a drum loop in a live concert setting, with a member of the audience whistling. You can see how the different FFT sizes affect the way we see high vs. low frequencies, as well as transients vs. sustained notes.

Now that you understand how a spectrogram works, you’re on your way to being able to properly identify and diagnose common audio issues. And once you learn the best tool(s) for the job, you’ll be able to get the results you want, every time!

iZotope Logo
iZotope Logo

We make innovative audio products that inspire and enable people to be creative.

Subscribe to our newsletter

Get top stories of the week and special discount offers right in your inbox. You can unsubscribe at any time.

Follow us

Copyright © 2001–2020 iZotope, Inc. All rights reserved.