May 18, 2017
Mouth clicks and lip smacks are a common problem for dialogue editors. iZotope RX 6 introduces a new module for their automatic reduction—Mouth De-click. In this article, Alexey Lukin, the developer of Mouth De-click, describes how it works and what makes it different from other declickers.
To understand how declickers work, it's helpful to first think about classifying noises in audio signals according to their durations. Impulse noise only contaminates a small fraction of signal samples. Some examples of such noises are clicks and pops of a vinyl record, clicks of a scratched audio CD, and electromagnetic interference from a cell phone. Stationary noises are the ones that do not change over time. Their power and spectrum stay the same. Some examples include tape hiss, 50/60 Hz power line hum or buzz, and air conditioning noise. Intermittent noises include disturbances like coughs, footsteps, squeaky chairs, and clothing rustle.
Click reduction algorithms aim to remove impulse noises. Many louder clicks can be easily found as spikes on the signal waveform. However, minor clicks are only visible on a signal spectrogram as narrow vertical lines.
Historically, clicks have been repaired manually in the waveform using the “pencil” tool available in some DAWs. This is time-consuming and hardly effective, because hand-drawing of waveform samples is usually not accurate enough for seamless repair.
Algorithms for automatic click repair typically consist of two stages: click detection and waveform interpolation. Each of these stages can be manual or automatic. In iZotope RX, the manual way of click repair is available through the Interpolate module. The user selects the range of samples corrupted by a click, and the interpolation algorithm replaces the selection with the synthesized signal. The synthesis is done in such a way that the recovered signal is smoothly connected to the left and right sides of the selection, while the spectrum of the recovered segment matches the surrounding signal.
For a recording suffering from hundreds or thousands of clicks, like a vinyl record, manual repair is impractical. Click detection algorithms can automatically find many types of clicks and run interpolation on them. In RX, De-click and De-crackle modules contain such algorithms. While De-click targets larger standalone clicks, De-crackle attenuates smaller clicks that occur densely, as a continuous stream. These modules are often used together, with De-click preceding De-crackle.
When clicks are very dense and diverse in amplitude, it is often beneficial to run De-click a couple of times. The second pass will take care of smaller clicks that may have been omitted or uncovered by the first pass.
Because De-click and De-crackle are detecting clicks automatically, they need some context for efficient operation. So, unlike manual interpolation, they should be used on wider time selections, rather than a single isolated click.
In 2015 we were contacted by Don Baarns—founder of the “Audio Rescue RX” Facebook group and long-time RX beta tester. Don sent us a sample of speech where regular declickers, including RX’s De-click module, failed to produce a meaningful cleanup on some severe lip smacks. Even at higher settings of strength, which already made certain speech plosives lose definition, some mouth clicks were left largely untouched by the algorithm. Challenged by this case, we started to analyze speech recordings in search of clicks that were not treated well with De-click.
Most click detection algorithms are tuned for detection of very short clicks with a wide frequency range. Such clicks are visible as vertical lines on a spectrogram. Vinyl record clicks and digital clicks fall into this class. However, there are some clicks that don’t have the same broadband frequency profile. For example, lip smacks and mouth clicks are typically more narrow-band: they sound like drops of water and mostly occupy mid-high frequencies of 5 – 12 kHz. One could argue that such clicks are natural to speech and should not be interpolated by the declicker. But in many records of speech, mouth clicks are excessive and require attenuation.
For RX 6, we developed a dedicated algorithm for reduction of mouth clicks. Unlike the previously available De-click algorithm, Mouth De-click detects clicks in the middle-frequency region, looking for specific spectral shapes characteristic of lip smacks. The detector is sensitive to short bursts of energy in the 1.5 – 15 kHz frequency range, but also ensures that “useful” plosives and transients are not detected as mouth clicks by inhibiting detection at 0.3 – 1.5 kHz and around transients.
Once mouth clicks are identified, they are interpolated by an algorithm similar to Spectral Repair’s Attenuate mode.
The set of user parameters is similar to De-click. Sensitivity slider adjusts how many mouth clicks are identified and repaired. This is the most important control affecting the amount of processing: removing too many clicks may also reduce natural speech plosives, like T, P, K, and other transient sounds. Frequency skew slider steers the detection stage toward higher- or lower-frequency clicks. Click widening param allows repairing wider segments for treatment of heavier lip smacks.
From the spectrogram image below, you can see that the old De-click algorithm fails to identify and remove all the mouth clicks, even when Strength is set to 7. At the same time, new Mouth De-click removes virtually all clicky sounds, with less impact on overall speech quality.
Mouth De-click is also designed as a low-latency alternative to the De-click module. Although it specifically excels at removing mouth clicks, it can also be used for repair of other click types in situations where real-time performance is important. The table below summarizes latency and CPU load of click reduction modules in RX. The CPU load has been measured by rendering a mono 44.1-kHz file and may not accurately reflect the real-time CPU load in every DAW.
|De-click||700 – 2800 ms||1 – 4%|
|Mouth De-click||60 ms||3%|
Quality of audio repair often depends on the correct sequence of processing steps. Just as other impulse noise repair algorithms, Mouth De-click is supposed to work first, ahead of any hum or broadband noise reduction. When both impulse and stationary noises are present in the signal, removing impulse noises first makes sure that the stationary denoiser (such as De-noise module of RX) can work most efficiently. Some users have reported that using De-crackle after Mouth De-click produces improved results on certain samples.
We suggest the following flowchart of processing steps for audio repair as a general guideline. It demonstrates that for many repair problems there are multiple modules in RX that can be used alternatively or in sequence.