May 22, 2018 by Shahan Nercessian

The Making of Biovox in VocalSynth 2

New to VocalSynth 2 is a module for modulating signals called Biovox. In this blog post, Shahan Nercessian, the developer of Biovox, explains what it is, how it works, and the new manipulation controls that it brings to VocalSynth 2.

What is Biovox?

Biovox is the newest module in the VocalSynth family. It is based on a physical model of the human vocal tract, whose shape is continuously changing according to the input vocal signal.

You can find Biovox alongside the Vocoder, Compuvox, Talkbox, and Polyvox in VocalSynth 2. While each of the other synthesis modules in VocalSynth 2 impart their own distinct character, one will find the default settings of Biovox to provide a relatively neutral, clean-sounding modulation. This provides a clean slate for sculpting sound through different waveforms, LFOs, and effects.

In particular, it allows for some extremely warm sounds when filtering lush chords triggered in MIDI or Sidechain modes. Aspects of the underlying vocal tract model can also be manipulated, giving rise to new, wild effects and creative control.

Who is it for?

The BioVox module is for music producers, beat makers, and vocalists alike who want to experiment with unique effects and create new sounds. We designed it to be used on vocals, but that doesn't mean you can't get weirdly awesome sounds on other instruments, too.

Vowel replacement in Biovox makes cleaner-than-human, modern phrasing possible that sounds great across multiple genres, including electronic, rock, and hip-hop.

Try or buy VocalSynth 2 here!

What does Biovox do?

The articulatory capabilities of our human vocal tract is just physics. To first order, voiced phonation is nothing more than the propagation of air pressure waves through tubes of varying shapes. Accordingly, the speech synthesis research community has made many strides to replicate aspects of the human vocal tract both mechanically and digitally. Some of the resulting concoctions are equal parts fascinating and downright creepy (check this one out). More recently, a crudely-titled Javascript applet called Pink Trombone demonstrated real-time digital speech synthesis in action, allowing the user to observe how vocal tract shape maps to different phonetic units (see here).

Under the hood, Biovox models the flow of air as it travels through the oral and nasal cavities, and uses this to filter incoming audio in a way which matches the input audio content. It uses Wave Digital Filters, a digital signal processing methodology that is particularly well-suited for modeling physical systems. The various controls in Biovox alter the airways of the vocal tract model through different means to achieve different kinds of manipulation.

The “Clarity” knob controls the overall amount of frequency modulation imparted on the input signal, and sculpts the estimated vocal tract shape in a way which reduces articulation and intelligibility in the processed output signal.

The “Shift” knob alters the formants of the incoming vocal signal, giving similar results to the formant shifting control heard in other synthesis engines like Vocoder and Talkbox. Unlike these modules, however, formants can be shifted without needing to analyze the frequency content of the signal, by instead changing the length of the underlying vocal tract model.

The “Nasal” knob controls the amount of nasality in the processed output. Within the vocal tract model, it controls the amount of coupling between the oral cavity and a very congested nasal cavity.

The “Breath” knob controls the amount of whisper-like breathiness to blend in with the carrier signal. Within the vocal tract model, it adds turbulence to the air flowing near the end of the throat.

Lastly, we wanted to add the equivalent of a formant filter to VocalSynth 2, i.e. the ability to be able to manually select and impose vowel sounds onto the processed signal. Accordingly, the advanced view of Biovox features an International Phonetic Alphabet (IPA) vowel chart, which is a grouping of vowels in a particular way. In Biovox, moving around the chart allows the user to blend the currently estimated vocal tract shape with ones characterizing different vowels. A particularly awesome use case for this is to automate vowel replacements to give more of a modern, “vowel-ly” sound to a track.

Why did iZotope develop Biovox?

In straight-up “dad-talk,” we thought that a module based on a physical model would be both fun and educational to our users. However, it goes without saying that it was absolutely imperative that Biovox A) sounded good, and B) had a reason to exist alongside the other already great-sounding modules in VocalSynth.

The first step in developing Biovox was to figure out a means of estimating a vocal tract shape directly from audio. After that, we brainstormed what kind of new and intuitive controls could be created, which both capitalized on the use of the physical vocal tract model, and whose effects had previously not been available in other modules. Ultimately, we settled on a series of manipulation controls which we think can be used pretty expansively to not only sculpt and shift formants, but also to dial in some interesting human vocal imperfections.

What can you look forward to in future Biovox versions?

Biovox is a physical model of our own vocal tract, and in a sense provides a way to make a machine just a little more human. In the future, we would like to further humanize the machine, potentially incorporating machine learning-based transformations, to provide new control that we couldn’t imagine could live inside a knob. Advances in neural network-based audio style transfer and vocal identity transformation are just some of the interesting and exciting avenues of research that we would like to incorporate into future versions of Biovox.