The way we experience and listen to music has been evolving rapidly ever since Alexander Graham Bell invented the wax cylinder in 1877. As consumers or end-point listeners, that innovation might not seem so rapid, but from the other side of the glass where the people who make the music are seated, this forward propulsion is palpable. Dolby now has an Atmos version of Alexa you can set up in any room in your home, and classic records are being remixed in 5.1 format all the time.
Sonic imaging refers to the localization of sound sources in three-dimensional space. Whether it’s the number of speakers we have access to, the fidelity and resolution of a digital signal, or how something was captured in the first place, there seem to be more options, tools, and formats every day. With terms like binaural recording, Atmos, 360 sound, and spatial audio finding their way into more and more industry conversations, the question becomes, what’s going on and why? In order to properly postulate what the future sounds like, let's start from the beginning.
Live (Not the Ableton kind!)
Before the wax cylinder, music was only ever experienced live. The listener was in the same space as the musicians and the exchange happened in real time. A listener’s location and the dimensions and materials of a performance space dictated the experience.
Imagine being in the back of the Vatican listening to a choir sing "Ave Maria", and juxtapose that with sitting on the back porch listening to someone perform “Sweet Baby James” on an acoustic guitar. One experience makes you feel like you’re in the proximity of something infinite, while the other gives you feelings of intimacy and closeness. Both experiences make us feel emotions which are directly related to where the exchange itself takes place.
Once technology made capturing and reproducing recordings possible, all bets were off, but this idea of giving the listener contextual experience is something that all sonic imaging has attempted to address in one way or another.
Monaural or monophonic sound (aka mono) is sound emanating from one position, or one ear. Gramophones were early evolutions of the wax cylinder and they had a single speaker which output sound in mono. Even if there were multiple microphones or a session was multitracked, all of the independent channels were panned to the middle with everything dead center. Although stereo technology was around in the 1930s, most of the records produced and distributed until the late 1950s and early 60s were all done in mono. It took a long time for people to move past it since most of the playback systems owned at the time were all mono and people were used to engineering, mixing, and mastering in this format.
Famously, Phil Spector was very much against the transition from mono to stereo, saying that it took control away from the producer. His argument was that in mono, the producer was always in control of how a recording was experienced, but once you had two speakers and the listener was in control of where they were placed, the producer relinquished that control and the recordings were compromised. It’s an intriguing argument, and a reality that most producers and engineers still grapple with today.
Interestingly, mono is still very much alive and well in several situations, and sound often gets summed to a mono output. Live clubs and venues will often sum the bass—or sometimes entire mixes—to mono in order to ensure a more uniform listening experience across a given space. With the ubiquitous rise of Bluetooth speakers and digital personal assistants such as Alexa, huge numbers of people are listening in mono every day. Because of that reality, most mix and mastering engineers check how their sessions sound in mono as well as in stereo so they can get an understanding of what a listener will experience when this inevitable summing takes place.
Eventually, people reasoned that we have two ears and that two speakers would better mimic the way we naturally hear sound in space. Stereophonic sound was officially invented in 1930, but didn’t become a standard format in recorded audio until the 60s, and not until the 70s for most film.
Here’s where the semantics get confusing. Stereophonic sound doesn’t only relate to a left and right stereo signal, but rather to anything with more than one channel or speaker. Just as monophonic relates to a single note being played in time, monaural/monophonic sound relates to sound emitting from a single speaker, and where polyphonic means more than one simultaneous note, stereosonic means more than a single speaker. By this logic, stereo relates to all surround formats as well as the ubiquitous two-channel systems, but in conversation people are usually referring to the latter.
The majority of stereophonic recording presents sound in a 2D space, where you have a lateral plane moving from left to right, and sense of depth moving away from you in a forward direction. We place sounds in space by panning them left to right, and create depth through the use of volume and time-based effects processors such as reverb. When we listen to a stereo signal in headphones, the image changes. The signal is coming at you from two extremes—one on each side of your head—as opposed to however it was mixed, which was likely with two near-field monitors sitting in front of a person at a mixing desk.
3D audio is a term that relates to all immersive formats of playback that exist in an X, Y, and Z plane, such as 360 audio, binaural, spatial audio, Atmos, immersive audio, etc. We’ve been experiencing 3D audio for quite some time now in different ways, whether it’s on an amusement park ride, in a movie theater, through a virtual or augmented reality experience, or several other iterations.
Surround sound formats, such as 5.1 and Atmos, give you a 360-degree sense of depth, but are largely viewed as more of a mixing process and playback experience since they dictate a specific number of speakers with specific placement. Outside of a theater or certain automobiles, you’re not often experiencing surround playback in your day-to-day life, although this is rapidly changing. It’s worth taking a closer look at a few of these formats to understand why they’re interesting, and the key differences between them.
Our ears are complex things which perceive sound in a way that is dictated by our ears’ placement on our heads, their shape, and their inner anatomy. We perceive something as being directly behind us because the sound reaches our ears in a specific way which then sends that information to our brain.
Binaural recordings are created through the use of two mics placed in the ears of a person or a dummy head, mimicking the placement of our ears and capturing audio as a human would in a given space. Because of this process, these recordings need to be experienced via headphones in order to hear them properly. Additionally, they don’t respond to head movement, so the sonic image remains static.
Binaural recordings are very cool and if you haven’t heard one yet, head over to YouTube for the quintessential Virtual Barber Shop video, which is most people’s first binaural experience. In the recording, the listener is receiving a haircut in a barbershop and you can sense the barber being directly behind you, moving across your head from left to right and up and over. Disney has been using binaural technology in its rides for decades, and Pearl Jam even released an entire album titled Binaural recorded with (you guessed it) binaural recording techniques. The audiobook of Stephen King’s novel The Mist can be experienced in a binaural format (available on Amazon), and there are several other proofs of binaural concept out there. Products such as Sennheiser’s AMBEO Smart Headset make capturing binaural recordings easy and relatively inexpensive.
The potential application of this technology seems large since you’re able to experience it via any pair of stereo headphones, but we are yet to see it find its place in the mainstream. This is likely because humans are instinctively reactive to things that are behind them and it makes us uncomfortable.
Spatial audio is a term loosely used to describe all audio experienced in 3D, but mostly boils down to a few key ideas. Some spatial audio is recorded with special 360-degree microphones that often consist of several mics inside of one piece of hardware, capturing audio in a sphere. Other recordings become spatial once we take sound sources and place them around the listener in a three-dimensional space.
For the everyday listener who doesn’t have access to a multichannel playback scenario, experiencing spatial audio is made possible via technology called Ambisonics. Via some fancy trigonometry, Ambisonics transmits a speaker-independent representation of a sound field called ‘B-format’, which gets decoded to whatever speaker setup a listener might have. This makes spatial audio very malleable and accessible to more people in more scenarios.
The real demarcation between most spatial audio experiences is whether or not they have head tracking technology incorporated, which allows the listener to move around in the sonic space by turning their head. Because our brains need what we see to line up with what we hear, dynamic spatial audio experiences like those in virtual and augmented realities require head tracking to keep the suspension of disbelief intact.
Outside of places where you can have a virtual or augmented reality experience, 360 video and images are content we’ve been experiencing in the mainstream for quite some time already: there are several clubs that have huge Dolby systems where people can DJ in Atmos, and platforms like Facebook and mobile apps like Pokémon GO are going deeper into AR, VR, and 360 tech every day, so it’s only fitting that the audio component would come into play as well.
Manufacturers and developers new and old alike are creating tools which allow you to capture, mix, and manipulate spatial audio. There’s definitely something to these spatial audio concepts, and many signs point to spatial experiences in both a visual and auditory sense being the future of many industries such as education, journalism, film, music, entertainment, art, etc.
After having taken a look at music experiences ranging from live music in real time all the way to fully immersive, three-dimensional audio on demand, the success of all iterations still seems largely dependent on the context. In the same way that watching the newest Mission: Impossible film in mono would be quite the drag, any and all of these sonic images out of context and without the right experience just don’t work that well.
Stereo doesn’t seem to be going anywhere anytime soon, but the future of recorded audio most definitely is. Our world has a seemingly unending thirst for content and experiences that are more dynamic, more immersive, and more interactive. With that demand comes the need for those of us on the sonic side of things to embrace these new tools and create the supply. New jobs are emerging, new stories are waiting to be told, and new ideas are waiting to be hatched. Sounds pretty cool—pun intended!