Audio Developers Conference 2022

On November 12 I left for England to attend the Audio Developers Conference that was taking place in the London. ADC is the conference to attend if you are into Computers and Audio. As they state on their website: “ADC is an annual event celebrating all audio development technologies, from music applications and game audio to audio processing and embedded systems.”

This is the 4th ADC I’ve attended. I had first heard about the conference at CppCon 2015 when I met Julian Storer, Timur Doumler, and Jean-Baptiste Thiebaut. We found that there were a number of attendees at CppCon that year who were in the Audio Developer community, and we organized a Whiskey Dinner. It was there I learned about the first ADC conference (then called JUCE Summit) to be held that Novemenber in 2015. Unfortunately I was not able to attend, but I was able to work with my management to get Apple to sponsor the conference in the 2016. I attended and served on the Program Committee in 2016, 2017 and 2018.

This year’s ADC was under the stewardship of Raw Material Software, and they did a fabulous job! This year was a record attendance, with about 400 people in person, and a further 700 attendees online. In fact, it was a little too crowded! One could barely move around between sessions! It was incredible to see so many familiar faces.

The conference was 3 days, with the first day devoted to Workshops and two session packed days. Sessions were generaly 50 minutes long with short breaks in between. That works out to about 6 sessions a day!

Workshop: Generating Audio Code with Max/MSP

This was an excellent workshop where we dived into the newest offering from Cycling74, RNBO. It allows you to effectively export a Max Patch (with some modifications) to C++, WebASM, and a preformed VST plugin. In the demo we quickly created a command line tool that exported a Max Patch into C++, a web plugin, and a VST plugin. This is super powerful, and I think this is going to change the way we can quickly prototype audio.

Implementing Real-Time Parallel DSP on GPUs

This session was about implementing realtime DSP on the GPUs. The Goal is to get to 1ms audio processing, and to utilize the large amount of compute available. To use the power of the GPU, you need to have many active independent threads, and these threads have to be see *all * the memory at once. GPU Audio team helps solve this complicated problem with the things. On the CPU side they have a Scheduler that forms a “blueprint” of the audio that needs to be target. The GPU implements this blueprint. And then in order to avoid many round trips to the GPU, the data transfers are quantized to the 200 µs boundary. Additionally they have library fundamentals for FIR and IIR. IIR is rather remarkable because of the dependency on previous data. However, if you form the data into the way their library works then it can be moved over.

Thread synchronisation in real-time audio processing with RCU (Read-Copy-Update)

Timur Doumler gave another excellent C++ talk on audio. Specifically, this discussion was about how you can share data between realtime and non-realtime producers/consumers. This is a challenging problem when trying to share something that is non-trivial, and the naive approaches of using Lock-free FIFOs have some hidden challenges. RCU is a technique that has been used in the Linux kernel for years, and looks to be coming in C++26. Timur builds on the RCU proposal by exploring a per object RCU instead of a global RCU. Great talk!

Synchronizing audio on the Web

This talk is more about how to take existing Web APIs and create a way of having multiple audio elements play together. It doesn’t seem like there’s a consistent and good way to do this, so the general technique seems to be exploring a TimingObject to make sure the MediaController elements are tied together. In general it seems that the WebAPI community would like browser implementers to create low level APIs that have clear behaviors, but instead most APIs that emerge seem to be high level APIs that don’t generally work well together.

Announcing SoundStacks’ New Cmajor Platform

An exciting announcement from SoundStack about CMajor, a compact programming language for audio development. This isn’t just a programming language, but a whole suite of tools. The idea is that you have a compact language to describe the audio, and it compiles to an intermediate format to be delivered to a target. Then it JITs to the most effective form for that target. In addition to the compiler, JIT, and standard library, SoundStack also released a VS Code plugin to add development to make building your audio processor lighting fast.

It was striking to me about how similar this idea was to RNBO… The developer community is lower the barrier to allow fast development of audio to many platforms.

PANEL: Tabs or Spaces?

This was a playful discussion about a number of different topics among a panel of 4 developers. A focus for this year’s discussion was around “Successors”, what is the successor to C++, what is the successor to VST?

Real-time interactive synthesis with ML: differentiable DSP in a plugin

DDSP: Differential DSP. It’s sort of like giving the levers to a ML algorithm to tweak until they have a working model. An example of this was for Mawf Synth from TicTok (example) which can transforms the sound of one tonal instrument into a different one. This talk highlights an emerging trend of using ML to tune an algorithm instead of a more “academic” approach. Specifically:

DSP Approach	DDSP (ML) Approach
Carefully designed.	Throw a Deep Net at it
Interpretable	Generic, but black box
Can predict	Cannot make predictions

AI Driven Content Creation: Bringing Voice Synthesis to Audio Applications

Great technology overview of Supertone’s collection of Voice focused AI processing tools. Their voice isolation algorithm GOYO was very adapt, but the Voice Synth and “Voice Transformer” techniques were truly astonishing. The example was to a segment of “Amazing Grace” sung by Elvis, and transpose it on to a “How High the Moon” midi version by Frank Sinatra. Their voice transformation technique was a mind-melting demo of changing a voice in realtime from Male to Female. Check out their video.

This talk really brings forward the ethical challenges to the emerging audio technologies as AI and ML become more and more part of the development story.

Four Ways To Write A Pitch-Shifter

A lightning fast tour through how to change pitch and time. Pitch is really just time stretching, then sample rate conversions. And there are a number of techniques. Generally SOLA/WSOLA (Waveform-similarity based OLA) are great techniques, but Geraint Luff builds on these to explore a Phase Vocoder, and then goes even further by showing how to compensate for phase problems around the same frequency.

Optimising a Real-time Audio Processing Library

Dave Rowland outlines all the necessary steps for how to build work on optimizing your library. It’s vital to make sure you have success by setting up Micro benchmarks, a server to observe the results, and track historical trends. Always make sure you’ve got the right compile time settings, because it’s pointless to profile Debug builds.

Networked low-latency audio in the wild: crossing the LAN boundary

This was an overview of P2P over the internet. The first half focuses on how NAT (Network Address Translation) can allow the connection of nodes outside of a LAN. Choosing the right way to set up the topology (Relay server, mixing server) has tradeoffs of latency. The latter half focused on how to balance Latency and Reliability. While there are techniques for packet redundancy, such as RFC 2198, one can also attempt to do Packet loss concealment, with a variety of result.

CHOC: Why can’t I stop writing libraries or using backronyms?

Julian Storer talks about CHOC, a header-only with a variety of tools he uses. What I like is the outline of how to organize a library, with a consistent naming and namespace structure. It has all the common blocks one would expect in a library!

Allocations considered beneficial: How to allocate on the audio thread, and why you might want to

“Time water for nothing”. Allocating memory is possible on the realtime thread if you can guarantee that you will not block. This can be challenging, but the PMR allocation strategy in C++ makes this easier. The talk then goes in deep about how to use PMR containers.

KEYNOTE: Incompleteness Is a Feature Not a Bug

This Keynote capped an incredible conference. David Zicarelli gave a really deep discussion on the interleaving topics of Behavior and Perception. David has built many tools that “stack” and build on top of each other, and while they have constraints, these are “enabling constraints”. It is interesting to see that David’s thoughts about maximizing efficiency have given way to exploring more “chaotic” ideas — instead of trying to maximize the processing of 1 or 2 voices, why not just through in hundreds of voices to make random sounding audio where music emerges from the chaos.

Wrapping up.

I noticed 3 major trends over the conference:

Machine Learning is more and More in to audio.
“Raw” Audio programming is still realtime and difficult.
Tools are emerging to lower the entry to using compute resources.

Overall, this was an amazing conference, and I’m very much looking forward to next year!

Written on November 26, 2022