Linux Audio Primer

This document is written for people who are having trouble running Overtone on Linux, or who want to better understand how the different pieces of the audio system fit together. It's a bit of a read, but if you want to be able to figure out your own issues, you should read this carefully at least once.

This is written in late 2023, when Linux audio is once again in the midst of a transition. Hopefully in a few years time PipeWire will have become the norm, replacing PulseAudio and Jack, and we can greatly reduce the amount of stuff you need to be aware of.

Overtone uses SuperCollider under the hood, a sound synthesis engine created by James McCartney (who was also part of the team who created CoreAudio for Apple) in 1996, and still seeing active development to this day. SuperCollider in turn makes use of Jack, a pro-audio sound server created in the early nillies by Paul Davis (who also created Ardour). Jack allows low-latency connections between applications and audio devices for both audio and MIDI. Jack in turn makes use of ALSA (at least on a typical Linux setup), the audio system that's built into the Linux kernel, providing low-level access to audio devices.

So we need to, at a minimum, talk about ALSA, Jack, and SuperCollider. But for a complete picture we'll have to mention PulseAudio and PipeWire as well. We'll go bottom-to-top, starting with kernel-level ALSA devices, and working our way up to Overtone. At each step of the way we'll provide the tools to verify your setup.

ALSA

ALSA (Advanced Linux Sound Architecture) was created in the late nineties, replacing OSS (Open Sound System) as the common API to interface with audio device drivers. ALSA has been around for a long time and is generally rock solid, with excellent driver coverage. If you are troubleshooting an issue with sound, it's much more likely to be in the layers above ALSA than with ALSA itself. Still, we need to start at the beginning to have a good foundation.

You can check if ALSA recognized your soundcard with the aplay and arecord command line utilities. On Debian/Ubuntu systems these are part of the alsa-utils package.

aplay --list-devices

**** List of PLAYBACK Hardware Devices ****
card 0: HDMI [HDA Intel HDMI], device 3: HDMI 0 [HDMI 0]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: HDMI [HDA Intel HDMI], device 7: HDMI 1 [HDMI 1]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: HDMI [HDA Intel HDMI], device 8: HDMI 2 [HDMI 2]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 1: PCH [HDA Intel PCH], device 0: ALC3232 Analog [ALC3232 Analog]
  Subdevices: 0/1
  Subdevice #0: subdevice #0

arecord --list-devices

**** List of CAPTURE Hardware Devices ****
card 1: PCH [HDA Intel PCH], device 0: ALC3232 Analog [ALC3232 Analog]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

This is the output for my laptop, it has found several HDMI connections (which besides video also have an audio channel), and the Intel device on the motherboard driving the audio jack and built-in speakers.

Let's play a sound

aplay /usr/share/sounds/alsa/Front_Center.wav

This will most likely "just work", however if I tell it to use a specific device, it fails. (hw:1,0 means "second hardware card, first subdevice).

aplay --device hw:1,0 /usr/share/sounds/alsa/Front_Center.wav
aplay: main:834: audio open error: Device or resource busy

One of the downsides of ALSA is that it only allows a single program at a time to access a given sound device. On modern desktop environments that's really not good enough. So people have come up with a number of "sound servers" (you may also see the term "daemon"). Userspace programs that can mix and route the audio from multiple client applications. Only the sound server uses the hardware device directly. That's why ALSA tells us Device or resource busy.

So how is it that omitting the --device parameter does work? Let's look at aplay --list-pcms. This list all possible options for --device, including virtual ones.

Here's an abridged version of the output on my system.

aplay --list-pcms

default
    Playback/recording through the PulseAudio sound server
pulse
    PulseAudio Sound Server
jack
    JACK Audio Connection Kit
pipewire
    PipeWire Sound Server
hw:CARD=PCH,DEV=0
    HDA Intel PCH, ALC3232 Analog
    Direct hardware device without any conversions

What this shows is that the various sound servers — PulseAudio, Jack, Pipewire — come with their own ALSA drivers, so that applications that use the ALSA API can still be routed through the sound server. When we use --device default or --device pulse, aplay actually connects to PulseAudio, which then routes the audio to the hardware device.

Note that you may not see all these options, we'll talk about the various sound servers next.

Jack Audio Connection Kit

The Jack Audio Connection Kit (JACK) was created by Paul Davis in the early 2000's. Davis was one of the early engineers at Amazon, and since leaving there has spent numerous hours and days building the foundation for Linux as a music production platform. Besides Jack he's also the author of Ardour, a powerful open source ProTools-like DAW.

The Jack audio daemon (jackd) is a userspace process that enables multiple applications to send audio and MIDI signals to each other, and to hardware devices. It does this in a way that is suitable for pro-audio, with low latency, minimal overhead (in particulars it avoids wherever possible having to copy buffers), single-frame precision, and arbitrary routing.

Jack also contains a "Transport" API, allowing multiple applications to synchronize. So you can hit "play" in your DAW, and your drum sequencer starts to roll as well, staying at the exact same bar, beat, and tempo (bpm), and even following tempo changes.

SuperCollider uses Jack for its audio output. It's possible to compile it with PortAudio instead, but the default builds, and anything you install from your distribution's package manager, will most likely depend on Jack. However, if you don't have Jack currently on your system then don't rush to install it just yet. We'll talk about PipeWire further down, which is compatible with the Jack API, so Jack-powered applications like SuperCollider can work just as well with PipeWire instead of Jack, and if you are running a up-to-date distro this is likely the preferred option.

This is probably a good time to check your current situation.

$ ps ax | grep -E '(jackd|pipewire|pulseaudio)'
   2769 ?        S<sl   4:15 /usr/bin/pipewire
   2775 ?        Ssl    0:00 /usr/bin/pipewire -c filter-chain.conf
   2776 ?        S<sl   0:20 /usr/bin/pipewire-pulse

In my case we can see pipewire and pipewire-pulse, you may instead see pulseaudio, and possible jackd or jackdbus.

If you do have jack some useful command line tools are

# list ports and connection
jack_lsp -c
# see if jack is started or stopped
jack_control status

And useful GUI tools to start/stop Jack, and/or to see and manipulate the connection graph of QJackCtl, Patchage, Patchance, and Catia (from KXStudio).

PulseAudio

Remeber the issue with ALSA? It can only handle a single application at a time. That may have been acceptable in the nineties, but on modern desktop systems that doesn't cut it. So the various Linux desktop environments each came up with their own sound daemon to solve this. Gnome had esound (originally developed for Enlightenment). KDE had aRts. This meant it was tricky to get sound of a KDE application when running Gnome, or vice versa.

So Fedora championed PulseAudio, one daemon to replace them all. And over the last decade or so PulseAudio has become the norm for Linux audio. That said it certainly hasn't been without controversy. Some distributions, notably Ubuntu, adopted PulseAudio well before it was sufficiently mature. This had made many people very angry and has been widely regarded as a bad move. I think the botched PulseAudio rollout is one of the main reasons why "Linux Audio" has become such a running joke. (But certainly not the only one, as this document demonstrates, it is admittedly all a bit of a mess).

It is notable that PulseAudio was written by the same person who created systemd. Another daemon whose rollout by distributions was not without controversy (and that's a whopper of an undestatement). It also seems the author had no significant prior experience with audio programming.

Eventually PulseAudio did stabilize, and for the last number of years it has generally worked just fine. It does what it needs to.

One of the most useful tools to interact with the PulseAudio daemon is pavucontrol, which lets you change input and output levels, see which devices it's found, and change some device settings.

One of the big differences between a desktop daemon like PulseAudio and a pro-audio daemon like Jack is that the former tries to automatically connect programs to audio outputs. It does clever things like switching over to your bluetooth headphones as soon as they connect. Jack does no such things, either the programs or the user need to explicitly tell it how to wire things up.

Apart from that though the use case are pretty similar. But PulseAudio's design is not suitable for professional audio. It has too much overhead and introduces too much latency. So people have kept using Jack. That introduced a new problem, if PulseAudio is already running then it's already holding the output device (on the ALSA level), so now Jack will fail to start. This was later solved by introducing a mechanism that lets Jack and PulseAudio negotiate access to the underlying device.

PipeWire

Is it really necessary to have separate audio servers for desktop and pro-audio? Evidently not, note the situation on Mac, where there's a single operating system audio layer called CoreAudio, which consists of two parts, a device driver part in the kernel, and a user-space daemon process. It serves the needs of desktop programs, but is also good enough to handle pro-audio tasks, as demonstrated by the popularity of the mac for music production.

Can Linux do the same thing? Turns out it can. PipeWire is a multimedia (audio, video, midi, and transport) deamon once again championed by Fedora, but this time they did get a true veteran of audio and video programming at the helm. Wim Taymans already had a good number of years under his belt working on GStreamer, a multimedia framework used by WebKit, Gnome, and dozens of media players.

PipeWire took inspiration from Jack and CoreAudio to create something that can handle the most demanding pro-audio applications, while also being suitable as a desktop sound daemon.

While PulseAudio and the desktop daemons that came before it all caused fragmentation, PipeWire is the first attempt to truly unify the Linux audio experience. It does this by being compatible with PulseAudio, Jack, and ALSA. In other words, the overwhelming majority of applications should "just work" with PipeWire.

Adoption by distributions is still a bit patchy at time of writing. Perhaps the scars of the PulseAudio adoption have made people more cautious. But PipeWire is now the default audio server on recent Ubuntu releases, and it seems on the whole this transition has gone by without a hitch.

On Debian/Ubuntu systems the package names are pipewire, pipewire-alsa, pipewire-pulse, and pipewire-jack.

There are still some things that could cause confusion, however. In particular it's possible to run Jack and PipeWire at the same time, with PipeWire becoming another Jack client. This is mostly intended for people using Jack for music production, who find that the Jack emulation in PipeWire does not yet fill their needs.

If you see both a pipewire and a jackdbus process, then that is likely what is happening.

$ ps ax | grep -E '(jackd|pipewire|pulseaudio)'
   .... /usr/bin/pipewire
   .... /usr/bin/jackdbus
   ....

Running a Jack client program (like SuperCollider) will now connect to Jack, not to PipeWire. Even if jack is not currently running, but it is installed, this might be an issue. Jack and PipeWire both install a shared library providing the Jack API, and when an application starts it will normally pick the one provided by Jack, not the one provided by PipeWire. To work around this there is a wrapper script called pw-jack, which will prepend PipeWire's shared library to the LD_LIBRARY_PATH.

pw-jack is just a shell script. We can use sh -x to see what it's doing.

sh -x `which pw-jack` true
+ DEFAULT_SAMPLERATE=48000
+ getopts hr:vs:p: param
+ shift 0
+ [ -n  ]
+ LD_LIBRARY_PATH=/usr/${LIB}/pipewire-0.3/jack:/usr/${LIB}/pipewire-0.3/jack
+ export LD_LIBRARY_PATH
+ exec true

So if you want to make sure SuperCollider uses PipeWire and not original Jack, then run it as:

pw-jack scsynth -u 1234

If you don't have original Jack installed then you should not need pw-jack, and the need for it should go away in the future.

To manage PipeWire's connection graph, you can use the same programs you used for Jack (possibly through pw-jack), like QJackCtl, jack_lsp, jack_connect, Patchage, Patchance, Catia. However you may not see the complete graph, since PipeWire also handles non-Jack clients. For a PipeWire-native connection graph editor you can check out qpwgraph.

SuperCollider

SuperCollider consists of a server (scsynth), and one or more clients (scland, scide, Overtone), which communicate over OSC (Open Sound Control). OSC can be sent over UDP or TCP. We generally prefer UDP.

Overtone will try to start scsynth for you with all the necessary command line flags, although you really only need two: the port number, and the maximum number of listeners. (The latter defaults to 64, which causes issues for sclang). For the rest the defaults are generally fine. The most commonly used port is 57110. You can also start it yourself, and tell Overtone how to connect to it.

scsynth -u 57110 -l 32

Now save this snippet as test.sc:

(
fork {
	var cond, runResponder;
	
	Server.default = s = Server.remote(\remote, NetAddr("127.0.0.1", 57110));
	
	cond = Condition({ s.serverRunning });
	
	// 'signal' will allow the forked routine to advance
	// only when 'serverRunning' finally becomes true
	runResponder = SimpleController(s).put(\serverRunning, { cond.signal });
	
	cond.wait;
	
	runResponder.remove;
	a = { SinOsc.ar(440, 0, 0.1).dup }.play;
};
)

And run it with

sclang test.sc

If you're hearing a clear middle A sine wave sound, then you're good. SuperCollider is all set up! If not, the most likely cause is that SuperCollider isn't hooked up to the audio device.

So use your Jack patch bay of choice to test that. I like using qpwgraph, but you may have to install that one from source. Patchance is one that's available in Ubuntu repos.

Connect SuperCollider output 1 and 2 to your audio device, and you should start hearding sound. Now that you've confirmed that SuperCollider is working you can move on to Overtone.

Provide feedback

Saved searches