Apr 3, 2025

Merging Mono Tracks to Surround Sound with Ffmpeg

TheJoeCoder

568 Words … ⏲ Reading Time:2 Minutes, 34 Seconds

2025-04-03 00:00 +0100

Ah yes, Ffmpeg. It’s a glorious tool and I don’t know what we’d do without it. However, some of the filters are absolutely mad and I whether I understand any command or not is a roulette wheel.

I came across a situation where I needed to merge 5 mono audio tracks representing a 5.0 Surround setup (C, FL, FR, SL, SR) into one surround file, in order to merge it with a video. This needed to be done for 12 languages, so it had to be able to be automated. That’s where ffmpeg comes in.

On the ffmpeg wiki, I found this section of a page on audio manipulation. This covers mixing 6 mono tracks into a 5.1 output track. The command was as such:

ffmpeg -i front_left.wav -i front_right.wav -i front_center.wav -i lfe.wav -i back_left.wav -i back_right.wav \
-filter_complex "[0:a][1:a][2:a][3:a][4:a][5:a]join=inputs=6:channel_layout=5.1:map=0.0-FL|1.0-FR|2.0-FC|3.0-LFE|4.0-BL|5.0-BR[a]" -map "[a]" output.wav

Why yes this is obtuse, thanks for asking! It took me a minute to understand what any of it did to even try to change it to work on 5 channels only.

After some trial and error (and a lot of nondescript Invalid argument errors), I landed on this:

ffmpeg -i .\English_L.wav -i .\English_C.wav -i .\English_R.wav -i .\English_Ls.wav -i .\English_Rs.wav -filter_complex "[0:a][1:a][2:a][3:a][4:a]join=inputs=5:channel_layout=5.0:map=0.0-FL|2.0-FR|1.0-FC|3.0-BL|4.0-BR[a]" -map "[a]" .\English_Merged.wav

First, we define all our inputs (with the -i arguments). Here I’m going with them in the order in the AC3 layout, so that’s: Left, Centre, Right, Left Surround, Right Surround, LFE - although as this is only 5.0 we don’t actually have an LFE channel.

Now comes the complex filter - it’s called complex for a reason. I think this is what it does:

We start off by getting all of the audio tracks from each of the input tracks with [0:a][1:a][2:a][3:a][4:a]
Then we join the 5 tracks together, with a channel layout of 5.0 corresponding to the 5.0 surround we’re going for. If you’re trying to try a different layout like 6.1 or, I don’t know, hexagonal, then try referring to this documentation for the C header to see what you need to write.
Now time for the mapping. Each map entry contains both the channel name and where it corresponds to.
- For the channel name, the first input is 0 and the first track in that input is 0, hence 0.0. If you were using a stereo input file for the left and right channels, for example, and you defined it as the first track, you’d use 0.0 for the left channel of the input and 0.1 for the right.
- I have no idea where the names for the channels come from. Again, this C header documentation might help.
And now for [a]. From what I can gather, this is assigning the output of the filter to a label called a which we use in a sec.

The -map [a] tells ffmpeg to use the audio track we just created in the output file, and then we place the output filename at the end to tell it where to send our new track.

As a bonus, here’s a little powershell script to automate making a bunch of these files:

# Define the languages we want to convert
$langs = @('Cantonese', 'English', 'Flemish', 'French', 'German', 'Japanese', 'Korean', 'Latin_Spanish', 'Mandarin', 'Russian', 'Spanish', 'Ukrainian')

# Do the Ffmpegging
$langs | foreach-object { ffmpeg -i .\${_}_L.wav -i .\${_}_C.wav -i .\${_}_R.wav -i .\${_}_Ls.wav -i .\${_}_Rs.wav -filter_complex "[0:a][1:a][2:a][3:a][4:a]join=inputs=5:channel_layout=5.0:map=0.0-FL|2.0-FR|1.0-FC|3.0-BL|4.0-BR[a]" -map "[a]" .\${_}_Merged.wav }