Spleeter: not additive sum of components

Created on 1 Nov 2019  路  4Comments  路  Source: deezer/spleeter

None of the 3 models with default parameters produce additive sum of the components.
I tested this in Audacity and there is always a difference signal that is like a residual.
Is there a way to modify some parameter for this or this needs to be fixed?

Thank you and awesome work on this framework!

help wanted model question

Most helpful comment

Hi @danielkorg,
Thank you for you message!
You're perfectly right, the default separation configuration does not produce components that add up to the mix. This is because spectrogram models are learnt up to 11kHz only and when separating, the default mask extension (above 11kHz) is set to 0, which means that all frequencies above 11kHz will actually be discarded in the separated files.

There is actually an option to change this default behaviour: the mask_extension option can be set to zeros (default) or to average (see the wiki). The latter extends the mask values to the average value of the masks under 11kHz, which results in masks summing to one in the whole frequency range and thus to separated tracks summing to the mix track.
The average option generates some interferences in the high frequencies (as modelling of the source is very rough in this frequency range), that is why we chose to set zeros as the default option. So you should use this option only if the interferences are less of an issue to you than not having the components summing to the mix.

You can very easily set this option by editing the configs/2stems/base_config.json file (same for the 4stems and the 5stems config files) and replacing the "mask_extension":"zeros" option by "mask_extension":"average". Then, perform the separation with the config file option (ensure to be in the spleeter folder or to replace the path to the config file by a valid one):

spleeter separate -i <path/to/input/audio/file.wav> -o <path/to/output> -p configs/2stems/base_config.json

The separated sources should then sum up to the mix!

All 4 comments

Hi @danielkorg,
Thank you for you message!
You're perfectly right, the default separation configuration does not produce components that add up to the mix. This is because spectrogram models are learnt up to 11kHz only and when separating, the default mask extension (above 11kHz) is set to 0, which means that all frequencies above 11kHz will actually be discarded in the separated files.

There is actually an option to change this default behaviour: the mask_extension option can be set to zeros (default) or to average (see the wiki). The latter extends the mask values to the average value of the masks under 11kHz, which results in masks summing to one in the whole frequency range and thus to separated tracks summing to the mix track.
The average option generates some interferences in the high frequencies (as modelling of the source is very rough in this frequency range), that is why we chose to set zeros as the default option. So you should use this option only if the interferences are less of an issue to you than not having the components summing to the mix.

You can very easily set this option by editing the configs/2stems/base_config.json file (same for the 4stems and the 5stems config files) and replacing the "mask_extension":"zeros" option by "mask_extension":"average". Then, perform the separation with the config file option (ensure to be in the spleeter folder or to replace the path to the config file by a valid one):

spleeter separate -i <path/to/input/audio/file.wav> -o <path/to/output> -p configs/2stems/base_config.json

The separated sources should then sum up to the mix!

OK thank you very much, it works! :)

Can't wait to see more from you guys!

I tried editing the "mask_extension":"zeros" option to "mask_extension":"average" but I still have the 11 khz cutoff separation files when I run the program. It gives me errors when I run it with -p configs/2stems/base_config.json so I have to run it with -p spleeter/configs/2stems/base_config.json for the program to run & then I still get the same 11khz cutoff. Do you have to close the dos prompt & reopen it to get the cutoff frequency to be gone? Thanks for your hepl. Roger

I tried editing the "mask_extension":"zeros" option to "mask_extension":"average" but I still have the 11 khz cutoff separation files when I run the program. It gives me errors when I run it with -p configs/2stems/base_config.json so I have to run it with -p spleeter/configs/2stems/base_config.json for the program to run & then I still get the same 11khz cutoff. Do you have to close the dos prompt & reopen it to get the cutoff frequency to be gone? Thanks for your hepl. Roger

me too, did you figure this out? Maybe we also have to change F as is stated in some other issues? But I am not sure

Was this page helpful?
0 / 5 - 0 ratings