Spleeter: Improving config for MUSDB18

Created on 12 Nov 2019 · 15Comments · Source: deezer/spleeter

Description

training spleeter on MUSDB18 using the provided configuration file does produce very poor results. This is probably because the config was designed for evaluation only.

When using the provided config, I get the following results with museval

vocals          ==> SDR:   1.058  SIR:  -5.229  ISR:   2.040  SAR:  12.087
drums           ==> SDR:   1.205  SIR:  -3.945  ISR:   1.987  SAR:  12.087
bass            ==> SDR:   0.680  SIR:  -6.822  ISR:   1.964  SAR:  12.087
other           ==> SDR:   1.063  SIR:  -5.320  ISR:   1.984  SAR:  12.087

Step to reproduce

python -m spleeter train -p configs/musdb_config.json -d MUSDB18-WAV

Questions

To improve the config, I think the following things would need to be addressed:

n_chunks_per_song is set to 1. Shouldn't this be larger for full tracks like in MUSDB?
random_time_crop is using a fixed seed which is not updated during training. That means the chunks are deterministic and therefore only a very small fraction of the MUSDB18 dataset is actually used in training.
train_max_steps is set to 100000. Was this tested on MUSDB18? Should this be increased?
No early stopping is configures. On such a small dataset the model will suffer from significant overfitting

Desired behavior is to update the config/docs to be able to train using MUSDB18.

bug evaluation model training

Source

faroit

👍2

Most helpful comment

Hey @syxu828
It is probably because the training didn't go through at all, at least that's how it was in my case. Can you try running the spleeter train command with '--verbose' added as an argument and look at the log messages at the end?
I get something like this -

WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.

I suspect that the audio isn't getting loaded, but I haven't yet been able to fix it.

rohitma38 on 22 Nov 2019

👍2

All 15 comments

@faroit How long did it take you to train on musdb-18, with the default config file?
(I'm trying to do the same and the training seems to get over surprisingly quickly!)

rohitma38 on 13 Nov 2019

@faroit, as you figured out, the provided musdb config file must be tweaked to get a proper training. It was just provided as a quick way to run a toy training on musdb.

n_chunks_per_song is set to 1. Shouldn't this be larger for full tracks like in MUSDB?

Yes, it should be set to something bigger for musdb. This is the total number of 20s (could be set by parameter chunk_duration) audio chunk used for training (a sub 12s segment is then randomly crop within these chunks). Chunking is used to ensure efficient spectrogram caching while keeping some randomness in the selected segments.
For musdb, I would recommend setting to a value between 20 and 30 to have a good span of each track, otherwise you only train the model on 20s of each track which is very small.

random_time_crop is using a fixed seed which is not updated during training. That means the chunks are deterministic and therefore only a very small fraction of the MUSDB18 dataset is actually used in training.

I don't think so. As random_time_crop is called several times in the same tensorflow session (the one used for training), each call should lead to a different random crop. Seeding just ensure, that the sequence of cropping value is the same in a session. Feel free to correct me if I'm wrong :).

train_max_steps is set to 100000. Was this tested on MUSDB18? Should this be increased?

This was not tested on musdb so a different value may be better. This could heavily depends on the other training parameters, so training/validation metrics should be monitored during training.

No early stopping is configures. On such a small dataset the model will suffer from significant overfitting.

We unfortunately don't provide an implementation of an early stopping mechanism implemented so far. An easy workaround is to increase the number of saved checkpoints and to select model afterward based on validation metrics.

romi1502 on 14 Nov 2019

@romi1502 Thanks, I will try to some parameters. Since @mmoussallam already noted that you don't want to release pre-trained weights on MUSDB18, are you ok with maintaining a fork on sigsep? Once we found a good config we could add that via PR.

I don't think so. As random_time_crop is called several times in the same tensorflow session (the one used for training), each call should lead to a different random crop. Seeding just ensure, that the sequence of cropping value is the same in a session. Feel free to correct me if I'm wrong :).

I guess you are right. At least when it comes to randomness within a session. This short example should mimic the behavior:

import tensorflow as tf

def add_random_int(sample):
    r = tf.random.uniform([1], minval=2, maxval=100, dtype=tf.int64, seed=42)
    return sample + r[0]

dataset = tf.data.Dataset.range(3)
dataset = dataset.repeat(3)
dataset = dataset.map(add_random_int)

print([x.numpy() for x in dataset])
print([x.numpy() for x in dataset])

out:

[50, 17, 26, 30, 44, 98, 65, 62, 35]
[50, 17, 26, 30, 44, 98, 65, 62, 35]

so as long as the repeat is infinite the crops are random.

faroit on 14 Nov 2019

Hi @faroit, the code of spleeter is still moving quite a lot. So it may be a bit premature to put a maintanable fork of Spleeter in sigsep. But we may consider it in the near future.

romi1502 on 21 Nov 2019

Hi @rohitma38 , I encounter the same thing as you. The training is quickly done. Did you find out the reason ?

syxu828 on 22 Nov 2019

WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.

I suspect that the audio isn't getting loaded, but I haven't yet been able to fix it.

rohitma38 on 22 Nov 2019

👍2

@rohitma38 Your comment made me realize that is the same issue I am having when trying to train. I am getting separation outputs that were no different than the original audio, and mine also says that the estimator made no steps. I think verbose should be shown by default.

GrahamboJangles on 23 Nov 2019

Hi @faroit, the code of spleeter is still moving quite a lot. So it may be a bit premature to put a maintanable fork of Spleeter in sigsep. But we may consider it in the near future.

@romi1502 okay, just keep me updated. I will leave this issue open till then

faroit on 27 Nov 2019

@romi1502 I want to reproduce results in paper. Can you share your config files used in paper?

tuxzz on 18 Dec 2019

It would nice of this issue could be addressed as part of the JOSS review. A reproducible MUSDB18 configuration could show how spleeters model and training pipeline would compare against other models when only trained on MUSDB18.

faroit on 20 May 2020

Hi @faroit,
I pushed a modified musdb_config.json config that trains properly on musdb and provide decent results:

| |SDR |SAR |SIR |ISR |
|-----------|--------|--------|---------|--------|
| Vocals |4.69 |4.94 |12.24 |9.64 |
| Drums |4.59 |4.79 |9.21 |8.60 |
| Bass |3.71 |5.02 |6.44 |9.31 |
| Other |3.10 |3.93 |5.23 |7.91 |

(Edit 25/05/2020: the table is erroneous due to a bug, see next post)

These results should be roughly reproducible using the command:

spleeter train -p configs/musdb_config.json -d <path to musdb>

Note that you might not get the exact same results as I'm not sure that there is any guarantee of ordering preservation in the data pipeline, but it should not change much.

Nothing was optimized in anyway (using data augmentation, hyper-parameter tuning...) to guarantee optimal performances of the model on musdb though, but it give an idea of what it can do when trained only on musdb.

romi1502 on 22 May 2020

Erratum
There was an error in the separation pipeline related to this issue which is in the process of being fixed.

Here is the actual table of results for the model trained on musdb with the musdb_config.json config:

| |SDR |SAR |SIR |ISR |
|-----------|--------|--------|---------|--------|
| Vocals |5.10 |5.44 |12.45 |9.58 |
| Drums |5.15 |5.25 |10.68 |8.89 |
| Bass |4.27 |5.42 |7.23 |9.38 |
| Other |3.21 |3.89 |5.37 |7.80 |

romi1502 on 25 May 2020

That's great! Make sure you update the tags for the JOSS paper

faroit on 25 May 2020

嘿@ syxu828
可能是因为培训完全没有完成，至少在我看来就是这样。您可以尝试运行带有“ -verbose”作为参数的spleeter train命令并查看末尾的日志消息吗？
我得到这样的东西-

WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.

我怀疑音频没有被加载，但是我还不能修复它。

Hey @syxu828
It is probably because the training didn't go through at all, at least that's how it was in my case. Can you try running the spleeter train command with '--verbose' added as an argument and look at the log messages at the end?
I get something like this -

WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.

I suspect that the audio isn't getting loaded, but I haven't yet been able to fix it.

I encountered the same problem, did you fix it?

xiaozhuo12138 on 12 Jun 2020

嘿@ syxu828
可能是因为培训完全没有完成，至少在我看来就是这样。您可以尝试运行带有“ -verbose”作为参数的spleeter train命令并查看末尾的日志消息吗？
我得到这样的东西-
WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.
我怀疑音频没有被加载，但是我还不能修复它。

Hey @syxu828
It is probably because the training didn't go through at all, at least that's how it was in my case. Can you try running the spleeter train command with '--verbose' added as an argument and look at the log messages at the end?
I get something like this -
WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.
I suspect that the audio isn't getting loaded, but I haven't yet been able to fix it.

I encountered the same problem, did you fix it?