training spleeter on MUSDB18 using the provided configuration file does produce very poor results. This is probably because the config was designed for evaluation only.
When using the provided config, I get the following results with museval
vocals ==> SDR: 1.058 SIR: -5.229 ISR: 2.040 SAR: 12.087
drums ==> SDR: 1.205 SIR: -3.945 ISR: 1.987 SAR: 12.087
bass ==> SDR: 0.680 SIR: -6.822 ISR: 1.964 SAR: 12.087
other ==> SDR: 1.063 SIR: -5.320 ISR: 1.984 SAR: 12.087
python -m spleeter train -p configs/musdb_config.json -d MUSDB18-WAV
To improve the config, I think the following things would need to be addressed:
n_chunks_per_song is set to 1. Shouldn't this be larger for full tracks like in MUSDB?random_time_crop is using a fixed seed which is not updated during training. That means the chunks are deterministic and therefore only a very small fraction of the MUSDB18 dataset is actually used in training.train_max_steps is set to 100000. Was this tested on MUSDB18? Should this be increased?Desired behavior is to update the config/docs to be able to train using MUSDB18.
@faroit How long did it take you to train on musdb-18, with the default config file?
(I'm trying to do the same and the training seems to get over surprisingly quickly!)
@faroit, as you figured out, the provided musdb config file must be tweaked to get a proper training. It was just provided as a quick way to run a toy training on musdb.
n_chunks_per_songis set to 1. Shouldn't this be larger for full tracks like in MUSDB?
Yes, it should be set to something bigger for musdb. This is the total number of 20s (could be set by parameter chunk_duration) audio chunk used for training (a sub 12s segment is then randomly crop within these chunks). Chunking is used to ensure efficient spectrogram caching while keeping some randomness in the selected segments.
For musdb, I would recommend setting to a value between 20 and 30 to have a good span of each track, otherwise you only train the model on 20s of each track which is very small.
random_time_cropis using a fixed seed which is not updated during training. That means the chunks are deterministic and therefore only a very small fraction of the MUSDB18 dataset is actually used in training.
I don't think so. As random_time_crop is called several times in the same tensorflow session (the one used for training), each call should lead to a different random crop. Seeding just ensure, that the sequence of cropping value is the same in a session. Feel free to correct me if I'm wrong :).
train_max_stepsis set to 100000. Was this tested on MUSDB18? Should this be increased?
This was not tested on musdb so a different value may be better. This could heavily depends on the other training parameters, so training/validation metrics should be monitored during training.
No early stopping is configures. On such a small dataset the model will suffer from significant overfitting.
We unfortunately don't provide an implementation of an early stopping mechanism implemented so far. An easy workaround is to increase the number of saved checkpoints and to select model afterward based on validation metrics.
@romi1502 Thanks, I will try to some parameters. Since @mmoussallam already noted that you don't want to release pre-trained weights on MUSDB18, are you ok with maintaining a fork on sigsep? Once we found a good config we could add that via PR.
I don't think so. As random_time_crop is called several times in the same tensorflow session (the one used for training), each call should lead to a different random crop. Seeding just ensure, that the sequence of cropping value is the same in a session. Feel free to correct me if I'm wrong :).
I guess you are right. At least when it comes to randomness within a session. This short example should mimic the behavior:
import tensorflow as tf
def add_random_int(sample):
r = tf.random.uniform([1], minval=2, maxval=100, dtype=tf.int64, seed=42)
return sample + r[0]
dataset = tf.data.Dataset.range(3)
dataset = dataset.repeat(3)
dataset = dataset.map(add_random_int)
print([x.numpy() for x in dataset])
print([x.numpy() for x in dataset])
out:
[50, 17, 26, 30, 44, 98, 65, 62, 35]
[50, 17, 26, 30, 44, 98, 65, 62, 35]
so as long as the repeat is infinite the crops are random.
Hi @faroit, the code of spleeter is still moving quite a lot. So it may be a bit premature to put a maintanable fork of Spleeter in sigsep. But we may consider it in the near future.
Hi @rohitma38 , I encounter the same thing as you. The training is quickly done. Did you find out the reason ?
Hey @syxu828
It is probably because the training didn't go through at all, at least that's how it was in my case. Can you try running the spleeter train command with '--verbose' added as an argument and look at the log messages at the end?
I get something like this -
WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.
I suspect that the audio isn't getting loaded, but I haven't yet been able to fix it.
@rohitma38 Your comment made me realize that is the same issue I am having when trying to train. I am getting separation outputs that were no different than the original audio, and mine also says that the estimator made no steps. I think verbose should be shown by default.
Hi @faroit, the code of spleeter is still moving quite a lot. So it may be a bit premature to put a maintanable fork of Spleeter in sigsep. But we may consider it in the near future.
@romi1502 okay, just keep me updated. I will leave this issue open till then
@romi1502 I want to reproduce results in paper. Can you share your config files used in paper?
It would nice of this issue could be addressed as part of the JOSS review. A reproducible MUSDB18 configuration could show how spleeters model and training pipeline would compare against other models when only trained on MUSDB18.
Hi @faroit,
I pushed a modified musdb_config.json config that trains properly on musdb and provide decent results:
| |SDR |SAR |SIR |ISR |
|-----------|--------|--------|---------|--------|
| Vocals |4.69 |4.94 |12.24 |9.64 |
| Drums |4.59 |4.79 |9.21 |8.60 |
| Bass |3.71 |5.02 |6.44 |9.31 |
| Other |3.10 |3.93 |5.23 |7.91 |
(Edit 25/05/2020: the table is erroneous due to a bug, see next post)
These results should be roughly reproducible using the command:
spleeter train -p configs/musdb_config.json -d <path to musdb>
Note that you might not get the exact same results as I'm not sure that there is any guarantee of ordering preservation in the data pipeline, but it should not change much.
Nothing was optimized in anyway (using data augmentation, hyper-parameter tuning...) to guarantee optimal performances of the model on musdb though, but it give an idea of what it can do when trained only on musdb.
Erratum
There was an error in the separation pipeline related to this issue which is in the process of being fixed.
Here is the actual table of results for the model trained on musdb with the musdb_config.json config:
| |SDR |SAR |SIR |ISR |
|-----------|--------|--------|---------|--------|
| Vocals |5.10 |5.44 |12.45 |9.58 |
| Drums |5.15 |5.25 |10.68 |8.89 |
| Bass |4.27 |5.42 |7.23 |9.38 |
| Other |3.21 |3.89 |5.37 |7.80 |
That's great! Make sure you update the tags for the JOSS paper
嘿@ syxu828
可能是因为培训完全没有完成,至少在我看来就是这样。您可以尝试运行带有“ -verbose”作为参数的spleeter train命令并查看末尾的日志消息吗?
我得到这样的东西-
WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.我怀疑音频没有被加载,但是我还不能修复它。
Hey @syxu828
It is probably because the training didn't go through at all, at least that's how it was in my case. Can you try running the spleeter train command with '--verbose' added as an argument and look at the log messages at the end?
I get something like this -
WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.I suspect that the audio isn't getting loaded, but I haven't yet been able to fix it.
I encountered the same problem, did you fix it?
嘿@ syxu828
可能是因为培训完全没有完成,至少在我看来就是这样。您可以尝试运行带有“ -verbose”作为参数的spleeter train命令并查看末尾的日志消息吗?
我得到这样的东西-
WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.
我怀疑音频没有被加载,但是我还不能修复它。Hey @syxu828
It is probably because the training didn't go through at all, at least that's how it was in my case. Can you try running the spleeter train command with '--verbose' added as an argument and look at the log messages at the end?
I get something like this -
WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.
I suspect that the audio isn't getting loaded, but I haven't yet been able to fix it.I encountered the same problem, did you fix it?
I think the issue is fixed now in the currently available version. I have since been able to train successfully on musdb.
Most helpful comment
Hey @syxu828
It is probably because the training didn't go through at all, at least that's how it was in my case. Can you try running the spleeter train command with '--verbose' added as an argument and look at the log messages at the end?
I get something like this -
WARNING:tensorflow:Training with estimator made no steps. Perhaps input is empty or misspecified.I suspect that the audio isn't getting loaded, but I haven't yet been able to fix it.