Spleeter: Can this be used as a noise cleansing from conversation audio file?

Created on 5 Nov 2019  路  2Comments  路  Source: deezer/spleeter

Hello Team,
Thank you for your effort in building such a fantastic package. This is not a bug , its more of understanding what this package can do.
Please put your suggestion if spleeter can be used for noise cleansing from audio files. As per my understanding / analogy if I consider noise as music and conversation as singer's voice, then can I separate noise from audio conversation using spleeter.
Your input will be helpful.

question

Most helpful comment

I am not affiliated with the project, but FWIW :

I grabbed 1:00 to 3:00 of the audio of this BMXer recording himself on a GoPro camera, riding a bike around Los Angeles and speaking to the camera.

https://www.youtube.com/watch?v=a0vgJ3TeJzA

I then ran it through the 2stem spleeter :

spleeter separate -o ./audio_output/ -p spleeter:2stems -i input/00-YOUTUBE/a0vgJ3TeJzA.flac

I then merged the two files into one, with each file panned hard right and left :

ffmpeg -i vocals.wav -i accompaniment.wav -filter_complex "[0:a][1:a]amerge=inputs=2,pan=stereo|c0<c0+c1|c1<c2+c3[a]" -map "[a]" stereo.split.accomp.and.vocals.wav

The result was the bicyclist talking in my left ear, and a bunch of background noise in my right ear. When I removed the right ear, I heard the speaking quite clearly in my left ear. When I removed the left ear, I heard almost no speaking in the right ear. The only time I heard his voice in the right ear was when it was mistakable for a percussive drum sound. I sometimes heard people other than him speaking in the left ear, when he biked past people who were talking.

I believe this is sufficient proof of the concept of using spleeter's 2stem model to isolate spoken human voices from recordings with background noise. Try it yourself and see!

All 2 comments

I am not affiliated with the project, but FWIW :

I grabbed 1:00 to 3:00 of the audio of this BMXer recording himself on a GoPro camera, riding a bike around Los Angeles and speaking to the camera.

https://www.youtube.com/watch?v=a0vgJ3TeJzA

I then ran it through the 2stem spleeter :

spleeter separate -o ./audio_output/ -p spleeter:2stems -i input/00-YOUTUBE/a0vgJ3TeJzA.flac

I then merged the two files into one, with each file panned hard right and left :

ffmpeg -i vocals.wav -i accompaniment.wav -filter_complex "[0:a][1:a]amerge=inputs=2,pan=stereo|c0<c0+c1|c1<c2+c3[a]" -map "[a]" stereo.split.accomp.and.vocals.wav

The result was the bicyclist talking in my left ear, and a bunch of background noise in my right ear. When I removed the right ear, I heard the speaking quite clearly in my left ear. When I removed the left ear, I heard almost no speaking in the right ear. The only time I heard his voice in the right ear was when it was mistakable for a percussive drum sound. I sometimes heard people other than him speaking in the left ear, when he biked past people who were talking.

I believe this is sufficient proof of the concept of using spleeter's 2stem model to isolate spoken human voices from recordings with background noise. Try it yourself and see!

Hi @AIGyan

We haven't done any sort of evaluation on this task, nor was our model trained on such examples. Speech enhancement and denoising being an active research field, I assume there are more specialized tools to do that out there.

That being said I can only agree with @awesomer feel free to try it out and let us know what you find!

Was this page helpful?
0 / 5 - 0 ratings