Spleeter: [Bug] Possibile memory leak in save

Created on 10 Jan 2020 · 14Comments · Source: deezer/spleeter

I have found a possibile memory leak in the ffmpeg adapter in spleeter_utils/audio/ffmpeg.py, when calling to save:

process = subprocess.Popen(
            command,
            stdout=open(os.devnull, 'wb'),
            stdin=subprocess.PIPE,
            stderr=subprocess.PIPE)

        # Write data to STDIN.
        try:
            process.stdin.write(data.astype('<f4').tostring())
        except IOError:
            raise IOError(f'FFMPEG error: {process.stderr.read()}')

        # Clean process.
        process.stdin.close()
        if process.stderr is not None:
            process.stderr.close()
        process.wait()

        get_logger().info('File %s written', path)

        ################################################################
        current_mem, peak_mem = tracemalloc.get_traced_memory()
        overhead = tracemalloc.get_tracemalloc_memory()
        summary = "traced memory: %d KiB  peak: %d KiB  overhead: %d KiB" % (
            int(current_mem // 1024), int(peak_mem // 1024), int(overhead // 1024)
        )
        print( "after save", summary )
        ################################################################

After consecutive calls the memory will not be de-allocated, and it consists of the whole block of the input file (~30MB each time):

before save traced memory: 31558 KiB  peak: 32248 KiB  overhead: 30933 KiB
after save traced memory: 31560 KiB  peak: 32946 KiB  overhead: 30935 KiB
before save traced memory: 63630 KiB  peak: 64324 KiB  overhead: 58610 KiB
after save traced memory: 63632 KiB  peak: 65018 KiB  overhead: 58611 KiB

bug

Source

loretoparisi

👍4

All 14 comments

I am having problems with TensorFlow and flask.
I using billiard instead of multiprocessing
I get this error
Allocation of 572063744 exceeds 10% of system memory

osvaldo1963 on 10 Jan 2020

👍1

@osvaldo1963 yes of course since memory it is not being released, I assume by Popen after some calls to the save you will get a OOM and then Killed eventually. To check your server memory issues please refers to https://github.com/tornadoweb/tornado/issues/2425
how to implement a memory handler in order to trace the stacktrace and memory...

loretoparisi on 10 Jan 2020

Here is the whole code to check this bug (replacing save in ffmpeg.py:

import tracemalloc
    def save(
            self, path, data, sample_rate,
            codec=None, bitrate=None):
        """ Write waveform data to the file denoted by the given path
        using FFMPEG process.

        :param path: Path of the audio file to save data in.
        :param data: Waveform data to write.
        :param sample_rate: Sample rate to write file in.
        :param codec: (Optional) Writing codec to use.
        :param bitrate: (Optional) Bitrate of the written audio file.
        :raise IOError: If any error occurs while using FFMPEG to write data.
        """

        ##### LP: START TRACING
        is_tracing = tracemalloc.is_tracing()
        if not is_tracing:
            nframe = 6
            tracemalloc.start(nframe)

        ################################################################
        current_mem, peak_mem = tracemalloc.get_traced_memory()
        overhead = tracemalloc.get_tracemalloc_memory()
        summary = "traced memory: %d KiB  peak: %d KiB  overhead: %d KiB" % (
            int(current_mem // 1024), int(peak_mem // 1024), int(overhead // 1024)
        )
        print( "before save", summary )
        ################################################################

        directory = os.path.split(path)[0]
        if not os.path.exists(directory):
            os.makedirs(directory)
        get_logger().debug('Writing file %s', path)
        # NOTE: Tweak.
        if codec == 'wav':
            codec = None
        command = (
            self._get_command_builder()
            .flag('-y')
            .opt('-loglevel', 'error')
            .opt('-f', 'f32le')
            .opt('-ar', sample_rate)
            .opt('-ac', data.shape[1])
            .opt('-i', '-')
            .flag('-vn')
            .opt('-acodec', codec)
            .opt('-ar', sample_rate)  # Note: why twice ?
            .opt('-strict', '-2')     # Note: For 'aac' codec support.
            .opt('-ab', bitrate)
            .flag(path)
            .command())

        process = subprocess.Popen(
            command,
            stdout=open(os.devnull, 'wb'),
            stdin=subprocess.PIPE,
            stderr=subprocess.PIPE)

        # Write data to STDIN.
        try:
            process.stdin.write(data.astype('<f4').tostring())
        except IOError:
            raise IOError(f'FFMPEG error: {process.stderr.read()}')

        # Clean process.
        process.stdin.close()
        if process.stderr is not None:
            process.stderr.close()
        process.wait()

        get_logger().info('File %s written', path)

        ################################################################
        current_mem, peak_mem = tracemalloc.get_traced_memory()
        overhead = tracemalloc.get_tracemalloc_memory()
        summary = "traced memory: %d KiB  peak: %d KiB  overhead: %d KiB" % (
            int(current_mem // 1024), int(peak_mem // 1024), int(overhead // 1024)
        )
        print( "after save", summary )
        ################################################################

Example output when called 3 times, resulting in allocating three time the same memory, without collecting it

before save traced memory: 0 KiB  peak: 0 KiB  overhead: 0 KiB
after save traced memory: 2 KiB  peak: 1412 KiB  overhead: 4 KiB
before save traced memory: 27426 KiB  peak: 28119 KiB  overhead: 28280 KiB
after save traced memory: 27428 KiB  peak: 28837 KiB  overhead: 28281 KiB
before save traced memory: 52668 KiB  peak: 53361 KiB  overhead: 53119 KiB
after save traced memory: 52670 KiB  peak: 54080 KiB  overhead: 53121 KiB

loretoparisi on 13 Jan 2020

Hi @loretoparisi

thanks a lot for pointing this out. We will try to reproduce. what OS are you on ?

mmoussallam on 13 Jan 2020

@mmoussallam hello, it runs within a Docker image using python:3.7.4-slim-buster that is basically the debian buster, with with TF 1.14 and python 3.7.4. By the way we are investigating another possibile cause that could be here: https://github.com/tensorflow/tensorflow/issues/35084

loretoparisi on 13 Jan 2020

[UPDATE]
We have found that using

tf.reset_default_graph()
tf.keras.backend.clear_session()

after every call to estimator.predict seems to free up some memory, so the leak is still there, but ti works better:

after prediction traced memory: 28681 KiB  peak: 28685 KiB  overhead: 29682 KiB
after load traced memory: 28813 KiB  peak: 28821 KiB  overhead: 29761 KiB
after prediction traced memory: 29551 KiB  peak: 37746 KiB  overhead: 29709 KiB
after load traced memory: 29674 KiB  peak: 37746 KiB  overhead: 29788 KiB
after prediction traced memory: 30221 KiB  peak: 40712 KiB  overhead: 29701 KiB
after load traced memory: 30352 KiB  peak: 40712 KiB  overhead: 29782 KiB
after prediction traced memory: 30967 KiB  peak: 44731 KiB  overhead: 37916 KiB
after load traced memory: 31084 KiB  peak: 44731 KiB  overhead: 37993 KiB
after prediction traced memory: 31662 KiB  peak: 44731 KiB  overhead: 29729 KiB
after load traced memory: 31779 KiB  peak: 44731 KiB  overhead: 29806 KiB
after prediction traced memory: 32354 KiB  peak: 44731 KiB  overhead: 29735 KiB
after load traced memory: 32471 KiB  peak: 44731 KiB  overhead: 29812 KiB
after prediction traced memory: 33046 KiB  peak: 44731 KiB  overhead: 29741 KiB
after load traced memory: 33163 KiB  peak: 44731 KiB  overhead: 29818 KiB

The ongoing leak is about 600-700KB, while without clearing the session it was about 24MB at every call.

This is the stack trace:

{
    "message": {
        "traceback": [{
                "memory": 2925,
                "blocks": 29192,
                "stack": [
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 246",
                    "    allow_broadcast=True)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 290",
                    "    name=name).outputs[0]",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py\", line 507",
                    "    return func(*args, **kwargs)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 3616",
                    "    op_def=op_def)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2005",
                    "    self._traceback = tf_stack.extract_stack()",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py\", line 64",
                    "    ret.append((filename, lineno, name, frame_globals, func_start_lineno))"
                ]
            },
            {
                "memory": 2756,
                "blocks": 12,
                "stack": [
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 1145",
                    "    as_ref=False)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 1224",
                    "    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 305",
                    "    return constant(v, dtype=dtype, name=name)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 246",
                    "    allow_broadcast=True)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 254",
                    "    t = convert_to_eager_tensor(value, ctx, dtype)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 115",
                    "    return ops.EagerTensor(value, handle, device, dtype)"
                ]
            },
            {
                "memory": 2188,
                "blocks": 21506,
                "stack": [
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/error_interpolation.py\", line 319",
                    "    for frame in op.traceback:",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2568",
                    "    return tf_stack.convert_stack(self._traceback)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py\", line 123",
                    "    line = linecache.getline(filename, lineno, frame_globals)",
                    "  File \"/usr/local/lib/python3.7/linecache.py\", line 16",
                    "    lines = getlines(filename, module_globals)",
                    "  File \"/usr/local/lib/python3.7/linecache.py\", line 47",
                    "    return updatecache(filename, module_globals)",
                    "  File \"/usr/local/lib/python3.7/linecache.py\", line 137",
                    "    lines = fp.readlines()"
                ]
            },
            {
                "memory": 1715,
                "blocks": 19490,
                "stack": [
                    "  File \"<frozen importlib._bootstrap>\", line 983",
                    "  File \"<frozen importlib._bootstrap>\", line 967",
                    "  File \"<frozen importlib._bootstrap>\", line 677",
                    "  File \"<frozen importlib._bootstrap_external>\", line 724",
                    "  File \"<frozen importlib._bootstrap_external>\", line 857",
                    "  File \"<frozen importlib._bootstrap_external>\", line 525"
                ]
            },
            {
                "memory": 1620,
                "blocks": 16132,
                "stack": [
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py\", line 587",
                    "    \"ReadVariableOp\", resource=resource, dtype=dtype, name=name)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py\", line 788",
                    "    op_def=op_def)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py\", line 507",
                    "    return func(*args, **kwargs)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 3616",
                    "    op_def=op_def)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2005",
                    "    self._traceback = tf_stack.extract_stack()",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py\", line 64",
                    "    ret.append((filename, lineno, name, frame_globals, func_start_lineno))"
                ]
            },
            {
                "memory": 1370,
                "blocks": 13764,
                "stack": [
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py\", line 1503",
                    "    \"VarIsInitializedOp\", resource=resource, name=name)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py\", line 788",
                    "    op_def=op_def)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py\", line 507",
                    "    return func(*args, **kwargs)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 3616",
                    "    op_def=op_def)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2005",
                    "    self._traceback = tf_stack.extract_stack()",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py\", line 64",
                    "    ret.append((filename, lineno, name, frame_globals, func_start_lineno))"
                ]
            },
            {
                "memory": 1148,
                "blocks": 11418,
                "stack": [
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/gen_control_flow_ops.py\", line 935",
                    "    \"Switch\", data=data, pred=pred, name=name)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py\", line 788",
                    "    op_def=op_def)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py\", line 507",
                    "    return func(*args, **kwargs)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 3616",
                    "    op_def=op_def)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2005",
                    "    self._traceback = tf_stack.extract_stack()",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py\", line 64",
                    "    ret.append((filename, lineno, name, frame_globals, func_start_lineno))"
                ]
            },
            {
                "memory": 1140,
                "blocks": 5830,
                "stack": [
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py\", line 788",
                    "    op_def=op_def)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py\", line 507",
                    "    return func(*args, **kwargs)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 3616",
                    "    op_def=op_def)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2037",
                    "    for i, output_type in enumerate(output_types)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2037",
                    "    for i, output_type in enumerate(output_types)",
                    "  File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 357",
                    "    self._consumers = []"
                ]
            }
        ]
    }
}

loretoparisi on 13 Jan 2020

[UPDATE]
This issue seems to be definitively related to a bug in eager execution with tf.py_function:

results = tf.py_function(
                    func=self.safe_load,
                    inp=[audio_descriptor, offset, duration, sample_rate, dtype],
                    Tout=(tf.float32, tf.bool)),
                waveform, error = results[0]

The workaround using

tf.reset_default_graph()
tf.keras.backend.clear_session()

saved some memory (the leaked memory went down from the whole data leaked about ~24MB at call to a smaller ~600-700KB), but it is still there.

loretoparisi on 13 Jan 2020

thanks a lot for the investigation. Since it's deep into the internals of tensorflow I guess we'll just wait for a fix on their side and bump the version.

mmoussallam on 13 Jan 2020

❤1

@mmoussallam yes it makes sense. I'm not sure if there is any other workaround than resetting the session and graph btw.

loretoparisi on 14 Jan 2020

@loretoparisi Could you please show exactly where you called tf.reset_default_graph() and tf.keras.backend.clear_session() in order to help with this issue? Would be really helpful! Thanks!

jujuvetus on 28 Jan 2020

In the last version of the source code I did it at the end of the api separate_to_file of class Separator:

def separate_to_file(
            self, audio_descriptor, destination,
            audio_adapter=get_default_audio_adapter(),
            offset=0, duration=600., codec='wav', bitrate='128k',
            filename_format='{filename}/{instrument}.{codec}',
            synchronous=True):
        """ Performs source separation and export result to file using
        given audio adapter.

        Filename format should be a Python formattable string that could use
        following parameters : {instrument}, {filename} and {codec}.

        :param audio_descriptor:    Describe song to separate, used by audio
                                    adapter to retrieve and load audio data,
                                    in case of file based audio adapter, such
                                    descriptor would be a file path.
        :param destination:         Target directory to write output to.
        :param audio_adapter:       (Optional) Audio adapter to use for I/O.
        :param offset:              (Optional) Offset of loaded song.
        :param duration:            (Optional) Duration of loaded song.
        :param codec:               (Optional) Export codec.
        :param bitrate:             (Optional) Export bitrate.
        :param filename_format:     (Optional) Filename format.
        :param synchronous:         (Optional) True is should by synchronous.
        """
        waveform, _ = audio_adapter.load(
            audio_descriptor,
            offset=offset,
            duration=duration,
            sample_rate=self._sample_rate)

        with self.tf_session.as_default():
            with self.tf_session.graph.as_default():
                sources = self.separate(waveform)

        filename = splitext(basename(audio_descriptor))[0]
        generated = []

        for instrument, data in sources.items():

            if instrument == 'vocals':
                path = join(destination, filename_format.format(
                    filename=filename,
                    instrument=instrument,
                    codec=codec))

                audio_adapter.save(path, data, self._sample_rate, codec, bitrate)

         # clean up things
        tf.reset_default_graph()
        tf.keras.backend.clear_session()

loretoparisi on 28 Jan 2020

👍1

@loretoparisi Amazing! Cheers for the fast answer. I'll give this a shot.

jujuvetus on 28 Jan 2020

👍1

Hi everyone, I don't know if this is the right thread to explain my issue, but since it is memory related I'd thought I'd use this topic, instead of creating a new issue.
I am currently running a windows 10 machine, I have 4 GB of ram.
I also have a Ubuntu system that I use from time to time, when I use Linux with a 16GB swap (virtual ram), I can split very long songs, 5 mins plus with no problem, however on Windows, I run out of ram at the 4 minute mark.
Is the issue related to my physical memory, but if so, How did I manage to split files on my Linux distro with no problem?
Thanks and regards.