I have found a possibile memory leak in the ffmpeg adapter in spleeter_utils/audio/ffmpeg.py, when calling to save:
process = subprocess.Popen(
command,
stdout=open(os.devnull, 'wb'),
stdin=subprocess.PIPE,
stderr=subprocess.PIPE)
# Write data to STDIN.
try:
process.stdin.write(data.astype('<f4').tostring())
except IOError:
raise IOError(f'FFMPEG error: {process.stderr.read()}')
# Clean process.
process.stdin.close()
if process.stderr is not None:
process.stderr.close()
process.wait()
get_logger().info('File %s written', path)
################################################################
current_mem, peak_mem = tracemalloc.get_traced_memory()
overhead = tracemalloc.get_tracemalloc_memory()
summary = "traced memory: %d KiB peak: %d KiB overhead: %d KiB" % (
int(current_mem // 1024), int(peak_mem // 1024), int(overhead // 1024)
)
print( "after save", summary )
################################################################
After consecutive calls the memory will not be de-allocated, and it consists of the whole block of the input file (~30MB each time):
before save traced memory: 31558 KiB peak: 32248 KiB overhead: 30933 KiB
after save traced memory: 31560 KiB peak: 32946 KiB overhead: 30935 KiB
before save traced memory: 63630 KiB peak: 64324 KiB overhead: 58610 KiB
after save traced memory: 63632 KiB peak: 65018 KiB overhead: 58611 KiB
I am having problems with TensorFlow and flask.
I using billiard instead of multiprocessing
I get this error
Allocation of 572063744 exceeds 10% of system memory
@osvaldo1963 yes of course since memory it is not being released, I assume by Popen after some calls to the save you will get a OOM and then Killed eventually. To check your server memory issues please refers to https://github.com/tornadoweb/tornado/issues/2425
how to implement a memory handler in order to trace the stacktrace and memory...
Here is the whole code to check this bug (replacing save in ffmpeg.py:
import tracemalloc
def save(
self, path, data, sample_rate,
codec=None, bitrate=None):
""" Write waveform data to the file denoted by the given path
using FFMPEG process.
:param path: Path of the audio file to save data in.
:param data: Waveform data to write.
:param sample_rate: Sample rate to write file in.
:param codec: (Optional) Writing codec to use.
:param bitrate: (Optional) Bitrate of the written audio file.
:raise IOError: If any error occurs while using FFMPEG to write data.
"""
##### LP: START TRACING
is_tracing = tracemalloc.is_tracing()
if not is_tracing:
nframe = 6
tracemalloc.start(nframe)
################################################################
current_mem, peak_mem = tracemalloc.get_traced_memory()
overhead = tracemalloc.get_tracemalloc_memory()
summary = "traced memory: %d KiB peak: %d KiB overhead: %d KiB" % (
int(current_mem // 1024), int(peak_mem // 1024), int(overhead // 1024)
)
print( "before save", summary )
################################################################
directory = os.path.split(path)[0]
if not os.path.exists(directory):
os.makedirs(directory)
get_logger().debug('Writing file %s', path)
# NOTE: Tweak.
if codec == 'wav':
codec = None
command = (
self._get_command_builder()
.flag('-y')
.opt('-loglevel', 'error')
.opt('-f', 'f32le')
.opt('-ar', sample_rate)
.opt('-ac', data.shape[1])
.opt('-i', '-')
.flag('-vn')
.opt('-acodec', codec)
.opt('-ar', sample_rate) # Note: why twice ?
.opt('-strict', '-2') # Note: For 'aac' codec support.
.opt('-ab', bitrate)
.flag(path)
.command())
process = subprocess.Popen(
command,
stdout=open(os.devnull, 'wb'),
stdin=subprocess.PIPE,
stderr=subprocess.PIPE)
# Write data to STDIN.
try:
process.stdin.write(data.astype('<f4').tostring())
except IOError:
raise IOError(f'FFMPEG error: {process.stderr.read()}')
# Clean process.
process.stdin.close()
if process.stderr is not None:
process.stderr.close()
process.wait()
get_logger().info('File %s written', path)
################################################################
current_mem, peak_mem = tracemalloc.get_traced_memory()
overhead = tracemalloc.get_tracemalloc_memory()
summary = "traced memory: %d KiB peak: %d KiB overhead: %d KiB" % (
int(current_mem // 1024), int(peak_mem // 1024), int(overhead // 1024)
)
print( "after save", summary )
################################################################
Example output when called 3 times, resulting in allocating three time the same memory, without collecting it
before save traced memory: 0 KiB peak: 0 KiB overhead: 0 KiB
after save traced memory: 2 KiB peak: 1412 KiB overhead: 4 KiB
before save traced memory: 27426 KiB peak: 28119 KiB overhead: 28280 KiB
after save traced memory: 27428 KiB peak: 28837 KiB overhead: 28281 KiB
before save traced memory: 52668 KiB peak: 53361 KiB overhead: 53119 KiB
after save traced memory: 52670 KiB peak: 54080 KiB overhead: 53121 KiB
Hi @loretoparisi
thanks a lot for pointing this out. We will try to reproduce. what OS are you on ?
@mmoussallam hello, it runs within a Docker image using python:3.7.4-slim-buster that is basically the debian buster, with with TF 1.14 and python 3.7.4. By the way we are investigating another possibile cause that could be here: https://github.com/tensorflow/tensorflow/issues/35084
[UPDATE]
We have found that using
tf.reset_default_graph()
tf.keras.backend.clear_session()
after every call to estimator.predict seems to free up some memory, so the leak is still there, but ti works better:
after prediction traced memory: 28681 KiB peak: 28685 KiB overhead: 29682 KiB
after load traced memory: 28813 KiB peak: 28821 KiB overhead: 29761 KiB
after prediction traced memory: 29551 KiB peak: 37746 KiB overhead: 29709 KiB
after load traced memory: 29674 KiB peak: 37746 KiB overhead: 29788 KiB
after prediction traced memory: 30221 KiB peak: 40712 KiB overhead: 29701 KiB
after load traced memory: 30352 KiB peak: 40712 KiB overhead: 29782 KiB
after prediction traced memory: 30967 KiB peak: 44731 KiB overhead: 37916 KiB
after load traced memory: 31084 KiB peak: 44731 KiB overhead: 37993 KiB
after prediction traced memory: 31662 KiB peak: 44731 KiB overhead: 29729 KiB
after load traced memory: 31779 KiB peak: 44731 KiB overhead: 29806 KiB
after prediction traced memory: 32354 KiB peak: 44731 KiB overhead: 29735 KiB
after load traced memory: 32471 KiB peak: 44731 KiB overhead: 29812 KiB
after prediction traced memory: 33046 KiB peak: 44731 KiB overhead: 29741 KiB
after load traced memory: 33163 KiB peak: 44731 KiB overhead: 29818 KiB
The ongoing leak is about 600-700KB, while without clearing the session it was about 24MB at every call.
This is the stack trace:
{
"message": {
"traceback": [{
"memory": 2925,
"blocks": 29192,
"stack": [
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 246",
" allow_broadcast=True)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 290",
" name=name).outputs[0]",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py\", line 507",
" return func(*args, **kwargs)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 3616",
" op_def=op_def)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2005",
" self._traceback = tf_stack.extract_stack()",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py\", line 64",
" ret.append((filename, lineno, name, frame_globals, func_start_lineno))"
]
},
{
"memory": 2756,
"blocks": 12,
"stack": [
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 1145",
" as_ref=False)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 1224",
" ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 305",
" return constant(v, dtype=dtype, name=name)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 246",
" allow_broadcast=True)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 254",
" t = convert_to_eager_tensor(value, ctx, dtype)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py\", line 115",
" return ops.EagerTensor(value, handle, device, dtype)"
]
},
{
"memory": 2188,
"blocks": 21506,
"stack": [
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/error_interpolation.py\", line 319",
" for frame in op.traceback:",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2568",
" return tf_stack.convert_stack(self._traceback)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py\", line 123",
" line = linecache.getline(filename, lineno, frame_globals)",
" File \"/usr/local/lib/python3.7/linecache.py\", line 16",
" lines = getlines(filename, module_globals)",
" File \"/usr/local/lib/python3.7/linecache.py\", line 47",
" return updatecache(filename, module_globals)",
" File \"/usr/local/lib/python3.7/linecache.py\", line 137",
" lines = fp.readlines()"
]
},
{
"memory": 1715,
"blocks": 19490,
"stack": [
" File \"<frozen importlib._bootstrap>\", line 983",
" File \"<frozen importlib._bootstrap>\", line 967",
" File \"<frozen importlib._bootstrap>\", line 677",
" File \"<frozen importlib._bootstrap_external>\", line 724",
" File \"<frozen importlib._bootstrap_external>\", line 857",
" File \"<frozen importlib._bootstrap_external>\", line 525"
]
},
{
"memory": 1620,
"blocks": 16132,
"stack": [
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py\", line 587",
" \"ReadVariableOp\", resource=resource, dtype=dtype, name=name)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py\", line 788",
" op_def=op_def)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py\", line 507",
" return func(*args, **kwargs)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 3616",
" op_def=op_def)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2005",
" self._traceback = tf_stack.extract_stack()",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py\", line 64",
" ret.append((filename, lineno, name, frame_globals, func_start_lineno))"
]
},
{
"memory": 1370,
"blocks": 13764,
"stack": [
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py\", line 1503",
" \"VarIsInitializedOp\", resource=resource, name=name)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py\", line 788",
" op_def=op_def)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py\", line 507",
" return func(*args, **kwargs)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 3616",
" op_def=op_def)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2005",
" self._traceback = tf_stack.extract_stack()",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py\", line 64",
" ret.append((filename, lineno, name, frame_globals, func_start_lineno))"
]
},
{
"memory": 1148,
"blocks": 11418,
"stack": [
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/gen_control_flow_ops.py\", line 935",
" \"Switch\", data=data, pred=pred, name=name)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py\", line 788",
" op_def=op_def)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py\", line 507",
" return func(*args, **kwargs)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 3616",
" op_def=op_def)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2005",
" self._traceback = tf_stack.extract_stack()",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py\", line 64",
" ret.append((filename, lineno, name, frame_globals, func_start_lineno))"
]
},
{
"memory": 1140,
"blocks": 5830,
"stack": [
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py\", line 788",
" op_def=op_def)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py\", line 507",
" return func(*args, **kwargs)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 3616",
" op_def=op_def)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2037",
" for i, output_type in enumerate(output_types)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 2037",
" for i, output_type in enumerate(output_types)",
" File \"/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\", line 357",
" self._consumers = []"
]
}
]
}
}
[UPDATE]
This issue seems to be definitively related to a bug in eager execution with tf.py_function:
results = tf.py_function(
func=self.safe_load,
inp=[audio_descriptor, offset, duration, sample_rate, dtype],
Tout=(tf.float32, tf.bool)),
waveform, error = results[0]
The workaround using
tf.reset_default_graph()
tf.keras.backend.clear_session()
saved some memory (the leaked memory went down from the whole data leaked about ~24MB at call to a smaller ~600-700KB), but it is still there.
thanks a lot for the investigation. Since it's deep into the internals of tensorflow I guess we'll just wait for a fix on their side and bump the version.
@mmoussallam yes it makes sense. I'm not sure if there is any other workaround than resetting the session and graph btw.
@loretoparisi Could you please show exactly where you called tf.reset_default_graph() and tf.keras.backend.clear_session() in order to help with this issue? Would be really helpful! Thanks!
In the last version of the source code I did it at the end of the api separate_to_file of class Separator:
def separate_to_file(
self, audio_descriptor, destination,
audio_adapter=get_default_audio_adapter(),
offset=0, duration=600., codec='wav', bitrate='128k',
filename_format='{filename}/{instrument}.{codec}',
synchronous=True):
""" Performs source separation and export result to file using
given audio adapter.
Filename format should be a Python formattable string that could use
following parameters : {instrument}, {filename} and {codec}.
:param audio_descriptor: Describe song to separate, used by audio
adapter to retrieve and load audio data,
in case of file based audio adapter, such
descriptor would be a file path.
:param destination: Target directory to write output to.
:param audio_adapter: (Optional) Audio adapter to use for I/O.
:param offset: (Optional) Offset of loaded song.
:param duration: (Optional) Duration of loaded song.
:param codec: (Optional) Export codec.
:param bitrate: (Optional) Export bitrate.
:param filename_format: (Optional) Filename format.
:param synchronous: (Optional) True is should by synchronous.
"""
waveform, _ = audio_adapter.load(
audio_descriptor,
offset=offset,
duration=duration,
sample_rate=self._sample_rate)
with self.tf_session.as_default():
with self.tf_session.graph.as_default():
sources = self.separate(waveform)
filename = splitext(basename(audio_descriptor))[0]
generated = []
for instrument, data in sources.items():
if instrument == 'vocals':
path = join(destination, filename_format.format(
filename=filename,
instrument=instrument,
codec=codec))
audio_adapter.save(path, data, self._sample_rate, codec, bitrate)
# clean up things
tf.reset_default_graph()
tf.keras.backend.clear_session()
@loretoparisi Amazing! Cheers for the fast answer. I'll give this a shot.
Hi everyone, I don't know if this is the right thread to explain my issue, but since it is memory related I'd thought I'd use this topic, instead of creating a new issue.
I am currently running a windows 10 machine, I have 4 GB of ram.
I also have a Ubuntu system that I use from time to time, when I use Linux with a 16GB swap (virtual ram), I can split very long songs, 5 mins plus with no problem, however on Windows, I run out of ram at the 4 minute mark.
Is the issue related to my physical memory, but if so, How did I manage to split files on my Linux distro with no problem?
Thanks and regards.
This bug should be fixed now that we've moved on to TF 2