from deepspeech import Model
model = Model(pbmm_local_path, 512)
model.enableDecoderWithLM(
lm_local_path,
trie_local_path,
alpha,
beta,
)
del model
Description:
Repeated calls to the above command causes about 4 mb memory leak per 10 calls.
I have an application where variously different models are loaded for ASR depending on the context and purpose.
After about 2-3 thousand requests or so, the machine runs out of memory.
Only restarting the process will free the memory.
I saw that there is this thread https://github.com/mozilla/DeepSpeech/issues/2403, which confirmed leakage on C++, python and .Net client, but only .Net is fixed?
I'm wondering if this is still a known issue on python, and whether there are plans to fix it?
If not, any recommendations or workarounds?
Does the problem go away if you call model.__del__() directly? Maybe we should expose the destructor explicitly instead of only relying on Python GC.
Yes I did try calling __del__() directly, the result is the same.
Can you reproduce with 0.7.0? I can't.
I'll spend some time to get 0.7.0 running.
Any reference as to how this issue is fixed between 0.6.0 and 0.7.0?
The thread above indicates that this was an issue at least as of early 0.6.0-alpha
The linked thread was fixed, I don't think it's related to what you're seeing.
Maybe I'm interpreting this message incorrectly.
https://github.com/mozilla/DeepSpeech/issues/2403#issuecomment-544085038
It says fixed for .Net client, but "looks like the C++ client is not releasing completely".
Can you confirm that it is fixed for the python client? As far as that thread was concerned?
What I'm seeing may just be a side-effect of python garbage collector still keeping some allocation.
If the other thread is fixed for python client, then we can close this one. I don't have strong evidence that it is from deepspeech and not just the behavior from python.
Maybe I'm interpreting this message incorrectly.
#2403 (comment)It says fixed for .Net client, but "looks like the C++ client is not releasing completely".
Can you confirm that it is fixed for the python client? As far as that thread was concerned?What I'm seeing may just be a side-effect of python garbage collector still keeping some allocation.
If the other thread is fixed for python client, then we can close this one. I don't have strong evidence that it is from deepspeech and not just the behavior from python.
I've linked you the valgrind proof. Please, try to reproduce on current 0.7.1 alpha.
@khu834
Please @khu834 ?
Sure, sorry for the delay, been occupied with chasing other leads.
Will do this in the next week.
On Fri, May 15, 2020 at 3:40 AM lissyx notifications@github.com wrote:
Please @khu834 https://github.com/khu834 ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/2967#issuecomment-629164604,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACJEOTMTSHMITA7ZW73LUKTRRUL2RANCNFSM4MWZUONQ
.
@khu834 Gentle ping, can you shed more light on that?
My apologies, just getting to this today.
Here's what I have so far running in 0.6.1 (will be doing 0.7.1 next as I have not set that up yet)
python code ("mem_test.py")
from deepspeech import Model
for i in range(5):
ds = Model('am/export/model.pbmm', 512)
_ = ds.enableDecoderWithLM(
'lm/lm.binary',
'lm/trie',
0.75,
1.85
)
ds.__del__()
valgrind command
valgrind --tool=memcheck --suppressions=valgrind-python.supp python -E -tt mem_test.py
I downloaded the suppression file from http://svn.python.org/projects/python/trunk/Misc/valgrind-python.supp and uncommented the lines for PyObject_Free and PyObject_Realloc
Here's the summary:
==1838== HEAP SUMMARY:
==1838== in use at exit: 5,676,567 bytes in 71,766 blocks
==1838== total heap usage: 331,228 allocs, 259,462 frees, 44,720,754 bytes allocated
==1838==
==1838== LEAK SUMMARY:
==1838== definitely lost: 14,382 bytes in 10 blocks
==1838== indirectly lost: 1,179,029 bytes in 16,037 blocks
==1838== possibly lost: 599,137 bytes in 4,979 blocks
==1838== still reachable: 3,884,019 bytes in 50,740 blocks
==1838== of which reachable via heuristic:
==1838== stdstring : 657,315 bytes in 16,518 blocks
==1838== newarray : 15,328 bytes in 61 blocks
==1838== suppressed: 0 bytes in 0 blocks
==1838== Rerun with --leak-check=full to see details of leaked memory
==1838==
==1838== For counts of detected and suppressed errors, rerun with: -v
==1838== Use --track-origins=yes to see where uninitialised values come from
==1838== ERROR SUMMARY: 4790 errors from 561 contexts (suppressed: 0 from 0)
Let me know if my methodologies is sound. I'll continue with 0.7.1 and report back.
Only took 10 minutes to get 0.7.1 set up! Love it.
python code ("mem_test.py")
from deepspeech import Model
for i in range(5):
ds = Model('am/deepspeech-0.7.1-models.pbmm')
ds.enableExternalScorer('lm/deepspeech-0.7.1-models.scorer')
ds.setScorerAlphaBeta(0.75, 1.85)
ds.__del__()
valgrind command
valgrind --tool=memcheck --suppressions=valgrind-python.supp python -E -tt mem_test.py
Here's the summary:
==2441== HEAP SUMMARY:
==2441== in use at exit: 3,520,936 bytes in 36,098 blocks
==2441== total heap usage: 285,224 allocs, 249,126 frees, 57,586,141 bytes allocated
==2441==
==2441== LEAK SUMMARY:
==2441== definitely lost: 70 bytes in 1 blocks
==2441== indirectly lost: 0 bytes in 0 blocks
==2441== possibly lost: 347,687 bytes in 1,561 blocks
==2441== still reachable: 3,173,179 bytes in 34,536 blocks
==2441== of which reachable via heuristic:
==2441== stdstring : 394,239 bytes in 11,236 blocks
==2441== newarray : 3,792 bytes in 9 blocks
==2441== suppressed: 0 bytes in 0 blocks
==2441== Rerun with --leak-check=full to see details of leaked memory
==2441==
==2441== For counts of detected and suppressed errors, rerun with: -v
==2441== Use --track-origins=yes to see where uninitialised values come from
==2441== ERROR SUMMARY: 6392 errors from 581 contexts (suppressed: 0 from 0)
Am I interpreting this correct in that there is memory leak in 0.6.1 but seems that it's no longer an issue in 0.7.1?
Or is the "possibly lost" blocks a concern still?
Or is the "possibly lost" blocks a concern still?
Without the details, it's hard to tell for sure, but it should be better to have 0. However, you tested the python bindings, and it could come from a lot of places there, even with the allocs things you tweaked.
@khu834 Do you see leak when running the C++ client ?
@khu834 Do you see leak when running the C++ client ?
Will try that next. I have not ran the C++ client before, so I'll need to set that up first.
Here is the info on setting it up
Here's what I got with C++ client 0.7.1
valgrind --tool=memcheck ./deepspeech --model am/deepspeech-0.7.1-models.pbmm --scorer lm/deepspeech-0.7.1-models.scorer --audio ~/mem_tests/000de1bf-621c-4324-8f61-2736990b3467_8.wav
Seems that valgrind crashed?
==22593== Memcheck, a memory error detector
==22593== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==22593== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==22593== Command: ./deepspeech --model am/deepspeech-0.7.1-models.pbmm --scorer lm/deepspeech-0.7.1-models.scorer --audio /home/khuang/mem_tests/000de1bf-621c-4324-8f61-2736990b3467_8.wav
==22593==
TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.1-0-g2e9c281
2020-06-04 22:17:36.176129: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
==22593== Warning: set address range perms: large range [0x395f8000, 0x71794000) (defined)
valgrind: m_translate.c:1767 (vgPlain_translate): Assertion 'tres.status == VexTransOK' failed.
host stacktrace:
==22593== at 0x38083828: show_sched_status_wrk (m_libcassert.c:343)
==22593== by 0x38083944: report_and_quit (m_libcassert.c:419)
==22593== by 0x38083AD1: vgPlain_assert_fail (m_libcassert.c:485)
==22593== by 0x380A2006: vgPlain_translate (m_translate.c:1767)
==22593== by 0x380D547B: handle_chain_me (scheduler.c:1076)
==22593== by 0x380D6FEF: vgPlain_scheduler (scheduler.c:1420)
==22593== by 0x380E6416: thread_wrapper (syswrap-linux.c:103)
==22593== by 0x380E6416: run_a_thread_NORETURN (syswrap-linux.c:156)
==22593== by 0x380E68DA: vgModuleLocal_start_thread_NORETURN (syswrap-linux.c:325)
==22593== by 0x3810F44D: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==22593== by 0xDEADBEEFDEADBEEE: ???
==22593== by 0xDEADBEEFDEADBEEE: ???
==22593== by 0xDEADBEEFDEADBEEE: ???
sched status:
running_tid=4
Thread 1: status = VgTs_WaitSys (lwpid 22593)
==22593== at 0x7ED6469: syscall (syscall.S:38)
==22593== by 0x65A9323: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x65A9120: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x65A76C3: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x65A7C20: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x536373A: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x536378A: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x536F598: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x537901C: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x53756B4: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x535E17A: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5359BBA: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x535A846: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x535B366: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x535B3DB: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x535B6DF: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x535BDBE: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x535BDD8: DS_SpeechToText (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x407D61: LocalDsSTT(ModelState*, short const*, unsigned long, bool, bool) (in /home/khuang/mem_tests/ds7/deepspeech)
==22593== by 0x408311: ProcessFile(ModelState*, char const*, bool) (in /home/khuang/mem_tests/ds7/deepspeech)
==22593== by 0x4085DF: main (in /home/khuang/mem_tests/ds7/deepspeech)
Thread 2: status = VgTs_WaitSys (lwpid 22594)
==22593== at 0x7BE370F: futex_wait (futex-internal.h:61)
==22593== by 0x7BE370F: futex_wait_simple (futex-internal.h:135)
==22593== by 0x7BE370F: __pthread_once_slow (pthread_once.c:105)
==22593== by 0x5ED56D5: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5ED58DF: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5ED61D8: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFB17F: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFB5DB: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFA705: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFA82B: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x54B87D8: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5510C86: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AEF61: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AF96E: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58B05D9: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AFFA4: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AFEAE: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x65940CB: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x6593193: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x71C51FF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x7BDC4A3: start_thread (pthread_create.c:456)
==22593== by 0x7EDAD0E: clone (clone.S:97)
Thread 3: status = VgTs_WaitSys (lwpid 22595)
==22593== at 0x7BE370F: futex_wait (futex-internal.h:61)
==22593== by 0x7BE370F: futex_wait_simple (futex-internal.h:135)
==22593== by 0x7BE370F: __pthread_once_slow (pthread_once.c:105)
==22593== by 0x5ED56D5: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5ED58DF: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5ED61D8: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFB17F: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFB5DB: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFA705: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFA82B: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x54B87D8: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5510C86: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AEF61: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AF96E: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58B05D9: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AFFA4: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AFEAE: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x65940CB: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x6593193: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x71C51FF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x7BDC4A3: start_thread (pthread_create.c:456)
==22593== by 0x7EDAD0E: clone (clone.S:97)
Thread 4: status = VgTs_Runnable (lwpid 22596)
==22593== at 0x40B7AD1: ???
==22593== by 0x5F: ???
==22593== by 0xF: ???
Thread 5: status = VgTs_WaitSys (lwpid 22597)
==22593== at 0x7BE217F: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==22593== by 0x71BF50B: std::condition_variable::wait(std::unique_lock<std::mutex>&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x6593C9A: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x65942CA: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x6593193: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x71C51FF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x7BDC4A3: start_thread (pthread_create.c:456)
==22593== by 0x7EDAD0E: clone (clone.S:97)
Thread 6: status = VgTs_WaitSys (lwpid 22598)
==22593== at 0x7BE217F: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==22593== by 0x71BF50B: std::condition_variable::wait(std::unique_lock<std::mutex>&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x6593C9A: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x65942CA: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x6593193: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x71C51FF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x7BDC4A3: start_thread (pthread_create.c:456)
==22593== by 0x7EDAD0E: clone (clone.S:97)
Thread 7: status = VgTs_WaitSys (lwpid 22599)
==22593== at 0x7BE3780: futex_wake (futex-internal.h:231)
==22593== by 0x7BE3780: __pthread_once_slow (pthread_once.c:127)
==22593== by 0x5ED56D5: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5ED58DF: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5ED61D8: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFB17F: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFB5DB: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFA705: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5EFA82B: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x54B87D8: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5510C86: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AEF61: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AF96E: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58B05D9: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AFFA4: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AFEAE: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AFD0A: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58AFF99: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58B01AD: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58BC6AD: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58BDF4A: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58E259D: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x58E2EB2: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x63B0FA4: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x63A1EBF: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x65940CB: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x6593193: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x71C51FF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x7BDC4A3: start_thread (pthread_create.c:456)
==22593== by 0x7EDAD0E: clone (clone.S:97)
Thread 8: status = VgTs_WaitSys (lwpid 22600)
==22593== at 0x7BE217F: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==22593== by 0x71BF50B: std::condition_variable::wait(std::unique_lock<std::mutex>&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x54042CA: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5404392: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5B791D0: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x5BB0F86: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x63B0FA4: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x63A1EBF: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x65940CB: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x6593193: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x71C51FF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x7BDC4A3: start_thread (pthread_create.c:456)
==22593== by 0x7EDAD0E: clone (clone.S:97)
Thread 9: status = VgTs_WaitSys (lwpid 22601)
==22593== at 0x7BE217F: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==22593== by 0x71BF50B: std::condition_variable::wait(std::unique_lock<std::mutex>&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x6593C9A: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x65942CA: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x6593193: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x71C51FF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x7BDC4A3: start_thread (pthread_create.c:456)
==22593== by 0x7EDAD0E: clone (clone.S:97)
Thread 10: status = VgTs_WaitSys (lwpid 22602)
==22593== at 0x7BE217F: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==22593== by 0x71BF50B: std::condition_variable::wait(std::unique_lock<std::mutex>&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x6593C9A: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x659440B: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x6593193: ??? (in /home/khuang/mem_tests/ds7/libdeepspeech.so)
==22593== by 0x71C51FF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==22593== by 0x7BDC4A3: start_thread (pthread_create.c:456)
==22593== by 0x7EDAD0E: clone (clone.S:97)
Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.
If that doesn't help, please report this bug to: www.valgrind.org
In the bug report, send all the above text, the valgrind
version, and what OS and version you are using. Thanks.
same behavior on 0.7.3
Seems that valgrind crashed?
Problem on your system, it works well here.
See, full TensorFlow:
[...]
==3876339== LEAK SUMMARY:
==3876339== definitely lost: 24 bytes in 1 blocks
==3876339== indirectly lost: 0 bytes in 0 blocks
==3876339== possibly lost: 330,996 bytes in 1,472 blocks
==3876339== still reachable: 1,631,169 bytes in 33,986 blocks
==3876339== of which reachable via heuristic:
==3876339== stdstring : 396,275 bytes in 11,298 blocks
==3876339== newarray : 4,672 bytes in 11 blocks
==3876339== suppressed: 0 bytes in 0 blocks
And TFLite:
[...]
==3878526== LEAK SUMMARY:
==3878526== definitely lost: 24 bytes in 1 blocks
==3878526== indirectly lost: 0 bytes in 0 blocks
==3878526== possibly lost: 0 bytes in 0 blocks
==3878526== still reachable: 8,143 bytes in 134 blocks
==3878526== of which reachable via heuristic:
==3878526== stdstring : 2,180 bytes in 58 blocks
==3878526== suppressed: 0 bytes in 0 blocks
I fear it might be on tensorflow side, not in our code ...
On a debug build, full TensorFlow, I see the same figures and nothing that seems to be from our code, but all from TensorFlow itself.
@khu834 It's really sad that you decided to cut the whole original valgrind informations, because I have no way to check whether your original report matches my observations on libdeepspeech.so or not and thus I'm having to re-do everything. Much more complicated, especially when your other valgrind crashed: I'm have no idea if I'm looking at the same thing than you.
For example https://github.com/tensorflow/tensorflow/issues/29217 mentions leaking thread pools, which I confirmed as of now. Years old issue, still there, unlikely we have time to fix it ourselves ...
I've been getting this warning in the console when training with DeepSpeech 0.7.3
swig/python detected a memory leak of type 'Alphabet *', no destructor found.
It seems the valgrind check ran earlier reported an issue with TF but maybe this is also part of it?
If it's a separate issue I can open another ticket instead.
Am I interpreting this correct in that there is memory leak in 0.6.1 but seems that it's no longer an issue in 0.7.1?
Or is the "possibly lost" blocks a concern still?
So, @khu834 can you share your original 0.7+ valgrind report? We still are in the dark on that matter.
I've been getting this warning in the console when training with DeepSpeech 0.7.3
swig/python detected a memory leak of type 'Alphabet *', no destructor found.It seems the valgrind check ran earlier reported an issue with TF but maybe this is also part of it?
If it's a separate issue I can open another ticket instead.
This was taken care of in #3049
Am I interpreting this correct in that there is memory leak in 0.6.1 but seems that it's no longer an issue in 0.7.1?
Or is the "possibly lost" blocks a concern still?So, @khu834 can you share your original 0.7+ valgrind report? We still are in the dark on that matter.
Here's the full report for python client, 0.7.1
https://gist.github.com/khu834/725e5ee108026b90cfdb0d77876dbd67
@khu834 We have had reports in the past of problem with conda / anaconda / miniconda. Can you try and repro with vanilla python? Also in this log I can't find any leak reference (except in the summary), only invalid read size / uninitialized values (which concerns me, hence the python stuff). Can you repro with proper CLI parameters for tracking leaks?
(also on 0.7.3 please)
Got it, will do this week.
On Tue, Jun 16, 2020 at 3:26 AM lissyx notifications@github.com wrote:
(also on 0.7.3 please)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/2967#issuecomment-644677800,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACJEOTNK4MMIFGNMZKQTQFLRW5CFDANCNFSM4MWZUONQ
.
@khu834 ping
@khu834 We have had reports in the past of problem with conda / anaconda / miniconda. Can you try and repro with vanilla python? Also in this log I can't find any leak reference (except in the summary), only invalid read size / uninitialized values (which concerns me, hence the python stuff). Can you repro with proper CLI parameters for tracking leaks?
Can you provide some details about what "proper CLI parameters for tracking leaks" mean?
I'll get started on this but want to make sure I run it under the right conditions.
What is missing from what I ran before?
valgrind --tool=memcheck --suppressions=valgrind-python.supp python -E -tt mem_test.py
What is missing from what I ran before?
valgrind --tool=memcheck --suppressions=valgrind-python.supp python -E -tt mem_test.py
--25363-- Valgrind options:
--25363-- -v
--25363-- --log-file=std-unique_ptr.0.log
--25363-- --leak-check=full
--25363-- --leak-resolution=high
--25363-- --show-reachable=yes
Here's the full report, this is what I ran
valgrind --tool=memcheck --suppressions=valgrind-python.supp --leak-check=full --leak-resolution=high --show-reachable=yes python3 -E -tt mem_test.py
using vanilla python3 and DS 0.7.3
https://github.com/khu834/deepspeech_valgrind/blob/master/log
Thanks, ill have à close look next week. We've reworked some part of our ci so hopefully we might be able to easily add valgrind support in the future
Here's the full report, this is what I ran
valgrind --tool=memcheck --suppressions=valgrind-python.supp --leak-check=full --leak-resolution=high --show-reachable=yes python3 -E -tt mem_test.pyusing vanilla python3 and DS 0.7.3
https://github.com/khu834/deepspeech_valgrind/blob/master/log
Thanks, it's hard to make a call without debug symbols, but the situation looks much better than initially when looking at definitively lost and possibly lost sections.
@khu834 Even using the Python suppression list, there is still a ton of noise from Python itself, it's hard to really be sure where we are faulty here. Add in the equation the numerous "harmless" leaks from tensorflow ...
Full-blown TensorFlow debug build under Python 3.8 (debian/sid), with your code adapted to run 10 iterations:
47898 ==91349== LEAK SUMMARY:
47899 ==91349== definitely lost: 32 bytes in 1 blocks
47900 ==91349== indirectly lost: 72 bytes in 3 blocks
47901 ==91349== possibly lost: 4,655,700 bytes in 2,305 blocks
47902 ==91349== still reachable: 2,940,649 bytes in 26,755 blocks
47903 ==91349== of which reachable via heuristic:
47904 ==91349== newarray : 6,096 bytes in 9 blocks
47905 ==91349== suppressed: 105,328 bytes in 260 blocks
About the possibly lost, here we have 4211064 bytes out of 4655700 just because of those harmless tensorflow threadpool related:
47870 ==91349== 2,105,352 bytes in 1 blocks are possibly lost in loss record 2,783 of 2,784
47871 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307)
47872 ==91349== by 0x6769B81: Eigen::internal::handmade_aligned_malloc(unsigned long, unsigned long) (Memory.h:104)
47873 ==91349== by 0x93F2385: Eigen::MaxSizeVector<Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadData>::MaxSizeVector(unsigned long) (MaxSizeVector.h:38)
47874 ==91349== by 0x93F1ADF: Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment) (NonBlockingThreadPool.h:37)
47875 ==91349== by 0x93EE248: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, Eigen::Allocator*) (threadpool.cc:102)
47876 ==91349== by 0x7FD6211: tensorflow::LocalDevice::EigenThreadPoolInfo::EigenThreadPoolInfo(tensorflow::SessionOptions const&, int, tensorflow::Allocator*) (local_device.cc:94)
47877 ==91349== by 0x7FD5CA0: tensorflow::LocalDevice::LocalDevice(tensorflow::SessionOptions const&, tensorflow::DeviceAttributes const&) (local_device.cc:145)
47878 ==91349== by 0x8064BFE: tensorflow::ThreadPoolDevice::ThreadPoolDevice(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>, tensorflow::DeviceLocality const&, tensorflow::Allocator*) (threadpool_device.cc:52)
47879 ==91349== by 0x8065FA5: std::_MakeUniq<tensorflow::ThreadPoolDevice>::__single_object std::make_unique<tensorflow::ThreadPoolDevice, tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>, tensorflow::DeviceLocality, tensorflow::Allocator*>(tensorflow::SessionOptions const&, st d::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>&&, tensorflow::DeviceLocality&&, tensorflow::Allocator*&&) (unique_ptr.h:857)
47880 ==91349== by 0x8065CA8: tensorflow::ThreadPoolDeviceFactory::CreateDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> >, std::allocator<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> > > >*) (threadpool_ device_factory.cc:63)
47881 ==91349== by 0x7F64DC2: tensorflow::DeviceFactory::AddDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> >, std::allocator<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> > > >*) (device_factory.cc:129)
47882 ==91349== by 0x6672486: tensorflow::DirectSessionFactory::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (direct_session.cc:188)
47883 ==91349==
47884 ==91349== 2,105,352 bytes in 1 blocks are possibly lost in loss record 2,784 of 2,784
47885 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307)
47886 ==91349== by 0x6769B81: Eigen::internal::handmade_aligned_malloc(unsigned long, unsigned long) (Memory.h:104) 47887 ==91349== by 0x93F2385: Eigen::MaxSizeVector<Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadData>::MaxSizeVector(unsigned long) (MaxSizeVector.h:38)
47888 ==91349== by 0x93F1ADF: Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment) (NonBlockingThreadPool.h:37) 47889 ==91349== by 0x93EE248: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, Eigen::Allocator*) (threadpool.cc:102) 47890 ==91349== by 0x8029738: tensorflow::NewThreadPoolFromSessionOptions(tensorflow::SessionOptions const&) (process_util.cc:164)
47891 ==91349== by 0x665D2DC: tensorflow::(anonymous namespace)::GlobalThreadPool(tensorflow::SessionOptions const&) (direct_session.cc:138) 47892 ==91349== by 0x665DB3D: tensorflow::DirectSession::DirectSession(tensorflow::SessionOptions const&, tensorflow::DeviceMgr const*, tensorflow::DirectSessionFactory*) (direct_session.cc:330)
47893 ==91349== by 0x667255D: tensorflow::DirectSessionFactory::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (direct_session.cc:192) 47894 ==91349== by 0x80461C7: tensorflow::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (session.cc:94) 47895 ==91349== by 0x6655BBD: TFModelState::init(char const*) (tfmodelstate.cc:55)
47896 ==91349== by 0x664CF0B: DS_CreateModel (deepspeech.cc:298)
That leaves us 444996 bytes to explain.
Here are some 135438 from Python itself?:
47827 ==91349== 135,438 bytes in 68 blocks are possibly lost in loss record 2,780 of 2,784
47828 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307)
47829 ==91349== by 0x58BF63: PyUnicode_New (in /usr/bin/python3.8)
47830 ==91349== by 0x57E689: PyUnicode_Substring (in /usr/bin/python3.8)
47831 ==91349== by 0x500952: ??? (in /usr/bin/python3.8)
47832 ==91349== by 0x567005: _PyEval_EvalFrameDefault (in /usr/bin/python3.8)
47833 ==91349== by 0x565161: _PyEval_EvalCodeWithName (in /usr/bin/python3.8)
47834 ==91349== by 0x5F05D2: _PyFunction_Vectorcall (in /usr/bin/python3.8)
47835 ==91349== by 0x566E35: _PyEval_EvalFrameDefault (in /usr/bin/python3.8)
47836 ==91349== by 0x565161: _PyEval_EvalCodeWithName (in /usr/bin/python3.8)
47837 ==91349== by 0x683C82: PyEval_EvalCode (in /usr/bin/python3.8)
47838 ==91349== by 0x5FAAAF: ??? (in /usr/bin/python3.8)
47839 ==91349== by 0x5BFE5B: ??? (in /usr/bin/python3.8)
Then a few more, taking us down to 261222:
47683 ==91349== 48,336 bytes in 23 blocks are possibly lost in loss record 2,769 of 2,784
47684 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307)
47685 ==91349== by 0x5865B2: ??? (in /usr/bin/python3.8)
47686 ==91349== by 0x54BC94: ??? (in /usr/bin/python3.8)
47687 ==91349== by 0x54B2BE: ??? (in /usr/bin/python3.8)
47688 ==91349== by 0x54B8C2: ??? (in /usr/bin/python3.8)
47689 ==91349== by 0x54B411: ??? (in /usr/bin/python3.8)
47690 ==91349== by 0x54C340: ??? (in /usr/bin/python3.8)
47691 ==91349== by 0x678C3A: ??? (in /usr/bin/python3.8)
47692 ==91349== by 0x678F80: ??? (in /usr/bin/python3.8)
47693 ==91349== by 0x5BFBFB: ??? (in /usr/bin/python3.8)
47694 ==91349== by 0x56BCD6: _PyEval_EvalFrameDefault (in /usr/bin/python3.8)
47695 ==91349== by 0x565161: _PyEval_EvalCodeWithName (in /usr/bin/python3.8)
Then back to TensorFlow threadpool, so we reach 218214 bytes:
47652 ==91349== 43,008 bytes in 128 blocks are possibly lost in loss record 2,767 of 2,784
47653 ==91349== at 0x483AB65: calloc (vg_replace_malloc.c:760)
47654 ==91349== by 0x4012CE6: allocate_dtv (dl-tls.c:343)
47655 ==91349== by 0x4012CE6: _dl_allocate_tls (dl-tls.c:589)
47656 ==91349== by 0x487DB81: allocate_stack (allocatestack.c:622)
47657 ==91349== by 0x487DB81: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
47658 ==91349== by 0xBF56ED4: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
47659 ==91349== by 0x93DCA6E: std::thread::thread<std::function<void ()>&, , void>(std::function<void ()>&) (thread:130)
47660 ==91349== by 0x93DB45F: tensorflow::(anonymous namespace)::StdThread::StdThread(tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) (env.cc:60)
47661 ==91349== by 0x93DB837: tensorflow::(anonymous namespace)::PosixEnv::StartThread(tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) (env.cc:109)
47662 ==91349== by 0x8F37A3F: tensorflow::EnvWrapper::StartThread(tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) (env.h:400)
47663 ==91349== by 0x93F14FA: tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>) (threadpool.cc:54)
47664 ==91349== by 0x93F1D27: Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment) (NonBlockingThreadPool.h:57)
47665 ==91349== by 0x93EE248: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, Eigen::Allocator*) (threadpool.cc:102)
47666 ==91349== by 0x8029738: tensorflow::NewThreadPoolFromSessionOptions(tensorflow::SessionOptions const&) (process_util.cc:164)
Then again, down to 150118:
47541 ==91349== 34,048 bytes in 128 blocks are possibly lost in loss record 2,760 of 2,784
47542 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307)
47543 ==91349== by 0x6769B81: Eigen::internal::handmade_aligned_malloc(unsigned long, unsigned long) (Memory.h:104)
47544 ==91349== by 0x93F3637: Eigen::MaxSizeVector<unsigned int>::MaxSizeVector(unsigned long) (MaxSizeVector.h:38)
47545 ==91349== by 0x93F276F: void Eigen::MaxSizeVector<Eigen::MaxSizeVector<unsigned int> >::emplace_back<int>(int const&) (MaxSizeVector.h:92)
47546 ==91349== by 0x93F1C27: Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment) (NonBlockingThreadPool.h:48)
47547 ==91349== by 0x93EE248: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, Eigen::Allocator*) (threadpool.cc:102)
47548 ==91349== by 0x7FD6211: tensorflow::LocalDevice::EigenThreadPoolInfo::EigenThreadPoolInfo(tensorflow::SessionOptions const&, int, tensorflow::Allocator*) (local_device.cc:94)
47549 ==91349== by 0x7FD5CA0: tensorflow::LocalDevice::LocalDevice(tensorflow::SessionOptions const&, tensorflow::DeviceAttributes const&) (local_device.cc:145)
47550 ==91349== by 0x8064BFE: tensorflow::ThreadPoolDevice::ThreadPoolDevice(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>, tensorflow::DeviceLocality const&, tensorflow::Allocator*) (threadpool_device.cc:52)
47551 ==91349== by 0x8065FA5: std::_MakeUniq<tensorflow::ThreadPoolDevice>::__single_object std::make_unique<tensorflow::ThreadPoolDevice, tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>, tensorflow::DeviceLocality, tensorflow::Allocator*>(tensorflow::SessionOptions const&, st d::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>&&, tensorflow::DeviceLocality&&, tensorflow::Allocator*&&) (unique_ptr.h:857)
47552 ==91349== by 0x8065CA8: tensorflow::ThreadPoolDeviceFactory::CreateDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> >, std::allocator<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> > > >*) (threadpool_ device_factory.cc:63)
47553 ==91349== by 0x7F64DC2: tensorflow::DeviceFactory::AddDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> >, std::allocator<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> > > >*) (device_factory.cc:129)
47554 ==91349==
47555 ==91349== 34,048 bytes in 128 blocks are possibly lost in loss record 2,761 of 2,784
47556 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307)
47557 ==91349== by 0x6769B81: Eigen::internal::handmade_aligned_malloc(unsigned long, unsigned long) (Memory.h:104)
47558 ==91349== by 0x93F3637: Eigen::MaxSizeVector<unsigned int>::MaxSizeVector(unsigned long) (MaxSizeVector.h:38)
47559 ==91349== by 0x93F276F: void Eigen::MaxSizeVector<Eigen::MaxSizeVector<unsigned int> >::emplace_back<int>(int const&) (MaxSizeVector.h:92)
47560 ==91349== by 0x93F1C27: Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment) (NonBlockingThreadPool.h:48)
47561 ==91349== by 0x93EE248: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, Eigen::Allocator*) (threadpool.cc:102)
47562 ==91349== by 0x8029738: tensorflow::NewThreadPoolFromSessionOptions(tensorflow::SessionOptions const&) (process_util.cc:164)
47563 ==91349== by 0x665D2DC: tensorflow::(anonymous namespace)::GlobalThreadPool(tensorflow::SessionOptions const&) (direct_session.cc:138)
47564 ==91349== by 0x665DB3D: tensorflow::DirectSession::DirectSession(tensorflow::SessionOptions const&, tensorflow::DeviceMgr const*, tensorflow::DirectSessionFactory*) (direct_session.cc:330)
47565 ==91349== by 0x667255D: tensorflow::DirectSessionFactory::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (direct_session.cc:192)
47566 ==91349== by 0x80461C7: tensorflow::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (session.cc:94)
47567 ==91349== by 0x6655BBD: TFModelState::init(char const*) (tfmodelstate.cc:55)
Then tensorflow/protobuf, we reach 127534:
7361 ==91349== 22,584 bytes in 941 blocks are possibly lost in loss record 2,749 of 2,784 47362 ==91349== at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:342)
47363 ==91349== by 0x9717459: google::protobuf::RepeatedField<int>::Reserve(int) (repeated_field.h:1404) 47364 ==91349== by 0x97171F9: google::protobuf::RepeatedField<int>::MergeFrom(google::protobuf::RepeatedField<int> const&) (repeated_field.h:1273) 47365 ==91349== by 0x937E37F: tensorflow::AttrValue_ListValue::MergeFrom(tensorflow::AttrValue_ListValue const&) (attr_value.pb.cc:900) 47366 ==91349== by 0x9380CA2: tensorflow::AttrValue::MergeFrom(tensorflow::AttrValue const&) (attr_value.pb.cc:1814)
47367 ==91349== by 0x928EA46: tensorflow::KernelDef_AttrConstraint::MergeFrom(tensorflow::KernelDef_AttrConstraint const&) (kernel_def.pb.cc:492) 47368 ==91349== by 0x92940F0: google::protobuf::internal::GenericTypeHandler<tensorflow::KernelDef_AttrConstraint>::Merge(tensorflow::KernelDef_AttrConstraint const&, tensorflow::KernelDef_AttrConstraint*) (repeated_field.h:703) 47369 ==91349== by 0x9293E2B: void google::protobuf::internal::RepeatedPtrFieldBase::MergeFromInnerLoop<google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint>::TypeHandler>(void**, void**, int, int) (repeated_field.h:1672) 47370 ==91349== by 0x7B89F4D: google::protobuf::internal::RepeatedPtrFieldBase::MergeFromInternal(google::protobuf::internal::RepeatedPtrFieldBase const&, void (google::protobuf::internal::RepeatedPtrFieldBase::*)(void**, void**, int, int)) (repeated_field.h:1642)
47371 ==91349== by 0x9293AE1: void google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase const&) (repeated_field.h:1630) 47372 ==91349== by 0x92933AE: google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint>::MergeFrom(google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint> const&) (repeated_field.h:2128)
47373 ==91349== by 0x9293267: google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint>::RepeatedPtrField(google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint> const&) (repeated_field.h:1932)
Then again TensorFlow threadpool, so we are at 78054 bytes:
47172 ==91349== 16,456 bytes in 1 blocks are possibly lost in loss record 2,736 of 2,784 47173 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307) 47174 ==91349== by 0x6769B81: Eigen::internal::handmade_aligned_malloc(unsigned long, unsigned long) (Memory.h:104) 47175 ==91349== by 0x93F2385: Eigen::MaxSizeVector<Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadData>::MaxSizeVector(unsigned long) (MaxSizeVector.h:38)
47176 ==91349== by 0x93F1ADF: Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment) (NonBlockingThreadPool.h:37) 47177 ==91349== by 0x93EE248: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, Eigen::Allocator*) (threadpool.cc:102)
47178 ==91349== by 0x93EE063: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) (threadpool.cc:90) 47179 ==91349== by 0x804D1DE: tensorflow::(anonymous namespace)::GraphRunnerThreadPool() (single_threaded_cpu_device.cc:35)
47180 ==91349== by 0x804D39F: tensorflow::(anonymous namespace)::SingleThreadedCpuDevice::SingleThreadedCpuDevice(tensorflow::Env*) (single_threaded_cpu_device.cc:49) 47181 ==91349== by 0x804D77A: tensorflow::NewSingleThreadedCpuDevice(tensorflow::Env*) (single_threaded_cpu_device.cc:97) 47182 ==91349== by 0x7FC8E00: tensorflow::GraphRunner::GraphRunner(tensorflow::Env*) (graph_runner.cc:96) 47183 ==91349== by 0x80B241E: tensorflow::ShapeRefiner::ShapeRefiner(int, tensorflow::OpRegistryInterface const*) (shape_refiner.cc:46) 47184 ==91349== by 0x80C6E83: tensorflow::ConvertGraphDefToGraph(tensorflow::GraphConstructorOptions const&, tensorflow::GraphDef const&, tensorflow::Graph*) (graph_constructor.cc:1455) 47185 ==91349==
47186 ==91349== 16,512 bytes in 1 blocks are possibly lost in loss record 2,737 of 2,784
47187 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307)
47188 ==91349== by 0x6769B81: Eigen::internal::handmade_aligned_malloc(unsigned long, unsigned long) (Memory.h:104)
47189 ==91349== by 0x93F250F: Eigen::MaxSizeVector<Eigen::EventCount::Waiter>::MaxSizeVector(unsigned long) (MaxSizeVector.h:38)
47190 ==91349== by 0x93F1B12: Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment) (NonBlockingThreadPool.h:37)
47191 ==91349== by 0x93EE248: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, Eigen::Allocator*) (threadpool.cc:102)
47192 ==91349== by 0x7FD6211: tensorflow::LocalDevice::EigenThreadPoolInfo::EigenThreadPoolInfo(tensorflow::SessionOptions const&, int, tensorflow::Allocator*) (local_device.cc:94)
47193 ==91349== by 0x7FD5CA0: tensorflow::LocalDevice::LocalDevice(tensorflow::SessionOptions const&, tensorflow::DeviceAttributes const&) (local_device.cc:145) 47194 ==91349== by 0x8064BFE: tensorflow::ThreadPoolDevice::ThreadPoolDevice(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>, tensorflow::DeviceLocality const&, tensorflow::Allocator*) (threadpool_device.cc:52)
47195 ==91349== by 0x8065FA5: std::_MakeUniq<tensorflow::ThreadPoolDevice>::__single_object std::make_unique<tensorflow::ThreadPoolDevice, tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>, tensorflow::DeviceLocality, tensorflow::Allocator*>(tensorflow::SessionOptions const&, st d::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>&&, tensorflow::DeviceLocality&&, tensorflow::Allocator*&&) (unique_ptr.h:857) 47196 ==91349== by 0x8065CA8: tensorflow::ThreadPoolDeviceFactory::CreateDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> >, std::allocator<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> > > >*) (threadpool_ device_factory.cc:63) 47197 ==91349== by 0x7F64DC2: tensorflow::DeviceFactory::AddDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> >, std::allocator<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> > > >*) (device_factory.cc:129) 47198 ==91349== by 0x6672486: tensorflow::DirectSessionFactory::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (direct_session.cc:188) 47199 ==91349== 47200 ==91349== 16,512 bytes in 1 blocks are possibly lost in loss record 2,738 of 2,784 47201 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307)
47202 ==91349== by 0x6769B81: Eigen::internal::handmade_aligned_malloc(unsigned long, unsigned long) (Memory.h:104) 47203 ==91349== by 0x93F250F: Eigen::MaxSizeVector<Eigen::EventCount::Waiter>::MaxSizeVector(unsigned long) (MaxSizeVector.h:38) 47204 ==91349== by 0x93F1B12: Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment) (NonBlockingThreadPool.h:37) 47205 ==91349== by 0x93EE248: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, Eigen::Allocator*) (threadpool.cc:102)
47206 ==91349== by 0x8029738: tensorflow::NewThreadPoolFromSessionOptions(tensorflow::SessionOptions const&) (process_util.cc:164) 47207 ==91349== by 0x665D2DC: tensorflow::(anonymous namespace)::GlobalThreadPool(tensorflow::SessionOptions const&) (direct_session.cc:138) 47208 ==91349== by 0x665DB3D: tensorflow::DirectSession::DirectSession(tensorflow::SessionOptions const&, tensorflow::DeviceMgr const*, tensorflow::DirectSessionFactory*) (direct_session.cc:330) 47209 ==91349== by 0x667255D: tensorflow::DirectSessionFactory::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (direct_session.cc:192)
47210 ==91349== by 0x80461C7: tensorflow::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (session.cc:94) 47211 ==91349== by 0x6655BBD: TFModelState::init(char const*) (tfmodelstate.cc:55)
47212 ==91349== by 0x664CF0B: DS_CreateModel (deepspeech.cc:298)
TensorFlow/protobuf, down to 69070 bytes:
46977 ==91349== 8,984 bytes in 303 blocks are possibly lost in loss record 2,723 of 2,784
46978 ==91349== at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:342) 46979 ==91349== by 0x9717459: google::protobuf::RepeatedField<int>::Reserve(int) (repeated_field.h:1404) 46980 ==91349== by 0x9716D34: google::protobuf::RepeatedField<int>::Add(int const&) (repeated_field.h:1227) 46981 ==91349== by 0x7C9177D: tensorflow::AttrValue_ListValue::add_type(tensorflow::DataType) (attr_value.pb.h:1029) 46982 ==91349== by 0x8F6B8B9: tensorflow::(anonymous namespace)::FinalizeAttr(absl::string_view, tensorflow::OpDef*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*) (op_def_builder.cc:221) 46983 ==91349== by 0x8F6EDF5: tensorflow::OpDefBuilder::Finalize(tensorflow::OpRegistrationData*) const (op_def_builder.cc:642)
46984 ==91349== by 0x8F625A3: tensorflow::register_op::OpDefBuilderReceiver::OpDefBuilderReceiver(tensorflow::register_op::OpDefBuilderWrapper<true> const&)::{lambda(tensorflow::OpRegistrationData*)#1}::operator()(tensorflow::OpRegistrationData*) const (op.cc:299)
46985 ==91349== by 0x8F627EE: std::_Function_handler<tensorflow::Status (tensorflow::OpRegistrationData*), tensorflow::register_op::OpDefBuilderReceiver::OpDefBuilderReceiver(tensorflow::register_op::OpDefBuilderWrapper<true> const&)::{lambda(tensorflow::OpRegistrationData*)#1}>::_M_invoke(std::_Any_data const&, tensorflow::OpRegistrationData*&&) (std_function.h:286)
46986 ==91349== by 0x8F63AC2: std::function<tensorflow::Status (tensorflow::OpRegistrationData*)>::operator()(tensorflow::OpRegistrationData*) const (std_function.h:688)
46987 ==91349== by 0x8F61F43: tensorflow::OpRegistry::RegisterAlreadyLocked(std::function<tensorflow::Status (tensorflow::OpRegistrationData*)> const&) const (op.cc:236)
46988 ==91349== by 0x8F61CE0: tensorflow::OpRegistry::MustCallDeferred() const (op.cc:214)
46989 ==91349== by 0x8F610CF: tensorflow::OpRegistry::LookUpSlow(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const (op.cc:111)
Another, we are at 65150 bytes:
46449 ==91349== 3,920 bytes in 98 blocks are possibly lost in loss record 2,687 of 2,784 46450 ==91349== at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:342)
46451 ==91349== by 0x9717459: google::protobuf::RepeatedField<int>::Reserve(int) (repeated_field.h:1404) 46452 ==91349== by 0x9716D34: google::protobuf::RepeatedField<int>::Add(int const&) (repeated_field.h:1227) 46453 ==91349== by 0x7C9177D: tensorflow::AttrValue_ListValue::add_type(tensorflow::DataType) (attr_value.pb.h:1029) 46454 ==91349== by 0x8F6A930: tensorflow::(anonymous namespace)::ProcessCompoundType(absl::string_view, tensorflow::AttrValue*) (op_def_builder.cc:135) 46455 ==91349== by 0x8F6AF3D: tensorflow::(anonymous namespace)::FinalizeAttr(absl::string_view, tensorflow::OpDef*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*) (op_def_builder.cc:181) 46456 ==91349== by 0x8F6EDF5: tensorflow::OpDefBuilder::Finalize(tensorflow::OpRegistrationData*) const (op_def_builder.cc:642)
46457 ==91349== by 0x8F625A3: tensorflow::register_op::OpDefBuilderReceiver::OpDefBuilderReceiver(tensorflow::register_op::OpDefBuilderWrapper<true> const&)::{lambda(tensorflow::OpRegistrationData*)#1}::operator()(tensorflow::OpRegistrationData*) const (op.cc:299)
46458 ==91349== by 0x8F627EE: std::_Function_handler<tensorflow::Status (tensorflow::OpRegistrationData*), tensorflow::register_op::OpDefBuilderReceiver::OpDefBuilderReceiver(tensorflow::register_op::OpDefBuilderWrapper<true> const&)::{lambda(tensorflow::OpRegistrationData*)#1}>::_M_invoke(std::_Any_data const&, tensorflow::OpRegistrationData*&&) (std_function.h:286)
46459 ==91349== by 0x8F63AC2: std::function<tensorflow::Status (tensorflow::OpRegistrationData*)>::operator()(tensorflow::OpRegistrationData*) const (std_function.h:688)
46460 ==91349== by 0x8F61F43: tensorflow::OpRegistry::RegisterAlreadyLocked(std::function<tensorflow::Status (tensorflow::OpRegistrationData*)> const&) const (op.cc:236)
46461 ==91349== by 0x8F61CE0: tensorflow::OpRegistry::MustCallDeferred() const (op.cc:214)
And again, 62022:
46318 ==91349== 3,128 bytes in 23 blocks are possibly lost in loss record 2,678 of 2,784
46319 ==91349== at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:342) 46320 ==91349== by 0x9717459: google::protobuf::RepeatedField<int>::Reserve(int) (repeated_field.h:1404) 46321 ==91349== by 0x9716D34: google::protobuf::RepeatedField<int>::Add(int const&) (repeated_field.h:1227) 46322 ==91349== by 0x7C9177D: tensorflow::AttrValue_ListValue::add_type(tensorflow::DataType) (attr_value.pb.h:1029) 46323 ==91349== by 0x8F6A875: tensorflow::(anonymous namespace)::ProcessCompoundType(absl::string_view, tensorflow::AttrValue*) (op_def_builder.cc:131) 46324 ==91349== by 0x8F6AF3D: tensorflow::(anonymous namespace)::FinalizeAttr(absl::string_view, tensorflow::OpDef*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*) (op_def_builder.cc:181)
46325 ==91349== by 0x8F6EDF5: tensorflow::OpDefBuilder::Finalize(tensorflow::OpRegistrationData*) const (op_def_builder.cc:642)
46326 ==91349== by 0x8F625A3: tensorflow::register_op::OpDefBuilderReceiver::OpDefBuilderReceiver(tensorflow::register_op::OpDefBuilderWrapper<true> const&)::{lambda(tensorflow::OpRegistrationData*)#1}::operator()(tensorflow::OpRegistrationData*) const (op.cc:299)
46327 ==91349== by 0x8F627EE: std::_Function_handler<tensorflow::Status (tensorflow::OpRegistrationData*), tensorflow::register_op::OpDefBuilderReceiver::OpDefBuilderReceiver(tensorflow::register_op::OpDefBuilderWrapper<true> const&)::{lambda(tensorflow::OpRegistrationData*)#1}>::_M_invoke(std::_Any_data const&, tensorflow::OpRegistrationData*&&) (std_function.h:286)
46328 ==91349== by 0x8F63AC2: std::function<tensorflow::Status (tensorflow::OpRegistrationData*)>::operator()(tensorflow::OpRegistrationData*) const (std_function.h:688)
46329 ==91349== by 0x8F61F43: tensorflow::OpRegistry::RegisterAlreadyLocked(std::function<tensorflow::Status (tensorflow::OpRegistrationData*)> const&) const (op.cc:236)
46330 ==91349== by 0x8F61CE0: tensorflow::OpRegistry::MustCallDeferred() const (op.cc:214)
Then back to ThreadPool, down to 55862 bytes:
46290 ==91349== 3,080 bytes in 1 blocks are possibly lost in loss record 2,676 of 2,784
46291 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307)
46292 ==91349== by 0x6769B81: Eigen::internal::handmade_aligned_malloc(unsigned long, unsigned long) (Memory.h:104) 46293 ==91349== by 0x93F2466: Eigen::MaxSizeVector<Eigen::MaxSizeVector<unsigned int> >::MaxSizeVector(unsigned long) (MaxSizeVector.h:38) 46294 ==91349== by 0x93F1AF7: Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment) (NonBlockingThreadPool.h:37)
46295 ==91349== by 0x93EE248: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, Eigen::Allocator*) (threadpool.cc:102) 46296 ==91349== by 0x7FD6211: tensorflow::LocalDevice::EigenThreadPoolInfo::EigenThreadPoolInfo(tensorflow::SessionOptions const&, int, tensorflow::Allocator*) (local_device.cc:94) 46297 ==91349== by 0x7FD5CA0: tensorflow::LocalDevice::LocalDevice(tensorflow::SessionOptions const&, tensorflow::DeviceAttributes const&) (local_device.cc:145)
46298 ==91349== by 0x8064BFE: tensorflow::ThreadPoolDevice::ThreadPoolDevice(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>, tensorflow::DeviceLocality const&, tensorflow::Allocator*) (threadpool_device.cc:52) 46299 ==91349== by 0x8065FA5: std::_MakeUniq<tensorflow::ThreadPoolDevice>::__single_object std::make_unique<tensorflow::ThreadPoolDevice, tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>, tensorflow::DeviceLocality, tensorflow::Allocator*>(tensorflow::SessionOptions const&, st d::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, tensorflow::gtl::IntType<tensorflow::Bytes_tag_, long long>&&, tensorflow::DeviceLocality&&, tensorflow::Allocator*&&) (unique_ptr.h:857)
46300 ==91349== by 0x8065CA8: tensorflow::ThreadPoolDeviceFactory::CreateDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> >, std::allocator<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> > > >*) (threadpool_ device_factory.cc:63) 46301 ==91349== by 0x7F64DC2: tensorflow::DeviceFactory::AddDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> >, std::allocator<std::unique_ptr<tensorflow::Device, std::default_delete<tensorflow::Device> > > >*) (device_factory.cc:129) 46302 ==91349== by 0x6672486: tensorflow::DirectSessionFactory::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (direct_session.cc:188) 46303 ==91349== 46304 ==91349== 3,080 bytes in 1 blocks are possibly lost in loss record 2,677 of 2,784 46305 ==91349== at 0x483877F: malloc (vg_replace_malloc.c:307) 46306 ==91349== by 0x6769B81: Eigen::internal::handmade_aligned_malloc(unsigned long, unsigned long) (Memory.h:104) 46307 ==91349== by 0x93F2466: Eigen::MaxSizeVector<Eigen::MaxSizeVector<unsigned int> >::MaxSizeVector(unsigned long) (MaxSizeVector.h:38) 46308 ==91349== by 0x93F1AF7: Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment) (NonBlockingThreadPool.h:37) 46309 ==91349== by 0x93EE248: tensorflow::thread::ThreadPool::ThreadPool(tensorflow::Env*, tensorflow::ThreadOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, Eigen::Allocator*) (threadpool.cc:102) 46310 ==91349== by 0x8029738: tensorflow::NewThreadPoolFromSessionOptions(tensorflow::SessionOptions const&) (process_util.cc:164) 46311 ==91349== by 0x665D2DC: tensorflow::(anonymous namespace)::GlobalThreadPool(tensorflow::SessionOptions const&) (direct_session.cc:138) 46312 ==91349== by 0x665DB3D: tensorflow::DirectSession::DirectSession(tensorflow::SessionOptions const&, tensorflow::DeviceMgr const*, tensorflow::DirectSessionFactory*) (direct_session.cc:330) 46313 ==91349== by 0x667255D: tensorflow::DirectSessionFactory::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (direct_session.cc:192) 46314 ==91349== by 0x80461C7: tensorflow::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (session.cc:94) 46315 ==91349== by 0x6655BBD: TFModelState::init(char const*) (tfmodelstate.cc:55) 46316 ==91349== by 0x664CF0B: DS_CreateModel (deepspeech.cc:298)
etc.
Running valgrind on full-blown tensorflow runtime from C++, hacking native_client/client.cc to loop 5 times over model creation / deletion:
$ LD_LIBRARY_PATH=$(pwd)/tensorflow/bazel-bin/native_client/ valgrind --tool=memcheck --leak-check=full --leak-resolution=high --show-reachable=yes --track-origins=yes ./native_client/deepspeech
[...]
==97431== 22,584 bytes in 941 blocks are possibly lost in loss record 2,570 of 2,581
==97431== at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:342)
==97431== by 0x8A68459: google::protobuf::RepeatedField<int>::Reserve(int) (repeated_field.h:1404)
==97431== by 0x8A681F9: google::protobuf::RepeatedField<int>::MergeFrom(google::protobuf::RepeatedField<int> const&) (repeated_field.h:1273)
==97431== by 0x86CF37F: tensorflow::AttrValue_ListValue::MergeFrom(tensorflow::AttrValue_ListValue const&) (attr_value.pb.cc:900)
==97431== by 0x86D1CA2: tensorflow::AttrValue::MergeFrom(tensorflow::AttrValue const&) (attr_value.pb.cc:1814)
==97431== by 0x85DFA46: tensorflow::KernelDef_AttrConstraint::MergeFrom(tensorflow::KernelDef_AttrConstraint const&) (kernel_def.pb.cc:492)
==97431== by 0x85E50F0: google::protobuf::internal::GenericTypeHandler<tensorflow::KernelDef_AttrConstraint>::Merge(tensorflow::KernelDef_AttrConstraint const&, tensorflow::KernelDef_AttrConstraint*) (repeated_field.h:703)
==97431== by 0x85E4E2B: void google::protobuf::internal::RepeatedPtrFieldBase::MergeFromInnerLoop<google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint>::TypeHandler>(void**, void**, int, int) (repeated_field.h:1672)
==97431== by 0x6EDAF4D: google::protobuf::internal::RepeatedPtrFieldBase::MergeFromInternal(google::protobuf::internal::RepeatedPtrFieldBase const&, void (google::protobuf::internal::RepeatedPtrFieldBase::*)(void**, void**, int, int)) (repeated_field.h:1642)
==97431== by 0x85E4AE1: void google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase const&) (repeated_field.h:1630)
==97431== by 0x85E43AE: google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint>::MergeFrom(google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint> const&) (repeated_field.h:2128)
==97431== by 0x85E4267: google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint>::RepeatedPtrField(google::protobuf::RepeatedPtrField<tensorflow::KernelDef_AttrConstraint> const&) (repeated_field.h:1932)
[...]
==97431== LEAK SUMMARY:
==97431== definitely lost: 0 bytes in 0 blocks
==97431== indirectly lost: 0 bytes in 0 blocks
==97431== possibly lost: 22,584 bytes in 941 blocks
==97431== still reachable: 1,042,295 bytes in 14,319 blocks
==97431== of which reachable via heuristic:
==97431== newarray : 3,416 bytes in 4 blocks
==97431== suppressed: 0 bytes in 0 blocks
Running valgrind on tflite runtime from C++, hacking native_client/client.cc to loop 5 times over model creation / deletion:
$ LD_LIBRARY_PATH=$(pwd)/tensorflow/bazel-bin/native_client/ valgrind --tool=memcheck --leak-check=full --leak-resolution=high --show-reachable=yes --track-origins=yes ./native_client/deepspeech
[...]
==100766== LEAK SUMMARY:
==100766== definitely lost: 0 bytes in 0 blocks
==100766== indirectly lost: 0 bytes in 0 blocks
==100766== possibly lost: 0 bytes in 0 blocks
==100766== still reachable: 5,457 bytes in 97 blocks
==100766== suppressed: 0 bytes in 0 blocks
Python/TFLite:
==103994== LEAK SUMMARY:
==103994== definitely lost: 0 bytes in 0 blocks
==103994== indirectly lost: 0 bytes in 0 blocks
==103994== possibly lost: 187,760 bytes in 97 blocks
==103994== still reachable: 1,362,814 bytes in 983 blocks
==103994== suppressed: 64 bytes in 2 blocks
Python/TFLite:
==103994== LEAK SUMMARY: ==103994== definitely lost: 0 bytes in 0 blocks ==103994== indirectly lost: 0 bytes in 0 blocks ==103994== possibly lost: 187,760 bytes in 97 blocks ==103994== still reachable: 1,362,814 bytes in 983 blocks ==103994== suppressed: 64 bytes in 2 blocks
And there is 0 possibly lost related to tensorflow or deepspeech, it's all malloc from python itself ...
@khu834 On current master, I really cannot find anything convincing me there is an actionable leakage on our side, using your repro steps.
python.zip
Here are my logs, if you want to cross-check on your side.
Thanks @lissyx for the detailed analysis.
Looks like 0.7.3 is pretty clean, and just some allocations in python that makes it seems like there is leakage.
I think we can close this issue.
For my use case, things should improve dramatically when I switch to 0.7.3.
Thanks, we plan to have valgrind coverage, but this still require some CI work to be able to have it in a usable way (especially generating usable suppressions). In the meantime, feedback is always welcome.