import pandas as pd
import numpy as np
import skimage
from scipy import signal
for orient in [0, 1]:
th = int(input_img.shape[orient] / 100)
peaks, info = signal.find_peaks(1 - bw_img.mean(orient), prominence=.35, width=2)
for pk, w in zip(peaks, info['widths']):
w *= 2
if orient == 0:
sign = bw_img[:, pk]
else:
sign = bw_img[pk, :]
sign = pd.Series(sign).rolling(th).max()
The above snippet is part of a function called in my main script. Running this results in either a malloc: Incorrect checksum for freed object 0x7fbf626f1f30: probably modified after being freed. error or a segmentation fault.
The culprit appears to be the rolling().max() line, since commenting out the line fixes the issue, as does replacing .max() with .mean().
I can't seem to recreate the error running the above snippet alone, and I cannot figure out why. The input (bw_img) is just a 2D array (black and white image).
It might be related to this issue https://github.com/pandas-dev/pandas/issues/25893 expect my memory doesn't seem to be leaking. The two variants I keep seeing seem to be a checksum failed after changing deallocated memory, or that an attempted change of deallocated memory is caught.
python version: 3.6.5 (also tested on 3.7.0)
pandas version 0.25.3 (also tested 0.24 and 0.23)
Below the stacktrace:
Process: python3.6 [61410]
Path: /Users/USER/*/python3.6
Identifier: python3.6
Version: ???
Code Type: X86-64 (Native)
Parent Process: zsh [41537]
Responsible: python3.6 [61410]
User ID: 305159407
Date/Time: 2020-01-06 09:43:30.365 +0100
OS Version: Mac OS X 10.14.3 (18D109)
Report Version: 12
Bridge OS Version: 3.0 (14Y674)
Anonymous UUID: 842CB73B-82E5-7A43-1D47-0BCD9BFB56A9
Time Awake Since Boot: 5500 seconds
System Integrity Protection: enabled
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Application Specific Information:
abort() called
python(61410,0x1134fe5c0) malloc: Incorrect checksum for freed object 0x7f8c83801610: probably modified after being freed.
Corrupt value: 0x28
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff5b5fb23e __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fff5b6b1c1c pthread_kill + 285
2 libsystem_c.dylib 0x00007fff5b5641c9 abort + 127
3 libsystem_malloc.dylib 0x00007fff5b6736e2 malloc_vreport + 545
4 libsystem_malloc.dylib 0x00007fff5b68786c malloc_zone_error + 184
5 libsystem_malloc.dylib 0x00007fff5b670103 tiny_free_list_remove_ptr + 544
6 libsystem_malloc.dylib 0x00007fff5b66daee tiny_free_no_lock + 933
7 libsystem_malloc.dylib 0x00007fff5b66d631 free_tiny + 483
8 _multiarray_umath.cpython-36m-darwin.so 0x0000000109b3357d _buffer_clear_info + 109
9 _multiarray_umath.cpython-36m-darwin.so 0x0000000109b334ff _dealloc_cached_buffer_info + 79
10 _multiarray_umath.cpython-36m-darwin.so 0x0000000109ae5152 array_dealloc + 18
11 window.cpython-36m-darwin.so 0x000000012c56a8a6 __pyx_fuse_9__pyx_f_6pandas_5_libs_6window__roll_min_max(tagPyArrayObject_fields*, long, long, _object*, _object*, int) + 3206
12 window.cpython-36m-darwin.so 0x000000012c569619 __pyx_fuse_9__pyx_pw_6pandas_5_libs_6window_59roll_max(_object*, _object*, _object*) + 425
13 algos.cpython-36m-darwin.so 0x000000012aa1a42c __pyx_FusedFunction_call + 812
14 python 0x0000000109410ae5 PyObject_Call + 101
15 python 0x00000001094eea1b _PyEval_EvalFrameDefault + 25787
16 python 0x00000001094f2356 _PyEval_EvalCodeWithName + 2902
17 python 0x00000001094f2aeb fast_function + 411
18 python 0x00000001094f1729 call_function + 553
19 python 0x00000001094ee694 _PyEval_EvalFrameDefault + 24884
20 python 0x00000001094f2356 _PyEval_EvalCodeWithName + 2902
21 python 0x00000001094f2aeb fast_function + 411
22 python 0x00000001094f1729 call_function + 553
23 python 0x00000001094ee608 _PyEval_EvalFrameDefault + 24744
24 python 0x00000001094f2356 _PyEval_EvalCodeWithName + 2902
25 python 0x00000001094f2ede _PyFunction_FastCallDict + 606
26 python 0x0000000109410cba _PyObject_FastCallDict + 202
27 python 0x0000000109410e6c _PyObject_Call_Prepend + 156
28 python 0x0000000109410ae5 PyObject_Call + 101
29 python 0x00000001094eea1b _PyEval_EvalFrameDefault + 25787
30 python 0x00000001094f2356 _PyEval_EvalCodeWithName + 2902
31 python 0x00000001094f2ede _PyFunction_FastCallDict + 606
32 python 0x0000000109410cba _PyObject_FastCallDict + 202
33 python 0x0000000109410e6c _PyObject_Call_Prepend + 156
34 python 0x0000000109410ae5 PyObject_Call + 101
35 python 0x00000001094eea1b _PyEval_EvalFrameDefault + 25787
36 python 0x00000001094f2356 _PyEval_EvalCodeWithName + 2902
37 python 0x00000001094f2aeb fast_function + 411
38 python 0x00000001094f1729 call_function + 553
39 python 0x00000001094ee608 _PyEval_EvalFrameDefault + 24744
40 python 0x00000001094f2356 _PyEval_EvalCodeWithName + 2902
41 python 0x00000001094f2aeb fast_function + 411
42 python 0x00000001094f1729 call_function + 553
43 python 0x00000001094ee694 _PyEval_EvalFrameDefault + 24884
44 python 0x00000001094f2356 _PyEval_EvalCodeWithName + 2902
45 python 0x00000001094f2aeb fast_function + 411
46 python 0x00000001094f1729 call_function + 553
47 python 0x00000001094ee608 _PyEval_EvalFrameDefault + 24744
48 python 0x00000001094f2356 _PyEval_EvalCodeWithName + 2902
49 python 0x00000001094e84d0 PyEval_EvalCode + 48
50 python 0x000000010952156e PyRun_FileExFlags + 174
51 python 0x0000000109520b1a PyRun_SimpleFileExFlags + 266
52 python 0x000000010953d8b6 Py_Main + 3542
53 python 0x0000000109405c78 main + 248
54 libdyld.dylib 0x00007fff5b4bbed9 start + 1
Thread 1:
0 libsystem_kernel.dylib 0x00007fff5b5f87de __psynch_cvwait + 10
1 libsystem_pthread.dylib 0x00007fff5b6b2593 _pthread_cond_wait + 724
2 python 0x0000000109539f1f PyThread_acquire_lock_timed + 351
3 python 0x000000010954099f acquire_timed + 111
4 python 0x000000010954071c lock_PyThread_acquire_lock + 44
5 python 0x000000010945fbfb _PyCFunction_FastCallDict + 475
6 python 0x00000001094f175a call_function + 602
7 python 0x00000001094ee608 _PyEval_EvalFrameDefault + 24744
8 python 0x00000001094f2356 _PyEval_EvalCodeWithName + 2902
9 python 0x00000001094f2aeb fast_function + 411
10 python 0x00000001094f1729 call_function + 553
11 python 0x00000001094ee608 _PyEval_EvalFrameDefault + 24744
12 python 0x00000001094f2356 _PyEval_EvalCodeWithName + 2902
13 python 0x00000001094f2aeb fast_function + 411
14 python 0x00000001094f1729 call_function + 553
15 python 0x00000001094ee608 _PyEval_EvalFrameDefault + 24744
16 python 0x00000001094f2b89 fast_function + 569
17 python 0x00000001094f1729 call_function + 553
18 python 0x00000001094ee608 _PyEval_EvalFrameDefault + 24744
19 python 0x00000001094f2b89 fast_function + 569
20 python 0x00000001094f1729 call_function + 553
21 python 0x00000001094ee608 _PyEval_EvalFrameDefault + 24744
22 python 0x00000001094f3069 _PyFunction_FastCallDict + 1001
23 python 0x0000000109410cba _PyObject_FastCallDict + 202
24 python 0x0000000109410e6c _PyObject_Call_Prepend + 156
25 python 0x0000000109410ae5 PyObject_Call + 101
26 python 0x0000000109541216 t_bootstrap + 70
27 libsystem_pthread.dylib 0x00007fff5b6af305 _pthread_body + 126
28 libsystem_pthread.dylib 0x00007fff5b6b226f _pthread_start + 70
29 libsystem_pthread.dylib 0x00007fff5b6ae415 thread_start + 13
Thread 2:
0 libsystem_pthread.dylib 0x00007fff5b6ae3f8 start_wqthread + 0
1 ??? 0x0000000054485244 0 + 1414025796
Thread 3:
0 libsystem_pthread.dylib 0x00007fff5b6ae3f8 start_wqthread + 0
1 ??? 0x0000000054485244 0 + 1414025796
Thread 4:
0 libsystem_pthread.dylib 0x00007fff5b6ae3f8 start_wqthread + 0
1 ??? 0x0000000054485244 0 + 1414025796
Thread 5:
0 libsystem_pthread.dylib 0x00007fff5b6ae3f8 start_wqthread + 0
1 ??? 0x0000000054485244 0 + 1414025796
Thread 6:
0 libsystem_pthread.dylib 0x00007fff5b6ae3f8 start_wqthread + 0
1 ??? 0x0000000054485244 0 + 1414025796
Thread 7:
0 libsystem_pthread.dylib 0x00007fff5b6ae3f8 start_wqthread + 0
1 ??? 0x0000000054485244 0 + 1414025796
Thread 8:
0 libsystem_pthread.dylib 0x00007fff5b6ae3f8 start_wqthread + 0
1 ??? 0x0000000054485244 0 + 1414025796
Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x0000000000000000 rbx: 0x00000001134fe5c0 rcx: 0x00007ffee67f8168 rdx: 0x0000000000000000
rdi: 0x0000000000000307 rsi: 0x0000000000000006 rbp: 0x00007ffee67f81a0 rsp: 0x00007ffee67f8168
r8: 0x0000000000000000 r9: 0x00007ffee67f80c0 r10: 0x0000000000000000 r11: 0x0000000000000206
r12: 0x0000000000000307 r13: 0x0000000111e4d000 r14: 0x0000000000000006 r15: 0x000000000000002d
rip: 0x00007fff5b5fb23e rfl: 0x0000000000000206 cr2: 0x00007fff8e27a188
Logical CPU: 0
Error Code: 0x02000148
Trap Number: 133
VM Region Summary:
ReadOnly portion of Libraries: Total=708.7M resident=0K(0%) swapped_out_or_unallocated=708.7M(100%)
Writable regions: Total=483.8M written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=483.8M(100%)
VIRTUAL REGION
REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
Activity Tracing 256K 2
Dispatch continuations 16.0M 2
Kernel Alloc Once 8K 2
MALLOC 170.5M 33
MALLOC guard page 16K 5
MALLOC_LARGE (reserved) 256K 3 reserved VM address space (unallocated)
STACK GUARD 36K 10
Stack 24.6M 10
VM_ALLOCATE 102.3M 174
VM_ALLOCATE (reserved) 160.0M 4 reserved VM address space (unallocated)
__DATA 42.7M 669
__FONT_DATA 4K 2
__LINKEDIT 253.2M 312
__TEXT 455.5M 557
__UNICODE 564K 2
shared memory 12K 4
=========== ======= =======
TOTAL 1.2G 1775
TOTAL, minus reserved VM space 1.0G 1775
In order to reproduce this, could you try to save the arguments passed to the last line (sign and th) to a file, and then try to run it only with these specific arguments?
Hi,
Sorry to get back to this so late.
I did try to run the function with only the arguments that seemed to make it fail. However the error didn't occur then. Also, when running the script on all input files, sometimes it would fail on the first image, sometimes on the third. I couldn't figure out which specific inputs it was that created the error.
I have encountered the same issue with a time series with shape (1963583, 1). I wanted to compute hour-long windows for a month of data with a resolution of 1 second. (This may sound like a bad idea, and it is, but since rolling() does not offer a stride argument (see Issue #15354) the 'official' way is to compute 99.9% useless windows and throw the ones you don't need away).
Running df.rolling(3600).min() 50 times filled up the RAM bit by bit until the IPython kernel crashed or gave a MemoryError. df.rolling(3600).median() was no problem, memory usage stayed the same which suggests it's not a problem with sorting the window data.
I tried adding the arguments min(raw=True, engine='numba') to change the underlying implementation but it still crashed.
I hope this is enough info to reproduce the error, but I guess it is very dependent on the system after all.
pandas: 1.0.3
python: 3.7.7
ipython: 7.13.0
I can reproduce this behaviour with this code snippet:
import numpy as np
import pandas as pd
import psutil
df = pd.DataFrame(np.random.randn(int(1e7), 1))
for i in range(10):
print(f"{i}: Memory usage: {psutil.virtual_memory()[2]}%")
df.rolling(3600).<operation>()
When using median or mean as operation, memory usage stayed constant:
0: Memory usage: 39.3%
1: Memory usage: 39.0%
2: Memory usage: 39.0%
3: Memory usage: 39.0%
4: Memory usage: 39.1%
5: Memory usage: 39.1%
6: Memory usage: 39.1%
7: Memory usage: 39.1%
8: Memory usage: 39.1%
9: Memory usage: 39.1%
With min or max grows considerably:
0: Memory usage: 39.1%
1: Memory usage: 41.7%
2: Memory usage: 44.4%
3: Memory usage: 47.0%
4: Memory usage: 49.6%
5: Memory usage: 52.2%
6: Memory usage: 54.9%
7: Memory usage: 57.5%
8: Memory usage: 60.1%
9: Memory usage: 62.8%
Output of pd.show_versions():
INSTALLED VERSIONS
------------------
commit : None
python : 3.8.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.6.2-1-default
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.3
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0
Cython : 0.29.15
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None
Edit:
I also tried to use a frequency as rolling window argument, like this:
import numpy as np
import pandas as pd
import psutil
index = pd.date_range('2020-01-01 00:00:00', '2020-04-01 00:00:00', freq='S')
data = np.random.randn(len(index), 1)
df = pd.DataFrame(data, index)
for i in range(5):
print(f"{i}: Memory usage: {psutil.virtual_memory()[2]}%")
df.rolling('H').min()
In this case the memory does not grow, so using 'H' instead of 3600 would be a quick fix for you, @magratheaner.
0: Memory usage: 38.3%
1: Memory usage: 38.4%
2: Memory usage: 38.4%
3: Memory usage: 38.4%
4: Memory usage: 38.4%
This suggests that the issue is somewhere in _roll_min_max_fixed in pandas/_libs/window/aggregations.pyx.
@s-scherrer Perfect, thank you. 'H' seems to be working nicely
Hi there,
I see you have a fix for this, but this issue is added to the 1.1 milestone, which is 1st of August. Are not you going to fix it till August or at August with the 1.1 release? If not and we'll have the fix sooner, what was the reason for adding it to 1.1 milestone?
Do you have a priority or severity mark? Why this (and some other similar issues, there are also duplicates, other people reported you same -- see https://github.com/pandas-dev/pandas/issues/32266 for example) is not marked with high priority/urgent bug? Do you understand that many apps crash since the release of pandas 1.0 all around the world because many libs use rolling.min/max? I wonder how it's managed...
Most helpful comment
Hi there,
I see you have a fix for this, but this issue is added to the 1.1 milestone, which is 1st of August. Are not you going to fix it till August or at August with the 1.1 release? If not and we'll have the fix sooner, what was the reason for adding it to 1.1 milestone?
Do you have a priority or severity mark? Why this (and some other similar issues, there are also duplicates, other people reported you same -- see https://github.com/pandas-dev/pandas/issues/32266 for example) is not marked with high priority/urgent bug? Do you understand that many apps crash since the release of pandas 1.0 all around the world because many libs use rolling.min/max? I wonder how it's managed...