Pillow is embedded into the Odoo library. I was trying to load a data file which referenced a png, when suddenly a segfault terminated my container.
The processing of the png would complete successfully.
Putting on DEBUG mode, we can see the following output.
odoo | 2016-05-30 08:29:20,832 7 INFO test openerp.modules.loading: loading base/base_data.xml
odoo | 2016-05-30 08:29:21,050 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:29:21,052 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:29:21,052 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:29:21,053 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:29:21,053 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:29:21,054 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:29:21,054 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:29:21,087 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:29:21,088 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:29:21,088 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:29:21,088 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:29:21,093 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:29:21,094 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:29:21,094 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:29:21,275 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:29:21,276 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:29:21,276 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:29:21,276 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:29:21,277 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:29:21,277 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:29:21,278 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:29:21,338 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:29:21,338 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:29:21,339 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:29:21,339 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:29:21,339 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:29:21,340 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:29:21,340 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:29:21,384 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:29:21,385 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:29:21,385 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:29:21,385 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:29:21,386 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:29:21,386 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:29:21,386 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:29:21,563 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:29:21,563 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:29:21,564 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:29:21,564 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:29:21,564 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:29:21,564 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:29:21,565 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | Segmentation fault
Pillow 3.2.0 and (probably through) 2.7.0
Python 2.7.11
Alpine Linux 3.3
After resuming the container, we can see also resuming the loading with the following output:
odoo | 2016-05-30 08:34:52,329 6 INFO test openerp.modules.loading: loading base/base_data.xml
odoo | 2016-05-30 08:34:52,618 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:34:52,619 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:34:52,621 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:34:52,621 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:34:52,622 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:34:52,623 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:34:52,624 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:34:52,659 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:34:52,661 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:34:52,662 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:34:52,665 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:34:52,670 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:34:52,674 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:34:52,675 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:34:52,857 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:34:52,857 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:34:52,857 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:34:52,858 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:34:52,858 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:34:52,866 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:34:52,867 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:34:52,937 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:34:52,938 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:34:52,940 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:34:52,945 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:34:52,946 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:34:52,946 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:34:52,947 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:34:52,984 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:34:52,985 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:34:52,986 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:34:52,991 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:34:52,991 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:34:52,992 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:34:52,992 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:34:53,242 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:34:53,242 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo | 2016-05-30 08:34:53,243 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo | 2016-05-30 08:34:53,245 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo | 2016-05-30 08:34:53,246 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo | 2016-05-30 08:34:53,247 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo | 2016-05-30 08:34:53,247 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo | 2016-05-30 08:34:53,652 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:34:53,653 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 41 3136
odoo | 2016-05-30 08:34:53,680 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:34:53,681 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 41 3136
odoo | 2016-05-30 08:34:53,836 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:34:53,836 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 41 3136
odoo | 2016-05-30 08:34:53,858 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo | 2016-05-30 08:34:53,858 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 41 3136
What were you doing with the pillow library, and what file were you using?
Hi Eric, I'm still trying to narrow the problem down, as I'm really not even sure, if this is the right repo down the stack... Any hint's you can probably give me to help hunting this down, I would very much appreciate...


both seem fail
On a normal ubuntu install, it works well, however, alpine is kind of the standard in docker security thanks to pax/Grsecurity...
Segfaults are actually pretty easy to pin down, more or less.
1) Run it under gdb. e.g.:
gdb python
...
r my_test_.py
[things go boom]
bt
[big long stack trace where things went wrong]
Fedora is the best out of the box for gdb, although there are macros that you can put in for other linuicies. That can give you a python level back trace from within gdb, as well as being able to inspect python objects. The unified view is really useful.
If you can, it helps to run a version of python with debug symbols, and compile with optimizations off. (on ubuntu, it's by installing python-dbg and compiling and running pillow against that)
2) Once you have a backtrace of any form, you'll at least be able to trace down which c extension it's from. Hopefully, you can narrow it down and figure out which image is causing the problem, then we can have a very simple test case to dive into and figure out what's going wrong.
Hm, I wasn't able yet to get python with debug symbols, as I think running on Alpine Linux is relevant.
Yet, the gdb output is not quite useful:
2016-05-30 10:58:16,468 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
2016-05-30 10:58:16,468 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
2016-05-30 10:58:16,470 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb)
(gdb) bt
No stack.
(gdb)
That's odd. If you're interpreting a png, then you should be well through the startup.
Try strace? It's going to have a ton of output, but you should be able to tell what's tried to execute and fail there.
Neither...
2016-05-30 10:58:16,470 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb)
(gdb) bt
No stack.
(gdb) strace
No default breakpoint address now.
Can this be related to PaX ?
https://en.wikibooks.org/wiki/Grsecurity/Appendix/PaX_Flags
eg.: https://pax.grsecurity.net/docs/mprotect.txt
I've seen paxctl -cm in order to disable mprotect on some nodejs dockers beeing in connection with segfaults...
I already tried paxctl -cm on some libraries (all of PIL/*.so and /lib/zlib.so) to no avail.
There's nothing in Pillow that would need executable memory maps, and at any rate, png decoding would not trigger memory mapping in our code. That's not to say that somewhere deep inside python the image file isn't getting memory mapped in, but it's not something that we request. (and even at that, it wouldn't require execute privs).
To narrow this down, you could build just enough of a docker image to run pillow, and attempt to open and load the images in question. If they succeed, then you at least know that it's somewhere else. And if they fail, it's going to be a smaller test case.
In fact I had already done so, but I couldn't really draw a conclusion...
FROM alpine
USER root
# Part of my stack, no idea if maybe relevant.
RUN set -x \
&& apk add --no-cache su-exec \
&& chmod 500 $(which su-exec)
RUN apk --no-cache add python \
build-base \
python-dev \
py-pip \
# Pillow image libraries
jpeg-dev \
zlib-dev
RUN LIBRARY_PATH=/lib:/usr/lib /bin/sh -c "pip install Pillow==3.2.0"
# --- User Creation ---
RUN addgroup -S -g 9001 otheruser && \
adduser -S -u 9001 -G otheruser -h /home -s /sbin/nologin otheruser && \
chown -R otheruser:otheruser /home
RUN apk --no-cache add gdb
ADD https://raw.githubusercontent.com/odoo/odoo/d2f0dba64c7afb3a9de49e7c3d46f74542f8bbbd/openerp/addons/base/static/img/main_partner-image.png /testimage.png
ADD https://gist.githubusercontent.com/blaggacao/aee86f3d8687efefb8186c9ffc19ac9b/raw/04b14f2407294c563e4b646162c754f0bc8fb00d/run.py /run.py
RUN echo "#!/bin/sh" >> /running && \
echo "gdb -ex r --args python -v run.py" >> /running
RUN chmod +x /running
CMD ["/running"]
With this for running:
https://gist.githubusercontent.com/blaggacao/aee86f3d8687efefb8186c9ffc19ac9b/raw/04b14f2407294c563e4b646162c754f0bc8fb00d/run.py
I'll sleep over this... If you are interested in this error, maybe you can quickly build /run this dockerfile to see if you can find something out of this...
HEadache, something broke this test case again :smile_cat:
I'm not sure what's up with that test script, especially the round trip through base64.
But, both of these test scripts work, without crashing:
from PIL import Image
im = Image.open('testimage.png')
im.load()
and:
import cStringIO as StringIO
from PIL import Image
with open('/testimage.png', 'rb') as f:
data = f.read()
image_stream = StringIO.StringIO(data)
image = Image.open(image_stream)
image.load()
The load() at the end is important, since pillow does lazy loading of the image -- open only hits the python layer and gets the image metadata. If you don't actually access the image pixels, then it never loads them and you never hit the c layer for decompression.
So based on this, I'd say that either you're doing something else with the image that's causing issues, or it's failing in an entirely different place.
The roundtrip was because I tried to mimic the original code. Well, thanks a lot for your help! Indeed, it also works on my docker for windows beta. So at least I can exclude this possible entropy as a cause for now...
The base64 should be a noop, and if it's not, then that's going to be a completely separate problem.
In general though, Pillow shouldn't segfault, and I'll try to track them down given some sort of reproducible test case.
True. OK, thanks again. I'll keep on digging...
Finally, with valgrind, I was able to get at least some error message:
https://gist.github.com/blaggacao/1df4f9dffb0e515aaf6decc6e67ff2d4#file-valgrind-log-L2354
It seems that like something "deep inside python", based on the logs, can there be anything said from a Pillow point of view, such as "definitely not related to pillow", "related but not responsible" or something alike?
Effectively it seems to be triggered by PaX flags and probably by the simple image loading, as this passed without problem in the test. I still couldn't figure out how to get a useful stack trace in order to see which code is calling but on a best guess basis, the program implements also image manipulation at some point https://github.com/odoo/odoo/blob/9.0/openerp/tools/image.py#L10
The trace and activity from valgrind make it look like its a gc pass that's segfaulting. Its possible that it's related to the pax flags, but the underlying cause of that would be a mismatch in incref/decref in an extension, so that there is a use after freeing.
The other thing that I noticed is that the error is at least 3 seconds after the logging about pngs, so it's unlikely that that is the direct cause. It would fit with a gc issue, but that just means tha the crash is far removed from the source of the problem.
:dizzy: I think this is closable, thank you for you help! I'm trying to get help from the alpine linux mainainers now...
This get's ever stranger, on the test docker we made, the first images passes and a second one also.
NOW, in the actual code the second image fails, effectively near the Pillow interaction, I'll try to narrow this down further and eventually reopen...
I don't know exactly what I did, probably something out of (or a combination of):
ulimit -c unlimited/proc/sys/kernel/core_pattern, although no core dump file seems to be written on the new locationEDIT
I double checked and definitely it's the privileged flag which made the interactive gdb session working. Seems that gdb needs some kind of kernel capabilities. Not that far a catch... The ulimit -c unlimited in conjunction with the core_patten then would be responsible for generating the dump file, which is not created when the process is properly caught by gdb.
END EDIT
But, I now got a backtrace for the situation, maybe you can have a look at it?
https://gist.github.com/blaggacao/c081658dea0e7b95cb006a1b1673786f
Without knowing about coredump analysis at all, I would guess, the last item is the culprit (musl), but yeah, things are never that easy :smile:
Do you have an actual core file? Or a docker file that can replicate this?
The backtrace looks like pure python, it's a bunch of frames and PyObject_Call. The bit about musl looks like the entry point for python, so I think that's a red herring.
(aside, I'm not seeing how to get symbols for the apks, nor are breakpoints working, nor the python tweaks to gdb to let me see the python objects. Alpine seems to be something of a step backwards in terms of debugability)
I can managed to dump a core file off the Docker in VM, https://filebin.net/9svhknfet7ofut0j
I can draft a dockerfile, which should be able to replicate, however it would be our full blown deployment stack. Thanks for your help!
I found a list of differences between muslibc and glibc, might be of interest, yet maybe not directly related: http://wiki.musl-libc.org/wiki/Functional_differences_from_glibc#Name_Resolver_.2F_DNS
I wasn't able to get much more out of the core dump, since I can't find any symbols packages for the python from the .apk. I'm pretty sure that symbols are required for the py-bt command.
I'm not sure what to tell you at this point, other than to try to narrow down what causes it. That may require switching to a python build that has symbols, or running under strace (which is blocked by something) or really verbose logging to narrow down the trigger.
OK, I'm really grateful for your help as I have never ever touched these topics and don't have that much real options due to lacks in knowledge. I've tried to build a python with symbols, yet objdump -t tells me "no symbols found" so I'm confused again...
I've tried it this way, I'll try to check back with the alpine community.
# # --- Build Debug Liraries for GDB Debugging ---
# ENV CFLAGS='-Wall -O0 -g'
# RUN apk add --update alpine-sdk \
# && adduser -G abuild -g "Alpine Package Builder" -s /bin/ash -D builder \
# && echo "builder ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers \
# && mkdir /packages \
# && chown builder:abuild /packages \
# && git clone --depth 1 git://dev.alpinelinux.org/aports \
# && abuild-keygen -a -i \
# && cd aports/main/python
# USER builder
# RUN cp -r aports/main/python /packages/python \
# && cd /packages/python \
# && abuild-keygen -a -i \
# && abuild checksum \
# && abuild -r \
# && abuild -i python \
# && abuild -i python-dev
# USER root
The symbols might be in a central location like /usr/lib/debug or /usr/share/gdb.
I think, I'm lucky: http://pkgs.alpinelinux.org/package/edge/main/x86_64/python-dbg Will make another attempt with this.
Well steps are pretty small ones, I'm constantly running into this, google doesn't say a useful word about this...
Starting program: /usr/lib/debug/usr/bin/python2.7.debug
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Warning:
Cannot insert breakpoint -2.
Cannot access memory at address 0x65d
Here we can see how the same program is acting on ubuntu:
2016-06-06 06:09:45,065 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM pHYs 59 9
2016-06-06 06:09:45,066 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM tIME 80 7
2016-06-06 06:09:45,066 8 DEBUG testubuntu PIL.PngImagePlugin: tIME 80 7 (unknown)
2016-06-06 06:09:45,066 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IDAT 99 7809
2016-06-06 06:09:45,343 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IHDR 16 13
2016-06-06 06:09:45,344 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IDAT 41 3136
2016-06-06 06:09:45,358 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IHDR 16 13
2016-06-06 06:09:45,359 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IDAT 41 3136
2016-06-06 06:09:45,488 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IHDR 16 13
2016-06-06 06:09:45,489 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IDAT 41 3136
2016-06-06 06:09:45,502 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IHDR 16 13
2016-06-06 06:09:45,503 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IDAT 41 3136
2016-06-06 06:09:45,634 8 INFO testubuntu openerp.modules.loading: loading base/res/
It has exactly the same DEBUG messages, so it is happening AFTER the image is loaded... (You probably knew that already :D)
Have you tried with something like https://hub.docker.com/r/frolvlad/alpine-glibc/?
cc @pedrobaeza
I just tested on alpine-glibc & it works fine. As such - I do not believe that this falls within scope of Alpine maintainers, and does belong here. This is most certainly an issue while compiling Pillow against muslc.
So what do you suggest?
I'm going to play around with some compile options today & see if I can maybe circumvent this. I do feel that this issue should be reopened though, or we should create a new one here.
As an update, (obviously) no luck on randomly tweaking compile options. The next step is to create an isolated test case outside of Odoo that can replicate this issue.
@blaggacao - Where exactly did you see this issue? I'm trying to find a common denominator between the image load failing in my section of code with your's, other than Odoo itself & the fact that the fail is on Pillow.
@lasley What code are you seeing fail? FWIW -- our test suite is now (as of this week) running against alpine/musl, and it's succeeding. e.g.: https://travis-ci.org/python-pillow/Pillow/builds/196550625
The docker image is pythonpillow/alpine, and it's defined here: https://github.com/python-pillow/docker-images/tree/master/alpine
@wiredfool - I'm working on pulling that out right now. All of us that are experiencing this issue are deep in the core of an ERP, so isolating the case is like pulling a needle from the haystack. So far the haystack is narrowed to Pillow calls, but we still could be dealing with something lower in the lib stack.
In my case, the segfault is occurring during dynamic creation of a few images that look to be in PNG format. The code is dense though, so I'm trying to find more examples in more dense code to maybe find the common point.
Once I am able to do that, I'll get us a fail case in the existing tests. Am I correctly understanding that the Docker images are just running the Tests folder in Pillow?
Yep, the docker images are running the test suite.
@lasley From https://github.com/python-pillow/Pillow/issues/1935#issuecomment-222529469 on, things got (a bit) more enlightened. I'd be happy to get back to this topic with some help as back then I couldn't crack it... My main impediment was that I could not get gdb to work on alpine, because back then there were no debugging symbols, but already back then the next beta version showed to have debugging symbols, but we were far away from switching, so I dropped. It might be worthwhile having a thorough look at the above analysis path...
@blaggacao - can you point me to the code that was causing this issue for you? I'm trying to narrow down variables.
You are correct that debug works in Alpine now, but IMO the first step is an isolated test case that consistently fails.
It was (!) at that time happening for those images https://github.com/python-pillow/Pillow/issues/1935#issuecomment-222448769 in the base modul, and here is the line that indicates this from the debug logs above:
INFO test openerp.modules.loading: loading base/base_data.xml
So your install was segfaulting immediately upon db creation with demo data?
Correct.
In my case, when I debugged step by step, I could not reproduce it!
Yeah I'm having similar difficulties. It seems that some gears have shifted somewhere low too, because we both were able to get past the DB creation that @blaggacao was not able to. We're both working on Alpine 3.5, and @blaggacao was in 3.4.
So looking back at the valgrind trace I'm seeing that there are at least two other shared libraries that are being called - psycopg and lxml. I wonder if they're possibly the culprit, either alone or in combination with pillow loading images?
If you've actually got a reproducible case, it may be time to dump a bunch of logging into the image class, either by wrapping it in a facade object or modifying it directly.
The case that I can reproduce is not very straight-forward now that I moved my image to Debian, but I will summarize to let it be clear:
docker build --tag tecnativa/odoo-base:10.0 .docker-compose -f setup-devel.yaml upchown -R $USER:$USER . (I'll have to work on getting rid of this, for now you'll need it).docker-compose -f devel.yaml upaccount_accountant addon on it.Then you'll see on the terminal that Odoo died with exit code 139.
I narrowed it down to know that removing this line makes it not die:
<span t-field="object.country_id.image" t-options='{"widget": "image", "class": "country_flag"}'/>
This is why I suspected the culprit was Pillow (all other database/XML operations worked just fine).
You can debug by adding import wdb;wdb.set_trace() to any Python file or <t t-debug="wdb"/> to any Qweb (XML) file and visiting http://localhost:1984. Keep in mind that if you debug XML files, you'll hit https://github.com/Kozea/wdb/issues/92, which makes it even funnier 馃ぃ
Sorry, this is such a hard to reproduce case, but that's all I got 馃槥
Can you explain step 8?
This is kind of where it gets deep into the ERP. Once you go to the page in step 7, there's a "Create Database" button. For step 9, there's an "Apps" section that installs the culprit.
My replication steps are the same as @Yajo's, and I too am having trouble isolating how to bring the fail case out of the monolith.
I also got it working by removing the same line, so I dug a bit further into the code & I'm almost positive that the piece of code actually causing this issue is odoo.tools.image_resize_image.
Naturally I cannot replicate by simply running the code though, which is both intriguing and maddening. There's a lot of conditionals though, and I definitely haven't tested all the edges yet.
So, this appears to be the (or at least, a) request that's killing it OMM, as saved to a curl request
curl 'http://localhost:10069/web/dataset/call_kw/web.planner/render' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Referer: http://localhost:10069/web' -H 'Origin: http://localhost:10069' -H 'X-Requested-With: XMLHttpRequest' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.51 Safari/537.36' -H 'Content-Type: application/json' --data-binary '{"jsonrpc":"2.0","method":"call","params":{"model":"web.planner","method":"render","args":[441,"planner_account"],"kwargs":{"context":{"lang":"en_US","tz":false,"uid":1}}},"id":843142426}' --compressed
(I'm doing this by replaying the XHR, but curl works too.) The click on the Accounting menu makes two successful calls before this, version_info and load_needaction, then there's load_views which fails on the initial call, but replaying works, and then this.
After getting the reproduction, I've injected logging into Pillow using this script, which is imported prior to odoo in oodo-bin:
#!/usr/bin/env python
from PIL import Image, ImageFile, ImageEnhance, ImageDraw, JpegImagePlugin
from functools import wraps
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
def logger_wrap(f, fname):
@wraps(f)
def wrapper(*args, **kwargs):
logger.debug ("Enter %s" %fname)
try:
return f(*args, **kwargs)
finally:
logger.debug ("Exit %s" %fname)
return wrapper
function_type = type(logger_wrap)
class_type = type(Image.Image)
def decorate(obj):
for name in dir(obj):
if '__' in name: continue
#print obj, name
elt = getattr(obj, name)
if type(elt) == class_type:
decorate(elt)
if type(elt) == function_type:
setattr(obj, name, logger_wrap(elt, "%s:%s" %(obj.__name__, name)))
for module in (Image, ImageFile, ImageEnhance, JpegImagePlugin):
decorate(module)
oodo-bin:
#!/usr/bin/env python
# set server timezone in UTC before time module imported
__import__('os').environ['TZ'] = 'UTC'
__import__('pkg_resources').declare_namespace('odoo.addons')
import inject_logger
import odoo
if __name__ == "__main__":
odoo.cli.main()
The logging that I'm getting from that replay is this:
$ docker-compose -f devel.yaml up
Starting scaffolding_db_1
Starting scaffolding_wdb_1
Starting scaffolding_inbox_1
Starting scaffolding_smtp_1
Starting scaffolding_odoo_1
Attaching to scaffolding_wdb_1, scaffolding_db_1, scaffolding_smtp_1, scaffolding_inbox_1, scaffolding_odoo_1
db_1 | LOG: database system was shut down at 2017-02-02 18:56:44 UTC
db_1 | LOG: MultiXact member wraparound protections are now enabled
db_1 | LOG: autovacuum launcher started
db_1 | LOG: database system is ready to accept connections
smtp_1 | smtp-sink: name_mask: all
smtp_1 | smtp-sink: trying... [0.0.0.0]:25
inbox_1 | <7>haproxy-systemd-wrapper: executing /usr/local/sbin/haproxy -p /run/haproxy.pid -f /usr/local/etc/haproxy/haproxy.cfg -Ds
inbox_1 | [ALERT] 032/185707 (8) : parsing [/usr/local/etc/haproxy/haproxy.cfg:8] : 'server stupid' : could not resolve address 'inbox.example.com'.
inbox_1 | [ALERT] 032/185707 (8) : Failed to initialize server(s) addr.
inbox_1 | <5>haproxy-systemd-wrapper: exit, haproxy RC=1
odoo_1 | NFO 18:57:1486061828 Executing contents of /opt/odoo/custom/entrypoint.d
odoo_1 | INFO 18:57:1486061828 Executing contents of /opt/odoo/common/entrypoint.d
odoo_1 | INFO 18:57:1486061828 Waiting until postgres is listening at db...
odoo_1 | + exec odoo --dev all
scaffolding_inbox_1 exited with code 1
odoo_1 | 2017-02-02 18:57:09,401 1 INFO ? odoo: Odoo version 10.0
odoo_1 | 2017-02-02 18:57:09,401 1 INFO ? odoo: Using configuration file at /opt/odoo/auto/odoo.conf
odoo_1 | 2017-02-02 18:57:09,401 1 INFO ? odoo: addons paths: ['/var/lib/odoo/addons/10.0', u'/opt/odoo/custom/src/private', u'/opt/odoo/auto/addons', u'/opt/odoo/custom/src/odoo/addons', '/opt/odoo/custom/src/odoo/odoo/addons']
odoo_1 | 2017-02-02 18:57:09,402 1 INFO ? odoo: database: odoo@db:default
odoo_1 | 2017-02-02 18:57:09,435 1 WARNING ? odoo.service.server: 'watchdog' module not installed. Code autoreload feature is disabled
odoo_1 | 2017-02-02 18:57:09,436 1 INFO ? odoo.service.server: HTTP service (werkzeug) running on 0.0.0.0:8069
odoo_1 | 2017-02-02 18:57:09,451 1 INFO devel odoo.modules.loading: loading 1 modules...
odoo_1 | 2017-02-02 18:57:09,718 1 INFO devel odoo.modules.loading: 1 modules loaded in 0.27s, 0 queries
odoo_1 | 2017-02-02 18:57:09,732 1 INFO devel odoo.modules.loading: loading 12 modules...
odoo_1 | 2017-02-02 18:57:09,789 1 INFO devel odoo.modules.loading: 12 modules loaded in 0.06s, 0 queries
odoo_1 | 2017-02-02 18:57:09,929 1 INFO devel odoo.modules.loading: Modules loaded.
odoo_1 | 2017-02-02 18:57:21,110 1 INFO ? odoo.addons.bus.models.bus: Bus.loop listen imbus on db postgres
odoo_1 | 2017-02-02 18:57:21,717 1 DEBUG ? inject_logger: Enter PIL.Image:new
odoo_1 | 2017-02-02 18:57:21,717 1 DEBUG ? inject_logger: Enter PIL.Image:_check_size
odoo_1 | 2017-02-02 18:57:21,717 1 DEBUG ? inject_logger: Exit PIL.Image:_check_size
odoo_1 | 2017-02-02 18:57:21,717 1 DEBUG ? inject_logger: Enter PIL.Image:isStringType
odoo_1 | 2017-02-02 18:57:21,717 1 DEBUG ? inject_logger: Exit PIL.Image:isStringType
odoo_1 | 2017-02-02 18:57:21,718 1 DEBUG ? inject_logger: Exit PIL.Image:new
odoo_1 | 2017-02-02 18:57:21,718 1 DEBUG ? inject_logger: Enter PIL.Image:_getdecoder
odoo_1 | 2017-02-02 18:57:21,718 1 DEBUG ? inject_logger: Exit PIL.Image:_getdecoder
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:register_open
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Exit PIL.Image:register_open
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:register_save
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Exit PIL.Image:register_save
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:register_mime
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Exit PIL.Image:register_mime
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:isPath
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Exit PIL.Image:isPath
odoo_1 | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:preinit
odoo_1 | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Enter PIL.Image:register_open
odoo_1 | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Exit PIL.Image:register_open
odoo_1 | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Enter PIL.Image:register_save
odoo_1 | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Exit PIL.Image:register_save
odoo_1 | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Enter PIL.Image:register_mime
odoo_1 | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Exit PIL.Image:register_mime
odoo_1 | 2017-02-02 18:57:21,721 1 DEBUG ? inject_logger: Enter PIL.Image:register_open
odoo_1 | 2017-02-02 18:57:21,721 1 DEBUG ? inject_logger: Exit PIL.Image:register_open
odoo_1 | 2017-02-02 18:57:21,721 1 DEBUG ? inject_logger: Enter PIL.Image:register_save
odoo_1 | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Exit PIL.Image:register_save
odoo_1 | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Enter PIL.Image:register_save_all
odoo_1 | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Exit PIL.Image:register_save_all
odoo_1 | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Enter PIL.Image:register_mime
odoo_1 | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Exit PIL.Image:register_mime
odoo_1 | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Enter PIL.Image:register_open
odoo_1 | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Exit PIL.Image:register_open
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.Image:register_save
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.Image:register_save
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.Image:preinit
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.ImageFile:_save
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.ImageFile:_tilesort
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.ImageFile:_tilesort
odoo_1 | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.Image:_getencoder
odoo_1 | 2017-02-02 18:57:21,724 1 DEBUG ? inject_logger: Exit PIL.Image:_getencoder
odoo_1 | 2017-02-02 18:57:21,725 1 DEBUG ? inject_logger: Exit PIL.ImageFile:_save
odoo_1 | 2017-02-02 18:57:21,728 1 INFO ? odoo.addons.report.models.report: Will use the Wkhtmltopdf binary at /usr/local/bin/wkhtmltopdf
odoo_1 | 2017-02-02 18:57:22,137 1 INFO ? odoo.http: HTTP Configuring static files
odoo_1 | 2017-02-02 18:57:22,161 1 INFO test odoo.modules.loading: loading 1 modules...
odoo_1 | 2017-02-02 18:57:22,186 1 INFO test odoo.modules.loading: 1 modules loaded in 0.03s, 0 queries
odoo_1 | 2017-02-02 18:57:22,211 1 INFO test odoo.modules.loading: loading 28 modules...
odoo_1 | 2017-02-02 18:57:22,290 1 INFO test odoo.modules.loading: 28 modules loaded in 0.08s, 0 queries
odoo_1 | 2017-02-02 18:57:22,598 1 INFO test odoo.modules.loading: Modules loaded.
odoo_1 | 2017-02-02 18:57:22,606 1 INFO test odoo.addons.base.ir.ir_http: Generating routing map
db_1 | LOG: unexpected EOF on client connection with an open transaction
So, it looks like it's somewhere else, in odoo.addons.base.ir.ir_http, or therabouts. It did appear to be deterministic, failing there three times on docker-compose runs. The fourth time the call succeeded, returned data, then died prior to any additional output when I replayed the XHR again. So, mostly reproducable, but not completely.
Actually, now I'm not so sure. It almost looks like almost all of that logging is for the first request, which may be the version_info poll, and the render request is just dying without leaving any logging trace.
Yeah there's an initial boot up sequence that you're seeing there. I've been isolating the logs with an initial request to /, which triggers that - then a subsequent request to the fail-page.
I haven't been getting any logging traces from my failures either, it's as if the app just vanishes.
Bravo on getting the specific curl isolated btw! I've just been hitting the whole page 馃槕
This new curl is perfect - I think @wiredfool is correct with the hypothesis that this is likely caused by lxml.
I traced the code backwards from web.planner/render, and it's basically just a template render. Traced that all the way back to the compilation, which is a bunch of heavy XML handling. No Pillow in sight for any of that!
This would also explain the original error @blaggacao was seeing with the database not being created, which is also pretty heavy on the lxml side of things.
This sort of debugging would be far easier if Python was compiled --with-dtrace for USDT support. That combined with BPF in recent linux kernels would mean that we could dump enter and exit from each python function from outside the container. Then again, simply having the dbg symbols would have helped as well.
I totally agree - I guess we all knew there were sacrifices when moving to Alpine though. How else would we have such a small OS?
After playing around a bit more, I am confident this isn't your problem @wiredfool. Thank you incredibly for your help in this, I am going to take the issue into a more appropriate realm (once I identify it is truly lxml).
You can go ahead and close this ticket, again 馃樃
I've just hit a similar issue with a django/Pillow based app on Alpine with python 3.5.2. I've quickly suspected a C extension (and Pillow) to cause the segfault but in my case it was simply due to a stack overflow (819+ stack frames of ~100 bytes each). With musl libc the default stack size is 80 KB (81920 B) while it is 8 MB with glibc. 80 KB is a "reasonable" default but it might not be a suitable choice for an existing application that assume a 8 MB stack size. In my case increasing the stack size from the main thread "solved" the problem (threading.stack_size(4*80*1024) 4x80 KB stack size for example).
I've done some research to see if Pillow or its dependencies were allocating large buffer on the stack but it doesn't look like. I would appreciate if someone could confirm that point. I am sorry if I burden you
@iNaKoll That would make some sense, as the stack trace linked (way) above is something like 414 items deep in c python. XML processing would tend to create that sort of stack trace. At any rate, it's an easy to test hypothesis.
Any idea on how to add that to the image? I'm not much in the C side of the street...
The threading line is python, so I threw that into the odoo-bin launcher as a test:
#!/usr/bin/env python
# set server timezone in UTC before time module imported
__import__('os').environ['TZ'] = 'UTC'
__import__('pkg_resources').declare_namespace('odoo.addons')
import inject_logger
import threading
threading.stack_size(4*80*1024)
import odoo
if __name__ == "__main__":
odoo.cli.main()
And it appears to be successful, at least in the case found above. There may be other calls that have issues.
It seems successful in mine too, albeit with limited testing. I'll submit a PR to your image @Yajo so we can perform some more extensive testing.
Hrmmm ok maybe I don't quite know your image well enough to submit that change @Yajo, but here's what you need:
# Fix stack size overflow
# https://github.com/python-pillow/Pillow/issues/1935
RUN awk '/import odoo/ { print; print "import threading; threading.stack_size(4*80*1024)"; next }1' /usr/local/bin/odoo > /tmp-bin \
&& mv /tmp-bin /usr/local/bin/odoo
/usr/local/bin/odoo being the Odoo installation path
An excellent whodunnit!
I might recall http://wiki.musl-libc.org/wiki/Functional_differences_from_glibc for the way to go.
I think the best would be to open PR to Odoo directly. I guess this could affect any other deployments with different stack sizes. Adding a configuration parameter, env variable, or simply a sane default that works everywhere seems a good improvement, even for stable releases. What do you think @lasley?
IMO we would need to identify the real CPython lib causing this issue then submit it there. This isn't really an Odoo issue, as @iNaKoll confirmed by mentioning that the issue is also occurring in a Django app. Setting the thread stack size is totally just duct tape on the situation.
The current theory is that it's a stack size issue. Overflowing the stack can cause a segfault and happen in pure python (hence stackless python which was tuned for recursive loads). It doesn't have to be a CPython extension. One likely cause would be recursive parsing of deep XML documents, or an AST generated from them. (This is at least plausible from the backtrace listed above).
You may be able to cause the same error by parsing any xml in the accounting package, or it might require building bits back up with the render.
Edit: just realised that python-pillow is obviously not the right place for this issue but anyway I think you guys would be interested in more evidence to support where the issue stems from.
Hi all,
I can also report what I think is the same issue occurring with another Docker container: ressu/sickgear . This also depends on lxml which hardly seems like a coincidence. What is interesting however is that an old release of sickgear itself (0.11.x) did not suffer from the fault.
To reproduce, build the container and run it: run -it --rm --entrypoint=/bin/sh -p 8081:8081 ressu/sickgear then navigate to localhost:8081 and under the Cog wheel choose Notifications (or just navigate to http://localhost:8081/config/notifications and the application will SIGSEGV.
I too have tried to debug this, first with strace which indicated it happening around the time of doing template processing but this wasn't helpful enough. I then attempted gdb and although the application starts up great, when I trigger the segfault, gdb says:
During startup program terminated with signal SIGSEGV
Which doesn't enable any debugging in terms of following stack traces. I'm not sure why it treats the obviously fully loaded software a still starting. I do get an error about being unable to disable address randomisation but I don't think that's relevant unless this is indirectly indicating another problem.
The plot thickens!
An interesting note - we were able to solve this problem in our Alpine container by switching to multi-threaded worker model. That revelation was honestly what caused me to just call it a lost cause and conform with the switch to Debian 鈽癸笍
Edit: just confirming that this issue can also be fixed on SickGear by increasing threading stack size: threading.stack_size(4*80*1024).
Does that resolve the issue because each thread gets its 'own stack' and therefore there is less chance for running out of stack space (different from stack overflow, I assume)? In which case I wonder if I can patch SickGear with the larger thread stack size. How does one come up with an optimal size? Some calculation based on the likely size/depth of the XML documents being parsed?
@lasley I get that. I have managed to keep with Alpine by manually increasing the stack size as mentioned above. It seems like there's a bit of pressure being applied to the Alpine maintainers atm to increase the stack size, but we shall have to wait and see what happens. It's pretty sucky to have to patch upstream so many apps because this core setting is so conservative.