Environment: Google Cloud Function / 2 GB in-ram file system / unknown (probably debian) 64 bit linux / crystal binaries called via Node.js.
I'm running into the fabled boehm-gc error:
GC Warning: Out of memory - trying to allocate less
Insufficient memory for GC_all_nils
when running crystal language binaries (or just crystal --version
or shards install
etc) from within a Google Cloud Function, and when there is tons of free ram (generally 1.7 gigs). GCF uses a 2 GB in-ram file system, meaning if I create a 20 MB file, I am really using up 20 MB of ram, etc. Running crystal --version
should not require 2 GB of ram, so something fishy is going on. I've checked the output of free -h
right before I run crystal --version
and indeed there are like 1.8 gigs of free ram.
For some reason no matter what I do, boehm-gc is throwing this error via crystal. I am wondering if maybe this is a bug?
I can run binaries or code you give me to try to debug this in GCF.
I also posted this on the boehm-gc repo: https://github.com/ivmai/bdwgc/issues/225
For anyone wondering why I care about this: I am working on a shard that will automatically package and deploy a crystal app into a google cloud function. Right now I have this working if I do the compilation locally on my linux machine, however I would like to compile binaries in Google Cloud Function so they are completely optimized for that runtime environment (and for OSX users).
If your're compiling, it might be very likely that the compiler consumes this 1.8 GB. Before jumping to conclusions that this is must be a bug, you should verify the actual memory usage while the program runs, not just before you start it. You could either poll free
or add --stats
to the compiler command.
@straight-shoota please read the issue description. Running any crystal binary such as crystal --version
or shards --version
triggers the issue! It sounds a lot like BDW is somehow incapable to detect the actual available memory.
This was in response to the last comment which talked explicitly about compiling.
Anyways it would be good to know the actual memory consumption, be it for compiling or crystal --version
.
unfortunately the best I can get is the memory consumption of all processes and the total available for the system immediately before I run crystal --version
and immediately after crystal --version
crashes with the GC error. (the whole system is typically using something like 200 mb of ram at that point). I would second that this sounds like BDW is incorrectly estimating the actual available memory.
@sam0x17 could you run the build command with --stats
option? That will show some progress and memory consumption in each stage.
I don't fully understand what code you are trying to compile and then run in GCF. How huge it is? Is it shareable?
@bcardiff I will do that and get back to you. edit: I actually can't because crystal crashes regardless of whether I am compiling or not, so --stats is irrelevant
At the moment I'm not even trying to compile anything -- just running the pre-compiled crystal binary and doing crystal --version
, or shards install
, etc.. The next step is compiling a tiny hello world program (e.g. puts "hello world"
). This is definitely something to do with BDW and not the size of what I'm trying to do.
The folks over at BDW have replied as follows, so I will do some research on nmap etc and see if I can debug the problem that way:
This means GC cannot grab memory from OS (GC uses mmap or sbrk to get more memory depending on the config).
My advice is to write a program in C that links to bdw-gc and run that. If that fails, it's a problem with bdw-gc and not Crystal.
bdwgc doesn't "estimate the available memory", it asks for what it needs with mmap
and clearly recieves an error.
I'd be wary of the output of free
here: can you post the full output of free
? Is free
reporting the memory used by the in-ram filesystem properly? Perhaps the process is running in a memory cgroup and the amount of available memory isn't visible to the process.
all right guys here is the output from a full run of the cloud function so you can better see what's going on. I put in some free -h
calls before and after every operation that _could_ have some effect on memory usage, along with a command that prints all running processes and their memory usage (the code for this function is as follows):
function procMemUsage() {
cmd('free -h');
cmd('ps aux | awk \'{print $6/1024 " MB\t\t" $11}\' | sort -n');
}
Here is the full output (scroll to the right):
D compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.434 Function execution started
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.492 $ free -h
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.523 total used free shared buffers cached
Mem: 2.0G 79M 1.9G 0B 0B 57M
-/+ buffers/cache: 22M 2.0G
Swap: 0B 0B 0B
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.523 $ ps aux | awk '{print $6/1024 " MB " $11}' | sort -n
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.567 0 MB COMMAND
0.769531 MB /bin/sh
1.51172 MB awk
2.40625 MB /bin/sh
3.21484 MB ps
4.29297 MB /bin/bash
57.1875 MB /nodejs/bin/node
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.567 preparing to compile...
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.578
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.578 injecting pcre into environment...
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.578 $ tar -xzf ./pcre.tar.gz -C /tmp
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.761 added /tmp/pcre/bin to PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.773 added /tmp/pcre/lib to LD_LIBRARY_PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.782 added /tmp/pcre/lib to LIBRARY_PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.792 added /tmp/pcre/include to C_INCLUDE_PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.801 added /tmp/pcre/include to CPP_INCLUDE_PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.801
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.801 injecting libevent into environment...
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:38.801 $ tar -xzf ./libevent.tar.gz -C /tmp
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.076 added /tmp/libevent/bin to PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.090 added /tmp/libevent/lib to LD_LIBRARY_PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.115 added /tmp/libevent/lib to LIBRARY_PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.129 added /tmp/libevent/include to C_INCLUDE_PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.143 added /tmp/libevent/include to CPP_INCLUDE_PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.143 $ free -h
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.210 total used free shared buffers cached
Mem: 2.0G 91M 1.9G 12M 0B 70M
-/+ buffers/cache: 21M 2.0G
Swap: 0B 0B 0B
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.210 $ ps aux | awk '{print $6/1024 " MB " $11}' | sort -n
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.268 0 MB COMMAND
0.4375 MB awk
1.74609 MB /bin/sh
3.14453 MB ps
3.38281 MB /bin/sh
4.29297 MB /bin/bash
58.1875 MB /nodejs/bin/node
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.268
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.272 crystal version not specified, finding latest version...
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.813 latest version: 0.24.2
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.813
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:39.813 downloading linux x64 binary for Crystal 0.24.2
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:40.566 $ curl -O -L --silent https://github.com/crystal-lang/crystal/releases/download/0.24.2/crystal-0.24.2-1-linux-x86_64.tar.gz
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:46.372
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:46.390 extracting "crystal-0.24.2-1-linux-x86_64.tar.gz"...
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:46.390 $ tar -xzf *.tar.gz
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.840
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.840 $ free -h
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.867 total used free shared buffers cached
Mem: 2.0G 273M 1.7G 194M 0B 252M
-/+ buffers/cache: 21M 2.0G
Swap: 0B 0B 0B
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.867 $ ps aux | awk '{print $6/1024 " MB " $11}' | sort -n
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.935 0 MB COMMAND
1.82422 MB awk
1.87891 MB /bin/sh
3.33984 MB ps
3.51562 MB /bin/sh
4.29297 MB /bin/bash
58.1875 MB /nodejs/bin/node
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.947 added /tmp/tmp-2UghNIpd6dGe8/crystal-0.24.2/bin to PATH
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.947
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.947 testing that crystal binary is installed and runnable...
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.947 $ free -h
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.962 total used free shared buffers cached
Mem: 2.0G 274M 1.7G 194M 0B 252M
-/+ buffers/cache: 21M 2.0G
Swap: 0B 0B 0B
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:48.962 $ ps aux | awk '{print $6/1024 " MB " $11}' | sort -n
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:49.043 0 MB COMMAND
2.05859 MB awk
2.05859 MB sort
2.98438 MB /bin/sh
3.02734 MB ps
4.29297 MB /bin/bash
58.1875 MB /nodejs/bin/node
I compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:49.043 $ crystal --version
E compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:49.155 Error: Command failed: crystal --version
GC Warning: Out of memory - trying to allocate less
Insufficient memory for GC_all_nils
at checkExecSyncError (child_process.js:473:13)
at Object.execSync (child_process.js:513:13)
at silentCmd (/user_code/compile.js:15:24)
at cmd (/user_code/compile.js:20:13)
at compile (/user_code/compile.js:95:14)
at exports.init (/user_code/compile.js:128:3)
at /var/tmp/worker/worker.js:684:7
at /var/tmp/worker/worker.js:668:9
at _combinedTickCallback (internal/process/next_tick.js:73:7)
at process._tickDomainCallback (internal/process/next_tick.js:128:9)
D compile-crystal-zd08sy l7rsrnwtot6j 2018-06-14 00:01:49.167 Function execution took 10735 ms, finished with status: 'crash'
Reading the bdwgc, allocating GC_all_nils
is pretty much the first thing the GC does, so it really does point to it being an interaction between bdwgc and the environment. Getting a strace
of crystal would be useful.
@RX14 ok I'll try to get strace to compile statically so I can run it in the cloud function and get back to you
@sam0x17 looks like one for the bdwgc folks, you should post your strace on that issue.
@sam0x17 But please upload the strace somewhere (gist or whatever) and post a link in the issue. This is way to long for a comment.
@straight-shoota @RX14 changed to a pastebin: https://pastebin.com/hqyuZgS8
Interestingly, I did an experiment filling an array with tons of random ints to see if I could get a statically compiled crystal binary to crash due to dynamic memory allocation, and it actually does not crash. So either this is a problem that affects crystal
and shards
but not crystal programs themselves, or something about how I set up my environment is different that is making things work. For my purposes this is enough to accomplish what I was trying to accomplish, but it may be of interest to continue to investigate why these memory errors were appearing before.
Reproducible example with directions is here: https://github.com/sam0x17/dynamic_memory_test
TLDR: clone the repo and run the build_deploy
script, assuming that you already have gcloud installed and logged in via gcloud init
and you already have docker set up to run as a non root user.
https://github.com/sam0x17/dynamic_memory_test
@RX14 @straight-shoota
I'm going to call this closed at this point. There is really no reason to be compiling in a cloud function now that we have confirmed that static binaries work, and that is the only reason anyone would ever encounter this error based on my experiment.
This error might as well affect any Crystal binary, so I'd keep it open. It's still not clear how this bug works.
@straight-shoota It's a problem with bdw-gc and that particular cloud. Crystal has nothing to do here...
Until we either:
As a matter of fact, can you please run df -h
before you run crystal
? Your tmpfs
might be full (and tmpfs counts out of ram - and is fairly invisible, I hit this myself recently).
@RX14 agree that we should close this unless it becomes a problem for someone else.
That said, df -h
shows that the tmpfs had plenty of free space, oddly.
Based on the strace, GCF does not support mmap(/dev/zero). So, I will change to use mmap(MAP_ANONYMOUS) instead for this target. Could you please let me know how the BDWGC was built (I mean configure/make options)? Is the target host (GCF) is Linux/amd64?
Could you please let me know how the BDWGC was built (I mean configure/make options)? Is the target host (GCF) is Linux/amd64?
@RX14 @straight-shoota maybe you guys might be able to shed some light on this? I do not know where the bdwgc compilation flags exist in this project
@sam0x17 @ivmai It's all here: https://github.com/crystal-lang/distribution-scripts
Though for Mac using homebrew it's here: https://github.com/Homebrew/homebrew-core/blob/master/Formula/crystal.rb
This commit should solve the issue: https://github.com/ivmai/bdwgc/commit/eed796d6d6fda97354ccbe6a39dce5506d9c6762
FWIW I'm also seeing this when running the Docker image via rootless podman:
✘ ryan@fedora ~/code/kgk master ● podman run --net host --rm crystallang/crystal crystal
GC Warning: Out of memory - trying to allocate less
Insufficient memory for GC_all_nils
✘ ryan@fedora ~/code/kgk master ●
Might be a podman bug but still something worth noting.
I'm also seeing this when running the Docker image via rootless podman:
GC Warning: Out of memory - trying to allocate less
Insufficient memory for GC_all_nils
Might be a podman bug but still something worth noting.
@kirbyfan64, If possible, please recheck it with the above fix (https://github.com/ivmai/bdwgc/commit/eed796d6d6fda97354ccbe6a39dce5506d9c6762)
Most helpful comment
This commit should solve the issue: https://github.com/ivmai/bdwgc/commit/eed796d6d6fda97354ccbe6a39dce5506d9c6762