When lots of command line arguments are passed to a Crystal binary, it will exceed the memory at startup, which results in a segfault. An example where that can happen is when you are writing a script that receives files as input.
$ ./do_something some_dir/*.gz
Segmentation fault (core dumped)
If there are many files, it will segfault and you have to explicitly set GC_INITIAL_HEAP_SIZE=4M to make it work. I would consider it a bug for two reasons:
To reproduce, you can take the example from OptionParser:
require "option_parser"
upcase = false
destination = "World"
OptionParser.parse do |parser|
parser.banner = "Usage: salute [arguments]"
parser.on("-u", "--upcase", "Upcases the salute") { upcase = true }
parser.on("-t NAME", "--to=NAME", "Specifies the name to salute") { |name| destination = name }
parser.on("-h", "--help", "Show this help") { puts parser }
parser.invalid_option do |flag|
STDERR.puts "ERROR: #{flag} is not a valid option."
STDERR.puts parser
exit(1)
end
end
destination = destination.upcase if upcase
puts "Hello #{destination}!"
And run it as follows:
$ crystal run example.cr -- $(seq 1 100000)
Program exited because of a segmentation fault (11)
However, on my system, that will work:
$ GC_INITIAL_HEAP_SIZE=4M crystal run example.cr -- $(seq 1 100000)
Hello World!
Building a binary (either debug or release mode) does not change the behavior.
When it crashes, the critical operations happens at startup:
kernel.cr:41:
...
40 # An array of arguments passed to the program.
41 ARGV = Array.new(ARGC_UNSAFE - 1) { |i| String.new(ARGV_UNSAFE[1 + i]) }
42
I attached a picture with the backtrace.It triggers immediately the garbage collector, which eventually crashes:

$ crystal -v
Crystal 0.33.0 [612825a53] (2020-02-14)
LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu
OS: Ubuntu 18.04
What's the error you get in Ruby?
I want to reproduce it but I already get the error in zsh with a long argument, so I can't even reach Crystal or Ruby.
In my opinion this should be fixed but it has a very low priority, because in any case the program will not work.
Should have added that I am using Bash as a shell.
It is true that there exists a limit on the number of parameters. In my real example, there are fewer arguments, but the file paths are longer. In other words, each argument uses more space than the numbers for seq.
With Ruby, I cannot reproduce the behavior (running out of memory by receiving a huge ARGV list).
For the record, I tried to run the Ruby(v2.7.0) version and it worked:
require 'optparse'
params = {}
OptionParser.new do |opts|
opts.on('-a')
opts.on('-b NUM', Integer)
opts.on('-v', '--verbose')
end.parse!(into: params)
p params
I ran it in zsh(v5.8) like this:
./file.rb -- {1..100000} and it correctly output {}.
Fixing this seems hard, because different programs expect different sizes of input. If it's not enough by default programs with large input will fail. If it's too much, programs with small input will consume more memory than they need.
First we should figure out, what the current default is. That's not officially documented anywhere. Judging from bdwgc's souce it depends on several configuration values, but seems to be 384 kB in most cases (with LARGE_CONFIG).
Why does it depend on the GC_INITIAL_HEAP_SIZE at all?
The argv is passed in from libc, which does not allocate that on the GC heap. Anything after that is done in crystal code. Why would the initial heap size affect anything at all?
I suspect GC_INITIAL_HEAP_SIZE is incidentally masking unsafe behaviour.
I want to reproduce it but I already get the error in zsh with a long argument
By the way, this is not about the shell but about the system. Both Linux and Mac have
getconf ARG_MAX
and for me it's 2097152.
Could be less on Mac.
Perhaps the issue is just that "GC collecting while in early init is broken" and we need to work out why?
Most helpful comment
Why does it depend on the
GC_INITIAL_HEAP_SIZEat all?The argv is passed in from libc, which does not allocate that on the GC heap. Anything after that is done in crystal code. Why would the initial heap size affect anything at all?
I suspect
GC_INITIAL_HEAP_SIZEis incidentally masking unsafe behaviour.