Crystal: Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS when allocating many sets

Created on 19 Jan 2018  路  15Comments  路  Source: crystal-lang/crystal

Hello, I'm hitting an internal limit described in the error message below:

Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS
Aborted (core dumped)

Here's a simple example that crashes and gets this error message way before it runs out of memory. I ran this on an EC2 instance with 122GB of RAM, the loop got to 17690000 before crashing, and only used 8.6GB of RAM at its peak.

groups = {} of Set(String) => Array(Float64)
i = 0

loop do
  group = Set(String).new(["a", "b", i.to_s])
  groups[group] = [1.0, 2.0, i.to_f64]
  if i % 10000 == 0
    puts i
  end
  i += 1
end

My use case is more complex than this (basically doing a GROUP BY and SUM on large 100MB CSVs), but this error is blocking this task. I'm not sure how to interpret Increase MAXHINCR or MAX_HEAP_SECTS -- is there a suggested way to increase this limit?

bug infrastructure

Most helpful comment

@asterite some people genuinely need to use 120gb of ram for their datasets. We should support that usecase.

All 15 comments

I have this issue also when doing lots of allocations, without getting too much into this, you can try the immix GC spinoff

No, don't try the immix gc. Its pre alpha and it will be broken. That's terrible advice.

The problem is in how libgc is configured, it can only handle heaps of 8gb or less. It needs to be recompiled for large heaps using specific configure options. I'll work out a process for this and include it in the omnibus and provide a download hopefully later today.

It seems you can set GC_MAXIMUM_HEAP_SIZE=... to the size you want.

I would recommend first trying to find out why you are allocating so much memory, and optimizing that. If you could post the real code you are working on we could help you optimize it.

@asterite some people genuinely need to use 120gb of ram for their datasets. We should support that usecase.

I mean, the CSV is 100MB, you shouldn't require 120GB to process it.

My simplified code shows the issue more concretely. Abstracting away some of the business-specific logic, my code to do a group by/sum is below.

This should keep only one of the same group in memory at one time, so it builds the sum for the group online. I'm replacing a Python script that does that same thing using a less efficient algorithm that requires much more data stored in memory. Python doesn't choke on using 120GB+ memory, but I'm rewriting so we can run this on smaller EC2 instances.

log.info "ALL #{OUTPUT_FILES.size} files loaded, summarizing CSVs"
OUTPUT_FILES = FILENAMES.map_with_index { |filename, idx| File.join(OUTPUT_FOLDER, "summarize_#{idx}.csv") }
sums_by_group = {} of Set(String) => Array(Float64)

GROUPING_KEY = [
  "product",
  "timestamp",
  "id",
  "other_field",
]
SUM_KEY = [
  "size",
  "foo",
  "bar",
]

OUTPUT_FILES.each do |output_file|
  puts "loading #{output_file}"
  File.open(output_file, "r") do |file|
    CSV.each_row(file) do |row|
      row_array = row.to_a
      group = Set(String).new(row_array[0..GROUPING_KEY.size])
      sum = row_array[GROUPING_KEY.size..-1].map { |value| value.to_f64 }
      if sums_by_group[group]?
        sum.each_with_index do |value, i|
          sums_by_group[group][i] += value
        end
      else
        sums_by_group[group] = sum
      end
    end
  end
end

# Convert to CSV
output_file = File.join(OUTPUT_FOLDER, "summarize_total.csv")

File.open(output_file, "w") do |file|
  CSV.build(file) do |csv|
    csv.row GROUPING_KEY + SUM_KEY

    sums_by_group.each do |k, v|
      csv.row k.to_a + v
    end
  end
end

The dataset is like:

114M summarize_0.csv
113M summarize_10.csv
113M summarize_11.csv
114M summarize_12.csv
114M summarize_13.csv
114M summarize_14.csv
114M summarize_15.csv
113M summarize_16.csv
114M summarize_17.csv
112M summarize_18.csv
114M summarize_19.csv
114M summarize_1.csv
113M summarize_20.csv
114M summarize_21.csv
113M summarize_22.csv
114M summarize_23.csv
114M summarize_24.csv
114M summarize_25.csv
114M summarize_26.csv
114M summarize_27.csv
113M summarize_28.csv
116M summarize_29.csv
114M summarize_2.csv
113M summarize_3.csv
114M summarize_4.csv
113M summarize_5.csv
113M summarize_6.csv
114M summarize_7.csv
114M summarize_8.csv
113M summarize_9.csv

=> which should output one summarized CSV of all of them. So really this algorithm shouldn't need 120GB, but the internal representation could be greater than 8GB.

Yeah, there doesn't seem to be a way around that. I guess the number of different groups is probably huge so you basically have to have a lot of data in memory.

Tweaking the GC is probably good here. If there's an option to disable that warning/crash, 馃憤 from me.

I have the same problem of allocating more than 8 GB.

Is Crystal going to support bigger heap allocation by default?

I could run the program when setting GC_INITIAL_HEAP_SIZE=

For example
~
export GC_INITIAL_HEAP_SIZE=15G
crystal program.cr
~

GC_INITIAL_HEAP_SIZE probably also adjusts the maximum size accordingly, but what's really needed here is GC_MAXIMUM_HEAP_SIZE. Of course in your case it might make sense to set the initial heap size to a large value.

I'm not sure if crystal should have a larger max or initial size by default. What would that even be? 8GB seems reasonable for most use cases. When you need more, it's just a simple environment setting.

Is there a performance advantage for keeping GC_MAXIMUM_HEAP_SIZE low? I'm wondering why it can't be made arbitrarily large. Also is this environment variable meant to be used at compile time or runtime? And what is the format/units of the variable? I've been trying to test it with export GC_INITIAL_HEAP_SIZE=30G this morning and still running into the Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS error after ~8GB.

I've tried and errorred with:

GC_MAXIMUM_HEAP_SIZE=30G crystal build ./src/trie-parser.cr --release
zstdcat data.zst | ./trie-parser
export GC_MAXIMUM_HEAP_SIZE=30G
zstdcat data.zst | crystal ./src/trie-parser.cr

I may be the exception, a lot of my use cases involves paging big datasets from memory.

@joeyrobert you're not the Exception, this is a known and talked about issue, we also do lots of Big Data stuff and we managed to subvert it using the export max heap option

Not necessarily a performance issue. But most typical use cases wont need an unlimited (or very large) heap size. Your're certainly not an exception but a small minority.

I guess one of the reasons to limit the size by default is, if an application has a memory leak but huge amounts of memory available, the issue will probably not be noticed for a while.

So, IMO it is fine to have a default value limited to a heap size that totally fits 90% of applications and an easy way to change it if you have a higher demand.

The README.environment says both variables have bytes value. So you will have to write the total number of bytes without unit prefix. (corrected as per next comment)

@straight-shoota Actually, README.environment state that you may use an optional suffix:

GC_INITIAL_HEAP_SIZE=

Initial heap size in bytes. May speed up process start-up. Optionally, may be specified with a multiplier ('k', 'M' or 'G') suffix.

GC_MAXIMUM_HEAP_SIZE=

Maximum collected heap size. Allows a multiplier suffix.

@joeyrobert , @straight-shoota
I could make work GC_MAXIMUM_HEAP_SIZE= , that's why i used
GC_INITIAL_HEAP_SIZE=.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

akzhan picture akzhan  路  67Comments

chocolateboy picture chocolateboy  路  87Comments

asterite picture asterite  路  70Comments

malte-v picture malte-v  路  77Comments

asterite picture asterite  路  139Comments