Crystal: OverflowError shows incorrect line number in backtrace

Created on 13 Oct 2019  路  8Comments  路  Source: crystal-lang/crystal

This code

p! int8 = 1_i8
p! int64 = 164_i64
p! int8 + int64

Produces

$ crystal test.cr
int8 = 1_i8 # => 1
int64 = 164_i64 # => 164
int8 + int64 # => Unhandled exception: Arithmetic overflow (OverflowError)
  from test.cr:0:9 in '__crystal_main'
  from /usr/local/Cellar/crystal/0.31.1/src/crystal/main.cr:97:5 in 'main_user_code'
  from /usr/local/Cellar/crystal/0.31.1/src/crystal/main.cr:86:7 in 'main'
  from /usr/local/Cellar/crystal/0.31.1/src/crystal/main.cr:106:3 in 'main'

It should show

-  from test.cr:0:9 in '__crystal_main'
+  from test.cr:3:9 in '__crystal_main'

Env

$ crystal --version
Crystal 0.31.1 (2019-10-02)

LLVM: 8.0.1
Default target: x86_64-apple-macosx

Most helpful comment

OK after asking on llvm-dev they said

The DWARF spec says line 0 means there is no relevant source location.
Sometimes it means an optimization combined operations from multiple
source locations, and it would be misleading to pick one arbitrarily;
sometimes it means the instruction came from an "artificial" bit of
source, for example a default constructor in C++, that really has no
worthwhile source location.

So I think we may have done it well after all :)

All 8 comments

Unrelated to overflow but DWARF, either when compiling or when decoding the tables.

Duplicate of another issue.

Related to #5567, #6931 and #7471

I think it might be related specifically to overflow because some overflow checks have location code around. I'll check. In the meantime let's keep this open.

gdb seems to have some line numbers anyway:

...
__crystal_raise_overflow () at /usr/share/crystal/src/raise.cr:238
new () at /usr/share/crystal/src/exception.cr:142
142   def initialize(message = "Arithmetic overflow")
(gdb) bt
#0  new () at /usr/share/crystal/src/exception.cr:142
#1  0x0807dfab in __crystal_raise_overflow ()
    at /usr/share/crystal/src/raise.cr:239
#2  0x0807cfc9 in __crystal_main () at /home/rdp/test.cr:3
#3  0x08110306 in main_user_code ()
    at /usr/share/crystal/src/crystal/main.cr:97
#4  0x0811021d in main () at /usr/share/crystal/src/crystal/main.cr:86
#5  0x0808bd66 in main () at /usr/share/crystal/src/crystal/main.cr:106

FWIW :)

Duplicate of other issues where line number is 0.

The DWARF info has a _lot_ of lines at line 0 (verified with readelf --debug-dump) but gdb still finds the correct definition? There is something wrong in Debug::DWARF when searching the file:line:column for a backtrace address in the built matrix.

OK I checked and it sure seems like we're parsing it right, the same way as objdump does.

The thing that is not clear to me is that sometimes there are opcodes right next to each other, like this:

  [0x0000048c]  Special opcode 72: advance Address by 5 to 0x1000015b0 and Line by -3 to 0
  [0x0000048d]  Special opcode 78: advance Address by 5 to 0x1000015b5 and Line by 3 to 3

One like this was what was being read and used to determine a location of "line 0" for the original example.
I also noticed it was "surrounded" by an is_stmt being 0:

$ objdump --dwarf test
...
  [0x00000488]  Set is_stmt to 0
  [0x00000489]  Advance PC by 88 to 0x1000015ab
  [0x0000048b]  Special opcode 5: advance Address by 0 to 0x1000015ab and Line by 0 to 3
  [0x0000048c]  Special opcode 72: advance Address by 5 to 0x1000015b0 and Line by -3 to 0
  [0x0000048d]  Special opcode 78: advance Address by 5 to 0x1000015b5 and Line by 3 to 3
  [0x0000048e]  Set column to 1
  [0x00000490]  Special opcode 33: advance Address by 2 to 0x1000015b7 and Line by 0 to 3
  [0x00000491]  Set is_stmt to 1

which makes me really wonder what the purpose of is_stmt is...

http://dwarfstd.org/doc/DWARF4.pdf section 6.2.4.6
Seems to imply that the purpose of is_stmt is to specify hints to the debugger so that it knows where to set them.

It seems imply that if is_stmt isn't set, it doesn't get added to the matrix[?], but isn't perfectly clear either...other parts of that doc seem to imply that each line should have at least one "is_stmt" on it.

http://dwarfstd.org/doc/dwarf-2.0.0.pdf seems to imply that is_stmt is set "one to represent a statement boundary"

http://stanshebs.github.io/gdb-doxy-test/gdb-xref/dwarf2read_8c_source.html line 16787 seems to show that gdb doesn't add ops to the matrix.

So my hunch is that if is_stmt isn't set that they're not "reliable for a backtrace" for whatever reason (maybe they represent code that's not actually on any line of the file so is_stmt is off).

I tried it with only adopting sequences that are is_stmt, like gdb seems to do:

diff --git a/src/debug/dwarf/line_numbers.cr b/src/debug/dwarf/line_numbers.cr
index b4a11139b..150e4e37c 100644
--- a/src/debug/dwarf/line_numbers.cr
+++ b/src/debug/dwarf/line_numbers.cr
@@ -284,7 +284,9 @@ module Debug
             operation_advance = adjusted_opcode // sequence.line_range
             increment_address_and_op_index(operation_advance)
             registers.line &+= sequence.line_base + (adjusted_opcode % sequence.line_range)
-            register_to_matrix(sequence, registers)
+            if (registers.is_stmt)
+              register_to_matrix(sequence, registers)
+            end
             registers.reset
           elsif opcode == 0
             # extended opcode
@@ -319,7 +321,9 @@ module Debug

             case standard_opcode
             when LNS::Copy
-              register_to_matrix(sequence, registers)
+              if (registers.is_stmt)
+                register_to_matrix(sequence, registers)
+              end
               registers.reset
             when LNS::AdvancePc
               operation_advance = DWARF.read_unsigned_leb128(@io)

And it started getting the line numbers right. But for whatever reason with macros it wasn't getting the columns right now (always column "1" now).

So I broadened the scope a bit, following seems to work:

diff --git a/src/debug/dwarf/line_numbers.cr b/src/debug/dwarf/line_numbers.cr
index b4a11139b..790748d88 100644
--- a/src/debug/dwarf/line_numbers.cr
+++ b/src/debug/dwarf/line_numbers.cr
@@ -284,7 +284,9 @@ module Debug
             operation_advance = adjusted_opcode // sequence.line_range
             increment_address_and_op_index(operation_advance)
             registers.line &+= sequence.line_base + (adjusted_opcode % sequence.line_range)
-            register_to_matrix(sequence, registers)
+            if (registers.is_stmt || (registers.line.to_i > 0 && registers.column.to_i > 0))
+              register_to_matrix(sequence, registers)
+            end
             registers.reset
           elsif opcode == 0
             # extended opcode
@@ -318,8 +320,10 @@ module Debug
             standard_opcode = LNS.new(opcode)

             case standard_opcode
-            when LNS::Copy
-              register_to_matrix(sequence, registers)
+            when LNS::Copy
+              if (registers.is_stmt || (registers.line.to_i > 0 && registers.column.to_i > 0))
+                register_to_matrix(sequence, registers)
+              end
               registers.reset
             when LNS::AdvancePc
               operation_advance = DWARF.read_unsigned_leb128(@io)

A little odd but seems to work for the few related issues I could find here (OS X and Linux)...

Please open a PR!

OK after asking on llvm-dev they said

The DWARF spec says line 0 means there is no relevant source location.
Sometimes it means an optimization combined operations from multiple
source locations, and it would be misleading to pick one arbitrarily;
sometimes it means the instruction came from an "artificial" bit of
source, for example a default constructor in C++, that really has no
worthwhile source location.

So I think we may have done it well after all :)

Was this page helpful?
0 / 5 - 0 ratings