Elixir: Code.string_to_quoted raises exception instead of returning error

Created on 27 Jan 2018 · 16Comments · Source: elixir-lang/elixir

Environment

Elixir & Erlang versions (elixir --version):

Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]

Elixir 1.6.0 (compiled with OTP 20)

Operating system:
Arch Linux

Current behavior

iex(10)> "\"\\x\"" |> Code.string_to_quoted
** (ArgumentError) missing hex sequence after \x, expected \xHH
    (elixir) src/elixir_interpolation.erl:155: :elixir_interpolation.unescape_hex/3
    (elixir) src/elixir_interpolation.erl:81: :elixir_interpolation."-unescape_tokens/2-lc$^0/1-0-"/2
    (elixir) src/elixir_tokenizer.erl:580: :elixir_tokenizer.handle_strings/6
    (elixir) lib/code.ex:568: Code.string_to_quoted/2

Expected behavior

Other syntax errors give a result of {:error, _}

Elixir Bug Intermediate

Source

schnittchen

Most helpful comment

All of those cases have been tackled! Thank you @schnittchen and @drincruz!

josevalim on 9 Jul 2018

🎉2 ❤1

All 16 comments

Very similarly:

** (ArgumentError) invalid Unicode sequence after \u, expected \uHHHH or \u{H*}
    (elixir) src/elixir_interpolation.erl:182: :elixir_interpolation.unescape_unicode/3
    (elixir) src/elixir_interpolation.erl:81: :elixir_interpolation."-unescape_tokens/2-lc$^0/1-0-"/2
    (elixir) src/elixir_tokenizer.erl:580: :elixir_tokenizer.handle_strings/6
    (elixir) lib/code.ex:568: Code.string_to_quoted/2

schnittchen on 27 Jan 2018

I'm fuzzing code with StreamData and rescued those two cases. It looks like there is nothing more coming.

schnittchen on 27 Jan 2018

Here is another one:

iex(23)> ".\"\#{}\"" |> Code.string_to_quoted 
** (CaseClauseError) no case clause matching: {1, 7, [{{1, 3, 1}, []}], []}
    (elixir) src/elixir_tokenizer.erl:636: :elixir_tokenizer.handle_dot/6
    (elixir) lib/code.ex:568: Code.string_to_quoted/2

schnittchen on 27 Jan 2018

I have seen at least two more now (it's a bit tricky because I have to reduce by hand). When I find one, I filter it out by the top stacktrace function (to protect my own code against failures when fuzzing).

Shall I continue posting examples?

schnittchen on 27 Jan 2018

@schnittchen yes, please post here, we will organize it somehow later. :)

josevalim on 28 Jan 2018

I seriously bumped up the max_runs and the test timeout and only once found a problem.

Here's the stacktrace:

         ** (SystemLimitError) a system limit has been reached
     code: check all snippet <- string(:ascii), max_runs: 80000 do
     stacktrace:
       :erlang.binary_to_atom("@GR{+z]`_XrNla!d<GTZ]iw[s'l2N<5hGD0(.xh&}>0ptDp(amr.oS&<q(FA)5T3=},^{=JnwIOE*DPOslKV KF-kb7NF&Y#Lp3D7l/!s],^hnz1iB |E8~Y'-Rp&*E(O}|zoB#xsE.S/~~'=%H'2HOZu0PCfz6j=eHq5:yk{7&|}zeRONM+KWBCAUKWFw(tv9vkHTu#Ek$&]Q:~>,UbT}v$L|rHHXGV{;W!>avHbD[T-G5xrzR6m?rQPot-37B@", :utf8)
       (elixir) src/elixir_parser.yrl:876: :elixir_parser.build_quoted_atom/3
       (elixir) src/elixir_parser.yrl:271: :elixir_parser.yeccpars2_50/7
       (elixir) /usr/lib/erlang/lib/parsetools-2.1.6/include/yeccpre.hrl:57: :elixir_parser.yeccpars0/5
       (elixir) src/elixir.erl:284: :elixir.tokens_to_quoted/3

However,

iex(57)> ":" <> String.duplicate("foo", 100) |> Code.string_to_quoted
{:error,
 {1, "atom length must be less than system limit: ",
  ":foofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoo"}}

The other case that I thought to have seen must have been either from a copy-paste error or the fact that

string |> inspect |> Code.string_to_quoted! == string

is not always true.

schnittchen on 28 Jan 2018

@josevalim I tried looking into this and it looks like at least one error comes from the fact that we raise directly in elixir_interpolation:unescape_tokens and unescape_chars. In the tokenizer we return errors as tuples but we call these functions, however we also call them in some other places (Kernel and Macro) and I am not sure we should not raise there. Suggestions on the approach?

whatyouhide on 13 Feb 2018

Yeah, we will have to make the Erlang code not raise and move the error

logic up.

José Valimwww.plataformatec.com.br
http://www.plataformatec.com.br/Founder and Director of R&D

josevalim on 13 Feb 2018

Is anyone doing it? I can do it if not.

kelvinst on 13 Jun 2018

@kelvinst please go ahead!

josevalim on 13 Jun 2018

@kelvinst ping :)

josevalim on 21 Jun 2018

Hello @kelvinst @schnittchen @whatyouhide
I am just following this issue.
Not sure if I can fix or help much

This is what I understand

The issue/problem is when I run in iex "\x" |> Code.string_to_quoted

iex(19)> "\x" |> Code.string_to_quoted
** (ArgumentError) missing hex sequence after \x, expected \xHH

This raises proper error "\d" |> Code.string_to_quoted

iex(19)> "\d" |> Code.string_to_quoted
{:error, {1, "unexpected token: ", "\"\d\" (column 1, codepoint U+007F)"}}

So somewhere in the file
In the file
lib/elixir/src/elixir_interpolation.erl
In case statement like this, a change is to be made

 94 unescape_chars(<<$\\, $x, Rest/binary>>, Map, Acc) ->
  95   case Map(hex) of
  96     true  -> unescape_hex(Rest, Map, Acc);
  97     false -> unescape_chars(Rest, Map, <<Acc/binary, $\\, $x>>)
  98   end;

any thoughts or feedback?

sandeshsoni on 25 Jun 2018

The "\x" |> Code.string_to_quoted error is actually not happening inside the string_to_quoted call - "\x" itself will error since it's not a valid string syntax.

michalmuskala on 25 Jun 2018

Sorry guys! I didn’t get time to work on that yet, just started to check the code and how it could be done. If anyone has any idea and wanted to try it out, just go ahead! :)

Best,
Kelvin Stinghen
kelvin.[email protected]

On Jun 13, 2018, at 17:56, José Valim notifications@github.com wrote:

@kelvinst https://github.com/kelvinst please go ahead!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/elixir-lang/elixir/issues/7270#issuecomment-396990462, or mute the thread https://github.com/notifications/unsubscribe-auth/ACqaHUuDBPcJVRNfRvgaEhMZByhcbaEhks5t8TYsgaJpZM4Rvb3D.

kelvinst on 28 Jun 2018

Greetings everyone! I've been taking a look at this issue and mainly been focusing on the Hex issues as of now. I believe I've gotten that part squared away (#7809). If that goes in, I can do more follow-ups on the other issues with unicode and dot (hopefully)! 👍