Crystal: String.each_line includes "\n" character

Created on 25 May 2016  路  10Comments  路  Source: crystal-lang/crystal

Im not sure if it is intended behaviour but i believe string elements in the array which returned by String.lines (which is generated using String.each_line) shouldn't include "\n" characters.

"hello\nworld\nlast".lines # => ["hello\n", "world\n", "last"]
implemented stdlib

Most helpful comment

Ruby just added a chomp optional argument to many methods, for example IO#gets. We could do the same. I always wanted to read lines and automatically have them chomped. Invoking chomp is OK, but will create a new string, so the chomp option will improve performance a bit.

All 10 comments

Ruby does the same. It allows preserving/differentiation between \r\n and \n, if you ever need to.

[70] pry(main)> "foo\nbar\nbaz\n".lines
=> ["foo\n", "bar\n", "baz\n"]
[71] pry(main)> "foo\nbar\nbaz\n".each_line.to_a
=> ["foo\n", "bar\n", "baz\n"]

It also makes more sense for the variant taking a parameter, in case we're ever going to support that.

[74] pry(main)> "foo\nbar\nbaz\n".lines("o")
=> ["fo", "o", "\nbar\nbaz\n"]
[75] pry(main)> "foo\nbar\nbaz\n".each_line("o").to_a
=> ["fo", "o", "\nbar\nbaz\n"]

I thought about it. "String.split" does what i want for sure but "String.lines" is a better name.

This is something that I thought about many times: most of the time you need to use chomp afterwards and it's a bit more inefficient. We could maybe add an option to remove the newline, but just for efficiency (using chomp would have the same effect, only a bit slower).

One case I found where preserving the newline is useful is when parsing an HTTP::Request, the headers end when a "\r\n" line is found. Although... if gets discarded it, the condition could probably be "if the line is empty" (the current code also checks for "\n").

I don't know if there are other cases where preserving the newline is needed. Maybe we could make gets discard it, and using gets(char) would preserve it, so you can do gets('\n') if you need it.

Pondering this thread, I've come to the following conclusion: I'd much rather have semantics where #lines and #each_line without arguments will drop \r\n or \n characters as in:

"hello\nworld\nlast".lines # => ["hello", "world", "last"]
"hello\r\nworld\r\nlast".lines # => ["hello", "world", "last"]

This will make handling dos and unix text files much easier and eliminate a lot of special case handling for \r\n in every application that needs it.

lines and #each_line with an argument would preserve that character as in:

"hello\nworld\nlast".lines('\n') # => ["hello\n", "world\n", "last"]
"hello\r\nworld\r\nlast".lines('\n') # => ["hello\r\n", "world\r\n", "last"]

Ruby just added a chomp optional argument to many methods, for example IO#gets. We could do the same. I always wanted to read lines and automatically have them chomped. Invoking chomp is OK, but will create a new string, so the chomp option will improve performance a bit.

@asterite Is it that common to need an un-chomped string? Maybe chomp: true should be the default.

@RX14 Yes, I wouldn't mind it being true by default

On the other hand, chomp in Ruby also applies to the given character if passed. For example:

io = StringIO.new "hello\nfoo\nbar"
io.gets(chomp: true) # => "hello"
io.gets('o', chomp: true) # => "f"
io.gets("ar", chomp: true) # => "o\nb"

So I'm not sure chomp should be true by default. When passing a character it's usually to say "read up until this char, and keep it". Maybe with \n it's different, but I don't know if it's good to have the chomp flag being true by default for some cases and false for others.

On the other hand, passing a delimiter to gets isn't very common, so it could be true by default when not passing any delimiter.

Hey! This is already fixed in 0.20.5 (and I think a couple of versions before that too):

"hello\nworld\nlast".lines               # => ["hello", "world", "last"]
"hello\nworld\nlast".lines(chomp: false) # => ["hello\n", "world\n", "last"]
Was this page helpful?
0 / 5 - 0 ratings