For example, running linguist on this file throws an invalid byte sequence error:
$ wget https://raw.github.com/leoniedu/CongressoAberto/a4785785cb37e8095893dc411f0a030a57fd30f8/CongressoAbertoWP/wp-includes/js/swfupload/swfupload.js
$ linguist swfupload.js
/Users/orii/.rvm/gems/ruby-1.9.3-p286/gems/github-linguist-2.4.0/lib/linguist/blob_helper.rb:209:in `split': invalid byte sequence in UTF-8 (ArgumentError)
from /Users/orii/.rvm/gems/ruby-1.9.3-p286/gems/github-linguist-2.4.0/lib/linguist/blob_helper.rb:209:in `lines'
from /Users/orii/.rvm/gems/ruby-1.9.3-p286/gems/github-linguist-2.4.0/lib/linguist/blob_helper.rb:240:in `loc'
from /Users/orii/.rvm/gems/ruby-1.9.3-p286/gems/github-linguist-2.4.0/bin/linguist:24:in `'
from /Users/orii/.rvm/gems/ruby-1.9.3-p286/bin/linguist:23:in `load'
from /Users/orii/.rvm/gems/ruby-1.9.3-p286/bin/linguist:23:in `'
I can confirm this error.
Is this still a bug?
Same thing for me...
Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/lib/linguist/generated.rb:41:in `split': invalid byte sequence in UTF-8 (ArgumentError)
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/lib/linguist/generated.rb:41:in `lines'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/lib/linguist/generated.rb:100:in `compiled_coffeescript?'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/lib/linguist/generated.rb:56:in `generated?'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/lib/linguist/generated.rb:12:in `generated?'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/lib/linguist/blob_helper.rb:277:in `generated?'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/lib/linguist/repository.rb:74:in `block in compute_stats'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/lib/linguist/repository.rb:69:in `each'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/lib/linguist/repository.rb:69:in `compute_stats'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/lib/linguist/repository.rb:43:in `languages'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/gems/github-linguist-2.9.5/bin/linguist:14:in `<top (required)>'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/bin/linguist:23:in `load'
from /Users/axw2/.rvm/gems/ruby-2.0.0-p247/bin/linguist:23:in `<main>'
I can confirm this also...
I received this error as well after manually running linguist on my repo.
I still can reproduce this with ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux] on a debian wheezy.
This seems to be the same like https://github.com/github/linguist/issues/241
I had the same error several times (last time on pdt-git/public).
I tried using force_encoding() and encode() on line 58 without effect.
Confirmed this is still an issue on 1.9.3-p484
I'm going to close this. Ruby 2.0 is nearly two years old now and I just don't see us investigating this any time soon sorry.
If anyone else wants to take a stab at this then please be my guest :smile:
@arfon I get this error on the original file of this post with Ruby 2.2:
$ ruby --version
ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
Well then.
I have also encountered this many times using ruby 2.2. Here is a recent stack trace:
$ linguist swfupload.js
/var/lib/gems/2.2.0/gems/github-linguist-4.2.7/lib/linguist/blob_helper.rb:266:in `split': invalid byte sequence in UTF-8 (ArgumentError)
from /var/lib/gems/2.2.0/gems/github-linguist-4.2.7/lib/linguist/blob_helper.rb:266:in `lines'
from /var/lib/gems/2.2.0/gems/github-linguist-4.2.7/lib/linguist/blob_helper.rb:283:in `loc'
from /var/lib/gems/2.2.0/gems/github-linguist-4.2.7/bin/linguist:51:in `<top (required)>'
from /usr/local/bin/linguist:23:in `load'
from /usr/local/bin/linguist:23:in `<main>'
I still can confirm this error.
When I run the following command, my program crashed.
github-linguist Inderxer.asp.txt
BTW, The encoding for Inderxer.asp.txt is GB2312
This has been resolved by https://github.com/github/linguist/pull/4730 which is now live on GitHub.com. Closing.
Most helpful comment
This has been resolved by https://github.com/github/linguist/pull/4730 which is now live on GitHub.com. Closing.