Rubocop: Remove (disable) Perfomace/Casecmp (at least for Ruby 2.4)

Created on 14 Apr 2017 · 13Comments · Source: rubocop-hq/rubocop

Expected behavior

No warnings for:

'АБВГ'.downcase == 'абвг'

Actual behavior

Warning: Use casecmp instead of downcase ==.

Explanation

The String#casecmp method does not work with Unicode (even in Ruby 2.4.1), this is written in the documentation.

There is a method String#casecmp?, which works with Unicode, but it is slower:

Warming up --------------------------------------
String#downcase + ==   233.440k i/100ms
      String#casecmp   274.247k i/100ms
     String#casecmp?   219.906k i/100ms
Calculating -------------------------------------
String#downcase + ==      5.746M (± 1.7%) i/s -     28.947M in   5.039252s
      String#casecmp      6.942M (± 1.8%) i/s -     34.829M in   5.019073s
     String#casecmp?      4.517M (± 2.6%) i/s -     22.650M in   5.017864s

Comparison:
      String#casecmp:  6941676.9 i/s
String#downcase + ==:  5745893.8 i/s - 1.21x  slower
     String#casecmp?:  4517314.3 i/s - 1.54x  slower

RuboCop version

$ rubocop -V
0.48.1 (using Parser 2.4.0.0, running on ruby 2.4.1 x86_64-linux)

Source

AlexWayfer

Most helpful comment

Style checks are not about speed, but about writing elegant code. From time to time I think the whole performance cops initiative was misguided, as the performance of certain things changes drastically between MRI revisions. Not to mention it might be completely different on JRuby and Rbx. One we finalize some extensions API, probably they would be moved out of RuboCop's core. Together with the Rails cops.

bbatsov on 15 Apr 2017

👍2 ❤1

All 13 comments

Here is the issue for fast-ruby, referenced by cop.

AlexWayfer on 14 Apr 2017

If you are using Ruby 2.4 then the new casecmp? predicate method works: 'АБВГ'.casecmp?('абвг') # => true

backus on 14 Apr 2017

@backus Yes, but it's only in Ruby 2.4, and it's slower than String#downcase + ==.

So, String#downcase + == is good compromise between Unicode support (but only in Ruby 2.4, again) and perfomance.

In Ruby < 2.4… We need to use unicode gem in any way :( And casecmp + zero? will be faster, yes.

AlexWayfer on 14 Apr 2017

I'd add a style check suggesting this use of casecmp? over downcase and == on 2.4. As for this particular issue - I see your point, but I think some people are finding it useful. Not sure what's the best course of action here.

bbatsov on 15 Apr 2017

I'd add a style check suggesting this use of casecmp? over downcase and == on 2.4.

casecmp? is slower than downcase + ==!

Not sure what's the best course of action here.

Ruby >= 2.4: casecmp doesn't work with Unicode, casecmp? is slower than downcase + ==, so downcase + == is good compromise.

Ruby < 2.4: No casecamp? method, and we're needing to use unicode gem (or any other external solution), which make code like this:

# frozen_string_literal: true

require 'benchmark/ips'

require 'unicode'

SLUG = 'АБВГ'

def downcase
  Unicode.downcase(SLUG) == 'абвг' # => true
end

def casecmp
  Unicode.downcase(SLUG).casecmp('абвг').zero? # => true
end

Benchmark.ips do |x|
  x.report('String#downcase + ==') { downcase }
  x.report('String#casecmp')       { casecmp }
  x.compare!
end

And that's results:

Warming up --------------------------------------
String#downcase + ==   121.382k i/100ms
      String#casecmp   110.585k i/100ms
Calculating -------------------------------------
String#downcase + ==      1.569M (± 1.6%) i/s -      7.890M in   5.031095s
      String#casecmp      1.396M (± 1.7%) i/s -      7.077M in   5.070289s

Comparison:
String#downcase + ==:  1568611.7 i/s
      String#casecmp:  1396273.2 i/s - 1.12x  slower

AlexWayfer on 15 Apr 2017

bbatsov on 15 Apr 2017

👍2 ❤1

Ruby >= 2.4: casecmp doesn't work with Unicode, casecmp? is slower than downcase + ==, so downcase + == is good compromise.

Many people don't really deal with non-ASCII strings, so this is not as big of an issue as it might seem. If everything you deal with is in English (quite common situation) than you're golden. If not - you should probably just disable this cop. So far you're the first person to complain about this, which leads to believe it's not a big problem in practice.

bbatsov on 15 Apr 2017

Style checks are not about speed, but about writing elegant code.

Yes, but I'm not talking about "Style checks", but about Performance/ cop :) And this cop doesn't reference to Ruby Style Guide, but to Fast Ruby.

Maybe this cop must be modified and moved to another scope (Style/?), and some description must be added to Style Guide.

From time to time I think the whole performance cops initiative was misguided, as the performance of certain things changes drastically between MRI revisions. Not to mention it might be completely different on JRuby and Rbx.

Good thoughts.

Many people don't really deal with non-ASCII strings, so this is not as big of an issue as it might seem. If everything you deal with is in English (quite common situation) than you're golden. If not - you should probably just disable this cop.

Okay, you're probably right.

AlexWayfer on 15 Apr 2017

From time to time I think the whole performance cops initiative was misguided, as the performance of certain things changes drastically between MRI revisions.

Yes. At least performance cops can not be "add and forget." The problem with language level optimisations in dynamic languages is that there are so many factors.

One we finalize some extensions API, probably they would be moved out of RuboCop's core.

Agreed. It should probably be handled by someone who has the interest to set up some performance test suite and run it for each new Ruby version.

Drenmi on 5 May 2017

$ rubocop -V
0.48.1 (using Parser 2.4.0.0, running on ruby 2.4.1 x86_64-linux)
@backus Yes, but it's only in Ruby 2.4, and it's slower than String#downcase + ==.

So, String#downcase + == is good compromise between Unicode support (but only in Ruby 2.4, again) and perfomance.

In Ruby < 2.4… We need to use unicode gem in any way :( And casecmp + zero? will be faster, yes.
I'd add a style check suggesting this use of casecmp? over downcase and == on 2.4.
casecmp? is slower than downcase + ==!

Not sure what's the best course of action here.
Ruby >= 2.4: casecmp doesn't work with Unicode, casecmp? is slower than downcase + ==, so downcase + == is good compromise.

Ruby < 2.4: No casecamp? method, and we're needing to use unicode gem (or any other external solution), which make code like this:

frozen_string_literal: true

require 'benchmark/ips'

require 'unicode'

SLUG = 'АБВГ'

def downcase
Unicode.downcase(SLUG) == 'абвг' # => true
end

def casecmp
Unicode.downcase(SLUG).casecmp('абвг').zero? # => true
end
Ruby >= 2.4: casecmp doesn't work with Unicode, casecmp? is slower than downcase + ==, so downcase + == is good compromise..ips do |x|
x.report('String#downcase + ==') { downcase }
x.report('String#casecmp') { casecmp }
x.compare!
end

2karuitach on 12 Jun 2017

Please…

AlexWayfer on 12 Jan 2018

There hasn't been much activity on this ticket and our Core Team is spread too thin on so many tasks, so I'll just close it for the sake of having a cleaner lists of tasks to focus on. We'd gladly review a PR, but it's unlikely that'd we're going to tackle this ourselves in the foreseeable future.

In general Performance cops are inherently linked to some MRI version, which is part of the reason we'll remove all of them from the core of RuboCop and move them to https://github.com/rubocop-hq/rubocop-performance.

bbatsov on 18 Sep 2018

There hasn't been much activity on this ticket and our Core Team is spread too thin on so many tasks, so I'll just close it for the sake of having a cleaner lists of tasks to focus on.

Issue still not resolved, and I think it should be opened. For cleaner lists of tasks you can use labels or GitHub Projects.

Closed undone issues will be forgotten or duplicated.