Rubocop: AbcSize increase/investigation

Created on 8 Aug 2020 · 4Comments · Source: rubocop-hq/rubocop

After updating to 0.88 I noticed a lot of increases in CyclomaticComplexity in our codebase (= lots of new offenses); then, after 0.89, AbcSize and PerceivedComplexity grew drastically, too.
(I believe that it is not a good thing either way: after updating of large-ish codebase to Rubocop you suddenly find dozens of new offenses, and urged either to raise thresholds which felt previously "reasonable", or refactor a lot of code.)

In order to understand what exactly the numbers mean, and how to tackle complexity the best way, I wrote a draft of "expalnatory analyzer" script (ugly!): https://gist.github.com/zverok/5e858fac96a2e9f5282bbeab8fec2d39 -- which is just copy-pasted AbcSizeCalculator which adds meta-info about which node caused increase in one of A,B,C metrics. Now, the result for one method:

  def setup_timecodes(interval)
    words = paragraphs.flat_map(&:words)

    # Clear previous, so exporter could be run with different parameters several time.
    words.each { |w| w.timecodes = [] }
    return unless interval

    current = job.timecode || 0
    words.first.timecodes << current
    words.each do |w|
      next if w.punct?
      # if the pause is large, there could be several interval marks before the current word
      while current + interval <= w.time.to_i
        current += interval
        w.timecodes << current
      end
    end
  end

"Intuitively", it is "not good, not terrible" (YMMV), and Rubocop on our previous setting (AbcSize Max 17) was happy with it. After 0.89, Rubocop says 19.6. Here is the script's output with the investigation (note that line is repeated if several nodes of this line caused metric increase):

Outcome: sqrt(8 assignments² + 16 branches² + 8 conditions²) = 19.6

  def setup_timecodes(interval)
#                     ^^^^^^^^  +assignment
    words = paragraphs.flat_map(&:words)
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +assignment
    words = paragraphs.flat_map(&:words)
#           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +branch
    words = paragraphs.flat_map(&:words)
#           ^^^^^^^^^^  +branch
    words = paragraphs.flat_map(&:words)
#                               ^^^^^^^  +condition
    words.each { |w| w.timecodes = [] }
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +condition
    words.each { |w| w.timecodes = [] }
#   ^^^^^^^^^^  +branch, +branch
    words.each { |w| w.timecodes = [] }
#                 ^  +assignment, +assignment
    words.each { |w| w.timecodes = [] }
#                    ^^^^^^^^^^^^^^^^  +assignment, +branch
    return unless interval
#   ^^^^^^^^^^^^^^^^^^^^^^  +condition
    current = job.timecode || 0
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^  +assignment
    current = job.timecode || 0
#             ^^^^^^^^^^^^^^^^^  +condition
    current = job.timecode || 0
#             ^^^^^^^^^^^^  +branch
    current = job.timecode || 0
#             ^^^  +branch
    words.first.timecodes << current
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +branch
    words.first.timecodes << current
#   ^^^^^^^^^^^^^^^^^^^^^  +branch
    words.first.timecodes << current
#   ^^^^^^^^^^^  +branch
    words.each do |w|
#   ^^^^^^^^^^^^^^^^^  +condition
      next if w.punct?
#     ^^^^^^^^^^^^^^^^  +condition
      next if w.punct?
#             ^^^^^^^^  +branch
      while current + interval <= w.time.to_i
#     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +condition
      while current + interval <= w.time.to_i
#           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +condition (comparison)
      while current + interval <= w.time.to_i
#           ^^^^^^^^^^^^^^^^^^  +branch
      while current + interval <= w.time.to_i
#                                 ^^^^^^^^^^^  +branch
      while current + interval <= w.time.to_i
#                                 ^^^^^^  +branch
        current += interval
#       ^^^^^^^^^^^^^^^^^^^  +assignment
        current += interval
#       ^^^^^^^  +assignment
        w.timecodes << current
#       ^^^^^^^^^^^^^^^^^^^^^^  +branch
        w.timecodes << current
#       ^^^^^^^^^^^  +branch

I am not 100% sure script output always points to "right" nodes, but I really hope so... Here, I can see several suspicious things (repeating the same output with additional comments):

Outcome: sqrt(8 assignments² + 16 branches² + 8 conditions²) = 19.6

  def setup_timecodes(interval)
#                     ^^^^^^^^  +assignment
    words = paragraphs.flat_map(&:words)
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +assignment
    words = paragraphs.flat_map(&:words)
#           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +branch
    words = paragraphs.flat_map(&:words)
#           ^^^^^^^^^^  +branch
    words = paragraphs.flat_map(&:words)
#                               ^^^^^^^  +condition -- is it?
    words.each { |w| w.timecodes = [] }
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +condition -- is it?
    words.each { |w| w.timecodes = [] }
#   ^^^^^^^^^^  +branch, +branch
    words.each { |w| w.timecodes = [] }
#                 ^  +assignment, +assignment -- two assignments?..
    words.each { |w| w.timecodes = [] }
#                    ^^^^^^^^^^^^^^^^  +assignment, +branch
    return unless interval
#   ^^^^^^^^^^^^^^^^^^^^^^  +condition
    current = job.timecode || 0
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^  +assignment
    current = job.timecode || 0
#             ^^^^^^^^^^^^^^^^^  +condition
    current = job.timecode || 0
#             ^^^^^^^^^^^^  +branch
    current = job.timecode || 0
#             ^^^  +branch
    words.first.timecodes << current
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +branch
    words.first.timecodes << current
#   ^^^^^^^^^^^^^^^^^^^^^  +branch
    words.first.timecodes << current
#   ^^^^^^^^^^^  +branch
    words.each do |w|
#   ^^^^^^^^^^^^^^^^^  +condition -- is it?
      next if w.punct?
#     ^^^^^^^^^^^^^^^^  +condition
      next if w.punct?
#             ^^^^^^^^  +branch
      while current + interval <= w.time.to_i
#     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +condition
      while current + interval <= w.time.to_i
#           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +condition (comparison)
      while current + interval <= w.time.to_i
#           ^^^^^^^^^^^^^^^^^^  +branch
      while current + interval <= w.time.to_i
#                                 ^^^^^^^^^^^  +branch
      while current + interval <= w.time.to_i
#                                 ^^^^^^  +branch
        current += interval
#       ^^^^^^^^^^^^^^^^^^^  +assignment
        current += interval
#       ^^^^^^^  +assignment -- umm, two assignments in one expression?..
        w.timecodes << current
#       ^^^^^^^^^^^^^^^^^^^^^^  +branch
        w.timecodes << current
#       ^^^^^^^^^^^  +branch

I am not 100% sure ANY of the suspicious things shown above are some bugs, but they are... well, _suspicious_ :) And the increase throughout the codebase bothers me, making me _even more suspicious_.

(Anyway, I believe that my "explainer" is a useful excercise that might one day grow into a gem. But I'll be happy if somebody could explain the weird increases)

Source

zverok

Most helpful comment

I am still a bit surprised that each is counted as "condition",

It's definitely debatable (and it was debated too 😆). There is no perfect measure, and Ruby in particular is troublesome as simple attribute readers need to be counted as "branches", while if Ruby could be compiled that could be inlined and not count. The threads give the reasoning for each, but for AbcSize the idea is that each is basically a for and should count as much. The CyclomaticComplexity, it's a disjoint component so increases by +1.

The main issue Goodhart's law.

I know these revised metrics are more accurate and hopefully will fit your coding style even better than the previous one, albeit with increased defaults. It's just Metrics inflation 😅

marcandre on 8 Aug 2020

👍2 😄1

All 4 comments

Maybe somebody will find this more readable: update script, which groups all line increments: https://gist.github.com/zverok/2f2a55c99f6093928103d58c1c149146 -- and its result:

Outcome: sqrt(8 assignments² + 16 branches² + 8 conditions²) = 19.6

  def setup_timecodes(interval)
#                     ^^^^^^^^            +assignment
    words = paragraphs.flat_map(&:words)
#                               ^^^^^^^   +condition
#           ^^^^^^^^^^                    +branch
#           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +branch
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  +assignment
    words.each { |w| w.timecodes = [] }
#                    ^^^^^^^^^^^^^^^^     +branch
#                    ^^^^^^^^^^^^^^^^     +assignment
#                 ^                       +assignment
#   ^^^^^^^^^^                            +branch
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   +condition
    return unless interval
#   ^^^^^^^^^^^^^^^^^^^^^^                +condition
    current = job.timecode || 0
#             ^^^                         +branch
#             ^^^^^^^^^^^^                +branch
#             ^^^^^^^^^^^^^^^^^           +condition
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^           +assignment
    words.first.timecodes << current
#   ^^^^^^^^^^^                           +branch
#   ^^^^^^^^^^^^^^^^^^^^^                 +branch
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^      +branch
    words.each do |w|
#                  ^                      +assignment
#   ^^^^^^^^^^                            +branch
#   ^^^^^^^^^^^^^^^^^                     +condition
      next if w.punct?
#             ^^^^^^^^                    +branch
#     ^^^^^^^^^^^^^^^^                    +condition
      while current + interval <= w.time.to_i
#                                 ^^^^^^  +branch
#                                 ^^^^^^^^^^^ +branch
#           ^^^^^^^^^^^^^^^^^^            +branch
#           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +condition (comparison)
#     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +condition
        current += interval
#       ^^^^^^^                           +assignment
#       ^^^^^^^^^^^^^^^^^^^               +assignment
        w.timecodes << current
#       ^^^^^^^^^^^                       +branch
#       ^^^^^^^^^^^^^^^^^^^^^^            +branch

zverok on 8 Aug 2020

I'm sorry for the trouble. These Metrics cops had many issues. Hopefully they were all fixes at once, so as to create transition issues only once...

Relevant changelog entries:

0.89.0

#8037: (Breaking) Cop Metrics/AbcSize now counts ||=, &&=, multiple assignments, for, yield, iterating blocks. &. now count as conditions too (unless repeated on the same variable). Default bumped from 15 to 17. Consider using rubocop -a --disable-uncorrectable to ease transition. ([@marcandre][])
#8276: Cop Metrics/CyclomaticComplexity not longer counts &. when repeated on the same variable. ([@marcandre][])
#8204: (Breaking) Cop Metrics/PerceivedComplexity now counts else in case statements, &., ||=, &&= and blocks known to iterate. Default bumped from 7 to 8. Consider using rubocop -a --disable-uncorrectable to ease transition. ([@marcandre][])

0.86.0

#8149: (Breaking) Cop Metrics/CyclomaticComplexity now counts &., ||=, &&= and blocks known to iterate. Default bumped from 6 to 7. Consider using rubocop -a --disable-uncorrectable to ease transition. ([@marcandre][])

The PRs have additional discussions. Let me know if you have particular issues with some of the calculations; #8276 is an example of further tweak I made after some good feedback.

marcandre on 8 Aug 2020

@marcandre Oh, I didn't do my homework, it seems :) (I typically follow Rubocop development closely, but lately was on vacation and out of internetz... Should've checked changelogs more thoroughly.) Anyways, it was not an angry "you broke my stuff" ticket, rather curious "I wonder how it works" ticket.

Your links make sense to me, I'll just raise thresholds a bit for now in our codebase (and probably will work more on my "explainer" script, I kinda love the idea).

I am still a bit surprised that each is counted as "condition", but probably it makes sense too (and doesn't make much difference, whether it would be counted as "branch", say, it would still have the same effect)

zverok on 8 Aug 2020

👍1

I am still a bit surprised that each is counted as "condition",

The main issue Goodhart's law.

I know these revised metrics are more accurate and hopefully will fit your coding style even better than the previous one, albeit with increased defaults. It's just Metrics inflation 😅

marcandre on 8 Aug 2020

👍2 😄1

Was this page helpful?

0 / 5 - 0 ratings