Chapel: String performance regressions from #12899

Created on 30 May 2019 · 15Comments · Source: chapel-lang/chapel

A few of our nightly string performance tests regressed after #12899:

https://chapel-lang.org/perf/chapcs/?startdate=2019/03/31&enddate=2019/05/30&graphs=arrayofstringelementaccess,stringtemporarycopies,searchwithinastring

This issue exists to track the progress of identifying the causes and improving performance.

Performance

Source

benharsh

All 15 comments

@chapel-lang/perf-team , @dmk42 : Here's an issue to track the recent string performance regressions in nightly testing.

benharsh on 30 May 2019

@dmk42: The question that came up at the perf meeting today was whether these are things that must now inherently be slower due to UTF8-by-default, or whether they were cases that simply had a slow initial implementation but could be improved, like iteration was for the first day or two.

bradcray on 30 May 2019

Thanks. I will continue to be on the lookout for such things, and I will follow this issue.

dmk42 on 30 May 2019

To be clear, we're hoping you'll take the first look at these cases that did regress to help classify them as candidates for improvement or not (e.g., in the next sprint if there are not spare cycles for it in this one).

bradcray on 30 May 2019

BTW, I am about to merge the ascii() deprecation. It includes some updates to some of the benchmarks for correctness in using string.byte() instead of ascii(), but there might be some minor performance improvements as a result. Unfortunately, it probably won't be much because the changes are mostly in initialization.

dmk42 on 30 May 2019

👍1

I'll assign this to myself to keep it visible to me. That doesn't mean others can't contribute.

dmk42 on 30 May 2019

The biggest performance loss is in test/types/string/psahabu/perf/substring.chpl where the loop labeled "substring" was O(n) and is now O(n**2). There doesn't seem to be much we can do about that because the substring operation is now O(n) instead of O(1) due to codepoints being the default.

dmk42 on 31 May 2019

👍1

Revisiting this string performance issue, although I cannot change the O(n) vs. O(n**2) aspect of the substring test, I was able to reduce the constant substantially, and improve substring performance by more than 3x.

I originally updated the String module in a character-set-agnostic manner. For example, to count multibyte characters, I decoded each character. Given that we now know the character set is guaranteed to be UTF-8, there are several situations where the character does not have to be decoded. I wrote a test version of the String module that took advantage of the known UTF-8 encoding, and was able to speed up all the tests slightly, and the substring test by a lot.

I will submit a PR to demonstrate how this is done. That PR could be merged, or it could just serve as an example.

dmk42 on 31 Jul 2019

The PR is #13580 . Because of the performance benefit, I'll suggest that it be merged.

dmk42 on 31 Jul 2019

A small correction. When the default indexing was by byte, the same loop was also O(n**2) because the characters had to be copied to the new string representing the substring. It's just that the constant was much smaller, and the overhead of copying multiple characters instead of one was often hidden in cache effects.

So, the real issue is that now we actually have to do something with those characters besides copy them, which is expensive.

dmk42 on 5 Aug 2019

Performance graph follow-up:
The PR also improved the other tests, to the point where they are almost as fast as with bytes.

https://www.chapel-lang.org/perf/chapcs/?startdate=2019/05/01&enddate=2019/08/05&graphs=arrayofstringelementaccess,stringtemporarycopies,searchwithinastring

dmk42 on 5 Aug 2019

👍1

Given the performance results above, I think it's OK to close this issue now.
@benharsh - Since you created this issue, is it addressed to your satisfaction?

dmk42 on 6 Aug 2019

I'm happy with the improvements, but I am curious if we can optimize ascii only strings -- https://github.com/chapel-lang/chapel/pull/13580#issuecomment-518751887

If we can, it doesn't matter to me if we leave this open, or close it and open a new issue about optimizing ascii-only strings.

ronawho on 6 Aug 2019

@e-kayrakli This predates when you joined, but I think you've brought up optimizing ascii-only strings so I wanted to see if you had any thoughts on this. https://chapel-lang.org/perf/chapcs/?startdate=2019/03/31&enddate=2020/07/14&graphs=arrayofstringelementaccess,searchwithinastring has the remaining regressions where the substring one is the largest regression.

@mppf had an interesting thought that since the number of codepoints is now cached with https://github.com/chapel-lang/chapel/pull/15758 that checking for ascii-only might be as simple as cachedNumCodepoints == numBytes

ronawho on 14 Jul 2020

@mppf had an interesting thought that since the number of codepoints is now cached with #15758 that checking for ascii-only might be as simple as cachedNumCodepoints == numBytes

This is what I have in mind, too, for ascii-only strings, but haven't come around implementing it yet. I'll try to take a look soon.