A few of our nightly string performance tests regressed after #12899:
https://chapel-lang.org/perf/chapcs/?startdate=2019/03/31&enddate=2019/05/30&graphs=arrayofstringelementaccess,stringtemporarycopies,searchwithinastring
This issue exists to track the progress of identifying the causes and improving performance.
@chapel-lang/perf-team , @dmk42 : Here's an issue to track the recent string performance regressions in nightly testing.
@dmk42: The question that came up at the perf meeting today was whether these are things that must now inherently be slower due to UTF8-by-default, or whether they were cases that simply had a slow initial implementation but could be improved, like iteration was for the first day or two.
Thanks. I will continue to be on the lookout for such things, and I will follow this issue.
To be clear, we're hoping you'll take the first look at these cases that did regress to help classify them as candidates for improvement or not (e.g., in the next sprint if there are not spare cycles for it in this one).
BTW, I am about to merge the ascii() deprecation. It includes some updates to some of the benchmarks for correctness in using string.byte() instead of ascii(), but there might be some minor performance improvements as a result. Unfortunately, it probably won't be much because the changes are mostly in initialization.
I'll assign this to myself to keep it visible to me. That doesn't mean others can't contribute.
The biggest performance loss is in test/types/string/psahabu/perf/substring.chpl where the loop labeled "substring" was O(n) and is now O(n**2). There doesn't seem to be much we can do about that because the substring operation is now O(n) instead of O(1) due to codepoints being the default.
Revisiting this string performance issue, although I cannot change the O(n) vs. O(n**2) aspect of the substring test, I was able to reduce the constant substantially, and improve substring performance by more than 3x.
I originally updated the String module in a character-set-agnostic manner. For example, to count multibyte characters, I decoded each character. Given that we now know the character set is guaranteed to be UTF-8, there are several situations where the character does not have to be decoded. I wrote a test version of the String module that took advantage of the known UTF-8 encoding, and was able to speed up all the tests slightly, and the substring test by a lot.
I will submit a PR to demonstrate how this is done. That PR could be merged, or it could just serve as an example.
The PR is #13580 . Because of the performance benefit, I'll suggest that it be merged.
A small correction. When the default indexing was by byte, the same loop was also O(n**2) because the characters had to be copied to the new string representing the substring. It's just that the constant was much smaller, and the overhead of copying multiple characters instead of one was often hidden in cache effects.
So, the real issue is that now we actually have to do something with those characters besides copy them, which is expensive.
Performance graph follow-up:
The PR also improved the other tests, to the point where they are almost as fast as with bytes.
https://www.chapel-lang.org/perf/chapcs/?startdate=2019/05/01&enddate=2019/08/05&graphs=arrayofstringelementaccess,stringtemporarycopies,searchwithinastring
Given the performance results above, I think it's OK to close this issue now.
@benharsh - Since you created this issue, is it addressed to your satisfaction?
I'm happy with the improvements, but I am curious if we can optimize ascii only strings -- https://github.com/chapel-lang/chapel/pull/13580#issuecomment-518751887
If we can, it doesn't matter to me if we leave this open, or close it and open a new issue about optimizing ascii-only strings.
@e-kayrakli This predates when you joined, but I think you've brought up optimizing ascii-only strings so I wanted to see if you had any thoughts on this. https://chapel-lang.org/perf/chapcs/?startdate=2019/03/31&enddate=2020/07/14&graphs=arrayofstringelementaccess,searchwithinastring has the remaining regressions where the substring one is the largest regression.
@mppf had an interesting thought that since the number of codepoints is now cached with https://github.com/chapel-lang/chapel/pull/15758 that checking for ascii-only might be as simple as cachedNumCodepoints == numBytes
@mppf had an interesting thought that since the number of codepoints is now cached with #15758 that checking for ascii-only might be as simple as cachedNumCodepoints == numBytes
This is what I have in mind, too, for ascii-only strings, but haven't come around implementing it yet. I'll try to take a look soon.