Encoding.ASCII has room for optimization, such as vectorizing the GetBytes / GetChars methods, simplifying if checks, and so on.
(This is a nice-to-have. This isn't blocking anything as I know.)
Would be happy to do more just need to clarify what exactly to do.
An example fast encode string to byte (till first non-ascii, at which point encoder would need to do fallback) is here https://github.com/benaadams/FrameworkBenchmarks/blob/9d940f533c14130ab5b627b61ab142c90abfbac3/frameworks/CSharp/aspnetcore-mono/PlatformBenchmarks/Utilities/BufferExtensionsText.cs#L245-L330
And a fast decode bytes to string is here (however this returns if invalid rather than moving to fallback, which the encoder needs to do)
https://github.com/aspnet/KestrelHttpServer/blob/0aff4a0440c2f393c0b98e9046a8e66e30a56cb0/src/Kestrel.Core/Internal/Infrastructure/StringUtilities.cs#L12-L110
If that's of any help?
Yes absolutely, the logic you linked to could be integrated and rather than returning one would supply the fallback characters required and continue.
The only thing still left to address is the fact that InteralFallback needs a byte[] which causes an allocation in most of the methods.
I will get to this today if I have more time or I will hopefully get some time tomorrow during the Hackathon!
Thank you for your help!
@juliusfriedman I made you a contributor once you accept the invite let me know and we can assign you this issue. Note this subscribes you to the repo, you can disable notificaitons if it's too much.
@juliusfriedman, @ViktorHofer will likely be available to give pointers to how to run perf tests.
@juliusfriedman here's more about the invite: https://github.com/dotnet/corefx/wiki/Hackathon#how-can-i-grab-an-issue
Assigning to myself temporarily.
The only thing still left to address is the fact that InteralFallback needs a byte[] which causes an allocation in most of the methods.
Consider using ArrayPool to avoid allocation.
I would recommend to look at optimizing the fallback path separately. This issue should be just about optimizing the mainline path, without fallback.
Hello guys, I was able to integrate @benaadams code fairly easily. @ViktorHofer if you let me know where the tests are I will be more than happy to run them I also wanted to make sure I didn't transpose any parameters. @jkotas I think even with fallback I was able to accommodate the accelerated scanning by modifying the logic just to recalculate the counts when an invalid characters was found. https://github.com/juliusfriedman/coreclr/commit/1bd30200edb5f0a8eb13e9474ec1a890e4b528aa
From https://github.com/juliusfriedman/coreclr/commit/1bd30200edb5f0a8eb13e9474ec1a890e4b528aa#diff-bc6135e702d012d7f4e28333836edf8cR769
// Alternate implementations in C++ @ https://git.merproject.org/mer-core/qtbase/commit/34821e226a94858480e57bb25ac7655bfd19f1e6 with support for UTF-8 Etc for reference.
This is link to GPL licensed code. It you have looked at this implementation while working on this, it would disqualify you from contributing this change. We take the licensing issues related to the origin of the code very seriously.
Yes I looked at it but only to find something to base comparisons on if needed. Sorry for the inconvenience.
It looks like a copy of my code (and licenced to the .NET Foundation)? I haven't looked at any GPL...
Can you clarify @juliusfriedman ?
@jkotas is it just the comment with link invalid; or the contribution a problem - as the source of the code is not related to the comment?
I was just being honest as I did look at the code briefly, I don't believe it's related to your code at all. That being said I think both of the added functions I copied from @benadams should also be passing the pointers by reference so that the positions are correctly changed if there is an invalid character found.
GPL licenced code is problematic as it forces anything using it or a derivative of it to also adopt the GPL licence; rather than use their own preferred licence
I am aware, sorry for not checking that before including it for reference.
I updated the comments to reference a difference source as well as one from corefxlabs, I also updated the functions to pass by reference. https://github.com/juliusfriedman/coreclr/commit/1fc647dfa66a49d67e4b3534f8dc2f76fc46d1e1
If I am disqualified it's okay because @benaadams did most of the work for this anyway.
Thanks again!
ALSO, just rembered MimeKit has some routines that could also be adopted (MIT License) https://jeffreystedfast.blogspot.com/2013/09/optimization-tips-tricks-used-by.html?m=1
@juliusfriedman OK I removed your assignment. Apologies for that, just want to be super cautious about license issues. Hopefully someone else can pick this up, and you'd be welcome to grab anything else!
Assigning to self since this work is already mostly done in the UTF-8 branch, just waiting for a PR and merge into master.
Most helpful comment
GPL licenced code is problematic as it forces anything using it or a derivative of it to also adopt the GPL licence; rather than use their own preferred licence