Runtime: Optimize Encoding.ASCII

Created on 31 May 2018 · 19Comments · Source: dotnet/runtime

Encoding.ASCII has room for optimization, such as vectorizing the GetBytes / GetChars methods, simplifying if checks, and so on.

(This is a nice-to-have. This isn't blocking anything as I know.)

area-System.Runtime optimization

Source

GrabYourPitchforks

👍1

Most helpful comment

GPL licenced code is problematic as it forces anything using it or a derivative of it to also adopt the GPL licence; rather than use their own preferred licence

benaadams on 3 Jun 2018

👍2

All 19 comments

Would be happy to do more just need to clarify what exactly to do.

juliusfriedman on 2 Jun 2018

An example fast encode string to byte (till first non-ascii, at which point encoder would need to do fallback) is here https://github.com/benaadams/FrameworkBenchmarks/blob/9d940f533c14130ab5b627b61ab142c90abfbac3/frameworks/CSharp/aspnetcore-mono/PlatformBenchmarks/Utilities/BufferExtensionsText.cs#L245-L330

And a fast decode bytes to string is here (however this returns if invalid rather than moving to fallback, which the encoder needs to do)
https://github.com/aspnet/KestrelHttpServer/blob/0aff4a0440c2f393c0b98e9046a8e66e30a56cb0/src/Kestrel.Core/Internal/Infrastructure/StringUtilities.cs#L12-L110

If that's of any help?

benaadams on 2 Jun 2018

🎉1

Yes absolutely, the logic you linked to could be integrated and rather than returning one would supply the fallback characters required and continue.

The only thing still left to address is the fact that InteralFallback needs a byte[] which causes an allocation in most of the methods.

I will get to this today if I have more time or I will hopefully get some time tomorrow during the Hackathon!

Thank you for your help!

juliusfriedman on 2 Jun 2018

@juliusfriedman I made you a contributor once you accept the invite let me know and we can assign you this issue. Note this subscribes you to the repo, you can disable notificaitons if it's too much.

danmosemsft on 2 Jun 2018

👍2

@juliusfriedman, @ViktorHofer will likely be available to give pointers to how to run perf tests.

danmosemsft on 2 Jun 2018

👍1

@juliusfriedman here's more about the invite: https://github.com/dotnet/corefx/wiki/Hackathon#how-can-i-grab-an-issue
Assigning to myself temporarily.

karelz on 2 Jun 2018

👍1

The only thing still left to address is the fact that InteralFallback needs a byte[] which causes an allocation in most of the methods.

Consider using ArrayPool to avoid allocation.

ViktorHofer on 2 Jun 2018

👍1

I would recommend to look at optimizing the fallback path separately. This issue should be just about optimizing the mainline path, without fallback.

jkotas on 2 Jun 2018

👍1

Hello guys, I was able to integrate @benaadams code fairly easily. @ViktorHofer if you let me know where the tests are I will be more than happy to run them I also wanted to make sure I didn't transpose any parameters. @jkotas I think even with fallback I was able to accommodate the accelerated scanning by modifying the logic just to recalculate the counts when an invalid characters was found. https://github.com/juliusfriedman/coreclr/commit/1bd30200edb5f0a8eb13e9474ec1a890e4b528aa

juliusfriedman on 2 Jun 2018

From https://github.com/juliusfriedman/coreclr/commit/1bd30200edb5f0a8eb13e9474ec1a890e4b528aa#diff-bc6135e702d012d7f4e28333836edf8cR769

// Alternate implementations in C++ @ https://git.merproject.org/mer-core/qtbase/commit/34821e226a94858480e57bb25ac7655bfd19f1e6 with support for UTF-8 Etc for reference.

This is link to GPL licensed code. It you have looked at this implementation while working on this, it would disqualify you from contributing this change. We take the licensing issues related to the origin of the code very seriously.

jkotas on 2 Jun 2018

Yes I looked at it but only to find something to base comparisons on if needed. Sorry for the inconvenience.

juliusfriedman on 2 Jun 2018

It looks like a copy of my code (and licenced to the .NET Foundation)? I haven't looked at any GPL...

Can you clarify @juliusfriedman ?

benaadams on 2 Jun 2018

@jkotas is it just the comment with link invalid; or the contribution a problem - as the source of the code is not related to the comment?

benaadams on 2 Jun 2018

I was just being honest as I did look at the code briefly, I don't believe it's related to your code at all. That being said I think both of the added functions I copied from @benadams should also be passing the pointers by reference so that the positions are correctly changed if there is an invalid character found.

juliusfriedman on 2 Jun 2018

GPL licenced code is problematic as it forces anything using it or a derivative of it to also adopt the GPL licence; rather than use their own preferred licence

benaadams on 3 Jun 2018

👍2

I am aware, sorry for not checking that before including it for reference.

I updated the comments to reference a difference source as well as one from corefxlabs, I also updated the functions to pass by reference. https://github.com/juliusfriedman/coreclr/commit/1fc647dfa66a49d67e4b3534f8dc2f76fc46d1e1

If I am disqualified it's okay because @benaadams did most of the work for this anyway.

Thanks again!

ALSO, just rembered MimeKit has some routines that could also be adopted (MIT License) https://jeffreystedfast.blogspot.com/2013/09/optimization-tips-tricks-used-by.html?m=1

juliusfriedman on 3 Jun 2018

@juliusfriedman OK I removed your assignment. Apologies for that, just want to be super cautious about license issues. Hopefully someone else can pick this up, and you'd be welcome to grab anything else!

danmosemsft on 4 Jun 2018

Assigning to self since this work is already mostly done in the UTF-8 branch, just waiting for a PR and merge into master.

GrabYourPitchforks on 2 Feb 2019

Addressed by https://github.com/dotnet/coreclr/pull/23098, https://github.com/dotnet/coreclr/pull/22516.

GrabYourPitchforks on 12 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings