It is quite common to need to convert bytes to hex strings and back. The .NET Framework does not have an API to do that. Look at what lengths people have gone to to solve this problem for them: How do you convert Byte Array to Hexadecimal String, and vice versa? Many of those solutions are somehow bad (e.g. slow or broken or not validating).
BitConverter.ToString(byte[])
generates strings of a practically useless format: BitConverter.ToString(new byte[] { 0x00, 0x01, 0xAA, 0xFF })
=> 00-01-AA-FF
. I don't understand how this decision was made. This seems to be a special solution for some specific purpose (debug output?). Using this API requires Replace(str, "-", "")
which is ugly and slow. Also, this API is not discoverable because it does not have "hex" in it's name.
I propose adding an API that does the following:
The use case for char[] in- and outputs is low-allocation scenarios.
The API could look like this:
```c#
static class HexConvert {
byte[] ToBytes(string input, int start, int count, HexFormat hexFormat);
void ToBytes(string input, int start, int count, ArraySegment
//Repeat the same for ArraySegment
string ToString(ArraySegment
char[] ToChars(ArraySegment
void ToChars(ArraySegment
//Plus obvious convenience overloads
int HexDigitToInteger(char val); //0-9A-F => 0-15.
char IntegerToHexDigit(int value, HexFormat hexFormat); //0-15 => 0-9A-F
}
[Flags]
enum HexFormat { UseDefaults = 0, Lowercase = 1 } //extensible
```
A possible extension would be to support Stream
and IEnumerable<byte>
as sources and destinations. I would consider this out of scope.
I feel that converting hexadecimal strings is an important and frequent enough thing that this should be supported well. Of course, a "light" version of this proposal is also feasible (just support for string and byte[] with no user-provided output buffer). Not sure if anyone wants/needs lowercase hex, I don't but others might.
This seems cheap enough to implement. Testing should be easily automatable. I don't see any particular usability problems. This is suitable for community contribution.
Replacement is not the ONLY way, you can just skip the char if you need to.
///Could easily allow upper or lower case here...
[System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)]
public static byte HexCharToByte(char c, bool upperCase = false) { c = char.ToUpperInvariant(c); return (byte)(upperCase ? char.ToUpperInvariant((char)(c > '9' ? c - 'A' + 10 : c - '0')) : (c > '9' ? c - 'A' + 10 : c - '0')); }
/// <summary>
/// Converts a String in the form 0011AABB to a Byte[] using the chars in the string as bytes to caulcate the decimal value.
/// </summary>
/// <notes>
/// Reduced string allocations from managed version substring
/// About 10 milliseconds faster then Managed when doing it 100,000 times. otherwise no change
/// </notes>
/// <param name="str"></param>
/// <returns></returns>
public static byte[] HexStringToBytes(string str, int start = 0, int length = -1)
{
if (length.Equals(Common.Binary.Zero)) return null;
if (length <= -1) length = str.Length;
if (start > length - start) throw new System.ArgumentOutOfRangeException("start");
if (length > length - start) throw new System.ArgumentOutOfRangeException("length");
System.Collections.Generic.List<byte> result = new System.Collections.Generic.List<byte>(length >> 1); // / 2
//Dont check the results for overflow
unchecked
{
//Iterate the pointer using the managed length ....
//Todo, optomize with reverse or i - 1
for (int i = start, e = length; i < e; i += 2)
{
//to reduce string manipulations pre call
if(str[i] == '-') ++i;
//Todo, Native and Unsafe
//Convert 2 Chars to a byte
result.Add((byte)(HexCharToByte(str[i]) << 4 | HexCharToByte(str[i + 1])));
}
}
//Dont use a List..
//Return the bytes
return result.ToArray();
}
I agree that the format is useless in MOST cases...
However if you think about it a little more and try to be as outside the box as possible you will clearly see that there is an extra index for every octet, finally then you can achieve some interesting things by realigning the data or using the extra indexes....
@GSPP how about augmenting the BitConverter type with these new methods?
I feel like that would be a better place for these since that is already a type people use for converting between various formats.
There are at least a couple of other issues tracking something like this:
https://github.com/dotnet/corefx/issues/1647
https://github.com/dotnet/corefx/issues/4441
Let's track it with a single issue (this one).
It does look like this is something for which there is interest in implementing.
@GSPP, @GrabYourPitchforks @KrzysztofCwalina is anyone of you interested in writing up a formal API proposal for this?
If the slices API lands before this, I'd go with something like:
```c#
// all instance methods are thread-safe
public abstract class BinaryEncoding
{
protected abstract void ToBytesOverride(ReadOnlySpan
protected abstract int GetByteLengthOverride(int stringLength);
protected abstract void ToCharsOverride(ReadOnlySpan
protected abstract int GetStringLengthOverride(int byteLength);
public byte[] ToBytes(ReadOnlySpan<char> str);
public void ToBytes(ReadOnlySpan<char> sourceChars, Span<byte> destinationBytes);
public int GetByteLength(int stringLength);
public string ToString(ReadOnlySpan<byte> bytes);
public char[] ToChars(ReadOnlySpan<byte> bytes);
public void ToChars(ReadOnlySpan<byte> sourceBytes, Span<char> destinationChars);
public int GetStringLength(int byteLength);
}
public class HexEncoding
{
public static HexEncoding UpperCase { get; }
public static HexEncoding LowerCase { get; }
}
```
perhaps add some additional overloads with byte[]
, char[]
, string
inputs.
We don't take any new API additions if they would be obsolete when Span
lands.
Marking as blocked on Span
dotnet/corefx#13892 - @AlexGhiondea feel free to change it if you think I am wrong.
Span has landed so this can be unblocked.
I propose a slightly different API, instead putting this on the Convert
class near the existing base64 APIs. That would help discoverability, and it gives us an existing template for the recommended API shape.
namespace System
{
// NEW methods on EXISTING type
public static class Convert
{
public static byte[] FromHexString(ReadOnlySpan<char> chars); // doesn't match an existing overload of FromBase64String, but could be useful
public static byte[] FromHexString(string s);
public static string ToHexString(byte[] inArray, HexFormattingOptions options = default);
public static string ToHexString(byte[] inArray, int offset, int length, HexFormattingOptions options = default);
public static string ToHexString(ReadOnlySpan<byte> bytes, HexFormattingOptions options = default);
}
[Flags]
public enum HexFormattingOptions
{
None = 0,
InsertLineBreaks = 1,
Lowercase = 2,
}
}
It also makes sense IMO to have OperationStatus
-based APIs. Those APIs could be done in the corefxlab repo to live alongside the existing Base64Encoder
and Base64Decoder
types in that repo. Then all of the OperationStatus
-based base64 and hex APIs can be reviewed at once if we decide to move them into corefx.
To give some parity with the *CharArray
from base64, I would consider a few other methods:
public static class Convert
{
public bool TryFromHexString(ReadOnlySpan<char> chars, Span<byte> buffer, out int bytesWritten);
public bool TryFromHexString(string s, Span<byte> buffer, out int bytesWritten);
public bool TryToHexString(ReadOnlySpan<byte> bytes, Span<char> buffer, out int charsWritten, HexFormattingOptions options = default);
}
The rationale for this is in my typical usage I expect a hex string to be short (think a SHA digest) which tops out at 64 bytes right now.
I'm not strongly tied to these, but worth considering during review.
Other considerations, does there need to be an options for the FromHexString
that provide similar functionality like specifying if the input is expected to be upper or lower, or would the input be treated case insensitive?
What about continuing from line breaks?
@vcsjones I envision HexFormattingOptions.InsertLineBreaks
as inserting every 72 chars, just like base64 does today. And parsing would allow all of uppercase / lowercase / whitespace, hence no options parameter.
I'd not put it on Convert
. My argument for that is this: Imagine there were 10 different such formats (Hex, base64, base85, ...). Then, we clearly would like each of them to be their own static class. Anything else would make for a very crowded API. Now we don't have 10 formats but from this thought experiment we can see that creating a separate class is the better option. Especially, since more formats might be added in the future. For example, Bitcoin uses a base85 format which is more efficient. Who knows if additional formats might be desirable in the future.
Should this API accept any other input besides spans? It could accept arrays, strings, ArraySegment
and even pointers. In my opinion it should only take spans. That would make for the cleanest and least crowded API. Anything else is easily converted to span. In particular I don't want to see overloads with index and count parameters. Thank god we no longer need those!
We certainly need methods that do not throw in the error case. My original proposal missed those.
InsertLineBreaks
as inserting every 72 chars
Is this a widely known standard format? Is it only for base64 or also for hex?
New API proposal:
namespace System
{
public static class HexConvert
{
//convert to bytes, throwing
public static byte[] GetBytes(ReadOnlySpan<char> chars, HexFormattingOptions formattingOptions = default);
public static void GetBytes(ReadOnlySpan<char> chars, Span<byte> output, out int bytesWritten, HexFormattingOptions formattingOptions = default);
//convert to bytes, try
public static byte[] TryGetBytes(ReadOnlySpan<char> chars, HexFormattingOptions formattingOptions = default); //can return null
public static bool TryGetBytes(ReadOnlySpan<char> chars, Span<byte> output, out int bytesWritten, HexFormattingOptions formattingOptions = default);
//convert to chars
string GetString(ReadOnlySpan<byte> bytes);
char[] GetChars(ReadOnlySpan<byte> bytes);
//plus non-throwing overloads
//plus overloads with output buffer
//plus overloads without HexFormattingOptions (I was lazy so I made them default arguments)
}
[Flags]
public enum HexFormattingOptions
{
None = 0,
Lowercase = 1,
}
}
I chose the name prefix to be "Get" so that "TryGet" is fitting. This succinct naming would not be possible when placing these methods in Convert
.
There are many variants of API. It can be used in web,cryptography and many other places. Some hex conversion is used in json for writing non-ASCII chars - \u0000 - \uFFFF
I think that .NET Core must provide optimized variants for each purpose
Writing API
All above give us more than 100 function declarations. If it will have byte[] inputs - it take much more
For example just writing API for ReadOnlySpan
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array);
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array, char separator);
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array, ReadOnlySpan<char> separator);
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array, int offset, int count);
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array, int offset, int count, char separator);
public static string ToHexStringLowerCase(this ReadOnlySpan<byte> array, int offset, int count, ReadOnlySpan<char> separator);
This is bad idea to place it in existing places, I think, that better to place this API in external netcoreapp2.1+netstandard2.1 package. Something like System.Text.Hex
Also, as @vcsjones mentioned for short strings, we need some different allocation mechanism, to avoid dirty hacks for reducing allocations count: (https://docs.microsoft.com/en-us/dotnet/csharp/how-to/modify-string-contents#unsafe-modifications-to-string)
public static unsafe string ToHexStringLowerCase(this byte[] array)
{
string ret = new string('\0', array.Length * 2);
fixed(char* strPtr=ret)
{
char* ptr = strPtr;
for (int i = 0; i < array.Length; i++)
{
byte bytic = array[i];
*ptr = HexChars.HexDigitsLowerCase[(bytic >> 4) & 0xF];
ptr++;
*ptr = HexChars.HexDigitsLowerCase[bytic & 0xF];
ptr++;
}
}
return ret;
}
For example in this case we can allocate string memory (x2 of byte array length) and write to it, without copy contructor.
I consolidated the different proposals and comments made in this discussion and came up with the following API. I also provide a sample implementation (internally utilizing AVX2, SSE4.1, SSSE3, SSE2 or Scalar/Lookup with the following performance characteristics) of the API to experiment with via a seperate GitHub repository and nuget package.
https://github.com/tkp1n/HexMate
https://www.nuget.org/packages/HexMate
System.Convert
(@GrabYourPitchforks, https://github.com/dotnet/corefx/issues/10013#issuecomment-463419321 and @vcsjones, https://github.com/dotnet/corefx/issues/10013#issuecomment-463452088)OperationStatus
-based APIs as currently in System.Buffers.Text.Base64
(@GrabYourPitchforks, https://github.com/dotnet/corefx/issues/10013#issuecomment-463419321)InsertLineBreaks
add a new-line every 72 characters for hex but every 76 for base64? (@GrabYourPitchforks, https://github.com/dotnet/corefx/issues/10013#issuecomment-463492630)-
-separated input to allow processing strings returned by BitConverter.ToString(byte[])
?GetMaxDecodedFromUtf8Length
and GetMaxEncodedToUtf8Length
make sense for Hex
as the calculation is much simpler as with base64 where those methods are provided?namespace System
{
// NEW type
[Flags]
public enum HexFormattingOptions
{
None = 0,
InsertLineBreaks = 1,
Lowercase = 2
}
// NEW methods on EXISTING type
public static class Convert
{
// Decode from chars
public static byte[] FromHexCharArray(char[] inArray, int offset, int length) => throw null;
public static bool TryFromHexChars(ReadOnlySpan<char> chars, Span<byte> bytes, out int bytesWritten) => throw null;
// Decode from strings
public static byte[] FromHexString(string s) => throw null;
public static byte[] FromHexString(ReadOnlySpan<char> chars) => throw null;
public static bool TryFromHexString(string s, Span<byte> bytes, out int bytesWritten) => throw null;
// Encode to chars
public static int ToHexCharArray(byte[] inArray, int offsetIn, int length, char[] outArray, int offsetOut, HexFormattingOptions options = default) => throw null;
public static bool TryToHexChars(ReadOnlySpan<byte> bytes, Span<char> chars, out int charsWritten, HexFormattingOptions options = default) => throw null;
// Encode to strings
public static string ToHexString(byte[] inArray, HexFormattingOptions options = default) => throw null;
public static string ToHexString(byte[] inArray, int offset, int length, HexFormattingOptions options = default) => throw null;
public static string ToHexString(ReadOnlySpan<byte> bytes, HexFormattingOptions options = default) => throw null;
}
}
namespace System.Buffers.Text
{
// NEW type
public static class Hex
{
// Decode
public static OperationStatus DecodeFromUtf8(ReadOnlySpan<byte> utf8, Span<byte> bytes, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true) => throw null;
public static OperationStatus DecodeFromUtf8InPlace(Span<byte> buffer, out int bytesWritten) => throw null;
// Encode
public static OperationStatus EncodeToUtf8(ReadOnlySpan<byte> bytes, Span<byte> utf8, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true) => throw null;
public static OperationStatus EncodeToUtf8InPlace(Span<byte> buffer, int dataLength, out int bytesWritten) => throw null;
}
}
I think we should demonstrate a need for inserting line breaks first. I am not aware of any common scenario where this would be used. If this feature is added then the line length must be variable. I cannot imagine any fixed line length to be appropriate.
What are the intended use cases?
Should the parsing routines accept --separated input
By default, the parsing routines should be maximally strict. This means no support for whitespace or any separator. When we parse data we generally want to be as strict as possible in order to find bugs in the originating systems and in order to not accidentally accept invalid data. The motto be strict in what you produce and liberal in what you accept
is false. It has lead to the mess that is web development. The correct approach is to be as strict as possible and relax only if required.
Parsing should accept any case of the letters A-F
, though. I see no good alternative to that.
I see no need for supporting a separator. This is a rare need. An easy workaround exists at a performance cost: str.Replace("-", "")
.
@GrabYourPitchforks, it seems we should survey what internal implementations we have and what all the places are where we expose something similar. It seems new proposals were added after you marked it as read for review.
Really looking forward to seeing this. I benchmarked @tkp1n implementation and it's faster than everything else in .NET Core. Compared to these methods: https://stackoverflow.com/a/624379/2828480
I have some follow-up thoughts on this.
My instinct is that @tkp1n's comment at https://github.com/dotnet/runtime/issues/17837#issuecomment-511112311 mostly nails it, but I'd recommend some further changes. First, let's not bother with anything in _System.Buffers.Text_ just yet, reserving this specific issue for new APIs in System.Convert
. Second, upon further reflection I don't think we need a HexFormattingOption
enum. It could instead be an optional parameter _bool lowercase = false_ at the end of the method signature. There's no need to introduce a new enum type into the _System_ namespace just yet.
I spent some time building up a list of places in the framework where we convert bytes to hex strings. See the end of this comment for a partial list. Every single one of these matches can be replaced with a call to one of the Convert
APIs as proposed in this issue.
Now, for a rude Q and A.
__What's the primary scenario for this?__
The 99% use case for converting to hex is to produce a case-insensitive identifier from a binary buffer (say, a GUID or similar). It's also frequently used in cryptography to represent symmetric key material or digests.
In particular, the majority of hex use cases are for _small-ish_ inputs and outputs (say, 256 bits max in the common case). That's also about the bounds of what we would expect where identifiers might appear.
This means that we don't need to worry about hyper-optimizing any such implementation. Until we get evidence that this particular API is bottlenecking real-world scenarios, a simple for loop should suffice as an implementation.
__What about customizing joiners or line breaks?__
For the primary scenario of identifiers and other small-ish payloads, callers typically don't want any type of joiner.
Where there are exceptions they tend to be things we can't easily account for in a generalized API. For example, some callers want the format "00-11-22-..."
(using joiners). Some callers want the format "0x00, 0x11, ..."
(using a prefix and joiners). Some callers want "%00%11..."
(using just a prefix) as the format. Hex editors might want simple spaces as a joiner, except where after every 60 characters a newline might occur, and that should be followed by a horizontal tab instead of a space.
I anticipate that such callers will continue to use their own custom logic. Since they represent the minority use case I'm not too worried about them for this scenario.
__What about Convert.FromHexString
?__
The behavior would be that it allows only [0-9A-Fa-f]
. The incoming string length must be evenly divisible by 2. (Empty strings are allowed.)
__Framework matches__
libraries\Common\tests\System\Net\Http\LoopbackServer.AuthenticationHelpers.cs:
299: sb.Append(b.ToString("x2"));
libraries\Common\tests\System\Security\Cryptography\ByteUtils.cs:
59: builder.Append(bytes[i].ToString("X2"));
libraries\System.CodeDom\src\Microsoft\CSharp\CSharpCodeGenerator.cs:
2512: Output.Write(b.ToString("X2", CultureInfo.InvariantCulture));
libraries\System.CodeDom\src\Microsoft\VisualBasic\VBCodeGenerator.cs:
2273: Output.Write(b.ToString("X2", CultureInfo.InvariantCulture));
libraries\System.DirectoryServices.AccountManagement\src\System\DirectoryServices\AccountManagement\Utils.cs:
62: stringizedArray.Append(b.ToString("x2", CultureInfo.InvariantCulture));
82: stringizedBinarySid.Append(b.ToString("x2", CultureInfo.InvariantCulture));
libraries\System.DirectoryServices.AccountManagement\src\System\DirectoryServices\AccountManagement\AD\ADStoreCtx_Query.cs:
496: stringguid.Append(b.ToString("x2", CultureInfo.InvariantCulture));
630: stringizedBinarySid.Append(b.ToString("x2", CultureInfo.InvariantCulture));
(below implementation is incorrect)
libraries\System.Drawing.Common\tests\mono\System.Drawing\BitmapTests.cs:
397: sOutput.Append(arrInput[i].ToString("X2"));
libraries\System.Net.Http\src\System\Net\Http\SocketsHttpHandler\AuthenticationHelper.Digest.cs:
245: bool formatted = result[i].TryFormat(byteX2, out int charsWritten, "x2");
libraries\System.Private.CoreLib\src\System\Diagnostics\Tracing\TraceLogging\XplatEventLogger.cs:
156: byteArray[i].TryFormat(hex, out int charsWritten, hexFormat);
libraries\System.Private.CoreLib\src\System\Reflection\AssemblyNameFormatter.cs:
75: sb.Append(b.ToString("x2", CultureInfo.InvariantCulture));
libraries\System.Reflection.MetadataLoadContext\tests\src\TestUtils\TestUtils.cs:
127: builder.Append(bytes[i].ToString("X2"));
libraries\System.Security.Cryptography.Xml\src\System\Security\Cryptography\Xml\SignedXmlDebugLog.cs:
203: builder.Append(b.ToString("x2", CultureInfo.InvariantCulture));
libraries\System.Security.Cryptography.Xml\tests\XmlDsigXsltTransformTest.cs:
117: sb.Append(b.ToString("X2"));
libraries\System.Text.Encodings.Web\tests\UrlEncoderTests.cs:
296: return string.Concat(_utf8EncodingThrowOnInvalidBytes.GetBytes(char.ConvertFromUtf32(codePoint)).Select(b => string.Format(CultureInfo.InvariantCulture, "%{0:X2}", b)));
libraries\System.Security.Cryptography.X509Certificates\src\Internal\Cryptography\Helpers.cs:
17: public static char[] ToHexArrayUpper(this byte[] bytes)
24: private static void ToHexArrayUpper(ReadOnlySpan<byte> bytes, Span<char> chars)
36: public static string ToHexStringUpper(this byte[] bytes) =>
@GrabYourPitchforks thank you for picking this up. I generally agree with the points you've brought up.
What's the primary scenario for this?
Let me add to your list two additional use cases I could think of: The tracing of binary data in motion (e.g., low-level network protocol) and hex dumps of binary files. Both typically encode a quantity of data well above the otherwise typical "max 256 bits"-cases. Ingesting those traces/dumps would consequently also use the decode methods with larger data sizes.
Investigating whether a vectorized implementation was worth it for those cases showed that AVX2 could be up to 10x faster in decoding 2KB of hex-encoded data (see table below / full comparison here).
There are already vectorized implementations in the broader ecosystem (e.g., this one in ASP.NET Core) that could be migrated to the API proposed here if we allow for some "hyper-optimization" to ensure the migration causes no perf regressions.
I would suggest a for-loop approach for small input data sizes and using an SSSE3 or AVX2 approach above a given threshold. All of this is an implementation detail, however, and we should first settle on the API, I guess. More than happy to provide a PR by the way, once the API is approved.
After integrating your feedback to my comment above, the API proposal would look like this:
namespace System
{
// NEW methods on EXISTING type
public static class Convert
{
// Decode from chars
public static byte[] FromHexCharArray(char[] inArray, int offset, int length) => throw null;
public static bool TryFromHexChars(ReadOnlySpan<char> chars, Span<byte> bytes, out int bytesWritten) => throw null;
// Decode from strings
public static byte[] FromHexString(string s) => throw null;
public static byte[] FromHexString(ReadOnlySpan<char> chars) => throw null;
public static bool TryFromHexString(string s, Span<byte> bytes, out int bytesWritten) => throw null;
// Encode to chars
public static int ToHexCharArray(byte[] inArray, int offsetIn, int length, char[] outArray, int offsetOut, bool lowercase = false) => throw null;
public static bool TryToHexChars(ReadOnlySpan<byte> bytes, Span<char> chars, out int charsWritten, bool lowercase = false) => throw null;
// Encode to strings
public static string ToHexString(byte[] inArray, bool lowercase = false) => throw null;
public static string ToHexString(byte[] inArray, int offset, int length, bool lowercase = false) => throw null;
public static string ToHexString(ReadOnlySpan<byte> bytes, bool lowercase = false) => throw null;
}
}
Maybe get in a simple implementation soon and rely on a community contribution to provide a super fast vectorized implementation. Sounds like a fun project for someone.
@tkp1n For your scenarios, would those tracers want to do a straight hex dump "00112233..." or would they want things to be formatted "00-11-22-..."? I've never seen any amount of data larger than a few dozen bytes dumped to hex as-is without some kind of formatting to make it more human-readable.
In the case of the aspnet core code you linked to above, it can be written in terms of the 100% fully type-safe HexConverter
internal class already in CoreLib, and its performance is within 1 nanosecond of the unsafe SSE3 version.
| Method | value | Mean | Error | StdDev |
|--------------------- |----------- |---------:|----------:|----------:|
| UseSse3_Unsafe | 3735928559 | 1.324 ns | 0.0298 ns | 0.0249 ns |
| UseBitTwiddle_Safe_B | 3735928559 | 2.237 ns | 0.0259 ns | 0.0216 ns |
@GSPP provide a super fast vectorized implementation
I did that already: https://github.com/tkp1n/HexMate/
@GrabYourPitchforks a straight hex dump or would they want things to be formatted
I'd say that depends on whether the primary reason for tracing is the consumption of the trace by a human or by a machine (with maybe a human looking at it from time to time). I've seen both, but over a certain amount of data there is almost always some formatting applied.
Another not yet mentioned source of "large" hex strings is the xs:hexBinary
primitive datatype in XML schemas. Such values are also encoded by the internal HexConverter class in CoreLib (via BinHexEncoder) that is undoubtedly fast for small inputs but falls behind vectorized implementations quickly for more substantial buffers. Decoding of hexBinary is done in BinHexDecoder using (yet) another implementation tolerating a mix of upper- and lowercase characters as well as whitespace.
```C#
namespace System
{
public static class Convert
{
public static byte[] FromHexString(string s);
public static byte[] FromHexString(ReadOnlySpan
public static string ToHexString(byte[] inArray);
public static string ToHexString(byte[] inArray, int offset, int length);
public static string ToHexString(ReadOnlySpan<byte> bytes);
}
}
```
@terrajobst or @GrabYourPitchforks feel free to assign this to me. I鈥榙 be happy to provide a PR for this.
@tkp1n done! Thanks :)
Most helpful comment
I consolidated the different proposals and comments made in this discussion and came up with the following API. I also provide a sample implementation (internally utilizing AVX2, SSE4.1, SSSE3, SSE2 or Scalar/Lookup with the following performance characteristics) of the API to experiment with via a seperate GitHub repository and nuget package.
Sample implementation to play arround with
https://github.com/tkp1n/HexMate
https://www.nuget.org/packages/HexMate
Adressed proposals and comments:
System.Convert
(@GrabYourPitchforks, https://github.com/dotnet/corefx/issues/10013#issuecomment-463419321 and @vcsjones, https://github.com/dotnet/corefx/issues/10013#issuecomment-463452088)OperationStatus
-based APIs as currently inSystem.Buffers.Text.Base64
(@GrabYourPitchforks, https://github.com/dotnet/corefx/issues/10013#issuecomment-463419321)Open questions:
InsertLineBreaks
add a new-line every 72 characters for hex but every 76 for base64? (@GrabYourPitchforks, https://github.com/dotnet/corefx/issues/10013#issuecomment-463492630)-
-separated input to allow processing strings returned byBitConverter.ToString(byte[])
?GetMaxDecodedFromUtf8Length
andGetMaxEncodedToUtf8Length
make sense forHex
as the calculation is much simpler as with base64 where those methods are provided?API proposal