Could this (https://github.com/dotnet/roslyn/pull/24621) work with byte strings? As writing them out is fairly impenetrable e.g.
static ReadOnlySpan<byte> ContinueBytes =>
new byte[] { (byte)'H', (byte)'T', (byte)'T', (byte)'P', (byte)'/', (byte)'1', (byte)'.', (byte)'1', (byte)' ', (byte)'1', (byte)'0', (byte)'0', (byte)' ', (byte)'C', (byte)'o', (byte)'n', (byte)'t', (byte)'i', (byte)'n', (byte)'u', (byte)'e', (byte)'\r', (byte)'\n', (byte)'\r', (byte)'\n' };
So it would be nice if it worked with UTF8 and/or ASCII encoding, so this worked instead (both preferably):
static ReadOnlySpan<byte> ContinueBytes =>
Encoding.UTF8.GetBytes("HTTP/1.1 100 Continue\r\n\r\n");
Example: https://github.com/aspnet/AspNetCore/pull/7422
/cc @VSadov @stephentoub @jkotas @KrzysztofCwalina @jaredpar @jcouv
My personal preference would for some string literal modifier to exist like b"abc"
(a la Rust) that is of type ROS<byte>
and be limited to ASCII characters. Other prefixes could exist for other encodings, though I wonder how endianness would work.
This would mesh generally with the proposals we've been shooting around internally re:
Utf8String theStr = utf8"Hello world!"; // or similar
And since the current proposal is to have a free conversion from Utf8String to ROS\
I hear talk of something like this, which would be more terse and a bit better:
static ReadOnlySpan<byte> ContinueBytes => u8"HTTP/1.1 100 Continue\r\n\r\n";
Working via implicit Utf8String
=> ReadOnlySpan<byte>
conversion (and hopefully the Utf8String using the same load from data as ReadOnlySpan<byte>
?)
/cc @GrabYourPitchforks
Isn't string
internally char
and should be ReadOnlySpan<char>
?
Not 8 bit string data (i.e. ASCII and UTF8); they are a list of bytes, so ReadOnlySpan<byte>
UTF16 string
s sort of work https://github.com/dotnet/coreclr/issues/22511 with
static ReadOnlySpan<char> Hello => "Hello".AsSpan();
But if you want to do 8 bit you have to do:
// "HTTP/1.1 100 Continue\r\n\r\n"
static ReadOnlySpan<byte> ContinueBytes => new byte[] { (byte)'H', (byte)'T', (byte)'T', (byte)'P', (byte)'/', (byte)'1', (byte)'.', (byte)'1', (byte)' ', (byte)'1', (byte)'0', (byte)'0', (byte)' ', (byte)'C', (byte)'o', (byte)'n', (byte)'t', (byte)'i', (byte)'n', (byte)'u', (byte)'e', (byte)'\r', (byte)'\n', (byte)'\r', (byte)'\n' };
It feels like having to write new byte[] { (byte)'H', (byte)'T', (byte)'T', (byte)'P', ...
should be fixed first. Looks gross.
Once there is a utf8 literal (or a wellknown/intrinsic conversion from string), hooking up the optimization should be fairly easy.
I think string literals are good special cases to have.
Long term I would love something like C++'s constexpr to generalize things like this.
According to LDM Sept. 16, 2019, UTF-8 string literals are emitted as UTF-16, at least in the initial implementation.
So I decided to create a source generator for this purpose.
Most helpful comment
According to LDM Sept. 16, 2019, UTF-8 string literals are emitted as UTF-16, at least in the initial implementation.
So I decided to create a source generator for this purpose.