Roslyn: Extend ReadOnlySpan<byte> optimization for static data to work with ASCII/UTF8 strings

Created on 11 Feb 2019  路  8Comments  路  Source: dotnet/roslyn

Could this (https://github.com/dotnet/roslyn/pull/24621) work with byte strings? As writing them out is fairly impenetrable e.g.

static ReadOnlySpan<byte> ContinueBytes =>
    new byte[] { (byte)'H', (byte)'T', (byte)'T', (byte)'P', (byte)'/', (byte)'1', (byte)'.', (byte)'1', (byte)' ', (byte)'1', (byte)'0', (byte)'0', (byte)' ', (byte)'C', (byte)'o', (byte)'n', (byte)'t', (byte)'i', (byte)'n', (byte)'u', (byte)'e', (byte)'\r', (byte)'\n', (byte)'\r', (byte)'\n' };

So it would be nice if it worked with UTF8 and/or ASCII encoding, so this worked instead (both preferably):

static ReadOnlySpan<byte> ContinueBytes =>
    Encoding.UTF8.GetBytes("HTTP/1.1 100 Continue\r\n\r\n");

Example: https://github.com/aspnet/AspNetCore/pull/7422

/cc @VSadov @stephentoub @jkotas @KrzysztofCwalina @jaredpar @jcouv

Area-Compilers Code Gen Quality

Most helpful comment

According to LDM Sept. 16, 2019, UTF-8 string literals are emitted as UTF-16, at least in the initial implementation.
So I decided to create a source generator for this purpose.

All 8 comments

My personal preference would for some string literal modifier to exist like b"abc" (a la Rust) that is of type ROS<byte> and be limited to ASCII characters. Other prefixes could exist for other encodings, though I wonder how endianness would work.

This would mesh generally with the proposals we've been shooting around internally re:

Utf8String theStr = utf8"Hello world!"; // or similar

And since the current proposal is to have a free conversion from Utf8String to ROS\

I hear talk of something like this, which would be more terse and a bit better:

static ReadOnlySpan<byte> ContinueBytes => u8"HTTP/1.1 100 Continue\r\n\r\n";

Working via implicit Utf8String => ReadOnlySpan<byte> conversion (and hopefully the Utf8String using the same load from data as ReadOnlySpan<byte>?)

/cc @GrabYourPitchforks

Isn't string internally char and should be ReadOnlySpan<char> ?

Not 8 bit string data (i.e. ASCII and UTF8); they are a list of bytes, so ReadOnlySpan<byte>

UTF16 strings sort of work https://github.com/dotnet/coreclr/issues/22511 with

static ReadOnlySpan<char> Hello => "Hello".AsSpan();

But if you want to do 8 bit you have to do:

// "HTTP/1.1 100 Continue\r\n\r\n"
static ReadOnlySpan<byte> ContinueBytes => new byte[] { (byte)'H', (byte)'T', (byte)'T', (byte)'P', (byte)'/', (byte)'1', (byte)'.', (byte)'1', (byte)' ', (byte)'1', (byte)'0', (byte)'0', (byte)' ', (byte)'C', (byte)'o', (byte)'n', (byte)'t', (byte)'i', (byte)'n', (byte)'u', (byte)'e', (byte)'\r', (byte)'\n', (byte)'\r', (byte)'\n' };

It feels like having to write new byte[] { (byte)'H', (byte)'T', (byte)'T', (byte)'P', ... should be fixed first. Looks gross.

Once there is a utf8 literal (or a wellknown/intrinsic conversion from string), hooking up the optimization should be fairly easy.

I think string literals are good special cases to have.

Long term I would love something like C++'s constexpr to generalize things like this.

According to LDM Sept. 16, 2019, UTF-8 string literals are emitted as UTF-16, at least in the initial implementation.
So I decided to create a source generator for this purpose.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

NikChao picture NikChao  路  3Comments

codingonHP picture codingonHP  路  3Comments

marler8997 picture marler8997  路  3Comments

ashmind picture ashmind  路  3Comments

OndrejPetrzilka picture OndrejPetrzilka  路  3Comments