Runtime: Add string.Create() with function pointer

Created on 3 Dec 2020  路  6Comments  路  Source: dotnet/runtime

Background and Motivation

Function pointers was introduced in C# 9 and can be used as allocation-free alternatives of delegates. SpanAction<T, TState> and ReadOnlySpanAction<T, TSate> can be effectively replaced with such pointers.

Proposed API

public sealed class String
{
  public static string Create<TState>(int length, TState state, delegate*<Span<char>, TState, void> action);
}

Usage Examples

private static void FromSequence(Span<char> output, ReadOnlySequence<char> input) => input.CopyTo(output);

ReadOnlySequence<char> seq = ...;
string result = string.Create(seq.Length, seq, &FromSequence);

Alternative Designs

None.

Risks

The method is available within unsafe blocks only.

api-suggestion area-System.Runtime untriaged

Most helpful comment

The purpose would be to avoid yet another allocation

Just to make sure everyone is on the same page, though, we're talking about one allocation total, not per invocation. The scenarios where a function pointer could be used (no capture) are the same scenarios for which the C# compiler will automatically cache the delegate, so there will only be an allocation the first time it's used and then never again (other than if there's a race to initialize it, but that's a one-time hit as well).

On this microbenchmark, which is arguably the absolute best case (one-character string being initialized with a known single character value):
```C#
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Running;
using System;
using System.Buffers;

[MemoryDiagnoser]
public class Program
{
static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);

public static unsafe string Create<TState>(int length, TState state, SpanAction<char, TState> action)
{
    if (action == null)
        throw new ArgumentNullException(nameof(action));

    if (length <= 0)
    {
        if (length == 0)
            return string.Empty;
        throw new ArgumentOutOfRangeException(nameof(length));
    }

    string result = new string('\0', length);
    fixed (char* ptr = result)
    {
        action(new Span<char>(ptr, result.Length), state);
    }
    return result;
}

public static unsafe string Create<TState>(int length, TState state, delegate*<Span<char>, TState, void> action)
{
    if (action == null)
        throw new ArgumentNullException(nameof(action));

    if (length <= 0)
    {
        if (length == 0)
            return string.Empty;
        throw new ArgumentOutOfRangeException(nameof(length));
    }

    string result = new string('\0', length);
    fixed (char* ptr = result)
    {
        action(new Span<char>(ptr, result.Length), state);
    }
    return result;
}

[Benchmark]
public string WithDelegate() => Create(1, 'c', (span, value) => span[0] = value);

[Benchmark]
public unsafe string WithPointer() => Create(1, 'c', &CreateOneCharString);

private static void CreateOneCharString(Span<char> span, char value) => span[0] = value;

}

on my machine I get:

|       Method |     Mean |     Error |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------- |---------:|----------:|----------:|-------:|------:|------:|----------:|
| WithDelegate | 9.484 ns | 0.1126 ns | 0.1053 ns | 0.0038 |     - |     - |      24 B |
|  WithPointer | 8.120 ns | 0.2221 ns | 0.2644 ns | 0.0038 |     - |     - |      24 B |

so, best case, a 13% improvement on throughput.  Most uses involve lots more work than this, but even ones that are just a tad-bit larger see a significant drop in that already relatively-small difference, e.g. here's an example from one use in ASP.NET that just fills the target buffer with some bitshifting from a long and lookups in a string:
```C#
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Running;
using System;
using System.Buffers;
using System.Threading;

[MemoryDiagnoser]
public class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);

    public static unsafe string Create<TState>(int length, TState state, SpanAction<char, TState> action)
    {
        if (action == null)
            throw new ArgumentNullException(nameof(action));

        if (length <= 0)
        {
            if (length == 0)
                return string.Empty;
            throw new ArgumentOutOfRangeException(nameof(length));
        }

        string result = new string('\0', length);
        fixed (char* ptr = result)
        {
            action(new Span<char>(ptr, result.Length), state);
        }
        return result;
    }

    public static unsafe string Create<TState>(int length, TState state, delegate*<Span<char>, TState, void> action)
    {
        if (action == null)
            throw new ArgumentNullException(nameof(action));

        if (length <= 0)
        {
            if (length == 0)
                return string.Empty;
            throw new ArgumentOutOfRangeException(nameof(length));
        }

        string result = new string('\0', length);
        fixed (char* ptr = result)
        {
            action(new Span<char>(ptr, result.Length), state);
        }
        return result;
    }

    internal static class CorrelationIdGenerator
    {
        private static readonly char[] s_encode32Chars = "0123456789ABCDEFGHIJKLMNOPQRSTUV".ToCharArray();

        private static long _lastId = DateTime.UtcNow.Ticks;

        public static string GetNextIdWithDelegate()
        {
            long id = Interlocked.Increment(ref _lastId);
            return Create(13, id, (buffer, value) =>
            {
                char[] encode32Chars = s_encode32Chars;

                buffer[12] = encode32Chars[value & 31];
                buffer[11] = encode32Chars[(value >> 5) & 31];
                buffer[10] = encode32Chars[(value >> 10) & 31];
                buffer[9] = encode32Chars[(value >> 15) & 31];
                buffer[8] = encode32Chars[(value >> 20) & 31];
                buffer[7] = encode32Chars[(value >> 25) & 31];
                buffer[6] = encode32Chars[(value >> 30) & 31];
                buffer[5] = encode32Chars[(value >> 35) & 31];
                buffer[4] = encode32Chars[(value >> 40) & 31];
                buffer[3] = encode32Chars[(value >> 45) & 31];
                buffer[2] = encode32Chars[(value >> 50) & 31];
                buffer[1] = encode32Chars[(value >> 55) & 31];
                buffer[0] = encode32Chars[(value >> 60) & 31];
            });
        }

        public static unsafe string GetNextIdWithFunctionPointer()
        {
            long id = Interlocked.Increment(ref _lastId);
            return Create(13, id, &ForPointer);
        }

        private static void ForPointer(Span<char> buffer, long value)
        {
            char[] encode32Chars = s_encode32Chars;

            buffer[12] = encode32Chars[value & 31];
            buffer[11] = encode32Chars[(value >> 5) & 31];
            buffer[10] = encode32Chars[(value >> 10) & 31];
            buffer[9] = encode32Chars[(value >> 15) & 31];
            buffer[8] = encode32Chars[(value >> 20) & 31];
            buffer[7] = encode32Chars[(value >> 25) & 31];
            buffer[6] = encode32Chars[(value >> 30) & 31];
            buffer[5] = encode32Chars[(value >> 35) & 31];
            buffer[4] = encode32Chars[(value >> 40) & 31];
            buffer[3] = encode32Chars[(value >> 45) & 31];
            buffer[2] = encode32Chars[(value >> 50) & 31];
            buffer[1] = encode32Chars[(value >> 55) & 31];
            buffer[0] = encode32Chars[(value >> 60) & 31];
        }
    }

    [Benchmark]
    public string WithDelegate() => CorrelationIdGenerator.GetNextIdWithDelegate();

    [Benchmark]
    public unsafe string WithPointer() => CorrelationIdGenerator.GetNextIdWithFunctionPointer();
}

and which yields on my machine:
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------- |---------:|---------:|---------:|-------:|------:|------:|----------:|
| WithDelegate | 19.14 ns | 0.412 ns | 0.385 ns | 0.0076 | - | - | 48 B |
| WithPointer | 18.01 ns | 0.410 ns | 0.342 ns | 0.0076 | - | - | 48 B |

for only a 6% gain.

And on top of that, all such APIs are allocating new strings, so these benchmark results don't reflect well the further cost that both incur of the impact on GC and how that affects the whole app.

From my perspective, this isn't worth starting a trend of creating unsafe pointer-based overloads for such APIs.

My $.02.

All 6 comments

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

I don't see that much use for this overload, considering that its only purpose is being an unsafe equivalent of what already exists.

Create method must check the pointer value and throw ArgumentNullException on null.

That's not a risk at all, that's basic code hygiene. The real risk is that any call-site requires unsafe.

The purpose would be to avoid yet another allocation and while I would expect that this is not necessarily used in a hot-path, the string.Create API exists as a performance optimization already and so having an overload that allows you to avoid an additional allocation seems to be inline with the original goal of the API (a low cost and "official" way to create a string of a known length and set the contents).

It might be worth prototyping and seeing if it shows any benefits for the locations already using String.Create such as ASP.NET Core, ReadOnlySequence, HttpListener, and a few of the Encoding APIs: https://source.dot.net/#System.Private.CoreLib/String.cs,7dd9a20e8a84bf21,references

@tannergooding , maybe it's reasonable to expand proposal and look at all public APIs expecting these delegates as input parameters and add overloads with function pointers.

I don't think that's as worthwhile. Most APIs taking a delegate aren't as performance oriented/low-level and so a function pointer might seem out of place.

I believe it's a case by case basis where function pointers might be beneficial and there are likely a few places where generic specialized interfaces would be a better fit.
There are also likely a number of places where function pointers are the wrong approach and where generic specialization over an interface would win out: https://source.dot.net/#System.Private.CoreLib/SpanHelpers.BinarySearch.cs,7ffa10c3faafe048, particularly where predicates and simple comparisons are involved.

The purpose would be to avoid yet another allocation

Just to make sure everyone is on the same page, though, we're talking about one allocation total, not per invocation. The scenarios where a function pointer could be used (no capture) are the same scenarios for which the C# compiler will automatically cache the delegate, so there will only be an allocation the first time it's used and then never again (other than if there's a race to initialize it, but that's a one-time hit as well).

On this microbenchmark, which is arguably the absolute best case (one-character string being initialized with a known single character value):
```C#
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Running;
using System;
using System.Buffers;

[MemoryDiagnoser]
public class Program
{
static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);

public static unsafe string Create<TState>(int length, TState state, SpanAction<char, TState> action)
{
    if (action == null)
        throw new ArgumentNullException(nameof(action));

    if (length <= 0)
    {
        if (length == 0)
            return string.Empty;
        throw new ArgumentOutOfRangeException(nameof(length));
    }

    string result = new string('\0', length);
    fixed (char* ptr = result)
    {
        action(new Span<char>(ptr, result.Length), state);
    }
    return result;
}

public static unsafe string Create<TState>(int length, TState state, delegate*<Span<char>, TState, void> action)
{
    if (action == null)
        throw new ArgumentNullException(nameof(action));

    if (length <= 0)
    {
        if (length == 0)
            return string.Empty;
        throw new ArgumentOutOfRangeException(nameof(length));
    }

    string result = new string('\0', length);
    fixed (char* ptr = result)
    {
        action(new Span<char>(ptr, result.Length), state);
    }
    return result;
}

[Benchmark]
public string WithDelegate() => Create(1, 'c', (span, value) => span[0] = value);

[Benchmark]
public unsafe string WithPointer() => Create(1, 'c', &CreateOneCharString);

private static void CreateOneCharString(Span<char> span, char value) => span[0] = value;

}

on my machine I get:

|       Method |     Mean |     Error |    StdDev |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------- |---------:|----------:|----------:|-------:|------:|------:|----------:|
| WithDelegate | 9.484 ns | 0.1126 ns | 0.1053 ns | 0.0038 |     - |     - |      24 B |
|  WithPointer | 8.120 ns | 0.2221 ns | 0.2644 ns | 0.0038 |     - |     - |      24 B |

so, best case, a 13% improvement on throughput.  Most uses involve lots more work than this, but even ones that are just a tad-bit larger see a significant drop in that already relatively-small difference, e.g. here's an example from one use in ASP.NET that just fills the target buffer with some bitshifting from a long and lookups in a string:
```C#
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Running;
using System;
using System.Buffers;
using System.Threading;

[MemoryDiagnoser]
public class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);

    public static unsafe string Create<TState>(int length, TState state, SpanAction<char, TState> action)
    {
        if (action == null)
            throw new ArgumentNullException(nameof(action));

        if (length <= 0)
        {
            if (length == 0)
                return string.Empty;
            throw new ArgumentOutOfRangeException(nameof(length));
        }

        string result = new string('\0', length);
        fixed (char* ptr = result)
        {
            action(new Span<char>(ptr, result.Length), state);
        }
        return result;
    }

    public static unsafe string Create<TState>(int length, TState state, delegate*<Span<char>, TState, void> action)
    {
        if (action == null)
            throw new ArgumentNullException(nameof(action));

        if (length <= 0)
        {
            if (length == 0)
                return string.Empty;
            throw new ArgumentOutOfRangeException(nameof(length));
        }

        string result = new string('\0', length);
        fixed (char* ptr = result)
        {
            action(new Span<char>(ptr, result.Length), state);
        }
        return result;
    }

    internal static class CorrelationIdGenerator
    {
        private static readonly char[] s_encode32Chars = "0123456789ABCDEFGHIJKLMNOPQRSTUV".ToCharArray();

        private static long _lastId = DateTime.UtcNow.Ticks;

        public static string GetNextIdWithDelegate()
        {
            long id = Interlocked.Increment(ref _lastId);
            return Create(13, id, (buffer, value) =>
            {
                char[] encode32Chars = s_encode32Chars;

                buffer[12] = encode32Chars[value & 31];
                buffer[11] = encode32Chars[(value >> 5) & 31];
                buffer[10] = encode32Chars[(value >> 10) & 31];
                buffer[9] = encode32Chars[(value >> 15) & 31];
                buffer[8] = encode32Chars[(value >> 20) & 31];
                buffer[7] = encode32Chars[(value >> 25) & 31];
                buffer[6] = encode32Chars[(value >> 30) & 31];
                buffer[5] = encode32Chars[(value >> 35) & 31];
                buffer[4] = encode32Chars[(value >> 40) & 31];
                buffer[3] = encode32Chars[(value >> 45) & 31];
                buffer[2] = encode32Chars[(value >> 50) & 31];
                buffer[1] = encode32Chars[(value >> 55) & 31];
                buffer[0] = encode32Chars[(value >> 60) & 31];
            });
        }

        public static unsafe string GetNextIdWithFunctionPointer()
        {
            long id = Interlocked.Increment(ref _lastId);
            return Create(13, id, &ForPointer);
        }

        private static void ForPointer(Span<char> buffer, long value)
        {
            char[] encode32Chars = s_encode32Chars;

            buffer[12] = encode32Chars[value & 31];
            buffer[11] = encode32Chars[(value >> 5) & 31];
            buffer[10] = encode32Chars[(value >> 10) & 31];
            buffer[9] = encode32Chars[(value >> 15) & 31];
            buffer[8] = encode32Chars[(value >> 20) & 31];
            buffer[7] = encode32Chars[(value >> 25) & 31];
            buffer[6] = encode32Chars[(value >> 30) & 31];
            buffer[5] = encode32Chars[(value >> 35) & 31];
            buffer[4] = encode32Chars[(value >> 40) & 31];
            buffer[3] = encode32Chars[(value >> 45) & 31];
            buffer[2] = encode32Chars[(value >> 50) & 31];
            buffer[1] = encode32Chars[(value >> 55) & 31];
            buffer[0] = encode32Chars[(value >> 60) & 31];
        }
    }

    [Benchmark]
    public string WithDelegate() => CorrelationIdGenerator.GetNextIdWithDelegate();

    [Benchmark]
    public unsafe string WithPointer() => CorrelationIdGenerator.GetNextIdWithFunctionPointer();
}

and which yields on my machine:
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------- |---------:|---------:|---------:|-------:|------:|------:|----------:|
| WithDelegate | 19.14 ns | 0.412 ns | 0.385 ns | 0.0076 | - | - | 48 B |
| WithPointer | 18.01 ns | 0.410 ns | 0.342 ns | 0.0076 | - | - | 48 B |

for only a 6% gain.

And on top of that, all such APIs are allocating new strings, so these benchmark results don't reflect well the further cost that both incur of the impact on GC and how that affects the whole app.

From my perspective, this isn't worth starting a trend of creating unsafe pointer-based overloads for such APIs.

My $.02.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matty-hall picture matty-hall  路  3Comments

EgorBo picture EgorBo  路  3Comments

Timovzl picture Timovzl  路  3Comments

noahfalk picture noahfalk  路  3Comments

sahithreddyk picture sahithreddyk  路  3Comments