Runtime: StringBuilder.AppendJoin (appending lists to StringBuilder)

Created on 24 Sep 2015  路  55Comments  路  Source: dotnet/runtime

Background

Let's say you want to build a string that contains a separated list (e.g. list of filenames, messages, or anything else).

If it is the only thing you want, you can use string.Join. However what if you want to append something else? If you Concat with the result of string.Join, that means that you unnecessarily allocated an intermediate joined string, which can be rather large (depending on the original collection).

On the other hand, StringBuilder would require something like:

var builder = new StringBuilder("Names: ");
for (var i = 0; i < names.Length; i++) {
    if (i > 0) { // this becomes even more involved if `names` is `IEnumerable`
        builder.Append(", ");
    }
    builder.Append(names[i]);
}

That can be abstracted into an extension method, however I believe that extension would be general and useful enough to benefit from being in NetFX.

API Design

(updated based on API review, second pass -- see https://github.com/dotnet/corefx/issues/3419#issuecomment-273254581)

public class StringBuilder {
    public StringBuilder AppendJoin(string separator, params object[] values);
    public StringBuilder AppendJoin<T>(string separator, IEnumerable<T> values);
    public StringBuilder AppendJoin(string separator, params string[] value);

    public StringBuilder AppendJoin(chat separator, params object[] values);
    public StringBuilder AppendJoin<T>(char separator, IEnumerable<T> values);
    public StringBuilder AppendJoin(char separator, params string[] value);

    // string.Join overloads NOT taken:
    //public StringBuilder AppendJoin(string separator, IEnumerable<string> values);
    //public StringBuilder AppendJoin(string separator, string[] values, int startIndex, int count);
    //public StringBuilder AppendJoin(char separator, IEnumerable<string> values);
    //public StringBuilder AppendJoin(char separator, string[] values, int startIndex, int count);
    // REJECTED in the first API review round:
    //public StringBuilder AppendJoin<T>(string separator, T[] values, int startIndex, int count);
    //public StringBuilder AppendJoin<T>(string separator, params T[] values);
    //public StringBuilder AppendJoin<T>(char separator, params T[] values);
}

Note: Motivation for the API approval update is described in https://github.com/dotnet/corefx/issues/3419#issuecomment-273254581 -- adjusting to follow proven working form of 'string.Join', correctly working with most-used overload calls - see usage example in https://github.com/dotnet/corefx/issues/5552#issuecomment-268395367.

Compatibility

Potential source-level issue if someone is already using an extension method with the same name.

api-approved area-System.Runtime up-for-grabs wishlist

All 55 comments

@terrajobst could you take a look please

The proposed APIs seem quite sensible to me and are following the spirit of StringBuilder: avoiding unnecessary allocations of strings.

@weshaggard @KrzysztofCwalina @vancem Thoughts?

It is a nice convenience function. My only comment is whether the API should be generic or not (currently StringBuilder has no generic methods) You could certainly imagine it take IEnumerable or even IEnumerable.

Also didn't C# recently add a feature that you could do a params IEnumerable that works like params T[].

But these are all minor points. As long as we have thought about the tradeoffs this is OK by me.

I would not add the last overload. It does not seems to be critical and I would prefer that in the future we add the ability to join slices/spans.

@vancem That's dotnet/roslyn#36, and as far as I can tell, it hasn't been added yet, it's currently "ready to be implemented".

What if the string from each value is dynamically generated, could we have some form of overload which can Append directly to the StringBuilder? I'm thinking something like the following:

```c#
public StringBuilder AppendJoin(string separator, IEnumerable appendActions);

public delegate void AppendAction(IStringBuilderAppender stringBuilderAppender);
```

where IStringBuilderAppender only exposes the Append methods from StringBuilder.

What if the string from each value is dynamically generated

So you could implement your own IEnumerable<T> and IEnumerator<T>.

@Lukazoid your proposal would require additional allocations: each delegate is a heap allocation. Probably worse in practice because the append action doesn't take context so it would probably require a closure.

Also, I'm not clear what benefit the deferred string allocation would bring. I think @Maxwe11 is right: when you don't have the strings already, simply produce them before calling or create a custom enumerator, potentially using yield.

Ah, I expected writing directly to the StringBuilder rather than creating an intermediate string which then gets written to the StringBuilder to work out better.

@terrajobst If I understand it correctly, that overload is meant to avoid intermediate string allocations. For example, instead of:

```c#
stringBuilder.AppendJoin(", ", dictionary.Select(kvp => kvp.Key + ": " + kvp.Value));

You would write something like:

```c#
stringBuilder.AppendJoin(", ", dictionary.Select<KeyValuePair<string, int>, AppendAction>(
        kvp => sb => { sb.Append(kvp.Key); sb.Append(": "); sb.Append(kvp.Value); }));

I do see the benefit: it does avoid unnecessary string allocations. But you're also right that it would require delegate and closure allocations.

And it also looks terrible, so I would likely choose explicit loop over this.

@Lukazoid Do I understand what you meant correctly?

An alternative approach (which might be what you hinted at when you mentioned context) would be:

```c#
public StringBuilder AppendJoin(
string separator, IEnumerable values, Action appendActionFactory)

This would be more efficient (no unnecessary string allocations, likely no per-invocation delegate or closure allocations) and also results in better looking code:

```c#
stringBuilder.AppendJoin(", ", dictionary,
    (kvp, sb) => { sb.Append(kvp.Key); sb.Append(": "); sb.Append(kvp.Value); });

That is exactly what I meant @svick, thanks for coming up with a much cleaner implementation :)

The following demonstrates the kind of thing I would like to be able to do:

```c#
var ages = new Dictionary
{
{ "Tom", 22 },
{ "Jane", 35 },
{ "Robert", 46 },
{ "Mary", 27 },
{ "Harry", 29 },
{ "Marge", 32 }
};

const int iterations = 100000;
const string formatString = "{0} is {1} years old";

var sw = Stopwatch.StartNew();

// Test with appending directly to the StringBuilder - proposed delegate signature
for (var i = 0; i < iterations; ++i)
{
var stringBuilder = new StringBuilder();
stringBuilder.AppendJoin(", ", ages, (sb, kvp) => sb.AppendFormat(CultureInfo.InvariantCulture, formatString, kvp.Key, kvp.Value));

var res = stringBuilder.ToString();

}
var appendJoinDelegateElapsed = sw.Elapsed;

This is what we could do with the other new overloads:

```c#
sw.Restart();
// Test with appending pre-built strings to the StringBuilder - "existing" IEnumerable<T> signature
for (var i = 0; i < iterations; ++i)
{
    var stringBuilder = new StringBuilder();
    var values = ages.Select(kvp => String.Format(CultureInfo.InvariantCulture, formatString, kvp.Key, kvp.Value));

    stringBuilder.AppendJoin(", ", values);

    var res = stringBuilder.ToString();
}
var appendJoinStringsElapsed = sw.Elapsed;

On my machine I get 0.55 seconds for appendJoinDelegateElapsed and 0.75 seconds for appendJoinStringsElapsed there also the benefit that the delegate example undoubtedly involves fewer intermediate string allocations.

So what's the proposal then?

@terrajobst I stand for the original proposal, but without the last overload as per @KrzysztofCwalina:

```c#
public class StringBuilder {
// ...

// Similar to string.Join. Alternative name can be "AppendAll" or "AppendRange".
public StringBuilder AppendJoin<T>(string separator, IEnumerable<T> values);
public StringBuilder AppendJoin<T>(string separator, params T[] values);

}
```

I think @Lukazoid's idea can be split into a separate ticket which can be decided on separately. Both approaches make sense, but for simple scenarios I definitely need a non-delegate approach, so delegate decision can be made after this one.

What @ashmind suggests sounds good to me.

@terrajobst this one is ready for review as well.

My 2 cents: I would prefer if we would call the API AppendAll instead of AppendJoin.

@AlexGhiondea, I don't think AppendAll(list, ", ") clearly clarifies that the list is being joined. Meanwhile AppendJoin is gramatically incorrect because Join is not a noun (while Format in AppendFormat is a noun). I think it sounds better as AppendJoined.

@jamesqo AppendFormat is syntactically correct English, but interpreted that way, it's not correct semantically, since you're not appending any "format", you're appending a formatted string. The way I interpret it, AppendFormat means "format and append", so AppendJoin (understood as "join and append") is the right name.

AppendJoin is also consistent: string.Format is related to StringBuilder.AppendFormat the same way string.Join is related to the proposed method, so it should be named StringBuilder.AppendJoin.

For the same reason, I'm against the AppendAll name, since it's not consistent.

The API looks good. We've also approved an overload of String.Join that uses char for the separator (#5552). We should probably do the same here, so:

C# public class StringBuilder { public StringBuilder AppendJoin<T>(string separator, IEnumerable<T> values); public StringBuilder AppendJoin<T>(string separator, params T[] values); public StringBuilder AppendJoin<T>(char separator, IEnumerable<T> values); public StringBuilder AppendJoin<T>(char separator, params T[] values); }

@AlexRadch please ping us when you grab issues, so that we can assign them to you ...

@karelz I made this API implementation in PR https://github.com/dotnet/coreclr/pull/8303 but nobody answer on it.

Sometimes it requires patience - Thu & Fri was Thanksgiving in US, so there were plenty of PRs from lots of folks in last 4 days and the core team was OOF ... there's gotta be some backlog :).

cc: @joperezr @AlexGhiondea to help with the code review ...

What is missing here? Do we need to add tests in CoreFX?

@karelz yes we need add tests.

The only counterintuitive part is that calling AppendJoin with a List<string> results in the params List<string>[] overload being chosen and you get a .ToString() on the list instead of list items.

@jnm2 you're right, I double checked. That sucks.

IEnumerable<T> and params T[] overloads don't work well together. @terrajobst we should either pull params T[] overload, or change it to:
c# public StringBuilder AppendJoin<T>(string separator, T value1, params T[] values);
to disambiguate the calls, if we really care about the usage syntax sb.AppendJoin("separator", "a", "b", "c");.
Any other guidance from API design point of view?

We just talked about this. Ideally, we'd like this shape:

```C#
public StringBuilder AppendJoin(string separator, params IEnumerable values);

Which the compiler doesn't support today. However we could elide `params` for now, requiring the consumer to write code like this this:

```C#
sb.AppendJoin(";", new[] { "one", "two", "three" });

And if we ever get params IEnumerable we can simply update this method to be params because it's not a breaking change.

Thus, we propose to remove the overload that takes T[] and only leave the IEnumerable as the latter is more powerful.

I have updated the spec in top most post. params overloads are now 'REJECETED'.

@AlexRadch, the (now rejected) APIs are already added to CoreCLR and CoreRT. We need to remove them. Do you want to grab that work?

Thanks @jnm2 for catching the problem!

We should adjust the API to string.Join (see example, incl. usage in https://github.com/dotnet/corefx/issues/5552#issuecomment-268395367).

This is the shape we should adopt:

public class StringBuilder {
    public StringBuilder AppendJoin(string separator, params object[] value);
    public StringBuilder AppendJoin<T>(string separator, IEnumerable<T> values);
    public StringBuilder AppendJoin(string separator, params string[] values);

    public StringBuilder AppendJoin(chat separator, params object[] value);
    public StringBuilder AppendJoin<T>(char separator, IEnumerable<T> values);
    public StringBuilder AppendJoin(char separator, params string[] values);

    // string.Join overloads NOT taken:
    //public StringBuilder AppendJoin(string separator, IEnumerable<string> values);
    //public StringBuilder AppendJoin(string separator, string[] values, int startIndex, int count);
    //public StringBuilder AppendJoin(char separator, IEnumerable<string> values);
    //public StringBuilder AppendJoin(char separator, string[] values, int startIndex, int count);
}

@karelz I do not understand what API should be?

public class StringBuilder {
    public StringBuilder AppendJoin(string separator, params object[] value);
    public StringBuilder AppendJoin<T>(string separator, IEnumerable<T> values);
    public StringBuilder AppendJoin(string separator, IEnumerable<string> values);
    public StringBuilder AppendJoin(string separator, params string[] values);
}

or as above plus

public class StringBuilder {
    // string.Join overloads NOT taken:
    public StringBuilder AppendJoin(string separator, string[] values, int startIndex, int count);
}

Where is char separator?
Now it is next

public StringBuilder AppendJoin<T>(char separator, params T[] values)
public StringBuilder AppendJoin<T>(string separator, params T[] values)
public StringBuilder AppendJoin<T>(char separator, IEnumerable<T> values)
public StringBuilder AppendJoin<T>(string separator, IEnumerable<T> values)

@AlexRadch thanks for catching it, I added the char overloads: https://github.com/dotnet/corefx/issues/3419#issuecomment-268397129

Spec in https://github.com/dotnet/corefx/issues/3419#issuecomment-268397129 updated with removal of IEnumerator<string> overloads. They are not necessary (see https://github.com/dotnet/corefx/issues/5552#issuecomment-268673073).

@AlexRadch is this something you are still interested in working on? If not we can unassign you and flag this as up-for-grabs.

For context: This is the the form approved:
We should adjust the API to string.Join (see example, incl. usage in https://github.com/dotnet/corefx/issues/5552#issuecomment-268395367).

This is the shape we should adopt:

public class StringBuilder {
    public StringBuilder AppendJoin(string separator, params object[] values);
    public StringBuilder AppendJoin<T>(string separator, IEnumerable<T> values);
    public StringBuilder AppendJoin(string separator, params string[] value);

    public StringBuilder AppendJoin(chat separator, params object[] values);
    public StringBuilder AppendJoin<T>(char separator, IEnumerable<T> values);
    public StringBuilder AppendJoin(char separator, params string[] value);

    // string.Join overloads NOT taken:
    //public StringBuilder AppendJoin(string separator, IEnumerable<string> values);
    //public StringBuilder AppendJoin(string separator, string[] values, int startIndex, int count);
    //public StringBuilder AppendJoin(char separator, IEnumerable<string> values);
    //public StringBuilder AppendJoin(char separator, string[] values, int startIndex, int count);
}

Updated the top most post with latest approved API shape.

Unassigning the issue - it is "up for grabs" again, available for anyone to pick it up.
Next steps: We need to check what is in CoreCLR and how it reflects the latest API approval. Then expose the API in CoreFX and add tests in CoreFX.

Moving to Future as this is not necessray for 2.0
cc @joperezr @AlexGhiondea

@hughbe this is not a 2.0 issue, but at some point it may interest you. Exposing API on stringbuilder that is already implmented.

I really want to see this in 2.0. Can I help?

Sure, I'll sign it to you (once GitHub let's me - from unknown reason, you don't show up in the list of users there :()

@jnm2 Contacted GitHub support - your name does not show up either in query Author/Assignee drop downs. Similar to another user problems earlier today (he was 'flagged').

That doesn't bother me. So:

Next steps: We need to check what is in CoreCLR and how it reflects the latest API approval. Then expose the API in CoreFX and add tests in CoreFX.

CoreCLR needs to be updated. It currently has:

c# public class StringBuilder { public StringBuilder AppendJoin<T>(string separator, params T[] values); public StringBuilder AppendJoin<T>(string separator, IEnumerable<T> values); public StringBuilder AppendJoin<T>(char separator, params T[] values); public StringBuilder AppendJoin<T>(char separator, IEnumerable<T> values); }

@karelz Are these the steps I should take?

I couldn't find an API addition checklist documented via CONTRIBUTING.md. Anything I missed?

It's weird to see tests in both places, I would expect them only in CoreFX -- @AlexGhiondea @joperezr any idea?

Your plan sounds reasonable (once we clarify where the tests should go - I think we will have to skip adding them in CoreCLR).

Contributions docs under construction have some rough links: https://github.com/dotnet/corefx/wiki/Contributions (feel free to help us improve them)

@jnm2 your plan looks good. Re the tests you found in coreclr - I do not think we add new tests for corelib in the coreclr repo. I think those are legacy although I don't know the history. Although they are nice because they can be modified in the same PR, tests in corefx are preferred because they can be reused over several runtimes (CoreCLR, CoreRT, Mono, UWP,.. .)

I pushed https://github.com/dotnet/coreclr/pull/11059. Next I'd like to write some corefx tests.
Is there a guide on how to run corefx tests against a local build of coreclr?

If we want to match string.join then the string[] is value and object[] is values no doubt by accident. I updated above since this matches the stated goal. However values in both seems better ot me. @terrajobst ?

https://apisof.net/catalog/System.String.Join(Char,String())

Please say we can be values...

@hughbe thanks for the link, added to the wiki (hopefully will grow into set of short straight-to-the-point reference docs)

@jnm2 ok, Contributor != Collaborator ... sigh, should have caught it myself. I added you to the repo Collaborators list, you should get an invitation from me. Let me know when you accept ...

@karelz Done.

I have CoreCLROverridePath set as an environment variable. When I run build.cmd and then try to build System.Runtime.Tests in VS2017, I get:

Metadata file corefxbinAnyOS.AnyCPU.DebugRemoteExecutorConsoleAppnetstandardRemoteExecutorConsoleApp.exe' could not be found

Any idea why?

Using the VS2017 dev terminal, msbuild /t:buildandtest works fine on System.Runtime.Tests.csproj. I'd just like to be able to run tests in the IDE.

@jnm2 does once doing build-tests.cmd at the root fix it?

@danmosemsft Yes, that fixed it.

It's not a problem for me, but build-tests fails four times with this message:

ZipArchivezip_netcoreappTests.cs(24,57): error CS1061: 'ZipArchiveEntry' does not contain a definition for 'ExternalAttributes' and no extension method 'ExternalAttributes' accepting a first argument of type 'ZipArchiveEntry' could be found (are you missing a using directive or a
n assembly reference?) [C:UsersjmusserSourceReposcorefxsrcSystem.IO.CompressiontestsSystem.IO.Compression.Tests.csproj]

Was this page helpful?
0 / 5 - 0 ratings