Runtime: Add some new StringBuilder.Replace overloads

Created on 4 May 2018  路  11Comments  路  Source: dotnet/runtime

Suggestions:

I suggest to add some new overloads to the StringBuilder.Replace method to:

1- return the no. of replacements via an out param:

This code:

SB.Replace(x, y, out int c)
if (c == 0)
   MessageBox.Show("Not Found");

is more efficient that this one, which will search the sb twice in case the search string exists (suppose IndexOf added):

if (SB.IndexOf(x) == -1)
   MessageBox.Show("Not Found");
else
   SB.Replace(x, y)

2- to replace a range of chars with a string:

public StringBuilder Replace(int start, int length, string newValue);
not that newValue.Length doesn't have to be equal to the length param.
ex:

sb.Replace(1 , 5, "Test string"); 

The above code is the alternative to the next code:

Sb.Remove(1, 5);
Sb.Insert(1, "Test string");

3- To allow match regex expression:

public StringBuilder Replace(string Regex, string newValue);
public StringBuilder Replace(string RegexPattern, string newValue);

I suggested another alternative here: dotnet/runtime#26070

4- Give the stringBuilder the ability to load a file directly, say with a static method:

public static StringBuilder StringBuilder.FromFile(string FileName);

Benefits:

Suppose you have a 10MB file containing some patterns that you need to replace, some of them has the \n symbol, so they can match strings spitted on more than one line. In this case you can't read each line in a String and then use the Regex to match and replace the patterns. Instead, you need to read the whole text into a String and apply the Regex.Replace on it several times. Since String is immutable, recreating a new 10MB String with each Replacement can be slow, so I suggest to do the Regex.Replace directly on the StringBuilder. This will be possible when StringBuilder implements a perfect way to iteration , which @vancem is already working on.
I also suggest you to give the stringBuilder the ability to load a file directly, say with a static method:
public static StringBuilder StringBuilder.FromFile(string FileName);
It is more efficient to copy the file into the String Builder chunks directly than reading blocks of the file as strings then appending them to the StringBuilder.

api-needs-work area-System.Runtime

Most helpful comment

I have a private PR open that entirely removes the StringBuilder use in Regex.Replace and operates on Span.
I prefer adding a Span overload instead of a StringBuilder one to get the most efficiency and the least heap allocation.

All 11 comments

We need a formal APIProposal along with some use cases explaining why we want this on StringBuilder and don't just use string instead. We want to be very careful and not try to make StringBuilder have the same functionality as String since they are meant to serve two different purposes.

@joperezr
Suppose you have a 10MB file containing some patterns that you need to replace, some of them has the \n symbol, so they can match strings spitted on more than one line. In this case you can't read each line in a String and then use the Regex to match and replace the patterns. Instead, you need to read the whole text into a String and apply the Regex.Replace on it several times. Since String is immutable, recreating a new 10MB String with each Replacement can be slow, so I suggest to do the Regex.Replace directly on the StringBuilder. This will be possible when StringBuilder implements a perfect way to iteration , which @vancem is already working on.
I also suggest you to give the stringBuilder the ability to load a file directly, say with a static method:
public static StringBuilder StringBuilder.FromFile(string FileName);
It is more efficient to copy the file into the String Builder chunks directly than reading blocks of the file as strings then appending them to the StringBuilder.

@viktorhofer

To me this sound like a misuse for StringBuilder. As it's name implies it is for __building__ strings, not for manipulation or loading from file.

For the given example one could read the file to a regular string. Seek per regex for the matches that should be replaced in that string, then perform the replace in a second step and build with a StringBuilder the result. So there are no unnecessary allocations and copies for the string, and all of this can be done with existing APIs.

Even better would be to spanify the things, and push Regex et.al to understand Spans. See https://github.com/dotnet/corefx/issues/27267 and https://github.com/dotnet/corefx/issues/24145 for this.

@gfoidl
There is no shame in making stringbuilder powerful. Letting most of its functions to be done by the string is not designed intentionally but resulted from the fact that SB has inefficient indexing and itteration performance bracuse it is implemented as a linked list. Vancem implemented a solution so adding more prohibited methods became possible.

@gfoidl
I looked at the source code of Regex.Replace. It uses a list and a stringBuilder. This means that if I had a text in a SB, I need to covert it to a string to call the Regex.Replace which will use a string builder to do a replacement and then convert it to a string!
Why don't write a version of Regex.Replace that accepts a SB directly?
Besides, a Regex.Replcae overload that accepts an array of patterns and and array of replacements, to do multiple replacements at once. It will be more efficient than rebuilding the resulted string repeatedly in a loop. Such a method will need to find all the matches, arrange them by start index, do the replacements.
It can be done by filling an array with all Matched info using this structure

struct MatchInfo
{
public Match match;
public int replacementIndex;
}

Then the array can be arranged by MatchInfo.m.StartIndex.
This may be an overhead, but we can choose to use it if we want to do a lot of succive replacement on a large text.

I have a private PR open that entirely removes the StringBuilder use in Regex.Replace and operates on Span.
I prefer adding a Span overload instead of a StringBuilder one to get the most efficiency and the least heap allocation.

I also suggest you to give the stringBuilder the ability to load a file directly, say with a static method:
public static StringBuilder StringBuilder.FromFile(string FileName);
It is more efficient to copy the file into the String Builder chunks directly than reading blocks of the file as strings then appending them to the StringBuilder.

That's what Span/Memory is for. You can avoid allocating strings by reading the file's content into a buffer with StreamReader.Read(Span<char>) and pass the buffer's slice to StringBuilder.Append(ReadOnlySpan<char>).

@ViktorHofer

and pass the buffer's slice to StringBuilder.Append(ReadOnlySpan)

That's nice. I'm not familiar with Span yet, but I expect that the append method will read the chars from the ReadOnlySpan and copies them to the chunks (at least bbecause it is a "ReadOnly" span) .
What I want is to lood 8000 char from the file into a char buffer, and assign this buffer directlt (by a pointer) to the internal char array that represents one of the Stringbuilder chunks, and repeat this action till the end of the file. I think this is the fastest way ever and the less memory use ever.

What I want is to lood 8000 char from the file into a char buffer, and assign this buffer directlt (by a pointer)

Yes, this is exactly what Span & Friends are for.

I'm not familiar with Span yet

See (the excellent) C# - All About Span: Exploring a New .NET Mainstay

Was this page helpful?
0 / 5 - 0 ratings