How often do we have written something like:
string stripped = "cd cd ab cd ab cd cd".Replace("ab", string.Empty);
only to remove all string occurrences from a string. We did not used the existing string.Remove method becausestring.Remove has no overload that takes a string. In order to achieve the same we would have to create a loop calling string.IndexOf first to find all occurrences and call string.Remove on each of them. This is a lot of code.
And even when removing a single character, we probably have written it as a string, because there is no such as an empty char.
string stripped = "c d c d a c d a cd cd".Replace("a", string.Empty);
Instead of replacing a string with string.Empty (which defacto removes it so we abused the API), there should be a method which removes the string or char directly.
namespace System
{
public sealed class String {
+ public string RemoveCharacters(this string, string value)
+ public string RemoveCharacters(this string, char value)
+ public string RemoveCharacters(this string, params string[] value)
}
}
``` C#
string stripped = "cd cd ab cd ab cd cd".RemoveCharacters("ab");
string stripped = "c d c d a c d a cd cd".RemoveCharacters('a'); //can slightly be more faster when providing a char instead of string
string stripped = string.RemoveCharacters("rn", " ", "-")` //instead of string.Replace("rn", string.Empty).Replace(" ", string.Empty).Replace("-", string.Empty)
## Alternative Designs
Alternatively we take the existing [string.Remove](https://docs.microsoft.com/en-us/dotnet/api/system.string.remove?view=netcore-3.1) method and only add overloads to it instead of introducing a new method name.
```diff
namespace System
{
public sealed class String {
+ public string Remove(this string, string value)
+ public string Remove(this string, char value)
+ public string Remove(this string, params string[] value)
}
}
None
//FYI: Edited because I've noticed there is a string.Remove but accepts only a range / integers instead of a string or char to look for and remove all occurrences. So I've renamed my method to RemoveCharacters. But string.Remove feels better and avoids adding a new method, we could just add more overloads to it. I leave the final name up to the .NET team.
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
FWIW most of the real-world empty replacement examples I see are things like string.Replace("\n", "") and string.Replace("-", "").
The name ReplaceCharacters would be misleading.
Such name as "ReplaceCharacters" suggests that the argument is more likely a set of characters rather than a sequence to replace. From a method with such a name i would expect it to replace each individual character, if it is found in the argument string.
@Symbai
Instead of replacing a string with string.Empty (which defacto removes it so we abused the API), there should be a method which removes the string or char directly.
Why? Do you think such code would be easier to understand? Or that it would be faster? Or some other reason?
If it's (at least partially) about performance, can you provide benchmarks showing how much faster the new methods would be?
The name ReplaceCharacters would be misleading.
Yes, I fully agree. Actually it should still be "Remove" but just with overloads for providing a string and a char. But I still leave this to the .NET team, as the name of the method is a minor thing to me. Just want to avoid of someone says "Ha, this already exist" while the existing is not helpful on this.
Why? Do you think such code would be easier to understand? Or that it would be faster? Or some other reason?
Both. If a developer writes something like this, he never means to replace something but to remove something. Otherwise he wouldn't use string.Empty or "". Regarding a benchmark, this will be difficult. You expect me to write a full API method. To be honest this is out of scope for me. I expect a method to be faster since it doesn't have to replace but only to remove characters from a string. If it turns out that there is a little to nothing performance improvements, I wouldn't mind, although I still believe there could be some benefit (perhaps using some Span.Slice?!). But let me ask you something instead. Why do you expect developers to use a replace method, when they usually want to remove a string? The only answer to this can be that "it just works" or "it has been like that for years". I am right? So what's your point here being against this API that just make things more clear in worst scenario?
So what's your point here being against this API that just make things more clear in worst scenario?
I'm not necessarily against this API. But every new API has its costs: apart from the obvious ones (it has to be implemented, reviewed and documented), there are also others (it has to be maintained for a very long time, it makes figuring out which method you should use harder, the opportunity cost, etc.). And that means every new API has to properly justify its existence.
So, if the new API was only marginally better, "it just works" sounds like a good justification to keep only the old API to me. And it's up to you, and others who agree with you, to convince the decision makers (and BTW, I am not one of those) that the costs are worth it.
As @svick says, API's have costs, which he listed very well. Consider someone new to .NET, who has a string and hits dot. They have to hunt through the list of API's. Maybe they see Replace(char, char) and RemoveCharacters(char). They are looking at code that does one, should they use the other? Is it faster? Maybe the code now has a mixture.
This seems to have no performance, efficiency, or discoverability benefits over an API that folks have used successfully for a long time.
I have to disagree with @danmosemsft that there is negative value for new users with this API, especially if we name it simply Remove(string, string). When I was new to .NET and had to remove a substring from string for the first time, the method I thought would be the answer was string.Remove, but that didn't have the right signature, so I went and googled the solution. Granted, it was easy, but if we had string.Remove(string, string) back then, I would have just used that, no SO or docs required.
It always seemed to me that string.Replace("whatever", string.Empty) is a bit of a hack. Yes, hacks that are commonly used become idioms, but I personally do not think this is a good one.
Maybe they see
Replace(char, char)andRemoveCharacters(char). They are looking at code that does one, should they use the other? Is it faster? Maybe the code now has a mixture.
_Honestly I think people will be more confused about things like Directory.GetFiles and DirectoryInfo.EnumerateFiles (which one is better? Which one is faster?) and all the different ways of retrieving the application executable path where even the answers on stackoverflows are differently because there are so many ways and APIs for it_.
In our case you always choose the API that makes sense. A replace method replaces, a remove method removes. If you want to remove, you choose the remove method. We do HAVE a string.Remove already. All it needs and what this proposal is about is an overload that accepts a string or a char, because its extremely often used. The reason why developers dont use the existing string.Remove is because they first would call string.IndexOf and also because they often want to replace all occurrences and not just one, which then requires to write a loop. This is why most take the replace method.
So, if the new API was only marginally better
IF, and what if not!? What if the implementation also allows passing multiple strings to remove and we can write something like this: string.Remove("\r\n", " ", "-") instead of string.Replace("\r\n").Replace(" ").Replace("-") ? It not only will be faster but also allocates less. Interestingly I've seen the last one also quite often myself and I would also write it myself this way. But only because the alternative and correct version as of now requires much more code and a loop.
What if the implementation also allows passing multiple strings to remove and we can write something like this
That was only added to the proposal 6 hours ago. It wasn't part of the suggested API surface when we looked at it.
Anything else that might be missing? Do we need overloads that take params string[], etc.?
That was only added to the proposal 6 hours ago.
The normal proposal should be still faster. It takes one parameter less. Svick for some reason doubt that and brought up performance as a reason against this proposed overload. Since I don't have written the implementation and cannot give him benchmark numbers, I gave him an extreme example where its obvious. Please note that I've put this in an "extension" note. That means it can be added but doesn't has to. Its meant to be as an "things we can do to improve it further, once the original proposed API is accepted".
Do we need overloads that take
params string[]?
Yes it might be better considering these APIs also take an array as parameter:
string.IndexOfAnystring.Joinstring.Concatstring.TrimSo it would line up with other existing APIs. Should I edit the first post again?
So it would line up with other existing APIs. Should I edit the first post again?
Sure - feel free to edit the post as many times as you need to. :)
I can see use cases for removing all occurrences of certain characters like '-' or '\r' from a string instance. Do you think it's common to want to remove multiple characters: '-' _and_ '\n' together? What kind of code do you see benefitting from this?
What kind of code do you see benefitting from this?
It heavily depends on the scenario but an extremely common real world scenario is when the line separator is unknown. Different OS, different line separator and .NET Core is cross platform compatible. Thus something like string.Replace("\r\n", string.Empty).Replace("\n", string.Empty) is expected. More specific scenarios are:
These are common scenarios. But I also see something like string.Replace(string1, string.Empty).Replace(string2, string.Empty) quite often, where string1 and string2 are project depending strings. So its hard to tell you a common scenario for this, but yeah it happens.
- Stripping invalid file path characters
- Stripping invalid URL characters
I wouldn't recommend this API for those scenarios. For scenarios such as automatic slug or filename generation, the best thing to do is to keep characters which are allowed, throwing away everything else.
- Stripping wildcard characters
- Stripping alphabetic or numbers
Are these all that common? For alphanumeric characters in particular, I don't think we'd want to encourage people to call the API string.RemoveChars(<here's an array of 62 chars>). Once we start getting into those scenarios we're basically talking string.RemoveWhere(Predicate<char> filter), at which point your code would look like:
string myString = GetSomeString();
myString = myString.RemoveWhere(char.IsLetter); // strawman API
I don't think we'd want to encourage people to call the API
Yes, this was a bad idea, sorry. When we talk about smaller arrays someone would like to remove from a string AND the content of the array are common strings/characters I can currently only think about everything which has the same meaning can be written differently. Line Endings: \n \r\n, quotes: " ', thousand separators: . , etc.
I've downloaded this, WPF and Winforms repo and did a regex seach for .Replace().Replace() pattern and like I said previously, most of the time it wants to remove project depending strings, some examples:
https://github.com/dotnet/runtime/blob/def5b48f55646d59faee7a92d26e648e67a104c4/src/libraries/System.ComponentModel.Annotations/src/System/ComponentModel/DataAnnotations/FileExtensionsAttribute.cs#L34-L35
https://github.com/dotnet/runtime/blob/def5b48f55646d59faee7a92d26e648e67a104c4/src/libraries/System.Numerics.Tensors/tests/TensorTests.cs#L2166
The proposed overloads which removes a single string/character are used more often.
https://github.com/dotnet/runtime/blob/def5b48f55646d59faee7a92d26e648e67a104c4/src/libraries/Common/tests/System/Diagnostics/DebuggerAttributes.cs#L154
https://github.com/dotnet/runtime/blob/def5b48f55646d59faee7a92d26e648e67a104c4/tools-local/tasks/installer.tasks/BuildTools.Publish/CloudTestTasks/AzureHelper.cs#L137
Add
public string Remove(this string, params char[] value)
Most helpful comment
FWIW most of the real-world empty replacement examples I see are things like
string.Replace("\n", "")andstring.Replace("-", "").