Runtime: API Implementation CR: string.IsNullOrEmpty & string.IsNullOrWhitespace functions

Created on 18 Nov 2019  路  38Comments  路  Source: dotnet/runtime

Both functions have massive overheads with string reallocations. And, both these functions are possibly the most used of all C# constructs.

Proposal:
Two new functions: IsNullOrEmptyByRef and IsNullOrWhitespaceByRef (new names to avoid breaking other code).

Both the new functions would accept a ref parameter to the string being checked.

area-System.Runtime

Most helpful comment

Please share a runnable micro benchmark that we can run to see the issue you're concerned about as well as your results from said benchmark.

All 38 comments

Not sure I understand. these methods shouldn't cause any allocations?

Correct. It sure seems silly if the comparisons are few and far between. But start doing something like loading a GB-sized CSV file at Web App start up, and things quickly become a nightmare. Had the nightmarish experience and found this solution after a lot of debugging and experimentation.

@sujayvsarma What is your specific concern with these functions? As Ben said, both of these functions are highly optimized already. Neither of these two functions should cause any allocations as you had suggested.

My concern is not with the code _inside_ those functions. My concern is with the way the parameters are passed into the functions.

You can see this when

if (string.IsNullOrEmpty(str)) { ... }

is slower than

if ((str == null) || (str.Length == 0)) { ... }

over thousands of iterations, with big strings.

@sujayvsarma Assembler code is almost identical in release mode: sharplab.io. In particular, no parameter passing involved at all since the IsNullOrEmpty method is inlined. If there are any allocations, they happen in your code and not within this API.

....and just in case it might be the next avenue of a question, strings in .NET are reference types, not value types, so no copy/allocation happens for the parameter to the method call.

I haven't raised the item because I was bored.

This was a real life performance issue encountered. The cause was that one function. When it was rewritten (== null and .Length == 0), the problem went away. Because of wordings like the above and SO threads seeking to prove the same with perf comparisons, it literally took many days (because everyone was "Oh that is not the problem surely!"). But it _was_ the problem.

Please share a runnable micro benchmark that we can run to see the issue you're concerned about as well as your results from said benchmark.

The implementation of string.IsNullOrEmpty lives here: (And I know that this is mirrored copy from coreclr, previews won't work in different repos...)
https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/Common/src/CoreLib/System/String.cs#L431-L441
It is O(1) operation since it does not involve checking the whole of the string (the length of the string is stored in the instance itself), so the size of the string would be irrelevant for this particular method call.

As far as string.IsNullOrEmpty go, it does depend on the length of the string & where the first non-whitespace character is at, but it does not involve checking the whole of the string once it encounters a non-whitespace character (since from that point it's known not to be whitespace-only):
https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/Common/src/CoreLib/System/String.cs#L443-L453

Given that strings are already reference types, passing the string instance to another method would be as simple as passing a pointer-sized number - 4 bytes on 32bit environment, 8 bytes on 64bit environment. I don't see the benefit of passing something that is already a reference by reference to the method, if it's not worse.

Also, I expect the overhead of calling string.IsNullOrEmpty to be almost none, since as many others have mentioned above, the JIT compiler is usually able to eliminate the call & "inline" the method body into the caller method, making the codegen virtually identical as when you write the checks out explicitly.

This was a real life performance issue encountered. The cause was that one function. When it was rewritten (== null and .Length == 0), the problem went away.

How did this affect your performance? Bytes allocated? Speed? any other things that I haven't mentioned? How did you notice that the problem existed? Why do you believe the root problem is actually that specific method & What made you to believe changing it out to something else solved the problem? It would be great if you could provide some quantitative data on how the performance was affected.

For reasons described above, I do not think the method itself is causing any problems, nor introducing the by-ref variant would help. Rather, I believe the problem resides elsewhere, such as buggy JIT (I'm not blaming the JIT team, but it's still a possibility 馃槄) or the environment you're running the code etc.

Could you perhaps provide some minimal reproduction code, and some numbers to back the perf implications?

I would recommend you to try something like BenchmarkDotNet and PerfView (Link to repo here and some article about it) to investigate about it.

But start doing something like loading a GB-sized CSV file at Web App start up

馃憖

(new names to avoid breaking other code)

Nit: It should not be a breaking change to introduce an overload that differs by by-val/by-ref since since you need ref keyword to call the by-ref one:

void Ayy(int a) => Console.WriteLine("a");
void Ayy(ref int a) => Console.WriteLine("aa");
Ayy(1) // Prints 1

@sujayvsarma

You can see this when

if (string.IsNullOrEmpty(str)) { ... }

is slower than

if ((str == null) || (str.Length == 0)) { ... }

over thousands of iterations, with big strings.

I think there are two possibilities why that would be the case:

  1. The call to IsNullOrEmpty is not inlined. Normally, that should not happen, but there could be some situations where it does happen.
  2. Per the disassembly from @AntonLapounov, the two approaches generate different machine code. In particular, the version with IsNullOrEmpty has a conditional jump (jbe), while the second version does not (it uses setz instead). This could have a significant effect on performance.

Both of those are things that could be improved, but neither of them is going to be improved by passing the string by ref.

Out of curiosity, I went ahead and benchmarked this. I admit there is a great chance the way the benchmarked methods are written is sub-optimal. FWIW, YMMV, and all that. 馃槢 I won't pretend to know anything about how the resultant IL would get executed on the processor, what effect caching might have, etc.

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
AMD Ryzen 7 2700X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=3.0.100
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
  DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
public class Benchmarks
{
    [ParamsSource(nameof(ValuesForS))]
    public string S { get; set; }

    public IEnumerable<string> ValuesForS => new[] { null, "", "Foo" };

    [Benchmark]
    public bool ActualMethod() => string.IsNullOrEmpty(S);

    [Benchmark]
    public bool ActualAlgorithm() => (S == null || 0u >= (uint)S.Length) ? true : false;

    [Benchmark]
    public bool ActualAlgorithmAlternateComparison() => (S == null || 0u == (uint)S.Length) ? true : false;

    [Benchmark]
    public bool ActualAlgorithmNoTernary() => (S == null || 0u >= (uint)S.Length);

    [Benchmark]
    public bool ActualAlgorithmNoTernaryAlternateComparison() => (S == null || 0u == (uint)S.Length);

    [Benchmark]
    public bool CustomAlgorithm1() => S == null || S.Length == 0;

    [Benchmark]
    public bool CustomAlgorithm2() => !(S is string s && s.Length > 0);
}



md5-0246d65156c4898ac143ca786088ee91



|                                      Method |   S |      Mean |     Error |    StdDev |
|-------------------------------------------- |---- |----------:|----------:|----------:|
|                                ActualMethod |   ? | 0.2843 ns | 0.0041 ns | 0.0038 ns |
|                             ActualAlgorithm |   ? | 0.2875 ns | 0.0145 ns | 0.0128 ns |
|          ActualAlgorithmAlternateComparison |   ? | 0.2741 ns | 0.0120 ns | 0.0101 ns |
|                    ActualAlgorithmNoTernary |   ? | 0.2719 ns | 0.0096 ns | 0.0090 ns |
| ActualAlgorithmNoTernaryAlternateComparison |   ? | 0.2731 ns | 0.0041 ns | 0.0036 ns |
|                            CustomAlgorithm1 |   ? | 0.2849 ns | 0.0168 ns | 0.0149 ns |
|                            CustomAlgorithm2 |   ? | 0.2853 ns | 0.0161 ns | 0.0151 ns |



md5-d36c86a9558f42c1b7a1120b0dbbb051



|                                      Method |   S |      Mean |     Error |    StdDev |
|-------------------------------------------- |---- |----------:|----------:|----------:|
|                                ActualMethod |     | 0.2159 ns | 0.0044 ns | 0.0041 ns |
|                             ActualAlgorithm |     | 0.2779 ns | 0.0063 ns | 0.0056 ns |
|          ActualAlgorithmAlternateComparison |     | 0.2733 ns | 0.0031 ns | 0.0029 ns |
|                    ActualAlgorithmNoTernary |     | 0.2772 ns | 0.0052 ns | 0.0049 ns |
| ActualAlgorithmNoTernaryAlternateComparison |     | 0.2783 ns | 0.0068 ns | 0.0063 ns |
|                            CustomAlgorithm1 |     | 0.2783 ns | 0.0061 ns | 0.0057 ns |
|                            CustomAlgorithm2 |     | 0.0340 ns | 0.0078 ns | 0.0073 ns |



md5-f572647bfb5487fc9825b1584193fffd



|                                      Method |   S |      Mean |     Error |    StdDev |
|-------------------------------------------- |---- |----------:|----------:|----------:|
|                                ActualMethod | Foo | 0.2157 ns | 0.0024 ns | 0.0021 ns |
|                             ActualAlgorithm | Foo | 0.2716 ns | 0.0045 ns | 0.0042 ns |
|          ActualAlgorithmAlternateComparison | Foo | 0.2664 ns | 0.0041 ns | 0.0038 ns |
|                    ActualAlgorithmNoTernary | Foo | 0.2939 ns | 0.0107 ns | 0.0100 ns |
| ActualAlgorithmNoTernaryAlternateComparison | Foo | 0.2749 ns | 0.0148 ns | 0.0138 ns |
|                            CustomAlgorithm1 | Foo | 0.2771 ns | 0.0079 ns | 0.0074 ns |
|                            CustomAlgorithm2 | Foo | 0.0232 ns | 0.0112 ns | 0.0099 ns |

real life performance issue encountered

Check that you don't use DEBUG build of the benchmark

Since no one seems to be interested in understanding the problem and a bulk of the response seems to be around benchmark numbers with contrived tests that have nothing to do with real life scenarios and blind insistence am closing this issue.

I am not trying to dismiss your claim entirely; I just think that the problem is not something to do with the aforementioned method (or parameter passing), and so I would like to look at the problem with more context.

If you could post the source code (Not the CSV file, because that likely would have sensitive information in it) before and after the change of how you're using this method to process your large CSV file, with how the performance was improved with the change (e.g. bytes allocated, time elapsed), that would be a great start to diagnose the problem. If you don't feel like posting your source code here, perhaps other MS employees can help?

We would need to understand the problem about what is causing it & how much it affects the performance before we fix it, because almost every solution to a problem have drawbacks; Especially API additions, since they are only additive (i.e. they can never be removed once it's added). Justifications are needed and this isn't enough of it.

benchmark numbers with contrived tests that have nothing to do with real life scenario

I agree. Those micro-benchmarks are unlikely to highlight the problem. Which is why I'm suggesting that you should post the source code of your method that processes the CSV file.

Minimal reproduction code would be nice; but I'm worried that the problem might get hidden with the change.

The CSV file is not scary or sensitive. But the basic API spec was its a CSV file, so use the RFC specs to read it [with all its ifs and buts]. So the code that parsed the file had to consider all of those ifs and buts. And, it needed to do this every time the app started up.

Since it was a Web API, it had to do it as fast as the web server would wait. The web server in turn [out of the box config] was enforcing a 15 second time limit for the whole startup.

WITH the string class's built-in methods, the Null/Empty [not interested in whitespaces at the moment] check was pushing it ABOVE the 15 second time limit, which meant the app would time out at startup. When we removed all of the string.IsNullOrEmpty checks and went with ((str == null) || (str.Length == 0)) instead, the app started spinning up in under 7 seconds. Half the time. With just that one change. What does that tell you? string.IsNullOrEmpty is SLOW, anti-performance.

@sujayvsarma

What does that tell you? string.IsNullOrEmpty is SLOW, anti-performance.

It tells me that in your specific case, replacing string.IsNullOrEmpty seemed to improve performance significantly. That could have a number of causes, but all we can do is to guess about them unless we're able to reproduce your results.

What it doesn't tell me is that string.IsNullOrEmpty is slow in general. And it certainly doesn't tell me that adding an overload where the string is passed by ref would be better.

Statement A = 18.5 seconds, causes timeout, app fails to be loaded, customers angry
Statement B = 7.32 seconds, everything works perfectly

That could have a number of causes

Root cause: Statement A

seemed to improve performance

A perf improvement by 2x is not a "seemed". A tangible "app failed to start" vs "app running smoothly" is not a "seemed".

And, it needed to do this every time the app started up.

@sujayvsarma ah, if its during startup it may be tiered compilation not inlining (at tier0); the Jit compilation is lighter at startup (for faster compilation); then then revisits those methods and improves them at runtime (for faster code performance).

You could try adding this to your .csproj and see if it makes a difference:

<TieredCompilation>false</TieredCompilation>

Then publish you app as release mode in the usual way

dotnet publish -c Release

@kouvel would R2R inline string.IsNullOrEmpty in user code, or do the version bubbles not work that way atm?

R2R and tier 0 JIT currently would not inline string.IsNullOrEmpty. For R2R, the version boundary is the module currently, and tier 0 JIT currently does not do any inlining. I'd be interested in knowing if and how much turning off tiered compilation as indicated above helps. It may make sense to inline small methods like these during tier 0 JIT.

It may make sense to inline small methods like these during tier 0 JIT.

R2R could mark them as good candidates in some way without the JIT having to inspect the full method at tier 0?

Maybe, though the IL size might be available without scanning the method

It may make sense to inline small methods like these during tier 0 JIT.

I thought this too, at one point (see for instance dotnet/coreclr#14474), but am now somewhat ambivalent. There is a fair amount of jit-time overhead involved inlining, and small methods are frequent enough that even modest thresholds (say just big enough for most property getters and setters) leads to a lot of inlining. This overhead has to be won back by less jit time downstream of inlining and less time spent in tier0 jitted code.

@sujayvsarma the results you report are surprising. I'm sure a number of folks (myself included) would be happy to take a closer look, if you can share an example that reproduces the results you see.

@AndyAyersMS - I don't know how you would reproduce it. The app loads CSV files at startup. It really struggled with one of them, that had well over 500K rows with 18 columns. The loader has to check each column against the RFC specs of how a CSV field should work and then load it. Of course, this involves multiple checks.

As part of our process to find the issue and fix it, we started by replacing .NET constucts with basic language equivalents. Like, we got rid of .NET Collections and Lists and used simple arrays (eg: List<string> becomes string[]). We removed LINQ and used loops, and so on. There were two big areas where we found a massive jump in performance:

  1. When we replaced the string.IsNullOrEmpty with ((str == null) || (str.Length == 0))
  2. When we removed LINQ and went with basic loops.

I understand why Linq would cause a perf jump, but I have no idea why the string functions would impact it that much.

The app is a [heavily pared-down] Asp .Net Core MVC application and it does the file loading in the ConfigureServices() function in its Startup once everything has been initialized and done, because it needs all that data to be up and loaded well before the first request hits it.

have no idea why the string functions would impact it that much

This is why you're hearing skepticism and folks asking for repros: we similarly are struggling to understand how this particular function could have the cited impact, and need something we can examine to investigate (or in the process of creating the repro determine the issue was actually elsewhere).

Thanks.

I know. Being someone that's used .NET for the past 19 years, its odd.

One more thing -- The code that was having the issue was built in Debug mode. Not Release. That was still the Build+QA phase of the project.

I am not an ASM/IL guy, don't understand it at all. I created a SharpLab gist that kind of approximates the way I am using it -- of course, with dummy functions to simulate any calls it makes to other functions. I noticed that the IL code is quite different when I changed how the checks are used. And, it seems to change heavily when I toggle the mode to Debug.

Maybe you can understand it better?

Here is the gist

The code that was having the issue was built in Debug mode

Oh!

Optimizations go out the window in debug. If you care about perf at all, do not run in debug.

I know that. Post-QA builds are Release. But we need a way to find out where bugs happen :)

Sure. But you can't expect debug builds to run as efficiently as release builds. Debug builds not only disable optimizations like inlining, they often add additional costs in the form of additional checks exactly to help with finding such bugs

I'm considering this is addressed.

Thanks.

Eh? That's like saying

"we will only support Release mode builds. You run on debug-mode you're on your own."

That's like saying "we will only support Release mode builds. You run on debug-mode you're on your own."

It is in no way saying that.

Do read your reaction. The moment I said debug mode, you're like "considering this addressed".

Look, in my particular scenario, we found the problem, fixed it [by using an alternative that worked for us] and moved on. I filed this out of interest that someone will get to the bottom of it... given it could very well be the most used function in all of .NET, and fix it for everyone else.

I still believe there is something seriously funky going on. But if this is all your definition of "customer focus" is, then...

Best wishes.

@sujayvsarma Having to increase timeout when you're processing a lot of data in Debug mode sounds acceptable to me. Minor changes resulting in 2x performance difference in Debug mode also sounds acceptable to me.

Debug mode is not there to be efficient, it's there to help you find bugs and you pay for that by having worse performance. Though Release mode doesn't mean you can't find bugs, it just makes it harder.

Trying to optimize Debug code is almost always pointless, since it's not representative of the performance of the same code in Release mode, which is what actually matters.

In other words, I don't see any evidence there is anything "seriously funky" happening here.

The moment I said debug mode, you're like "considering this addressed".

That's because you're reporting that debug code is too slow for production use. This is expected.

@danmosemsft - I really hope you read official stuff better than you read my statements. WHERE in any of my statements above have I said ANYTHING about the situation being "in production" ? Is this is another example of "customer focus"?

You should know that people use Debug builds [not Release!] right until the UAT (user acceptance testing) phase of projects. It is only for pure prod perf testing and actual prod & DR that Release builds are used. This is not an exception to any "rule". Thats how we do stuff.

Therefore making statements like what you and @svick are, is just grandstanding and trying to push issues under a carpet.

Fair enough, I didn't mean to misrepresent what you're doing. Nevertheless the point is that we dont expect much from the performance of debug builds.

@sujayvsarma

You should know that people use Debug builds [not Release!] right until the UAT (user acceptance testing) phase of projects.

What is forcing you to use Debug builds in those early phases? It seems that performance is already important to you, so why can't you switch to Release builds?

Come to that, I'm a little surprised that QA is running against a Debug build anyways, because while (most) differences in behavior between Debug and Release would be considered bugs... it means they're not testing what's actually being distributed to users.
If any potential diagnostic information included in a Debug build is that important for QA, I'd probably just re-run failing scripts against a Debug build.

As a side note, there's other potential ways to speed up application startup:

  • At build time, turn the CSV into a binary file and load that at run time (persisted map or some specific datatype)
  • Loading a file at start time implies the data is constant. If you run multiple copies of the app, create a permanent shared database that's loaded once.
  • Load the data asynchronously in the background, and return 404s to clients if the particular request they make can't be fulfilled yet, possibly with a timeout in the server code if the request might be fulfilled "soon". Sort the CSV so commonly-used data is available first.

At build time, turn the CSV into a binary file and load that at run time (persisted map or some specific datatype)

With their GC support, Frozen Objects look very interesting for this purpose https://github.com/microsoft/FrozenObjects

(Video with @mjsabby and @Maoni0 explaining more about them: https://www.youtube.com/watch?v=pGUuSWWNx2Y)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nalywa picture nalywa  路  3Comments

sahithreddyk picture sahithreddyk  路  3Comments

v0l picture v0l  路  3Comments

yahorsi picture yahorsi  路  3Comments

GitAntoinee picture GitAntoinee  路  3Comments