Runtime: Improve SelectMany behavior when the selector returns null

Created on 20 Aug 2016  路  19Comments  路  Source: dotnet/runtime

In the current version Enumerable.SelectMany experiences a NullReferenceException when selector returns null. This is undesirable because it's a confusing crash with a "forbidden" exception type in BCL code. This should be changed.

Example: foreach(var item in new int[][] { new int[1], null }.SelectMany(x => x)) { }

The safest approach would be to throw an exception that explains that the user has provided an invalid selector that does not obey the contract. Additionally, the documentation should state the requirement to not return null.

An alternative approach would be to treat null as an empty sequence. I'm not sure which solution is better. Sometimes, an empty sequence would be very convenient. Sometimes this would hide a bug. Also it's unclear that this can be done due to compatibility concerns. I think throwing a different exception type would be safe enough to take the change.

area-System.Linq

Most helpful comment

I really wonder what code would depend on a NRE being thrown from a SelectMany. I would make the case to simply fix this. I think the perf loss from this null check would be very small and pale in comparison to other perf losses that are routinely accepted by LINQ to improve usability.

All 19 comments

cc: @madstorgersen

We do not believe the value is worth the extra check (has small perf impact). We do not believe the scenario of passing null to SelectMany is that common in running applications.
We agree that it might be slightly better in development time, but that's about as much value as we can think of.
If you disagree, feel free to provide additional data.

Sometimes, an empty sequence would be very convenient. Sometimes this would hide a bug.

It's the hiding a bug part that matters most to me. If I'm _expecting_ null (which I think should be rare because null and empty list are not semantically similar from my perspective), I'd rather make that expectation explicit via .SelectMany(x => x ?? Enumerable.Empty<int>()).

@jnm2 It would not convenient when the type is long

.SelectMany(x => x ?? Enumerable.Empty<SomeLongTypeName>())

.SelectMany(x => x ?? Enumerable.Empty<KeyValuePair<string,SomeLongTypeName>>())

I would want to have this feature supported. Or at least having extension method EmptyIfNull for convenient

C# public IEnumerable<T> EmptyIfNull<T>(this IEnumerable<T> collection) => collection ?? Enumerable.Empty<T>();

@Thaina go for it. It only makes sense to add to the BCL if most people have this issue. Is there evidence of that?

It is very, very rare for the BCL to take the stance that throwing a bad exception type is OK. I cannot think of a single other such case (they might exist). I personally value this quality of the framework very much.

I understand the argument of cost, but is that really worth the inconsistency? It took me a while to track the bug down that lead to this ticket. I was searching the lambda for possible causes.

I know what you're saying, but I still think NRE is consistent and tolerating null is inconsistent with the rest of C#. Null and empty are not automatically semantically the same.

I understand that we should not make breaking change so instead we should have new function for this behavior for convenient

maybe 'SelectFlatten' ?

I think this behavior is much more preferable than current selectmany

I really wonder what code would depend on a NRE being thrown from a SelectMany. I would make the case to simply fix this. I think the perf loss from this null check would be very small and pale in comparison to other perf losses that are routinely accepted by LINQ to improve usability.

@vsadov to see above comments.

With a boneheaded exception like NRE, if it comes from user-code then it isn't good code that depends on it but bad code; I prefer getting an NRE to incorrect but non-throwing behaviour. Are we sure this flattening is always the best thing to do?

I would not advise flattening but rather throwing a non-boneheaded exception. The exception comes from BCL code, not from user code.

I just encountered this "bug" in my application that SelectMany thows an exception when there are no items. This is inconsistent with the rest of LINQ and my expectation was that this will return empty collection. Especially when this is happening in context of EntityFramework and SQL database. I expected that LINQ/EF will produce SQL query that will be executed on database and when there are no items simply return empty collection.

I really dont understand the decision, trying to justify this with "performance" sounds really strange.

Do you really expect developers to put this in every SelectMany ???

.SelectMany(x => x ?? Enumerable.Empty<MyType>())

That's probably the worst decision of C#/.NET I encountered so far.

I urge you to change the behaviour of SelectMany to make it consistent and return empty collection and not force developers to manually null check in every SelectMany.

Do you really expect developers to put this in every SelectMany ???

.SelectMany(x => x ?? Enumerable.Empty<MyType>())

No.

Because in many (most?) cases, the source collection will not be expected to contain a null element (ie, the collection is defined as IEnumerable<T>, not IEnumerable<T?>). You're far more likely to have a null property in a non-null element.

Note too, that although null input to the selector has been discussed, the issue is really about null _output_ - SelectMany(x => null). Null input is legitimate (although rare). Null output is against the documented contract.

Null output is against the documented contract.

Which is codified now with nullable reference types in .NET 5, so if you've enabled it, you'll likely get a compilation warning trying to return null from the SelectMany delegate.

So if I understand this issue correctly... let me show my specific example:

I had this LINQ line, that caused the exception. All classes (Site, Period, Lesson) are EF core entities, EF is connected to SQL database:

site.Periods.SelectMany(x => x.Lessons).Select(x => x.Name).ToList();

Because there were no Lessons on given Period in DB, this line throws an exception. Instead I would expect it would return empty collection. When I added the manual check it worked as expected:

site.Periods.SelectMany(x => x.Lessons ?? Enumerable.Empty<Lesson>()).Select(c => c.Name).ToList();

Now if I understand it correctly I should not have to put the null check in SelectMany, instead I need to make sure that the Period entity is created with empty collection of Lessons in the constructor of Period. This is something I didn't know is possible because I expected that if I initialise the collection (Lessons) to empty collection it will hide the actual data from database if there are any. But apparently it doesn't and EF can then decide that there are some data so it will fill the collection.

So to sum, the solution is to initialise all collection in EF entities to empty lists to avoid this issue. Is this correct?

I wonder why EF doesn't do this automatically?

So to sum, the solution is to initialise all collection in EF entities to empty lists to avoid this issue. Is this correct?

Yes. As a point of style, collections in any object should (almost) always be initialized to empty.

I wonder why EF doesn't do this automatically?

I don't know

Maybe revisit this for .NET 6? Taking the change on a major version would make the breaking change easier to tolerate.

The performance impact is a single, easy to predict branch per element. This compares to, at least, two interface calls for the enumeration per element. Update: I think it's 4 interface calls(?). Two to obtain the sequence ( MoveNext, Current), then GetEnumerator, then MoveNext.

I cannot think of a single other place where the BCL allows a NRE to be generated by framework code.

I cannot think of a single other place where the BCL allows a NRE to be generated by framework code.

There are plenty of places.

Some of them are by design because we don't want to pay any additional costs on very hot paths, e.g.
```C#
default(TaskAwaiter).GetResult();

will throw a NullReferenceException internally as it dereferences a null Task field.  This is considered invalid use, and it's not worth the extra code bloating all of the places this gets inlined into as part of awaits.

Some are accidental and we've just never gone back and addressed them because nulls are considered invalid inputs to them and we'd simply be changing the exception type from one indication of a null failure to another indication of a null failure, e.g.
```C#
new XPathDocument(default(Stream));

currently throws a NullReferenceException, as it doesn't validate that the expected non-nullable Stream is actually non-null. For these, we generally accept PRs to "fix" them, primarily to a) make failures easier to debug (callers get an exception that provides more detail about what was the actual problem) and b) to highlight that the failure wasn't actually because of a bug in the library but rather with how the library was being used; that, after all, is the primary purpose of ArgumentNullException.

This SelectMany case falls into the latter bucket. null is not allowed to be returned from the delegate, and as I noted, that is now explicitly codified with nullable reference types, with the delegate defined as Func<TSource, IEnumerable<TResult>> rather than the erroneous Func<TSource, IEnumerable<TResult>?>. I don't see a good reason to change the signature to allow nulls nor to change the implementation to pretend that null means an empty collection. But I think it'd be fine to add a check for a null return that throws an InvalidOperationException with a message explaining that the delegate returned null when it was required to return a valid collection.

I'm not concerned about breaking changes here; we generally consider it acceptable to replace NullReferenceExceptions with something else.

I'm also not concerned about the overhead: the cost of an extra null check won't be measurable when compared to the cost of the proceeding delegate invocation plus at a minimum an interface invocation (or if special-casing was desired, a type check), and most likely many more plus allocation as the returned enumerable is enumerated, on top of whatever is done inside of the callback and whatever is done by the consumer of the SelectMany.

Was this page helpful?
0 / 5 - 0 ratings