Full .NET has internal method to do PLINQ with custom scheduler. Current .NET Core does not have such method in ParallelEnumerable.
Could we have WithTaskScheduler
public in ParallelEnumerable
or any other viable alternative?
I see that underlying classes do have task scheduler configuration build in.
Why this is needed?
We have product which did not setup default task scheduler for whole process/appdomain. So we do that that manually. I am afraid to set scheduler process/appdomain wide to avoid regression. Same time I want to use PLINQ instead of Tasks/Parallel, where PLINQ is best fit. But I cannot use PLINQ as customer scheduler is must for us.
Another possible reason for such api is virtual isolation. I have witnessed that applications do more and more isolation not on process/appdomain levels, but in process via type loading and di/ioc containers. Another level of isolation could be done with parallelism, but for this all API should accept custom schedulers.
For those how like me on full .NET. Please like code if you have used it.
/// <summary>
/// Backport of <see cref="ParallelEnumerable"/>.
/// </summary>
public static class ParallelEnumerableExtensions
{
/// <summary>
/// Sets the task scheduler to execute the query.
/// </summary>
/// <typeparam name="TSource">The type of elements of <paramref name="source"/>.</typeparam>
/// <param name="source">A ParallelQuery on which to set the task scheduler option.</param>
/// <param name="taskScheduler">Task scheduler to execute the query.</param>
/// <returns>ParallelQuery representing the same query as source, but with the task scheduler option set.</returns>
/// <exception cref="T:System.ArgumentNullException">
/// <paramref name="source"/> or <paramref name="taskScheduler"/> is a null reference (Nothing in Visual Basic).
/// </exception>
/// <exception cref="T:System.InvalidOperationException">
/// WithTaskScheduler is used multiple times in the query.
/// </exception>
public static ParallelQuery<TSource> WithTaskScheduler<TSource>(this ParallelQuery<TSource> source, TaskScheduler taskScheduler)
{
return
(ParallelQuery<TSource>)
typeof(ParallelEnumerable)
.GetMethod("WithTaskScheduler", BindingFlags.Static | BindingFlags.InvokeMethod | BindingFlags.NonPublic)
.MakeGenericMethod(typeof(TSource))
.Invoke(null, new object[] { source, taskScheduler });
}
}
We have product which did not setup default task scheduler for whole process/appdomain.
I don't understand, how is that possible? Are you running on some custom version of .Net?
On .Net Framework and .Net Core, the TaskScheduler.Default
is set up automatically by the static constructor of TaskScheduler
. I don't see any way to avoid that, so it should be always set up.
I am afraid to set scheduler process/appdomain wide to avoid regression.
What kind of regression are you afraid of? Can't you try it and see if it works?
I don't think it makes sense to add APIs to solve potential problems, they should solve real problems.
@svick, fully agree with you that API should solve real problems.
How can I solve next?
[TestFixture]
public class ParallelEnumerableExtensionsTests
{
[Test]
public void Test()
{
// assume web calls which need [ThreadStatic] variables to be set and which are converted to logging correlations and HTTP headers
var a = new[] { 1, 2, 3 };
var s = new MyThreadStaticLoggingCorrelationsBillingHeadersTaskScheduler();
Console.WriteLine(a.AsParallel().WithTaskScheduler(s).Sum());
}
}
I have SaaS cloud multi tenant reporting and billing context.
Thanks for link. I have searched only corefx
, not coreclr
for issue.
Mostly my TaskScheduler
does next:
var operationContext = GrabOperationContextFromCurrentThreadWhichWasSetInHttpAspNetMvcController();
a.AsParallel().WithTaskScheduler(s).Select(x=>
SetOperationContext(operationContext);
try
{
return Do(x);
}
finally
{
UnSetOperationContext(operationContext);
}
)
So for each new thread start I want to set context and unset when done. I may write all things manually, but I lost readability of PLINQ, and things become more complicated when I use async/await.
I.e. LINQ works in context of original thread. PLINQ in context of other thread. What way do I have to pass context?
I do not use capabilities of TaskScheduler
to do something with threads, but these may be useful in future.
In process of reading 2184.
So for each new thread start I want to set context and unset when done.
What impact does setting the operation context have on the "Do" method? Is the Do method just looking in a threadstatic or something to find the context? If so, how about using AsyncLocal<T>
instead of [ThreadStatic]? You can just store the context into your async local before doing the PLINQ operation, and all of the workers involved in PLINQ will see that value.
Thanks for AsyncLocal
, will look into it.
Do
calls stack of other assemblies codes which sends HTTP calls and other data requests. We have system where some HEADERS come in request via HTTP - these are set in some thread local/static storages. We have authentication/authorization values we get from security service - and these are set in thread local/static. All these sets and gets are done via other teams assemblies.
We do appropriate calls so all context metadata propagates from our system to other we do call for proper billing/royalties/authorization. We cannot change code to use AsyncLocal
or what ever in other assemblies. Also we cannot set stuff globally as same IIS application serves many different customers. So I need each new/pool thread to inherit parent context.
UPDATE:
AsyncLocal
seems suits in some cases when I control code, but it needs 4.6. We are on 4.5.1 .NET for compilation, but seems .NET 4.6 for runtime, I hope.
@asd-and-Rizzo I think PLINQ is generally not the right solution for code that uses network. Have you considered rewriting your code to use async
-await
?
@svick, @jnm2
Let pretend I am in legal ediscovery sector . Let pretend I have SaaS. Which implies many customers, each with many matters. Tons of data.
Here is simple report/billing:
(await this.database.FindCustomers())
.Where(x => customersFilter.Include(x))
.AsParallel().WithTaskScheduler(this.taskScheduler)
.Select(x => this.GetMatters(x, dateRange))
.Select(async x => await x)
.SelectMany(x => x.Result);
Inside GetMatters
I have another AsParallel
with more async/awaits.
I say that PLINQ is orchestration (parent/child/parallelism), while async/await(it is about async, not parallelism) is low level detail. You do compare oranges and apples. Do you see my point?
@asd-and-Rizzo
I say that PLINQ is orchestration (parent/child/parallelism), while async/await(it is about async, not parallelism) is low level detail. You do compare oranges and apples. Do you see my point?
Yes. But if you use async
-await
, you can't use PLINQ for orchestration, at least not without defeating the purpose of async
-await
. Which is what your example seems to be doing: if you call Result
on a Task
, you might as well not be using async
-await
in the first place, since you're blocking a thread.
And you can use other approaches for orchestration. One option is to do it manually, which could make sense if your PLINQ query is as simple as what you're showing. Another option is TPL Dataflow.
(Also, I don't understand why is your query that complicated, the last three lines could be simplified to just .SelectMany(x => this.GetMatters(x).Result);
.)
@svick, I may refute and argue most of you arguments, but this thread about request for TaskScheduler
in PLINQ, not about async/await.
I'm going to close this. We've considered such a WithTaskScheduler method multiple times in the past, and decided against it for reasons like that outlined in https://github.com/dotnet/coreclr/issues/2184#issuecomment-160673747. And for the purposes of this thread, it sounds like AsyncLocal<T>
does fit the bill, and the only concern was around what version of .NET was needed for it, but any new methods introduced in PLINQ wouldn't be added to those older .NET's anyway. Thank you for the discussion.