In non-trivial number of cases, the RequestStop event has empty activity id.
9 945 out of 35 452 requests sent during HTTP FunctionalTests were missing activity id.
Other Start/Stop event pairs introduced with the new telemetry are not properly linked with ActivityId as well.
Also easily reproducible with a simple HttpClient scenario:
```C#
class Program
{
static async Task Main(string[] args)
{
using var listener = new HttpEventListener();
using var client = new HttpClient();
await client.GetAsync(@"https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Http/src/HttpDiagnosticsGuide.md");
}
}
internal sealed class HttpEventListener : EventListener
{
public const EventKeywords TasksFlowActivityIds = (EventKeywords)0x80;
//public const EventKeywords Debug = (EventKeywords)0x20000;
protected override void OnEventSourceCreated(EventSource eventSource)
{
if (eventSource.Name == "System.Net.Http" ||
eventSource.Name == "System.Net.Sockets" ||
eventSource.Name == "System.Net.Security" ||
eventSource.Name == "System.Net.NameResolution")
{
EnableEvents(eventSource, EventLevel.LogAlways);
}
else if (eventSource.Name == "System.Threading.Tasks.TplEventSource")
{
EnableEvents(eventSource, EventLevel.LogAlways, TasksFlowActivityIds /*| Debug*/);
}
}
protected override void OnEventWritten(EventWrittenEventArgs eventData)
{
var sb = new StringBuilder().Append($"{eventData.TimeStamp:HH:mm:ss.fffffff}[{eventData.EventName}]{eventData.ActivityId}.{eventData.RelatedActivityId} ");
for (int i = 0; i < eventData.Payload?.Count; i++)
{
if (i > 0)
sb.Append(", ");
sb.Append(eventData.PayloadNames?[i]).Append(": ").Append(eventData.Payload[i]);
}
Console.WriteLine(sb.ToString());
}
}
Result (with the change in https://github.com/dotnet/runtime/pull/41022)
09:34:45.3129327[RequestStart]00000011-0000-0000-0000-0000093f9d59.00000000-0000-0000-0000-000000000000 scheme: https, host: github.com, port: 443, pathAndQuery: /dotnet/runtime/blob/master/src/libraries/System.Net.Http/src/HttpDiagnosticsGuide.md, versionMajor: 1, versionMinor: 1
09:34:45.4344957[ResolutionStart]00001011-0000-0000-0000-0000090f9d59.00000011-0000-0000-0000-0000093f9d59 hostNameOrAddress: github.com
09:34:45.4752982[ResolutionStop]00001011-0000-0000-0000-0000090f9d59.00000000-0000-0000-0000-000000000000
09:34:45.4848747[ConnectStart]00002011-0000-0000-0000-0000091f9d59.00000011-0000-0000-0000-0000093f9d59 address: InterNetworkV6:28:{1,187,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,140,82,118,3,0,0,0,0}
09:34:45.5208470[ConnectStop]00002011-0000-0000-0000-0000091f9d59.00000000-0000-0000-0000-000000000000
09:34:45.5347626[HandshakeStart]00003011-0000-0000-0000-0000096f9d59.00000011-0000-0000-0000-0000093f9d59 isServer: False, targetHost: github.com
09:34:45.8030641[HandshakeStop]00003011-0000-0000-0000-0000096f9d59.00000000-0000-0000-0000-000000000000 protocol: Tls13
09:34:45.8070942[Http11ConnectionEstablished]00000011-0000-0000-0000-0000093f9d59.00000000-0000-0000-0000-000000000000
09:34:46.2040073[ResponseHeadersBegin]00000011-0000-0000-0000-0000093f9d59.00000000-0000-0000-0000-000000000000
09:34:46.3278444[RequestStop]00000011-0000-0000-0000-0000093f9d59.00000000-0000-0000-0000-000000000000
```
~Un-paired Stop events:~
~- ResolutionStop: ran in continuation: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.NameResolution/src/System/Net/Dns.cs#L485~
~- ConnectStop: ran in SocketsEventArgs: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.cs#L744~
~- HandshakeStop: ran in continuation: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs#L240~
They're paired, I misinterpreted the logs. The Stop event doesn't carry parent activity id, only the start one.
ResponseHeadersRead for the whole method #41022ResponseHeadersRead that want to properly time response body reading (based on our solution for convenience methods)We can also add or leave existing unpaired telemetry (though it shouldn't be named Start/Stop then) for events that cannot be bound by a single async context. We can link them to the Start/Stop events manually either via remembering ActivityId or using some other custom ID.
cc: @samsp-msft @MihaZupan @karelz
Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.
Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.
I might have found the root cause of this problem.
When I fetch an HTTP response without content, the ActivityId is present in the Stop event.
The RequestStop is called only after the response has been read to the end, which happens outside of the async scope started when RequestStart was called.
RequestStart is called in HttpConnectionPoolManager, several levels down from HttpClient.SendAsync.
RequestStop is called in HttpRequestMessage called when the response content stream has been read: e.g.: ContentLengthReadStream. Which is either called by the user, or in case of response buffering, in HttpClient.FinishSendAsync.
@ManickaP nice investigating 👍
Indeed as you observed, ActivityId is async-local so it flows from parent to child when creating a Task but changes do not flow back from child to parent when an await completes.
I'm not knowledgeable about the networking code, but is there an option to call Start() sooner so that it runs in the same Task where Stop() will eventually be called? Other more customized solutions may also be possible, but I've never attempted to author one and my suspicion is it would get messy.
is there an option to call Start() sooner so that it runs in the same Task where Stop() will eventually be called?
We could, theoretically, run it from HttpClient.SendAsync, but that would only help us with buffered responses. In the other case, the request gets finished after the user either disposes it or reads the response stream till the end and we have no control over when and where this happens.
Is there a way to manually retrieve and later attach the ActivityId to the stop event?
As @ManickaP said, we could only do that for buffered responses. For unbuffered ones, we hand out the response that the user can interact with that will eventaully trigger a Stop event.
If I'm understanding this correctly, we can't flow the ActivityID up to the parent?
In that case, is the HttpClient design (handing out the response to the caller to process) incompatible with a Start/Stop activity tracking the entire processing lifetime?
If we give up on tracking the response portion, we would cut into the value of the activity. At that point we would be telling users to log their own Start/Stop activity enclosing the processing of a request.
Alternatively, would introducing a "SendAndProcessWithTelemetryAsync" method that takes a callback make any sense? In that can we could ensure that everything happens in the same scope.
Something along the lines of (just a rough idea):
c#
// On HttpClient
Task<HttpResponseMessage> SendAndProcessWithTelemetryAsync(HttpRequestMessage request, HttpCompletionOption completionOption, Func<HttpResponseMessage, Task> responseCallback, CancellationToken cancellationToken)
{
TelemetryStart();
try
{
HttpResponseMessage response = await this.SendAsync(request, completionOption, cancellationToken);
await responseCallback(response);
TelemetryStop();
return response;
}
catch
{
TelemetryAbort(); // Which also logs Stop
throw;
}
}
Alternatively, would introducing a "SendAndProcessWithTelemetryAsync" method that takes a callback make any sense? In that can we could ensure that everything happens in the same scope.
That would measure/trace different things depending of HttpCompletionOptions, for buffered it'd include reading the response, for read-headers it wouldn't.
It's similar the first solution I suggested in here: https://github.com/dotnet/runtime/issues/40776#issuecomment-673954728.
for read-headers it wouldn't.
The idea is that it would be captured in the user-provided callback.
Essentially if a user had code like
```c#
Task DoSomethingWithResponseAsync(HttpResponseMessage response) {}
```c#
var response = await client.SendAsync(request, completionOption);
await DoSomethingWithResponseAsync(response);
it would turn into
c#
await client.SendAndProcessWithTelemetryAsync(request, completionOption, DoSomethingWithResponseAsync);
That way the Telemetry Start/Stop would be in the enclosing scope of the response processing.
Now I see.
Is there a way to manually retrieve and later attach the ActivityId to the stop event?
One could save EventSource.CurrentThreadActivityId after the start event, use EventSource.SetCurrentThreadActivityId to temporarily change the activity ID before and after the stop event, and set EventActivityOptions.Disable on the stop event to prevent ActivityTracker from overriding the activity ID. However, if the request causes any other activities, then those would not get the correct parent in their hierarchical GUIDs, so the async solution seems safer.
Alternatively, would introducing a "SendAndProcessWithTelemetryAsync" method that takes a callback make any sense?
Good question, but no :-) We do not want to encourage that style of programming, and it won't fix the 99% case (on top of which it's more expensive).
One could save EventSource.CurrentThreadActivityId after the start event ...
What @KalleOlaviNiemitalo suggests would work to set the activity id of the stop event, however there is also the consideration of what activity id is attached to all events that happen to occur in between the start and stop. Perhaps we are in position to know that we can control all of those events because we own 100% of the code that executes between those two points? I didn't get a chance to read all the updates above and it is almost 5:30am so I need to sleep before I'll be able to do a good job thinking about this. When I'm awake again I'll dig into it further : )
We can move the RequestStart somewhere within the start of HttpClient.SendAsync. This should put all of our other activities in the same async scope. After that, the control is in user's hands, we don't need to track those activities, we just need to link the RequestStop once it happens. For that, we can use the SetCurrentThreadActivityId. The only things that would be excluded are content reading methods (LoadIntoBuffer, SerializeToStream, CreateContentReadStream, etc.), which, AFAIK, don't have any activities, so we should be fine.
All of this is based on premise, that tracing the request until the whole response in consumed is what we want. Alternatively, we could narrow the scope only to the point of creating the response and exclude the response content reading.
If the response is unbuffered, then we should make the stop be when we hand back control to the user code, as its not ultimately us in control.
If we want to make start/stop events for this that can be correlated, we could create a new event with its own activity ID, that is not parented (so as not to break the async generated IDs) and then reference either it or the other ID in one of other of the payloads, so somebody can track it if they need to. I would put this uber-event at a lower priority for being completed. It may not make sense to be an activity ID, just some other ID that can be used to correlate.
In that case, is the HttpClient design (handing out the response to the caller to process) incompatible with a Start/Stop activity tracking the entire processing lifetime?
Yes (at least in the cases where the caller will still need to download the remainder of the response after SendAsync task completes). Apologies for not recognizing the potential for this issue earlier.
The main premises of Activities are:
Even if we are capable creating events that don't adhere to these rules, the tools doing analysis are likely to struggle because we expect Activities form a tree. So a generic pattern that is hard for the library to represent with a single Activity looks like this:
```c#
class Worker
{
Task DoWorkPart1(...); // in our case this is HttpClient.SendAsync()
Task DoWorkPart2(...); // and Stream.ReadAsync() on the returned stream
}
A user can consume that API and do very not-treeish things with it:
```c#
List<string> workList = new List<string>();
foreach(string work in workList) { await Worker.DoWorkPart1(work); }
foreach(string work in workList) { await Worker.DoWorkPart2(work); }
The individual parts of the work can be sequenced/nested, but if part1 and part2 are combined into a single activity they don't nest:
time ---->
| work 1.1 | work 2.1 | work 3.1 | work 1.2 | work 2.2 | work 3.2 |
< -------------- work 1 --------------------->
<------------------ work 2 ----------------->
< ------------------ work 3 ---------------->
However, there are still options we could look at to improve the situation, now and/or in the future:
I also want to loop in @brianrob who has probably had more experience with this type of issue than I have. He may know of some pre-existing approaches that handle this better without needing to do anything novel. Hope this helps!
For 5, I'd recommend we scope down and send the stop at the end of SendAsync, and make it clear either with the event name, or parameters in the payload as to what type of request it is, and so what the Stop will actually correspond to.
FYI: For your test app, make sure you aren’t running under the debugger. VS TPL analysis can conflict with the activity tracking – its something Noah mentioned to me when I was looking at some other event stuff.
Sam
Closing as resolved by #41022
Fixed in 5.0 in PR #41610