_For additional context see https://github.com/microsoft/botbuilder-dotnet/issues/3998 and its fix: https://github.com/microsoft/botbuilder-dotnet/pull/4000_
Unhandled key DialogEvents aren't currently tracked in telemetry. Two potential key DialogEvents for tracking are DialogEvents.Error and DialogEvents.VersionChanged (i.e. "error" and "versionChanged").
Per @stevenic we shouldn't log all unhandled DialogEvents which would add noise to the telemetry data. DialogEvents are optional by design.
Call dc.Dialogs.TelemetryClient.TrackTrace() for key unhandled DialogEvents in Dialog.OnDialogEventAsync()
_Potential_ C# code:
public virtual async Task<bool> OnDialogEventAsync(DialogContext dc, DialogEvent e, CancellationToken cancellationToken)
{
// Before bubble
var handled = await this.OnPreBubbleEventAsync(dc, e, cancellationToken).ConfigureAwait(false);
// Bubble as needed
if (!handled && e.Bubble && dc.Parent != null)
{
handled = await dc.Parent.EmitEventAsync(e.Name, e.Value, true, false, cancellationToken).ConfigureAwait(false);
}
// Post bubble
if (!handled)
{
handled = await this.OnPostBubbleEventAsync(dc, e, cancellationToken).ConfigureAwait(false);
}
// Log key unhandled DialogEvents
// Changes to the Dialog or its children could result in bad dialog state for the user.
// If the unhandled event's name is "versionChanged", emit a trace through the TelemetryClient.
if (!handled && e.Name == DialogEvents.VersionChanged)
{
await dc.Dialogs.TelemetryClient.TrackTrace(dc, e, cancellationToken).ConfigureAwait(false);
}
return handled;
}
"versionChanged" events are only emitted from DialogContainer and its subclasses. As of 4.9.x, these subclasses are currently ComponentDialog and AdaptiveDialog.
An override of Dialog.OnDialogEventAsync in DialogContainer may be a cleaner option. It seems unlikely that a DialogContainer subclass isn't the parent dialog or the leaf of another DialogContainer...
public abstract class DialogContainer : Dialog
{
// ...
public override async Task<bool> OnDialogEventAsync(DialogContext dc, DialogEvent e, CancellationToken cancellationToken)
{
var handled = await base.OnDialogEventAsync(dc, e, cancellationToken).ConfigureAwait(false);
// Trace unhandled "versionChanged" events.
if (!handled && e.Name == DialogEvents.VersionChanged)
{
await dc.Dialogs.TelemetryClient.TrackTrace(dc, e, cancellationToken).ConfigureAwait(false);
}
return handled;
}
}
"error" DialogEvents are only emitted from DialogManager. If this event isn't handled, it's rethrown from DialogManager.OnTurnAsync() which triggers the adapter's error handler.
With the current Dialog, DialogContainer and DialogManager classes in mind, it doesn't _seem_ necessary to track unhandled errors at a lower level.
Dialog or DialogContainer. Perhaps DialogManager.
Additional Traces sent to their telemetry for events that could forewarn Dialog errors.
Dotnet SDK
Javascript SDK
Java SDK
Dialog as of the latest preview doesn't have the eventing mechanism present in C# and JS.Python SDK
Dialog as of 4.9.x doesn't have the eventing mechanism present in C# and JS.[dcr]
@tomlm, @stevenic, @darrenj, @vishwacsena, @garypretty
Makes sense - do we want to also consider logging as an Trace Activity so it will show up in Emulator meaning it's even easier to see this when developing?
@darrenj that sounds like a good idea to me. However, this suggestion reminded me that remote "dialogs" (aka Skills) exist...
DialogEvents.VersionChangedI believe we have a new scenario to consider, Skill Hosts and their _Skills_ having a versionChange event.
public abstract class DialogContainer : Dialog
{
// ...
public override async Task<bool> OnDialogEventAsync(DialogContext dc, DialogEvent e, CancellationToken cancellationToken)
{
var handled = await base.OnDialogEventAsync(dc, e, cancellationToken).ConfigureAwait(false);
// Trace unhandled "versionChanged" events.
if (!handled && e.Name == DialogEvents.VersionChanged)
{
await dc.Dialogs.TelemetryClient.TrackTrace(dc, e, cancellationToken).ConfigureAwait(false);
// Send a TraceActivity logic...
// This code determines if the activity is from another bot.
var isSkillCallerId = dc.Context.Activity.CallerId != null;
isSkillCallerId &= dc.Context.Activity.CallerId.StartsWith(CallerIdConstants.BotToBotPrefix, StringComparison.InvariantCultureIgnoreCase);
if (isSkillCallerId)
{
// Do what exactly? Nothing?
}
}
return handled;
}
}
During a versionChange with an activity sent from a skill host, we could do nothing in the SDK. In other words, the Skill doesn't inform the skill host that it had an unhandled "versionChanged" event.
When CheckForVersionChangeAsync stops throwing exceptions, via dotnet (master@36d0e41) AdaptiveDialogs and ComponentDialogs will continue running with an unhandled "versionChanged" event.
If the Skill's dialogs ends up throwing an error, the parent will know there was a problem, but not about the version change (which may have caused the problem).
However, if you consider the Skill called via a SkillDialog to be another Dialog then the above behavior doesn't match other Dialog behavior around DialogEvents.
The Skill's root DialogContainer essentially did not "bubble-up" the event.
Dialog.OnDialogEventAsync()@36d0e41
public virtual async Task<bool> OnDialogEventAsync(DialogContext dc, DialogEvent e, CancellationToken cancellationToken)
{
// Before bubble
var handled = await this.OnPreBubbleEventAsync(dc, e, cancellationToken).ConfigureAwait(false);
// <!-- NOTE: The below if-block with dc.Parent.EmitEventAsync() never occurred -->
// Bubble as needed
if (!handled && e.Bubble && dc.Parent != null)
{
handled = await dc.Parent.EmitEventAsync(e.Name, e.Value, true, false, cancellationToken).ConfigureAwait(false);
}
// Post bubble
if (!handled)
{
handled = await this.OnPostBubbleEventAsync(dc, e, cancellationToken).ConfigureAwait(false);
}
return handled;
}
If we treat the Skill as a regular Dialog and inform the parent that a "versionChanged" event occurred, this likely results in changes to Dialog.OnDialogEventAsync, SkillDialog, and any Skill hosts _not_ using Dialogs. This last item is probably _highly_ undesired as customers have to change their code.
The "bubble up" stage from the skill is likely to be an activity with expectReplies or an invoke, as the skill needs an immediate response for its own processing.
DialogEvents.VersionChanged to a skill host...The parent should be able to ignore the skill's "versionChanged" event and send a 100(?) HTTP Status code back to the in-proc skill which then continues running its dialogs.
If the skill host sends a 200 (400?, 405?), the host must inform the skill to stop processing and relinquish control send an appropriate status code back to the host that called the skill. In this case, is the skill sending an EoC?
For 4xx and 5xx status codes from the parent, the skill possibly continues processing? The skill is unaware if there is one or _n_ instances of the skill host, all it knows was that its attempted event bubble up failed.
However, by bubbling the event, we have back-to-back HTTP requests which doesn't guarantee affinity. So we/customers might end up with blanket statements, e.g. all skill host instances don't accept _any_ "versionChanged" events from its skills; any request of this type gets a 100 status code response.
Alternatively, skill hosts have code that can account for two different instances having the same DialogState for a given user. When a skill host tells a skill to stop processing, the instance that called the skill may gracefully exit the turn. The second instance will create a turn in its processing of the versionChanged event.
tl;dr Not bubbling up the "versionChanged" event from the skill to the parent differs from normal Dialog behavior if skills are just remote Dialogs.
Bubbling up the event causes back-to-back HTTP requests.
Actually, ending up with back-to-back HTTP requests is not acceptable due to channels/adapters that require affinity.
With this in mind, the skill must discard its state changes and notify the skill host of the versionChanged event via the initial Host-to-Skill HTTP Request (activity from host to skill). Below are two simple scenarios:
The skill host should then re-invoke the skill with the original activity. The activity should probably have an added flag that signals to the skill to not re-bubble the event to the skill host and instead continue processing.
The skill host doesn’t re-invoke the skill. If the skill host cancels the SkillDialog, this will send an EoC to the skill for state management.
~The new problem is that the skill now doesn’t know if it should fire its “post Bubble” event handler for unhandled versionChanged events. The skill doesn’t get an in-turn response when it bubbles the event to its skill host, because it ended the current turn when it informed the parent of the versionChanged event.~
Edit #1 is fine/unavoidable... When the parent re-invokes the skill, the flag that indicates to the child to not re-bubble the event is equivalent to handled = true; right before the bubble phase.
The event should still be unhandled after the pre-bubble, but won’t trigger the bubble or post-bubble handlers.
The post-bubble handlers are never called.
Or the SkillDialog.OnDialogEventAsync override calls post-bubble because it can inspect the activity to see if it has been resent due to a versionChanged bubble.
Either the parent never knows about its skills’ versionChanged events, or the skill’s dialogs _possibly_ don’t trigger a post-bubble handler for its versionChanged.
@garypretty can you help?
Sure, I can help implement this.
Synced with @stevengum and agreed we will send a trace activity and TelemetryClient.TrackTrace - this will ensure that when using skills in production, developers can still see unhandled version changed events if they are using telemetry.
Looks like all PRs have been merged. Closing.