Please review the problem and suggested solution and provide your comments to help drive this feature implementation into a future platform release.
We have an Exceptions table which computes a hash of each exception. The idea behind this table is we could display which exceptions are occurring frequently compared to the exceptions which only happen a small handful of times.
When expanding an exception, we currently list the exception hash. We should also display and offer the option to sort by the number of occurrences of that exception hash. Since this is only valid for exceptions, we need to make sure it doesn't cause any issues with the other event log types.
We remove this feature/table if people do not find it valuable.
Hi @ohine
Not strictly related to this issue, but when I do
SELECT [ExceptionHash], COUNT(1) [Count]
FROM [dbo].[Exceptions]
GROUP BY [ExceptionHash]
HAVING COUNT(1) > 1
ORDER BY COUNT(1) DESC
I get empty results:
ExceptionHash Count
------------- -----------
(0 rows affected)
Yet, there are indeed duplicate records in the Exceptions table, just the hash is different. I'm now grouping by all columns excluding the hash:
SELECT [Count], [Message]
FROM (
SELECT [Message], [StackTrace], [InnerMessage], [InnerStackTrace], [Source], COUNT(1) [Count]
FROM [dbo].[Exceptions]
GROUP BY [Message], [StackTrace], [InnerMessage], [InnerStackTrace], [Source]
HAVING COUNT(1) > 1
) query
ORDER BY [Count] DESC
and getting:
Count Message
----- -----------------------------------------------------------------------------------------------------
169 The Controls collection cannot be modified because the control contains code blocks (i.e. <% ... %>).
168 Object reference not set to an instance of an object.
48 Exception has been thrown by the target of an invocation.
14 Error occurred while obtaining a token for portalId 0 and application Analytics
4 Object reference not set to an instance of an object.
2 Error occurred while obtaining a token for portalId 0 and application Structured Content
2 Thread was being aborted.
(7 rows affected)
Maybe this hash is including more metadata than it should?
@daguiler I think it's actually an issue that the function being used to calculate the hash doesn't use a stable key/salt, so the same content will generate different hashes. I discovered this recently, but haven't had a chance to verify and submit an issue.
Assuming that's true, it would certainly seem like no one has found this data useful since it started being collected (other than, perhaps, granting some false assurance against consistent exceptions). On the other hand, if we fix this issue and expose some more exception details in the UI, perhaps it would end up being good information to have.
Yes, makes sense.
Thanks @daguiler and @bdukes for looking into this RFC and providing feedback! I'm hoping @donker will chime in with more details behind that table and how he intended it to function.
I do recall we've had quite a few regression issues with the feature over the last few years and hopefully we can nail down what it should be doing and make it happen. I think it would be valuable to have that information, right now it appears it's just causing more issues then it's solving.
please be aware that by default, all Log types for exceptions (SCHEDULER_EXCEPTION, GENERAL_EXCEPTION, MODULE_LOAD_EXCEPTION, PAGE_LOAD_EXCEPTION) are configured to keep only the last 100 entries. When the eventlog entry is deleted, the exceptionevent is deleted as well and only the exeption with the stack trace is left, i.e. with the current structure, you won't get any data for statistics, unless you configure all these types to keep ALL entries. (In my scripts, I recently limited those entries to max age of - by default - two years, als either the DNN version is too old or the item would no longer match the code.
I don't see much benefit of having two 1:1 tables EventLog and ExceptionEvents, merging both into one would result in a slightly larger single table with a lot of empty columns (for non-exceptions), but the duplicate columns UserID and PortalID would be eliminated and a single table would be easier to handle.

If there is no use for Exceptions being stored separately any more, we should consider moving the messages and stacktrace back to EventLog table as well (they don't take significant space, if the columns are Null) and we would avoid the costly join when retrieving the data.
Referrer and UserAgent might be interesting on Login or first visit as well.
We also need to be aware that this tables may contain personal data, affected by GDPR.
@daguiler I think it's actually an issue that the function being used to calculate the hash doesn't use a stable key/salt, so the same content will generate different hashes. I discovered this recently, but haven't had a chance to verify and submit an issue.
Assuming that's true, it would certainly seem like no one has found this data useful since it started being collected (other than, perhaps, granting some false assurance against consistent exceptions). On the other hand, if we fix this issue and expose some more exception details in the UI, perhaps it would end up being good information to have.
I wrote the original change, so I'd better jump on this discussion. Before my intervention there was no Exceptions table. And to find an answer to "Which bit of malfunctioning code is wrecking my DNN's performance?" was near impossible to answer. So the thought was as Oliver mentioned to be able to get to grips with bugs in your installation. The UI, however, was never done because ... well, lots of reasons.
In all their wisdom at some point someone at Corp injected the LogGUID back into the hash so that defeated the purpose of the intervention. I believe (I'd have to trace this) that I took that out again.
What I do see, however, is this bit of code in the Hash function:
// DNN-8845: using a FIPS compliant HashAlgorithm
using (var hasher = new HMACSHA1())
{
var byteArray = hasher.ComputeHash(Encoding.Unicode.GetBytes(sb.ToString().ToLowerInvariant()));
return Convert.ToBase64String(byteArray);
}
I'm not familiar with HMACSHA. I implemented MD5 first. But it looks like they changed this to comply with FIPS: https://dnntracker.atlassian.net/browse/DNN-8845. But from what I can see this now defeats our purpose because HMACSHA1 is a keyed algorithm so when the key changes the hash changes (I think). They probably should have chosen SHA1CryptoServiceProvider instead which is not keyed. Can anyone second that observation?
Also I was hoping to offload the EventLog table from some data as well. The exceptions are stored in there as well. That should be (IMHO) deprecated.
Finally the UI: we should have a screen that shows the top 10 exceptions. Then you should be able to drill down by clicking on one to show the individual events. I.e. the list we now see but filtered by exception hash.
Yeah, it looks like that FIPS-compliant issue was the cause of the issue I was seeing. (Some government computers have a FIPS-compliant setting which causes programs to crash when they try to use the MD5 function or other non-compliant algorithms, so even though it's not being used in a security context, it's important not to use it).
I found this out when I used this code as the basis for a function I was using and realized it didn't work. Here's what I ended up with:
using (var hasher = SHA1.Create())
{
var byteArray = hasher.ComputeHash(Encoding.Unicode.GetBytes(data));
var hashedToken = Convert.ToBase64String(byteArray);
return hashedToken;
}
Personally, I don't know that I've ever used the data in the table to look at exceptions in the aggregate, but if there were a view of aggregate exceptions, I would think it was useful.
So for the worklist for this RFC we can add changing this algorithm at the very least.
I fear, we are mixing logging and analysis.
The key requirements for a log table:
Yeah, I'm not a proponent of the single table. The idea was to offload the largest block of text from the main table, namely the stacktrace. If you have a bit of malfunctioning code that runs on every page hit (and this is not an uncommon scenario) your EventLog grows like mad.
So for the worklist for this RFC we can add changing this algorithm at the very least.
Yup, I'll get it added into the description and suggest to use the code @bdukes provided.
I assumed it still was working after the last time I fixed it. I didn't verify the hash was still valid before creating the RFC. It's been something I've wanted to fix for awhile now but just haven't had the chance to create the UI for the table.
@donker, offloading rows would only make sense, if these columns take much space. If we consider most of the rows having empty fields for stack trace, inner stack trace, etc. each Null value just takes one bit (not byte). If there is a value, duplicate values occupies space for the text size of all messages, but by default limiting each exception type to 100 items kept, the size is neglectable. And by deleting old items, occurrence info will be deleted permanently as well.
However, any analysis (e.g. checking for Hash already existing) takes a lot of extra effort when recording a log entry.
@donker, offloading rows would only make sense, if these columns take much space. If we consider most of the rows having empty fields for stack trace, inner stack trace, etc. each Null value just takes one bit (not byte). If there is a value, duplicate values occupies space for the text size of all messages, but by default limiting each exception type to 100 items kept, the size is neglectable. And by deleting old items, occurrence info will be deleted permanently as well.
However, any analysis (e.g. checking for Hash already existing) takes a lot of extra effort when recording a log entry.
I see your point. Would you add columns then for the error message, stacktrace and/or inner exception? They are currently folded in an XML blurb which makes SQL manipulation next to impossible. The XML blurb could just as well be offloaded to disk as far as I'm concerned. But the exception is crucial to determine big from small errors in the system.
@donker, I would use individual columns, which ate taking less space and are easier to retrieve.
I wouldn't offload the XML blurb to disk, that would make analysis unnecessarily more difficult.
I wouldn't offload the XML blurb to disk, that would make analysis unnecessarily more difficult.
There is a lot of "junk" in there as well which looks more like something for a regular (disk based) log. I'd keep the best bits in SQL.
I definitely think we need to keep the exceptions abstracted into a new table to allow us to use the data for the original intent. This way, folks like us can do better analysis of what is affecting a DNN instance. We just need to fix the data and have a UI for it.
Yup, I'm all for keeping it, fixing the hash, and creating the UI.
If that's the consensus of the group, I can get this created as a feature and close out this RFC.
Any objections? @donker @sleupold @bdukes @daguiler @hismightiness
I'm all for it.
We need to get back to the requirements for logging:
Following this pattern, any event should be logged using a single database call, which just adds a single record to the log table. We should add a frequently running scheduler job, which analyses the log, separates exceptions and exception events, maintains a statistics table for it (e.g. events per day) and delete all analyzed items from the log keeping it small and fast.
@sleupold I think completely rewriting the logging engine is out of scope for this RFC. We are trying to put things back together and make small improvements to the event logging UI.
I think we can start another RFC for gathering requirements for a completely rewrite of how we're logging in DNN today. After we get the async flag added, we should be able to at least make logging a non-blocking call.
Any objections?
Sounds good to me.
No, I don't request a complete rewrite, but a restructure of processing the data. the current structure might be generated in the background process, but we need to free up logging from unnecessary burden to regain original performance.
PS: just adding a UI for no data is not useful imo.
I'd like to elaborate on the performance effect, as we were recently discussing the problem of incrementing CRM version, which may cause a multi hour downtime on high perf sites. Now consider a situation, when such a site gets into a loop of constantly logging errors (e.g. when the database size is limited and this limit is reached). Either the number of errors force the site to be shut down or the error logging continuously slows down the site, as each logging is analyzing the event generating hash, searching for the hash in the database and if not found, adding it. Writing EventLog and ExceptionEvent entries, which includes statements to delete outdated records per log type.
The original AddEventlog method just took the parameters and added a single log entry to the log table (and deleted outdated entries from this table, which often timed out, ending up in log tables increasing significantly and affecting performance). The split into multiple tables might reduce database size, but doesn't fix the general problem.
The optimal solution would be a log table, where just a single record gets added in process while a background job is executed frequently to analyze the log entries, taking appropriate actions, which includes preparing data for UI, and deleting analyzed data from the log.
The more I think and investigate about logging events and exceptions as well as auditing data changes in DNN, I feel a need to review the current practice, to make sure, it suits the needs of our users and to eliminate redundancies.
Issues and requirements:
Are there other needs from other users?
@sleupold I think you have a great start here. I would say that if we are taking a step back I'd want to add a few more goals to the process.
@mitchelsellers,
I agree - and like to add:
I don't run huge sites that have a severe performance hit due to logging, but a UI for viewing log4net logs, changing the log4net settings and that they do cleanup would be super useful!
So I think to resume this thread:
To come to the performance issue: exceptions should be exceptional. Ideally you have none. But if they occur frequently then there is an issue with your site and the logging performance hit will be the least of your worries. It's like improving insurance payout speed after accidents. Sure it's nice to have but ultimately you just want to not have accidents.
@donker
with 1. you mean fixing the hash algorithm and the cleanup issues?
regarding performance, I am having a different POV. If we want to improve DNN in this area (preventing it from being called the "old webforms turtle amongst CMS") we need to change the developers' mindset, to consider performance implications and possible improvements on every change of the platform. There are situations, where thousands of errors are logged (sometimes one error causing the other) and we should consider strategies to handle it. Ideally, those checks and logs and cleanups should not significantly affect DNN performance.
with 1. you mean fixing the hash algorithm and the cleanup issues?
Yes.
Given the other changes that we have on the plate for the platform and the hashing fix that has already been committed I am going to close this.
Post version 9.4.0 we might revisit the longer-term strategy regarding exceptions when we can make breaking changes if needed.
Most helpful comment
I definitely think we need to keep the exceptions abstracted into a new table to allow us to use the data for the original intent. This way, folks like us can do better analysis of what is affecting a DNN instance. We just need to fix the data and have a UI for it.