Umbraco-cms: Database indexes missing on foreign keys

Created on 8 Oct 2019  路  12Comments  路  Source: umbraco/Umbraco-CMS

Description

_UPDATE: I've added notes about specific indexes that need to be created below_

SQL server does not enforce that indexes are created on FK columns but in many (most) cases they should have indexes since quite often is the case that these columns are queried against.

One important one that is missing is:

CREATE NONCLUSTERED INDEX TEST_IX_umbracoContentVersion_nodeId ON dbo.umbracoContentVersion (nodeId)

We query against this column a lot!

So we definitely need that Index created but there's probably lots of others...

I found an SQL script that lists all FK columns that are missing indexes from this site: https://www.mssqltips.com/sqlservertip/5004/script-to-identify-all-nonindexed-foreign-keys-in-a-sql-server-database/

For reference, here's an article describing the benefits of indexing FK columns: https://sqlperformance.com/2012/11/t-sql-queries/benefits-indexing-foreign-keys

Here are the results of that script:

| Table_Name|Column_Name|
| ------------- | -----|
|cmsContentType2ContentType |childContentTypeId|
|cmsContentTypeAllowedContentType |AllowedId|
|cmsDictionary |parent|
|cmsDocumentType |templateNodeId|
|cmsLanguageText |languageId|
|cmsLanguageText |UniqueId|
|cmsMember2MemberGroup |MemberGroup|
|cmsMemberType | NodeId|
|cmsPropertyType| contentTypeId|
|cmsPropertyType| dataTypeId|
|cmsPropertyType| propertyTypeGroupId|
|cmsPropertyTypeGroup |contenttypeNodeId|
|cmsTagRelationship |propertyTypeId|
|cmsTagRelationship |tagId|
|cmsTask |nodeId|
|cmsTask |parentUserId|
|cmsTask |taskTypeId|
|cmsTask |userId|
|umbracoAccess |loginNodeId|
|umbracoAccess |noAccessNodeId|
|umbracoAccessRule| accessId|
|umbracoContent |contentTypeId|
|umbracoContentSchedule |languageId|
|umbracoContentSchedule |nodeId|
|umbracoContentVersion |nodeId|
|umbracoContentVersion |userId|
|umbracoContentVersionCultureVariation |availableUserId|
|umbracoDocumentVersion |templateId|
|umbracoDomain |domainRootStructureID|
|umbracoLog |userId|
|umbracoNode| nodeUser|
|umbracoRedirectUrl |contentKey|
|umbracoRelation |childId|
|umbracoRelation |relType|
|umbracoUser2NodeNotify |nodeId|
|umbracoUser2UserGroup |userGroupId|
|umbracoUserGroup |startContentId|
|umbracoUserGroup |startMediaId|
|umbracoUserLogin |userId|
|umbracoUserStartNode |startNode|
|umbracoUserStartNode |userId|

What to fix

We definitely need the umbracoContentVersion.nodeId indexed but we should review all of the above and determine if adding indexes to any of these will be beneficial. We only really want to add indexes if they are used in queries in our codebase. Adding too many indexes will slow down data insertion which we don't want.

In most cases you can find the "DTO" object for a table such as: Umbraco.Core.Persistence.Dtos.ContentVersionDto and then you can righ click on it's column that you want to search for to see if it's used in queries. For example, for the ContentVersionDto.NodeId, i right click this member and say find references in VS, then i can see how many queries it's used in

How to fix

For indexes that should exist, the DTO classes will need to be updated to declare the index and then a migration will need to be written to add all of the new indexes.


_This item has been added to our backlog AB#4131_

communitup-for-grabs

Most helpful comment

Hey, I took a look through this last night and found the following uses on those properties. Anything that I could see definitely did a JOIN, SELECT, WHERE, ORDERBY I've listed as needs index. There's a few things I couldn't find (CMSTask for example), and then I've listed anything that's just used for inserting as "no" (I assume they don't get indexed). There's also a handful that I didn't know enough about the codebase to work out, where the property is handed off to another method which may or may not do querying somewhere else.

Hopefully it's a useful start though. I found all of the dtos using the class here

If these make sense, I could maybe look at updating the DTOs to include the relevant attributes. Can you give an example of a DTO that already has an index attribute, and then I can follow that style.

| Table_Name | Column_Name | Num Uses | Needs index? | Notes |
|---------------------------------------|-----------------------|------------------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| cmsContentType2ContentType | childContentTypeId | 2 | maybe? | ContentTypeCommonRepository.cs orders a query by this param |
| | | | | ContentTypeRepositoryBase.cs uses this in 2 inserts |
| cmsContentTypeAllowedContentType | AllowedId | 5 | no | only used in inserts |
| cmsDictionary | parent | 4 | yes | DictionaryRepository.cs uses this in Where SqlSyntax.GetQuotedColumnName("Parent") |
| cmsDocumentType | templateNodeId | 2 | no | only used in inserts |
| cmsLanguageText | languageId | 5 | yes | DictionaryRepository.cs uses this in ConvertFromDto, Where LanguageID > 0 |
| cmsLanguageText | UniqueId | 4 | yes | DictionaryRepository.cs uses this on the right hand side of a join |
| cmsMember2MemberGroup | MemberGroup | 7 | yes | MemberGroupRepository.cs uses this as the On part of a join. MemberRepository.cs uses this in two Where statements to find all members in a role. MemberRepository.cs uses this in two Where statements to GetByMemberGroup |
| cmsMemberType | NodeId | 1 |no | insert only |
| cmsPropertyType | contentTypeId | 25 | yes | as part of joins all over the place |
| cmsPropertyType | dataTypeId | 37 | yes | as part of joins all over the place |
| cmsPropertyType | propertyTypeGroupId | 23 | yes | as part of joins, orderbys, wheres, all over the place |
| cmsPropertyTypeGroup | contenttypeNodeId | 8 | yes | ContentTypeCommonRepository.cs and MemberTypeRepository.cs use this for joins |
| cmsTagRelationship | propertyTypeId | 10 | yes | TagRepository.cs and ContentTypeRepository.cs use this for joining |
| cmsTagRelationship | tagId | 9 | yes | TagRepository.cs and ContentTypeRepositoryBase.cs use this for joining |
| cmsTask | nodeId | can't find in codebase | | |
| cmsTask | parentUserId | can't find in codebase | | |
| cmsTask | taskTypeId | can't find in codebase | | |
| cmsTask | userId | can't find in codebase | | |
| umbracoAccess | loginNodeId | 3 | unsure | insert only |
| umbracoAccess | noAccessNodeId | 3 | unsure | insert only |
| umbracoAccessRule | accessId | | yes | DocumentRepository.cs uses for joins |
| umbracoContent | contentTypeId | 34 | yes | as part of joins, orderbys, wheres, all over the place |
| umbracoContentSchedule | languageId | 2 | unsure | used in a methd "GetIsoCodeById" |
| umbracoContentSchedule | nodeId | 7 | yes | Used in Wheres in DocumentRepository |
| umbracoContentVersion | nodeId | 27 | yes | Used in lots of wheres all over the place |
| umbracoContentVersion | userId | 7 | yes | used in a join in DocumentRepository.cs |
| umbracoContentVersionCultureVariation | availableUserId | 1 | unsure | inserts only (column name also has a "rename this" comment on it) |
| umbracoDocumentVersion | templateId | 8 | unsure | |
| umbracoDomain | domainRootStructureID | 3 | no | just used for creating new objects |
| umbracoLog | userId | 5 | yes | used in a join for AuditRepository.GetBaseQuery |
| umbracoNode | nodeUser | 67 | yes | used in a join on ContentRepositoryBase.ApplySystemOrdering, selects in EntityRepository |
| umbracoRedirectUrl | contentKey | 5 | yes | used in a where RedirectUrlRepository.Get |
| umbracoRelation | childId | 3 | unsure | appears to only be inserts |
| umbracoRelation | relType | 6 | maybe? | used in by RelationRepository.PerformGetAll in an orderby and PerformGetByQuery in an orderby |
| umbracoUser2NodeNotify | nodeId | 6 | yes | used in joins and wheres in NotificationsRepository |
| umbracoUser2UserGroup | userGroupId | 8 | yes | used in joins in UserGroupRepository and UserRepository |
| umbracoUserGroup | startContentId | 10 | yes | UserGroupRepository uses it in a GroupBy statemetn |
| umbracoUserGroup | startMediaId | 11 | yes | UserGroupRepository uses it in a GroupBy statemetn |
| umbracoUserLogin | userId | 2 | yes | used in a where on UserRepository.ClearLoginSessions |
| umbracoUserStartNode | startNode | 6 | yes | used in selects in UserFactory, UserRepository |
| umbracoUserStartNode | userId | 6 | yes | used in WhereIns in UserRepository |

All 12 comments

Hey, I took a look through this last night and found the following uses on those properties. Anything that I could see definitely did a JOIN, SELECT, WHERE, ORDERBY I've listed as needs index. There's a few things I couldn't find (CMSTask for example), and then I've listed anything that's just used for inserting as "no" (I assume they don't get indexed). There's also a handful that I didn't know enough about the codebase to work out, where the property is handed off to another method which may or may not do querying somewhere else.

Hopefully it's a useful start though. I found all of the dtos using the class here

If these make sense, I could maybe look at updating the DTOs to include the relevant attributes. Can you give an example of a DTO that already has an index attribute, and then I can follow that style.

| Table_Name | Column_Name | Num Uses | Needs index? | Notes |
|---------------------------------------|-----------------------|------------------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| cmsContentType2ContentType | childContentTypeId | 2 | maybe? | ContentTypeCommonRepository.cs orders a query by this param |
| | | | | ContentTypeRepositoryBase.cs uses this in 2 inserts |
| cmsContentTypeAllowedContentType | AllowedId | 5 | no | only used in inserts |
| cmsDictionary | parent | 4 | yes | DictionaryRepository.cs uses this in Where SqlSyntax.GetQuotedColumnName("Parent") |
| cmsDocumentType | templateNodeId | 2 | no | only used in inserts |
| cmsLanguageText | languageId | 5 | yes | DictionaryRepository.cs uses this in ConvertFromDto, Where LanguageID > 0 |
| cmsLanguageText | UniqueId | 4 | yes | DictionaryRepository.cs uses this on the right hand side of a join |
| cmsMember2MemberGroup | MemberGroup | 7 | yes | MemberGroupRepository.cs uses this as the On part of a join. MemberRepository.cs uses this in two Where statements to find all members in a role. MemberRepository.cs uses this in two Where statements to GetByMemberGroup |
| cmsMemberType | NodeId | 1 |no | insert only |
| cmsPropertyType | contentTypeId | 25 | yes | as part of joins all over the place |
| cmsPropertyType | dataTypeId | 37 | yes | as part of joins all over the place |
| cmsPropertyType | propertyTypeGroupId | 23 | yes | as part of joins, orderbys, wheres, all over the place |
| cmsPropertyTypeGroup | contenttypeNodeId | 8 | yes | ContentTypeCommonRepository.cs and MemberTypeRepository.cs use this for joins |
| cmsTagRelationship | propertyTypeId | 10 | yes | TagRepository.cs and ContentTypeRepository.cs use this for joining |
| cmsTagRelationship | tagId | 9 | yes | TagRepository.cs and ContentTypeRepositoryBase.cs use this for joining |
| cmsTask | nodeId | can't find in codebase | | |
| cmsTask | parentUserId | can't find in codebase | | |
| cmsTask | taskTypeId | can't find in codebase | | |
| cmsTask | userId | can't find in codebase | | |
| umbracoAccess | loginNodeId | 3 | unsure | insert only |
| umbracoAccess | noAccessNodeId | 3 | unsure | insert only |
| umbracoAccessRule | accessId | | yes | DocumentRepository.cs uses for joins |
| umbracoContent | contentTypeId | 34 | yes | as part of joins, orderbys, wheres, all over the place |
| umbracoContentSchedule | languageId | 2 | unsure | used in a methd "GetIsoCodeById" |
| umbracoContentSchedule | nodeId | 7 | yes | Used in Wheres in DocumentRepository |
| umbracoContentVersion | nodeId | 27 | yes | Used in lots of wheres all over the place |
| umbracoContentVersion | userId | 7 | yes | used in a join in DocumentRepository.cs |
| umbracoContentVersionCultureVariation | availableUserId | 1 | unsure | inserts only (column name also has a "rename this" comment on it) |
| umbracoDocumentVersion | templateId | 8 | unsure | |
| umbracoDomain | domainRootStructureID | 3 | no | just used for creating new objects |
| umbracoLog | userId | 5 | yes | used in a join for AuditRepository.GetBaseQuery |
| umbracoNode | nodeUser | 67 | yes | used in a join on ContentRepositoryBase.ApplySystemOrdering, selects in EntityRepository |
| umbracoRedirectUrl | contentKey | 5 | yes | used in a where RedirectUrlRepository.Get |
| umbracoRelation | childId | 3 | unsure | appears to only be inserts |
| umbracoRelation | relType | 6 | maybe? | used in by RelationRepository.PerformGetAll in an orderby and PerformGetByQuery in an orderby |
| umbracoUser2NodeNotify | nodeId | 6 | yes | used in joins and wheres in NotificationsRepository |
| umbracoUser2UserGroup | userGroupId | 8 | yes | used in joins in UserGroupRepository and UserRepository |
| umbracoUserGroup | startContentId | 10 | yes | UserGroupRepository uses it in a GroupBy statemetn |
| umbracoUserGroup | startMediaId | 11 | yes | UserGroupRepository uses it in a GroupBy statemetn |
| umbracoUserLogin | userId | 2 | yes | used in a where on UserRepository.ClearLoginSessions |
| umbracoUserStartNode | startNode | 6 | yes | used in selects in UserFactory, UserRepository |
| umbracoUserStartNode | userId | 6 | yes | used in WhereIns in UserRepository |

@liamlaverty awesome work!

For adding indexes, you'll need to do 2 things:

The challenge now is to actually figure out how we query the data. It's actually not going to solve a huge amount of issues by just putting an index on a single FK column if that column is always being looked up as part of another comparison on the same table. In those cases it makes more sense to create an index across multiple columns.

For example, this PR fixes an issue with list view performance https://github.com/umbraco/Umbraco-CMS/pull/7316 which adds an index to both umbracoContentVersion.nodeId and umbracoContentVersion.current. In testing, if adding an individual index to both columns performance was not improved because it would still require a full table scan since SQL server doesn't use multiple indexes when trying to seek through a table, just the first or best one that it considers. I'm not an SQL server DB guru but from many things that I've read throughout the years thsi is what I've come to understand and the recent tests with this PR show that's true.

In the case of the umbracoContentVersion table based on @liamlaverty 's amazing chart, it just might make sense that this single new index covers 3x columns: nodeId, current, userId since userId is used in the joins for the same listview query. But this all comes down to testing, using Sql Server mgmt tools and looking at the Live Query Statistics, Execution Plan and Client Statistics. I wish it were easier to determine what needs an index or not but there should be some obvious ones i think. I've updated the desc to cross out umbracoContentVersion since i think for now we can consider that one done with the above listed PR.

Obvious ones

Here's a list of ones we defo need based on the above tables

  • [x] cmsDictionary.parent - we do query on this individually, will certainly help with large dictionaryies and editing in the back office
  • [ ] cmsLanguageText.UniqueId - this needs a Unique constraint index which is missing, it's unique! it's worth noting that just because it's type is uniqueidentifier doesn't mean it enforces constraints
  • [ ] cmsMember2MemberGroup.MemberGroup - this seems like it would be instantly beneficial especially when querying members by groups. Since there is no other column in this table there can't be a multi column index
  • [ ] cmsPropertyTypeGroup.contenttypeNodeId - this seems obvious to me, we have a PK and a unique constraint index on this table but nothing else so i think it would make sense to have a non clustered index on this FK column, we use it in lots of joins
  • [ ] umbracoAccessRule.accessId - yep, this one is obvious, no other cols that are not indexes are queried/joined
  • [ ] umbracoContent.contentTypeId - YES! How on earth is this missing? wow, we definitely need this
  • [ ] umbracoUserLogin.userId - yep makes sense, we only query this table currently by 2 differnet values at different times

Not needed

  • cmsLanguageText. languageId - we actually don't query on this, it's used in DictionaryRepository.ConvertFromDto but that is querying on in memory objects, not on Sql
  • cmsTask - doesn't exist in v8
  • umbracoRedirectUrl.contentKey - We already have the unique non-clustered index on this table which is across multiple columns which includes this column
  • umbracoRelation - we already have a unique non-clustered index across multiple columns including the ones listed above.
  • umbracoUser2NodeNotify - it's PK is a clustered index across all 3 columns
  • umbracoUser2UserGroup - it's existing index is across both userid and user group id
  • umbracoUserStartNode it's existing index is across all cols listed above

More Research

umbracoNode

I think it would be worth exploring the current indexes on umbracoNode which includes several indexes of a single column: objectType, parentId, path, trashed, uniqueId are all seperate indexes (uniqueId must be it's own since it's unique). But I'd guess that we'd get more perf out of having less indexes over more columns. As an example, when you sort on "Created By" in a list view and analyze the query produced, the Execution Plan it says this index should be created:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[umbracoNode] ([parentId],[nodeUser],[nodeObjectType])
INCLUDE ([uniqueId],[level],[path],[sortOrder],[trashed],[text],[createDate])

Of course we don't support the INCLUDE syntax since we also need to support SqlCe, but this is interesting since it would definitely be better for this query to have a single index over [parentId],[nodeUser],[nodeObjectType]. It's worth noting that just because an index has multiple columns in it doesn't mean it won't be used when only one of those columns is included in a where clause.

cmsPropertyType

This is one that is a bit critical and we already have 2x indexes on it. So like umbracoNode we need to explore in more detail how this is queried against and if would be more beneficial to have multi column indexes instead of what we currently have. Clearly we are missing some though since we don't have indexes on it's FKs :/

If we can get some stats on this one and figure out how best to index it, it could yield some great perf results.

cmsTagRelationship

It would make a lot of sense to have an index on propertyTypeId but i a few cases we also join on tagId. I 'think' in this case it would prob make sense to have a multi column index across both of these but we would want to verify that.

umbracoContentSchedule

Need to investigate if we should have a single index over both languageId, nodeId

umbracoLog

we do join on the FK user id but we also query and sort on a bunch of other columns depending on the methods used: logHeader, Datestamp, (possibly other custom ones)

Need to investigate the best approach for this table.

umbracoUserGroup

There are already 2x clustered indexes on this table. The existing ones prob don't make sense to join to one because we don't query on both an alias and a name at the same time, but we might include some of the other cols with these ones.

Next steps?

I think it would make sense that separate PRs are created for each of the obvious ones above. Any questions about how to do it just ask :)

For the more research ones, if anyone wants to give it a try to see what you can determine, that would be fanastic !

FWIW: "Of course we don't support the INCLUDE syntax since we also need to support SqlCe"

Given that SqlCe-4.0 exits support July-2021, perhaps this limitation might be worth reviewing?

@c9mbundy yep but for now in v8 we will continue to support SqlCe. In a future netcore version we might not but we would still need to support an embedded DB of some sort (in theory) and in which case we'd be limited again. It would be possible instead to avoid this limitation entirely and have different index definitions per DB type, that would of course be a lot of changes to the plumbing but should be possible. Allowing INCLUDE syntax for indexes adds even more 'know-how' to what should/shouldn't be done so that also complicates things a bit more since I'm no expert on best practices for that.

@Shazwazza am I right in thinking that if I wanted to give a couple of these ago, I do the following in discrete PRs:

  • update the relevant file in Umbraco.Core.Persistence.Dtos
  • create a file in Umbraco.Core.Migrations.Upgrade.V_8_X_X
  • add a reference to that file in Umbraco.Core.Migrations.Upgrade -> MigrationPlan.cs

@liamlaverty Yep! If you wanna start with a simple one, we can review and you'll be an expert in migrations in no time :)

@liamlaverty You can also check my (unfinished) PR as reference: https://github.com/umbraco/Umbraco-CMS/pull/7141

Took a bash at this on cmsDictionary.parent at lunch time, took a fair bit of the code from #7141. If I'm on the right tracks I'll give a few more a go and make some PRs.

There was a comment in #7141 that suggested the Name part of [Index(IndexTypes.NonClustered, Name = "IX_" + TableName + "_Parent")] was unnecessary because it's generated elsewhere, but I wasn't certain if that was the case, so I've left it in just now.

https://github.com/liamlaverty/Umbraco-CMS/commit/6b570cd5e3ea71cd5f55864cd6815e7506973255

Hey @liamlaverty, i think it would be easiest for now to just make the migration simple, don't worry for now about trying to be fancy and making a single migration to do all of the indexes. Might be easiest to just start with the basics, make one migration for this one index. Don't worry about reflection, etc...

Maybe later we can combine this into a more fancy approach to multiple indexes at once. In which case we don't need to use reflection like is being done there, you can get a tables metadata by doing this:

var tableInfo = Context.Database.PocoDataFactory.ForType(typeof(ContentNuDto));

which will give you all of the table data you need.

Note to take the index as described in https://github.com/umbraco/Umbraco-CMS/issues/8823 into acocunt.

Was this page helpful?
0 / 5 - 0 ratings