Mattermost-server: Investigate removing unnecessary indices

Created on 22 Feb 2019  ·  22Comments  ·  Source: mattermost/mattermost-server

If you're interested please comment here and come join our "Contributors" community channel on our daily build server, where you can discuss questions with community members and the Mattermost core team. For technical advice or questions, please join our "Developers" community channel.

New contributors please see our Developer's Guide.


Notes: Jira ticket

When adding the materialized public channels table (https://github.com/mattermost/mattermost-server/pull/9349), I intentionally duplicated the indices (for present columns) on the Channels table so as to reduce the complexity of the changes and rely on similar query planner behaviour.

I suspect some of these indices could be removed altogether, reducing database load overall (mostly on write). This ticket is an open-ended task to review our use of indices in the sql store.

If you have any questions or want to discuss the ticket, feel free to ask @jesse.hallam or @christopher in the ~Developers: Performance channel on community.mattermost.com

ArePerformance Hard Help Wanted TecGo

All 22 comments

@jwilander @esethna I'm interested in taking a look at this and have mentioned how I would go about taking care of the task on the "Developers : Performance" channel. The discovery of duplicate keys can be automated fairly simply using a tool such as https://www.percona.com/doc/percona-toolkit/2.1/pt-duplicate-key-checker.html

Sounds good @BenKnigge 👍

Hey @BenKnigge,

Did all your questions got resolved?

Are you still working on this @BenKnigge ?

@hanzei If @BenKnigge is not able to contribute I would like to.

Let's give @BenKnigge a few days to respond and see then.

I would also be interested in taking this issue.

@RajatVaryani Would you be open on passing this ticket to @Pomyk or do you want to work on this yourself?

I've started a new role and haven't done anything with this since my
initial research into it.

Finding a duplicate index is fairly simple.

Finding unnecessary indexes is more complicated.

Ideally whomever works on this would have access to usage stats for one or
more live databases that are under active participation.

On Mon, Nov 4, 2019 at 2:31 PM Ben Schumacher notifications@github.com
wrote:

@RajatVaryani https://github.com/RajatVaryani Would you be open on
passing this ticket to @Pomyk https://github.com/Pomyk or do you want
to work on this yourself?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mattermost/mattermost-server/issues/10347?email_source=notifications&email_token=ACJHMF2UB3EYU45C2FRWXKTQSCIENA5CNFSM4GZGTW42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDAZBGI#issuecomment-549556377,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACJHMF7C5GDUIXQ3ZYOOP5DQSCIENANCNFSM4GZGTW4Q
.

--
Benjamin Knigge
[email protected]
tel: +1-310-570-2880

@hanzei I don't mind passing this ticket.

Then it's all yours @Pomyk

Thanks @RajatVaryani

I have analyzed code in channel_store.go, ran some explains and checked used indices stats in percona server after some basic app usage.
I have found that the following indices are unnecessary on PublicChannels:

  • idx_publicchannels_name - it's covered by the unique index on (Name, TeamId)
  • idx_publicchannels_name_lower, idx_publicchannels_displayname_lower - not used by postgres when searching with '%' (a trigram index could be used by LIKE/ILIKE with %)
  • idx_publicchannels_delete_at - unless it's normal to have a lot more deleted channels than not deleted

Index on TeamId could be extended to another column, for example:

  • (TeamId, DisplayName) - to get rid of sorting in queries with order by DisplayName
  • (TeamId, DeleteAt) - to filter deleted channels
    I'm not sure which one would be better.

These index changes could be done for Channels table too.
Also found some queries that look over complicated and will try to make a PR to simplify them.

Duplicate indices found by pt-duplicate-key-checker: https://gist.github.com/Pomyk/b374964f059f694234981d7d49227ec6

cc @lieut-data

@Pomyk, wow this is a terrific analysis -- thank you!

I have analyzed code in channel_store.go, ran some explains and checked used indices stats in percona server after some basic app usage.

Would you be open to sharing your setup and testing strategy? We'd likely want to replicate your setup both with a loadtest (https://github.com/mattermost/mattermost-load-test) and possibly even on our community Mattermost instance, given the potential presence of enterprise-only queries and such.

@reflog and @streamer45, would it make sense for you to work with @Pomyk on this issue, given your existing loadtest and overall performance investigations?

Most of the work was reading code and extracting example queries. Usually you can guess which indexes would be used just by looking at the query, but I have also used explain on the more complicated ones in mysql and postgres.
I have also used https://www.percona.com/doc/percona-server/LATEST/diagnostics/user_stats.html from Percona as it's easier to find used indexes than with performance_schema. Looks like PostgreSQL has index statistics too, but I haven't used them yet.

I will gladly work with you, I have even applied for the Back End Engineer role at Mattermost ;)

Reactions table has inefficient primary key (UserId,PostId,EmojiName). Using (PostId,UserId,EmojiName) would be better as it would use index scan when fetching reactions for a post. Now it scans the whole table.

@Pomyk Thank you for your excellent analysis :+1:
Would love to see some benchmarks of the queries involved before and after your proposed changes. Does that sound good?
Also, if you need any help, we can move the discussion to the ~Developers: Performance channel.

OK, I will do some benchmarks.

benchmarks: https://github.com/Pomyk/mattermost-server/commit/1b4f4d3c703dbbfe9967efb50be22e82f73cfeb2

some helper scripts I used: https://github.com/Pomyk/mattermost-benchmark-scripts

sample results:

    benchmark                                                    old ns/op     new ns/op     delta
    BenchmarkChannels/mysql/GetPublicChannelsForTeam/1-16        514718        511093        -0.70%
    BenchmarkChannels/mysql/GetMoreChannels/1-16                 610817        600723        -1.65%
    BenchmarkChannels/mysql/GetDeleted/1-16                      819924        832318        +1.51%
    BenchmarkChannels/mysql/SearchInTeam/1-16                    628560        628895        +0.05%
    BenchmarkChannels/mysql/SearchArchivedInTeam/1-16            1328315       1443086       +8.64%
    BenchmarkChannels/mysql/SearchMore/1-16                      748883        732619        -2.17%
    BenchmarkChannels/mysql/SearchGroupChannels/1-16             687369        683080        -0.62%
    BenchmarkChannels/postgres/GetPublicChannelsForTeam/1-16     704532        676958        -3.91%
    BenchmarkChannels/postgres/GetMoreChannels/1-16              1042366       1011997       -2.91%
    BenchmarkChannels/postgres/GetDeleted/1-16                   1077233       1060135       -1.59%
    BenchmarkChannels/postgres/SearchInTeam/1-16                 911217        820803        -9.92%
    BenchmarkChannels/postgres/SearchArchivedInTeam/1-16         2281084       2154007       -5.57%
    BenchmarkChannels/postgres/SearchMore/1-16                   1257868       1199514       -4.64%
    BenchmarkChannels/postgres/SearchGroupChannels/1-16          1487470       1500727       +0.89%
    BenchmarkReactions/mysql/GetForPost-16                       17370649      280530        -98.39%
    BenchmarkReactions/postgres/GetForPost-16                    3060279       269196        -91.20%

Modified tables:

CREATE TABLE `Channels` (
  `Id` varchar(26) NOT NULL,
  `CreateAt` bigint(20) DEFAULT NULL,
  `UpdateAt` bigint(20) DEFAULT NULL,
  `DeleteAt` bigint(20) DEFAULT NULL,
  `TeamId` varchar(26) DEFAULT NULL,
  `Type` varchar(1) DEFAULT NULL,
  `DisplayName` varchar(64) DEFAULT NULL,
  `Name` varchar(64) DEFAULT NULL,
  `Header` text,
  `Purpose` varchar(250) DEFAULT NULL,
  `LastPostAt` bigint(20) DEFAULT NULL,
  `TotalMsgCount` bigint(20) DEFAULT NULL,
  `ExtraUpdateAt` bigint(20) DEFAULT NULL,
  `CreatorId` varchar(26) DEFAULT NULL,
  `SchemeId` varchar(26) DEFAULT NULL,
  `GroupConstrained` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`Id`),
  UNIQUE KEY `Name` (`Name`,`TeamId`),
  KEY `idx_channels_create_at` (`CreateAt`),
  KEY `idx_channels_team_id` (`TeamId`,`DeleteAt`,`DisplayName`),
  FULLTEXT KEY `idx_channel_search_txt` (`Name`,`DisplayName`,`Purpose`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

CREATE TABLE `PublicChannels` (
  `Id` varchar(26) NOT NULL,
  `DeleteAt` bigint(20) DEFAULT NULL,
  `TeamId` varchar(26) DEFAULT NULL,
  `DisplayName` varchar(64) DEFAULT NULL,
  `Name` varchar(64) DEFAULT NULL,
  `Header` text,
  `Purpose` varchar(250) DEFAULT NULL,
  PRIMARY KEY (`Id`),
  UNIQUE KEY `Name` (`Name`,`TeamId`),
  KEY `idx_publicchannels_team_id` (`TeamId`,`DeleteAt`,`DisplayName`),
  FULLTEXT KEY `idx_publicchannels_search_txt` (`Name`,`DisplayName`,`Purpose`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

CREATE TABLE `Reactions` (
  `UserId` varchar(26) NOT NULL,
  `PostId` varchar(26) NOT NULL,
  `EmojiName` varchar(64) NOT NULL,
  `CreateAt` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`PostId`,`UserId`,`EmojiName`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Was this page helpful?
0 / 5 - 0 ratings