_After previously filing #3559 and using the back ported fix, I've come across some compelling reasons for using Indexes on InnoDB tables._
In his comments on the above issue, @mwjames writes
...adding [a PRIMARY KEY] index just for the sake of the PRIMARY seems questionable unless of course there is evidence that the query planner would prefer the PRIMARY over any existing index.
...adding PRIMARY KEY to all tables arbitrarily is going to be difficult so I'm currently undecided on how to proceed. One possible solution could be for those users that require a more strict key validation to add an extra index definition via a hook (see #3675 (comment)) and gets loaded as soon as SMW related tables are altered (during setup or update).
After reading this, I would agree that if the problem is just my finicky DB, then there is no reason to make a core change in SMW. I appreciate the method that @mwjames developed to allow me to add primary keys--it works!--but I thought there must be something else going on, so I did quick research and came across Tuning InnoDB Primary Keys that states:
InnoDB is called an index-organized storage engine. An index-organized storage engine uses the B-Tree of the primary key to stores the data, the table rows. That means a primary key is mandatory with InnoDB. If there is no primary key for a table, InnoDB adds a hidden auto-incremented 6 bytes counter to the table and use that hidden counter as the primary key. There are some issues with the InnoDB hidden primary key. You should always define explicit primary keys on your tables. In summary, you access all InnoDB rows by the primary key values.
(Emphasis throughout this issue is mine, not the original.)
Now, that is written by the principal architect at Percona (the makers of the DB that is forcing primary keys on me), so I want a neutral source.
I found something that looks compelling. The MySQL ref manual, in the section titled "Clustered and Secondary Indexes" states:
Every InnoDB table has a special index called the clustered index where the data for the rows is stored....If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values.
...All indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.
So, why not just (continue) to rely on InnoDB's hidden clustered index? I think the answer is found in the conclusion of Scalability issues due to tables without primary keys (albeit, written in 2013 against MySQL 5.5, whose documentation describes InnoDB indexes the same as the 8.0 documentation):
Tables without primary keys cause a wide range of contention because they rely on acquiring dict_sys mutex to generate row-id values. This mutex is used at critical places within InnoDB. Hence the affect of large amount of INSERTs to tables without primary keys is not only isolated to that table alone but can be seen very widely. There are a number of times I have seen tables without primary keys being used in many different scenarios that include simple INSERTs to these tables as well as multi-row INSERTs as a result of, for example, INSERT … SELECT into a table that is being temporarily created. The advice is always to have primary keys present in your tables. Hopefully I have been able to highlight the true impact non-presence of primary keys can have.
Note that even with the fix for #3675 et al, I still got key collisions on smw_di_wikipage. I added an autoinc primary key to that table and fixed the problem. It now looks like:
CREATE TABLE `wiki1smw_di_wikipage` (
`s_id` int(8) unsigned NOT NULL,
`p_id` int(8) unsigned NOT NULL,
`o_id` int(8) unsigned NOT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
KEY `s_id` (`s_id`,`p_id`),
KEY `p_id` (`p_id`,`o_id`),
KEY `o_id` (`o_id`)
) ENGINE=InnoDB AUTO_INCREMENT=325805 DEFAULT CHARSET=binary |
I don't disagree with those cited sources (and those expert opinions) but the project has to strike a balance between affordable support while moving the software towards a better encapsulation and still provide room for improvements both in terms of maintainability and functionality. That being said:
After previously filing #3559 and using the back ported fix, I've come across some compelling reasons for using Indexes on InnoDB tables.
My previous comment still stands "... possible solution could be for those users that require a more strict key validation to add an extra index definition via a hook (see #3675 (comment)) and gets loaded as soon as SMW related tables are altered (during setup or update). ..."
created. The advice is always to have primary keys present in your tables. Hopefully I have been able to highlight the true impact non-presence of primary keys can have.
To make this happen, you would to have to force each SMW user to delete his/her data before updating the DB schema otherwise you run into a possible duplicate key error during the upgrade which is probably a no-go for bigger installations otherwise the entire installation is halted.
[0] (#3895) hook was introduced for a reason, namely to allow users to choose a stricter schema hereby opens that functionality to those that needed it while using the "default" (it has been this since its inception) schema avoids to deal with the "NULL issue" as outlined in #3559.
If you wanted to automated this sort of setup then I suggest someone creates an extension to provide this as a service but I don't want to make the schema stricter by default otherwise some users will complain, get errors, and then I have to (there isn't anyone else who answers technical questions!) explain why.
If you wanted to automated this sort of setup then I suggest someone creates an extension to provide this as a service...
Then that looks like something I'll have to do for PerconaDB. I need to understand the schema better, though, so is there a good place to start?
I _think_ I have a pretty good start on it, but I'm still running into problems like the one I mentioned above where I was attempting to use a primary key with s_id, p_id, and o_id combined.
Would it be better to create a visible autoinc key for most smw_* tables or only in the places where I'm getting collisions? If I have to start using an additional autoinc field, though, that looks like I'll have to submit a pull request so that the field could be added. (Note that the collision only occurred during a rebuildData run and, after the run completed, there were no collisions, which leads me to wonder if the collision was a sign of a bug.)
CREATE TABLE
wiki1smw_di_wikipage(
s_idint(8) unsigned NOT NULL,
p_idint(8) unsigned NOT NULL,
o_idint(8) unsigned NOT NULL,
idint(11) NOT NULL AUTO_INCREMENT,
Would it be better to create a visible autoinc key for most smw_* tables or only in the places where I'm getting collisions?
To be honest, I'm bit lost here as to why you want to create a new id field here. The smw_di* and smw_fpt* table only hold references in their *_id fields (p_id,o_id,s_id) to the only ID authority in SMW, namely the smw_object_ids table. Adding a field that is being auto incremented with no connection to the overall graph seems questionable and you will have a hard time to convenience me of its usefulness.
I would have imagined you use s_id + p_id_ + o_id a UNIQUE index constraint [0] to avoid any insert collisions but adding id as an auto increment field seems not right here.
(Note that the collision only occurred during a rebuildData run and, after the run completed, there were no collisions, which leads me to wonder if the collision was a sign of a bug.)
I wouldn't know because we don't use any primary key and most likely don't encounter such issue which is one reason I don't want to make this a default schema.
[0] https://dev.mysql.com/doc/refman/8.0/en/constraint-primary-key.html
CREATE TABLE
wiki1smw_di_wikipage(
s_idint(8) unsigned NOT NULL,
p_idint(8) unsigned NOT NULL,
o_idint(8) unsigned NOT NULL,
idint(11) NOT NULL AUTO_INCREMENT,I would have imagined you use
s_id+p_id_+o_ida UNIQUE index constraint [0] to avoid any insert collisions but addingidas an auto increment field seems not right here.
I agree, which is why I did that initially (well, I used PRIMARY not UNIQUE, but none of the columns were null during the collision), but, as I said, this resulted in collisions during rebuildData.php. I'll stick with id for now, but, if I have time, I'll try to figure out where the collisions come from.
@hexmode While we cannot just push primary keys (reasons given above) into every table, yet SMW 3.2 will provide #4732 and users can load that profile if they deemed it necessary.