Auto-increment IDs are a feature of most databases. They're an easy way to create an id which is unique for each row in a relational database, so easy, it's not often thought about.
However, auto-increment IDs have several undesirable features:
When Ghost was originally built, I already identified the need to be able to identify resources globally, and so all models in Ghost have both an auto-increment id and a uuid.
Far from solving the problem, this feels like the worst of both worlds:
The proposal for Ghost 1.0.0 is to move towards using a single ID for each resource, which is guaranteed to be globally unique. After some research into node modules which provide unique id's (there are many and this one was particularly promising), and also looking at what other platforms use, we settled on the idea of using ObjectID - the ID format that is used by MongoDB, via this npm module: https://github.com/williamkapke/bson-objectid.
Note, there are some small downsides as this could have a performance impact. One interesting potential performance issue is that innoDB uses auto-increment ids to optimise inserts. However, as ObjectID is still serial I don't believe this will be an issue.
Using ObjectId can also increase field and index sizes, I believe the impact for us will be minimal. At the moment, we do a poor job of adding indexes to Ghost, so even the fact that we intend to add an optimise them is going to be an improvement over the existing system. Further, we're reducing from having two potential IDs to need indexes on, to having one. There is plenty of content showing that the performance impact is minimal or a worthwhile tradeoff.
Finally, the one place where objectID might not be nice, is for post previews, we may need to revisit the implementation of these, ref #5097.
Already started with the implementation, but waiting for https://github.com/tgriesser/bookshelf/issues/1383 right now 👍
@ErisDS Shouldn't UUIDv4 be unique enough for this use case?
From reading the UUIDv4 Wikipedia page section about collisions:
https://en.wikipedia.org/wiki/Universally_unique_identifier#Random%5FUUID%5Fprobability%5Fof%5Fduplicates
Values are:
n = 68,719,476,736 (2^36), chance = 0.0000000000000004 (4×10^−16)
n = 2,199,023,255,552 (2^41), chance = 0.0000000000005 (5×10^−13)
n = 70,368,744,177,664 (2^46), chance = 0.0000000005 (5×10^−10)
As a global identifier this should be a pretty safe bet.
To limit collisions even more you can set the uuid field as the primary key or set a unique index and thus limit same-table collisions even though global collisions would still be possible in theory.
It's been a while since I've used MongoDB but by re-reading the documentation it would seem like ObjectIds are not much more random or unique than UUIDv4s.
objectid (12 bytes) = time (4 bytes) + machine identifier (3 bytes) + pid (2 bytes) + inc (3 bytes)time = 4 bytesmachine identifier = 3 bytespid = 2 bytesinc = 3 bytesI haven't done the collision chance calculation but my gut feeling tells me it can't guarantee a lower collision rate then UUIDv4.
The only real advantage I remember from working with ObjectIds is that since the time is the prefix you have a natural ordering by insert date.
I haven't done the collision chance calculation but my gut feeling tells me it can't guarantee a lower collision rate then UUIDv4.
The idea is to at least guarantee no collision within a single system, like Ghost(Pro) where we can ensure that pids are unique etc.
If we really need to improve this further, we could always move to our own version of ObjectID that uses a proper machine fingerprint, but it seems to me if the current implementation is good enough for MongoDB, it's probably good enough for us.
The only real advantage I remember from working with ObjectIds is that since the time is the prefix you have a natural ordering by insert date.
This is a pretty big deal, especially as we're going to be focusing our efforts on optimising for MySQL (and InnoDB). This is the main reason for not using uuid.
closed via #7495
Most helpful comment
The idea is to at least guarantee no collision within a single system, like Ghost(Pro) where we can ensure that pids are unique etc.
If we really need to improve this further, we could always move to our own version of ObjectID that uses a proper machine fingerprint, but it seems to me if the current implementation is good enough for MongoDB, it's probably good enough for us.
This is a pretty big deal, especially as we're going to be focusing our efforts on optimising for MySQL (and InnoDB). This is the main reason for not using uuid.