V8-archive: Add UUID support for Primary Keys

Created on 26 Sep 2018  Β·  25Comments  Β·  Source: directus/v8-archive

The uuid type support was removed was when we change all system types, because this type requires a fixed data: type=string + length=36.

In other word it needs to be a special type that doesn't require any user data but its name to work.

enhancement

Most helpful comment

From my point of view the best option is to use standard UUID. And don't try to resolve 'parallel' needs by modifyng it.

For standard queries in a relational database like the one we work with directus, we know beforehand in wich table the related register must be searched.

For "no tied" fk ids we can:

  • look up at every table
  • maintain a cache lookup hash table with UUID -> collection
  • in M2X like fields: have a pairing field (name it directus_related_table) alongside the fk UUID field, identifiying the referenced table/collection. In fact this is the way I achieved M2X relationships in directus 6.

At most times a "free table" fk field will allways be tied at a limited (handful) of possible related tables really (in the "normal" case we need to perfrom a join inside the database). An other "out of scope" case is when we can relate to foreing external data. And in "normal" M2X cases it's good to known the possible target tables wich can be addressed from the related_table field metadata: not just when querying but when binding to available UUIDs in the UI).

In conclusion, I think it's not necessary to alter UUID format to economize query costs and it can create unwanted incompaibilities.

All 25 comments

would ramsey UUID work?

This spec looks good to me. @WellingGuzman how about you?

@rakesh-verma-16 we are using the UUID library by Ramsey to generate the UUIDs.

What we need here is to do two things:

  1. When a field is created as uuid type, it should enforce the length to be 36 characters (including 4 hyphens).
  2. Generate an UUID when not value passed on insert or update, or throw an error if it's configured to not allow user defined UUID.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

How's the situation with this one? I'd need M2X for a project of mine, and I think implementing this (with an interface) is required before the M2X could be worked on? I could try digging into the codebase and this issue if it's not under work already, should be a decent introductory task

@Jonesus I think we should have "native support" for UUID primary keys in the api: meaning that it can be a replacement for auto-increment integers. When that's done, we indeed have to create an interface that will tie multiple collections that use this UUID primary key together.

UUID support would be great! Any news since v2.0.18 is out? Can I help you on this?

No updates yet, but we're eager to get this added in as it paves the way for M2X support.

I think the first step is to decide on the UUID format. We should be looking at standard formats, and custom ones... and could have an interface option to change this, or multiple interfaces (one for each).

Standard

123e4567-e89b-12d3-a456-426655440000
This is the standard, and has the added benefit of the built in vendor functions like UUID().

Custom

xxx-123e4567-e89b-12d3-a456-426655440000
xxxe4567-e89b-12d3-a456-426655440000
This would be a Directus specific implementation that allows faster collection lookups by prefixing an immutable collection key. We could either add this to the beginning or use some of the first few UUID characters for this. The collection key would be stored in directus_collections.key so we can change the collection name in the future without breaking the PKs.

The second implementation should be more performant in retrieving items in a relational setup. By being able to know what collection a record is from, we don't have to go searching through every single collection in the whole database.

We could generate the first 8 random chars of the UUID and use that for the directus_collections.key that way it's still a standard UUID format... though we'd be losing a little of the uniqueness by re-using the first X chars. Or would you prefer prepending the key and keeping the full proper UUID?

https://en.wikipedia.org/wiki/Universally_unique_identifier

I don't think it has to be a full GUID, as long as it's unique within the Directus installation.

though we'd be losing a little of the uniqueness by re-using the first X chars

ΒΏWhat? All of these are still unique:

rijk-2
rijk-3

They're not less or more unique than any other unique string πŸ™‚ They look less random, sure, but that doesn't hurt anything. If anything, having the ID prefixed means that we have to do less randomization as the ID now only has to be unique within the table, instead of globally, as they're prefixed.

Right β€” I meant if we "set" the first X characters of our version of UUID then it's less _globally_ unique. Obviously within a single collection/project this doesn't matter. But, if we're looking to have this PK type be a bit more "future proof" then we could keep this as a UUID... which would leave the door open for inter-project relationships. Imagine being able to create data relationships between APIs!

Although then we'd also want a prefix for the project/API to more easily reference that. Tricky.

So, if we'd rather simplify all of this we should have a few different interfaces/types:

  • UUID β€” Standard, no frills
  • Directus Key β€” Concat collection key and primary key, eg: e89b-1
  • Directus UUID β€” TBD, allows faster lookup for Project, Collection, PK (globally unique)

So, if we'd rather simplify all of this we should have a few different interfaces/types:

Let's work this out into 1 ID type that works instead of having 3 slightly different implementations πŸ™‚

That's fine too β€” but I still think we'll end up with two. I think it's important to offer an unadulterated UUID type since that's a standard. But since that isn't optimized for our Directus use-cases we can have a Directus Key that includes project/collection data.

Collection Key is easy to prepend... do you like the idea of a globally unique Directus Key that allows for inter-project relationships and has some sort of Project/API key? If so, any idea how that we might implement that?

Imagine being able to create data relationships between APIs!

Please take a look in UUIDv5, which is namespaced. https://en.wikipedia.org/wiki/Universally_unique_identifier#Versions_3_and_5_(namespace_name-based)

As long as you declare a namespace with a project's path, you're all fine.

Hey @mvhirsch β€” that's correct! But...

However, neither the namespace nor name can be determined from the UUID

What we're trying to achieve here is a way that we can "lookup" a project (namespace) and Collection (table) from the UUID. Since these are hashed we can't actually pull that info for the lookup. However, there's no easy way to add the project in without a full URL in the key (which won't work)... so we'll likely have a lookup KVP in the M2XP (? any collection/project) interface options.

From my point of view the best option is to use standard UUID. And don't try to resolve 'parallel' needs by modifyng it.

For standard queries in a relational database like the one we work with directus, we know beforehand in wich table the related register must be searched.

For "no tied" fk ids we can:

  • look up at every table
  • maintain a cache lookup hash table with UUID -> collection
  • in M2X like fields: have a pairing field (name it directus_related_table) alongside the fk UUID field, identifiying the referenced table/collection. In fact this is the way I achieved M2X relationships in directus 6.

At most times a "free table" fk field will allways be tied at a limited (handful) of possible related tables really (in the "normal" case we need to perfrom a join inside the database). An other "out of scope" case is when we can relate to foreing external data. And in "normal" M2X cases it's good to known the possible target tables wich can be addressed from the related_table field metadata: not just when querying but when binding to available UUIDs in the UI).

In conclusion, I think it's not necessary to alter UUID format to economize query costs and it can create unwanted incompaibilities.

Without a doubt we will include a vanilla UUID PK option. But we may also include a Directus specific PK that helps support our M2X relationship out-of-the-box.

In conclusion, I think it's not necessary to alter UUID format to economize query costs and it can create unwanted incompaibilities.

We should test this before making any conclusions..

I'm absolutely on board with the idea of _keep it simple, stupid_, but querying _every_ table has to be way slower than querying just the right one right? If you have 5 tables with 200k items each, the difference should be far from negligible.

To achieve better clarity/visibility, we are now tracking feature requests within the Feature Request project board.

This issue being closed does not mean it's not being considered.

So after discussing internally, this is what we've come up with (we'd love feedback!):

We love the idea of a new directus_uuid lookup table. But instead of storing _every_ record that uses a UUID in it (which could be a _lot_ if we move all system tables to use this), we can be more selective. As of now, we already know the collection for any given PK... the _only_ time we need to use the lookup table is for relationships that can save items from multiple tables (which we've been calling M2X or M2MM for many-to-any and many-to-many-many respectively).

Things are a bit easier since we want to support this "multiple collection" for M2O and O2M... and since Directus doesn't actually have a M2M relationship (it actually uses a O2M+M2O). So we're essentially adding two new relationship types:

One-to-Any (O2X)
Any-to-One (X2O)
_These will also give us Many-to-Any (M2X), etc_

The important takeaway here is that we only need to store records in directus_uuid when using a relationship with the new "X" (multiple collections). We also want to make sure that the API handles saving all lookups automatically (not the App). Sooo...

  1. API receives data (directly or from App)
  2. Looks for any relationships with an "X" (eg: O2X)
  3. Saves new O2X items in the related collection (like normal)
  4. Saves the lookup for that related item in directus_uuid

I know this is a bit of a brain bender, but does all this make sense @hemratna ?

I have a question - will this vanilla UUID be unique in scope of the specific project? If so, will it be unique between different environments (but for the same project)? For example, I want to export some data from CI environment to Staging, but it's possible that Staging was already updated with some new data. And it's possible that UUIDs of this new data items will be the same as UUIDs of those items that I'm trying to import. In this case accidentally the Staging data can be overridden by CI data. It especially related to those fields that keep keys from other tables. I think one of the ways to prevent that will be having a special initial seed that can be configured manually as one of the project settings and will guarantee that all UUIDs will be unique and not conflicting with UUIDs that were generated by using other initial seeds...
Is it possible to implement?

Hey @yarick2000 β€” I don't think this is an issue at all. Our UUIDs follow the official spec, so the probability of a collision (duplicate) is so small that it can normally be ignored. This is also true across project instances... thus the UUID (Universally Unique).

Ok, thank you

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vuhrmeister picture vuhrmeister  Β·  3Comments

24js picture 24js  Β·  3Comments

benhaynes picture benhaynes  Β·  4Comments

rijkvanzanten picture rijkvanzanten  Β·  3Comments

chintohere picture chintohere  Β·  3Comments