Please see latest spec in: https://github.com/prisma/prisma/issues/3405
In order to be able to optimise performance for the actual workload, it should be possible for the development team to specify the exact indexes they need.
All models are backed by a single database table with a id
column containing the node id. This column is indexed as the primary key.
Uniqueness is guaranteed by creating a unique index on the column.
type User @model {
id: ID! @isUnique
name: String @isUnique
city: String
}
This model generates a table with two indexes:
All relations are implemented with a intermediate relation table:
```graphql
type User @model {
id: ID! @isUnique
name: String
posts: [Post!]! @relation(name: "Posts")
}
type Post @model {
id: ID! @isUnique
title: String
creator: user! @relation(name: "Posts")
}
generates three database tables:
#### User
| id | name |
|---|-------|
|string|string|
- primaryKey(id)
#### Post
| id | title |
|---|-------|
|string|string|
- primaryKey(id)
#### Posts
| id | A | B |
|---|-------|-|
|string|string|string|
- primaryKey(id)
- unique(id)
- index(A)
- index(B)
- unique(A,B)
### Simplifying indexes on relation tables
#### Remove the id coulmn
There is no need for a dedicated id column on relation tables
Instead, use the compound index A,B as the primary key to guarantee that a node can only be inserted once into the relation
#### Two indexes is enough to provide fast traversal
To provide fast traversal in both directions we need two indexes: one on A and one on B.
As we already have a compound index with A as prefix, we just need an index on B
We end up with this structure:
| A | B |
|-------|-|
|string|string|
- primaryKey(A,B)
- index(B)
## Complexity Analysis
### Single Model
#### non unique field
{allUsers(filter:{age: 25})}
Currently: Tablescan
Best Case: index lookup | index(age)
{allUsers(filter:{age: 25, orderBy: MONEY_ASC})}
Currently: Tablescan
Best Case: index lookup | index(age, money)
More Realistic: index lookup + filesort | index(age)
#### two fields
{allUsers(filter:{age: 25, name: "Karl"})}
Currently: Tablescan
Best Case: compound index | index(age, name)
More Realistic: Index intersection or single key index | index(age), index(name)
{allUsers(filter:{age: 25, name: "Karl"}, orderBy: MONEY_ASC)}
Currently: Tablescan
Best Case: compound index | index(age, name, money)
More Realistic: Index intersection or single key index + filesort | index(age), index(name)
#### range query
{allUsers(filter:{age_gt: 25})}
Currently: Tablescan
Best Case: index seek | index(age)
{allUsers(filter:{age_gt: 25}, orderBy: AGE_ASC)}
Currently: Tablescan
Best Case: index seek | index(age)
{allUsers(filter:{age_lt: 25}, orderBy: AGE_ASC)}
Currently: Tablescan
Best Case: index seek + filesort | index(age) (range filter and orderby not compatible)
{allUsers(filter:{age_gt: 25}, orderBy: MONEY_ASC)}
Currently: Tablescan
Best Case: index seek + filesort | index(age)
#### range query and equality
{allUsers(filter:{age_gt: 25, name: "Karl"})}
Currently: Tablescan
Best Case: compound index seek | index(name, age)
More Realistic: index seek + table lookup | index(name)
[Sorting same as range query]
#### range query on two fields
{allUsers(filter:{age_gt: 25, money_gt: 10000})}
Currently: Tablescan
Best Case: compound index seek | index(name, money)
More Realistic: index seek + table lookup | index(money)
{allUsers(filter:{age_gt: 25, money_gt: 10000}, orderBy: MONEY_ASC)}
Currently: Tablescan
Best Case: compound index seek | index(name, money)
More Realistic: index seek + table lookup + filesort | index(money)
{allUsers(filter:{age_gt: 25, money_gt: 10000}, orderBy: FAMILYMEMBERS_ASC)}
Currently: Tablescan
Best Case: compound index seek | index(name, money, familyMembers)
More Realistic: index seek + table lookup + filesort | index(money)
### Multiple Models
#### simple relation
{User(id:""){posts{title}}}
Currently: 2 x index lookup | index(A), index(id)
Best Case: 2 x index lookup | index(A), index(id)
{User(id:""){posts(orderBy: LIKES_ASC){title}}}
Currently: 2 x index lookup + filesort | index(A), index(id)
Best Case: 2 x index lookup | mirror(posts.likes), index(A, likes), index(id)
#### relation with single equality
{User(id:""){posts(filter:{title: "nice title"}){title}}}
Currently: index lookup + overfetching index lookup | index(A), index(id)
Best Case: 2 x index lookup | mirror(posts.title) index(A, title), index(id)
{User(id:""){posts(filter:{title: "nice title"}, orderBy: LIKES_ASC){title}}}
Currently: index lookup + overfetching index lookup + filesort | index(A), index(id)
Best Case: 2 x index lookup | mirror(posts.likes), mirror(posts.title), index(A, title, likes), index(id)
More Realistic: index lookup + overfetching index lookup | mirror(posts.likes), index(A, likes), index(id)
#### relation with range query
{User(id:""){posts(filter:{likes_gt: 3}){title}}}
Currently: index lookup + overfetching index lookup | index(A), index(id)
Best Case: index seek + index lookup | mirror(posts.likes) index(A, likes), index(id)
{User(id:""){posts(filter:{likes_gt: 3}, orderBy; AGE_ASC){title}}}
Currently: index lookup + overfetching index lookup + filesort | index(A), index(id)
Best Case: index lookup + overfetching index lookup | mirror(posts.age) index(A, age), index(id)
{User(id:""){posts(filter:{likes_gt: 3}, orderBy; LIKES_ASC){title}}}
Currently: index lookup + overfetching index lookup + filesort | index(A), index(id)
Best Case: index lookup + index seek | mirror(posts.likes) index(A, likes), index(id)
## Mirror fields to relation table
Many of the Best Case and Realistic Case scenarios above depend on one or more columns from a model table to be mirrored in the relation table. Mirroring a column into the relation table enables us to create a compound index across relations. In the relation example above, we might introduce a mirrored field like this:
```graphql
type User @model {
id: ID! @isUnique
name: String
posts: [Post!]! @relation(name: "Posts")
}
type Post @model {
id: ID! @isUnique
title: String # mirror this field into Posts relation
creator: User! @relation(name: "Posts")
}
generates three database tables:
| id | name |
|---|-------|
|string|string|
| id | title |
|---|-------|
|string|string|
| id | A | B | B_title |
|---|-------|-|-|
|string|string|string|string|
The new compound index allows us to speed up queries like this:
{allUsers(filter:{posts_any: {title: "A Title"}}) {...} }
Mirrored fields must be kept in sync by the graphcool backend
types.graphql
type User @model {
id: ID! @isUnique
name: String @index
}
# or specify sort order
type User @model {
id: ID! @isUnique
name: String @index(sort: ASC)
}
type User @model {
id: ID! @isUnique
name: String @index(groups: [{name: "combined_name_and_age", position: 0}])
age: String @index(group: [{name: "combined_name_and_age", position: 1}])
}
type User @model {
id: ID! @isUnique
name: String @index(groups: [{relation: "Posts"}])
posts: [Post!]! @relation(name: "Posts")
}
type Post @model {
id: ID! @isUnique
title: String
creator: User! @relation(name: "Posts")
}
Improves performance of queries like:
{allPosts(filter:{creator:{name: "John"}})}
```graphql
type User @model {
id: ID! @isUnique
name: String @index(groups: [{name: "combined_name_and_title", position: 0, relation: "Posts"}])
posts: [Post!]! @relation(name: "Posts")
}
type Post @model {
id: ID! @isUnique
title: String @index(groups: [{name: "combined_name_and_title", position: 1, relation: "Posts"}])
creator: User! @relation(name: "Posts")
}
Improves performance of queries like:
```graphql
{allPosts(filter:{title: "Great Title", creator:{name: "John"}})}
To specify a unique constraint covering multiple fields, simply use the @unique
directive instead of @index
:
graphql
type User @model {
id: ID! @isUnique
name: String @unique(groups: [{name: "combined_name_and_age", position: 0}])
age: String @unique(group: [{name: "combined_name_and_age", position: 1}])
}
@sorenbs An excellent and thorough analysis!
Before I go any further, you might have a typo in the definition of the 'Post table':
type Post @model {
id: ID! @isUnique
title: String
creator: Post! @relation(name: "Posts") // <-- should read 'creator: User! @relation(name: "Posts")?
}
I think the syntax also needs to cover simple non-join tables as per follows:
type Country @model {
id: ID!
countryName: String! @isUnique
regions: [Region!] @relation(name: "CountryRegions")
// Regions are States in the US, but 'Provinces' in other countries
}
type Region @model {
id: ID!
country: Country! @IsUnique(groups: [{name: "UK_CountryRegions", position: 0}])
stateName: String! @isUnique(groups: [{name: "UK_CountryRegions", position: 1}]
} // UK_ here means 'Unique Key'
Also, I do not think that generated M2M Join tables are a good idea. One thing I learned the hard way over the years, even after having been given this piece of advice, was that if a M2M join table is created solely for the purpose of joining, without any additional info in it, then the database analysis is probably missing some important feature, and needs to be looked at again. Some M2M join tables also sometimes join 3 or 4 tables.
In principle, a M2M join table needs to include some other qualifying columns, above and beyond the standard 'CreatedAt, CreatedByUser' etc. For eg:
Real-Estate Agents (who often work in pairs, and can work with agents from another Agency, under a deal arrangement), is mapped to a specific property, sometimes as the 'Lead' agent, sometimes as the 'Assistant'. So the M2M join table might look like:
type MapAgent2Property @model {
id: ID! @isUnique
agentID: ID!
propertyID: ID!
agentRole: RoleEnum! // 'Primary', 'Assistant', 'CoAgent' etc.
dealAgreementID: ID // The details of who gets what % of the sale
}
So what I am saying is that generated M2M tables are not really feasible. The DB analyst should be able to define the table properly, and define the necessary (multi-column) Unique keys to ensure consistency.
So, your proposed syntax is good, but does not need to be as complex as you are making it. I really like the 'sort_order' option as well.
Good job on the thorough analysis!
BTW, what MarkDown tag did you use to get the cool Markup on your types?
` ` ` graphql
awesome, thanks! @kbrandwijk
Thanks @Trellian - some good thoughts!
First - I have corrected the typo.
About the M2M join table: A core design goal of Graphcool is to enable pain-free schema migrations preserving existing data.
One of the design decisions this has led to is that we always use a M2M-style join table when creating relationships. This way, transforming a 1-m relationship into a M2M is literally a matter of changing the type of a single field to a list. As we don't have to move data around, we can perform this operation quickly without downtime and without loosing data. This also allows us to introduce Kims proposal https://github.com/graphcool/framework/issues/746 in a way that does not require changing the underlying datastructure.
We should carefully consider if this level of flexibility is really required and if the tradeoff in performance is worth it, so thank you for bringing it up.
An alternative could be that relations by default never use a relation table, and M2M relations are not supported. If a relation table is required (either to add extra fields on the relation or to create a M2M relation), then the @edge
type proposed by Kim is required.
Do you think that would be a better approach? My main concern is that this introduces extra complexity that makes it more difficult to get started with Graphcool.
@sorenbs I like the @edge syntax. It's effectively defining a join table.
But I don't want to get off topic here, though. We're talking about having multi-column unique indexes/keys. Once you have those in place, a whole new world opens up. Without them, one can't produce a real-world database, except using the BeforeHook
approach, which is effectively throwing away the support for such in the backend RDBMS, which every good RDBMS is exceptionally good at. Field Constraints, Table Constraints, Relation Constraints all belong in the RDBMS. That's what the RDBMS is there for, after all.
Can I suggest that we open a new topic for discussing the impact of Join tables? Please also note, that a Join table is just like any other, no need to abstract it out, really. If there is extra information on the 'edge' with a join table, then that info is useful. Even for the simplest case of just order-by.
@sorenbs I don't see https://github.com/graphcool/framework/issues/746 marked as 1.0 ๐
I agree with @Trellian that multi-column uniqueness constraints are far more important than having a join table for being able to easily convert a one to many 1-M relationship into a many to many M2M relationship. I need many to many relationships occasionally, so don't want to get rid of that ability and realize that would still need a join table, but IMO it's going to be pretty rare to want to transition between those join situations, and I'd be fine with that being manual and cumbersome (need to create new tables, migrate data, rename) if it meant I got multi-column constraints (and indexes) which I need all the time.
The proposal mentions
To specify a unique constraint covering multiple fields, simply use the @unique directive instead of @index:
type User @model {
id: ID! @isUnique
name: String @unique(groups: [{name: "combined_name_and_age", position: 0}])
age: String @unique(group: [{name: "combined_name_and_age", position: 1}])
}
type User @model {
id: ID! @unique(groups: [{name: "post_title_per_user", position: 0}])
name: String
age: String
}
type Post @model {
id: ID! @isUnique
title: String! @unique(groups: [{name: "post_title_per_user", position: 1}])
}
@isUnique
and @unique
?@marktani in answer to #1
,
The necessary syntax is simpler than in #1
above, I think it should look like this:
type User @model {
id: ID! @Unique
name: String
age: String
Posts: [Post!] @relation(name: "user_posts")
}
type Post @model {
id: ID! @Unique
author: User! @relation(name: "user_posts) @unique(groups: [{name: "post_title_per_user", position: 0}])
title: String! @unique(groups: [{name: "post_title_per_user", position: 1}])
}
or maybe this would be simpler and easier to implement?:
type Post @model {
id: ID! @Unique
author: User! @relation(name: "user_posts)
title: String!
@unique({name: "UK_user_posts", columns: {"author", "title"})
// or even simpler, no need for a unique key name, most RBMS don't need them, or they `autogenerate`:
// @unique(columns: {"author", "title"})
}
Unique keys are Table constraints, and are never allowed to span tables directly in RDBMS theory.
For #2
, there is no difference between @isUnique
and @Unique
How will multi-field unique constraints be reflected in where
input objects?
Example schema:
type Post {
id: ID! @unique
title: String @unique(groups: [{name: "combined_title_and_slug", position: 0}])
slug: String @unique(group: [{name: "combined_title_and_slug", position: 1}])
}
Query by title
:
query {
post(where: {
title: "My biggest adventure"
}) {
id
title
published
}
}
Query by slug:
query {
post(where: {
id: "my-biggest-adventure"
}) {
id
title
published
}
}
Is there a way to fetch a post by a unique title
+ slug
combination? I thought of this:
query {
post(where: {
combined_title_and_slug: {
slug: "my-biggest-adventure"
title: "My biggest adventure"
}
}) {
id
title
published
}
}
I'd just like to mention that I'm really hurting for a compound primary key at the moment to provide an upsert on these fields
Yes, echoing the need for a multi-column index to enforce uniqueness.
Right now we are manually enforcing the combined uniqueness by performing a "where" at all places a creation can take place.
what about @fulltext()
index?
+1 for multi-column index and _unique_ indexes!
Suggestion looks good!
Is anything happening here? How can we help?
About multi column unique indexes, which i'm in dire need of at the moment: Why have 2 decorators @isUnique
and @unique
- could that possibly be done with a single one?
Generally agree Foreign Keys, multi column index, and constraints are super important in a robust for production RBDMS. I dont understand the design decision for defaulting to join tables, seems optimizing for the wrong thing, 80-20 rule, relational dbs are ALL about relationships, native on those entities, FKs are important, if u need an entire new table every time u need a simple FK that is a huge overhead, huge performance impact adding joins to SQL, and just generally hard to work with engineering wise, especially coming into a project with a big graph/schema.
Adding to @bjm88 I really don't understand, how this feature could be not in the list of top-priorities. Running a production DB without keys and constraints and any kind of load more than nothing is absurd. And without compound indexes in lot cases you just can't guarantee data consistency.
@terion-name That's the argument I have been trying to make for months, and have been shot down repeatedly. It is such a fundamental feature that I seriously wonder if the guys behind graphcool have any real-world experience at all. I'm not trying to be nasty here. It's just an observation. I wouldn't even consider releasing a product like this without those basic features. It will be dead in the water on the first even slightly non-trivial database. You wouldn't even be able to get the basic old-favourite 'video-store' DB to work properly. @bjm88 I agree totally with you too. The solution as it stands is totally unworkable in the real world.
I started a medium-sized project with Prisma ... just after 3 days of inputting data into it, I started to notice that it chocked badly, because? ... no indexes. And there was a weird Apollo client bug which rendered everything just inconsistent. I thus gave-up and simply moved to another stack instead.
Not that I'm too impatient, but I have noticed in the past that features you ask don't usually get implement as eagerly as you'll want them to be (in the days of Graphcool and now), even if it seems important. Sure, this is open-source, but I don't have time to learn Scala and I don't see the need to. And there's the internal latency with Prisma, which seems to be an additional 100ms.
Well, all those things adding-up together: I had to explore my options.
@marktani please, this feature is absolutely crucial for production, a first-tear. no sense in adding connectors or any features if it just can't utilize db performance.
Switching to other solutions because of the lack of this feature.
Hi Prisma team, I think you are seeing that building an ORM it is crucial to get relationships and performance right working with databases. This project has such potential, I really hope you consider doing organic foreign keys on tables themselves and taking index setup and management seriously. Have you considered using GitCoin to have people be able to pay for features to help support them. I suspect this is actually not hard to implement, many other ORMs do it, but I simply cannot use this library without these basic design principles for working with a db, would be happy to support with $ if you enable it.
Adding in my vote for this feature requirement. Composite foreign keys, indexes and the implementation of proper RDBMS architecture are a must for this project.
For those that run into performance problems using Postgres, check out Hasura as an option.
All things considered, i like prisma better - but when performance is important, this unfortunately is on the slower side atm.
Hey all,
thanks for sharing your concerns with us! :pray:
First of all i would like to say that performance is an important topic for us and internally we are discussing this all the time. We are committed to improving performance as a continuous effort.
However indexes are a just one facet of performance. During the last weeks we created an extensive benchmarking suite which we used to drive our decisions on which areas must be improved first performance wise. In our tests just adding indexes did not help performance as much as we would like it to. Instead we identified our SQL queries and application code as the biggest problems right now.
Therefore we spent a lot of time during the last weeks to refactor our SQL queries to be more performant. In addition we profiled the Prisma application code a lot and were able to implement significant improvements for CPU and memory consumption. The first big batch of those improvements is landing in our release 1.14
which went into the beta stage this week. ๐
If you are eager to give it a spin just use the tag/version 1.14-beta
. We have some more improvements in the pipeline for the subsequent releases as well. When we are through with all of those changes we have planned, we will turn to indexes again and i am confident that they will then bring additional improvements.
And there is one last thing around indexes: If you think that indexes are absolutely crucial you always have the possibility to just connect to your database and create them manually. I agree that this is not the best experience yet, but we really want to get indexes right before we ship anything half baked to our users.
PS: In case you hit any performance problems with 1.14
or later, we would really like to hear about it. Just contact me on our public Slack (@marcus
)so we can learn about your concrete problems and can come up with solutions in future releases. And i can probably give you a hint or two how your GraphQL query could be optimised ๐
@mavilein thank you for the detailed update. Can you speak to the compound primary key aspect of the discussion, please?
I.e. the ability to apply @unique to a combination of fields for a relation.
This aspect is the more limiting factor in terms of usability.
@mavilein performant queries is a good thing, but saying that indexes are not efficient โ is not correct. Now we simply can't make compound indexes (like unique(slug, lang)
) or fulltext indexes. We just can't optimize any selects beyond selections by ID (price range selects, date period selects and many other that are used in real-life applications). What is the problem? Why indexes is a problem for almost a year? It is far more easy thing than optimizing query builder and it is absolutely crucial and should be one of first features. And manual indexes just don't fit in current migrating model (that also needs to be changed, but it is a separate topic). In fact in any migrating model. It's just not an option for development cycle
@mavilein, if I create the unique constraints manually as you suggest, what are the chances that I will have problems later with e.g. data migrations? Thanks
@sorenbs
We should carefully consider if this level of flexibility is really required and if the tradeoff in performance is worth it, so thank you for bringing it up.
An alternative could be that relations by default never use a relation table, and M2M relations are not supported. If a relation table is required (either to add extra fields on the relation or to create a M2M relation), then the @edge type proposed by Kim is required.
Do you think that would be a better approach? My main concern is that this introduces extra complexity that makes it more difficult to get started with Graphcool.
Did you guys ever have a meeting(s) about this and pick a direction? These are very good questions that will dictate a ton of code you guys write, as well as how successful Prisma is.
fwiw, the "M2M everything!" approach you're currently drifting along with was the reason we jumped ship.
My main concern is that this introduces extra complexity that makes it more difficult to get started with Graphcool.
The complexity would be worth it. (I'd trade the complexity of developing/supporting 50 connectors to just supporting Postgres.)
@corysimmons
Did you guys ever have a meeting(s) about this and pick a direction? These are very good questions that will dictate a ton of code you guys write, as well as how successful Prisma is.
fwiw, the "M2M everything!" approach you're currently drifting along with was the reason we jumped ship.
I 100% agree with you. Guys, IMHO, use @edge
if you have to, but whatever you do, if you only allow M2M relations, then *every. single. * experienced Relational DB user out there is going to pass Prisma by.
Certainly I would never consider using Prisma, even in a demo project, without it.
JM2C.
Chiming in about the topic of supporting only M2M relations - once we support introspection for MySQL and unify "passive" and "active" connectors, more relation modelling approaches will be supported.
For Postgres, we already support different relation modelling approaches, here is an overview. @Trellian, did you check that out already? It would be great to hear your perspective on this.
Agree. The original topic of this issue is about index, while since prisma and db run on our own server, we can always manage indexes by ourselves (graphcool cannot do that). But the db structure & table structure created by prisma deploy
cannot be manually handled. The current structure generated by prisma is far from optimized.
For a one-to-many relation it uses 3 tables (A - AB- B) instead of 2; for a M2M relation with extra fields, 5 tables (A - A(AB) - AB - B(AB) -B) instead of 3. Also see #3060. That harms query performance badly.
@marktani Does introspection for Postgresql only work one way -- from db to prisma schema, not from prisma schema to db?
I tried to use the introspected schema (which has lots of @pgRelation's) to deploy a new db, but the deployed db structure is not the same as the original one.
That's an interesting question, I am not entirely sure! Let's create a separate discussion about this in a new issue. Can you share your initial schema and the introspected datamodel there? This will give me a better understanding of what's happening.
I have just published a spec that touches on this topic. It contains a new syntax proposal for (multi field) indexes. We would love to hear your feedback on this one.
Is anyone able to share an update on this topic? The lack of composite key/index support forced me to have to migrate over to hasura, but I'd love to have the option to come back. I really think prisma has a great deal of potential if this issue in particular is executed correctly, in a way that provides a high degree of flexibility for advanced schemas. In my opinion, this issue should be a top priority or many potential users will simply not take the library into serious consideration for use in production environments.
@willm78 In Russia, many people have already abandoned the Prisma on this issue.
@willm78 can we just add the indexes we wanted to our database directly bypassing prisma?
@beeplin : Yes you can do that.
One of the most valuable and important features for production-ready apps is in hold for a year. Have no words
@willm78 - would the proposal in this spec satisfy your requirements? https://github.com/prisma/prisma/issues/3405
@sorenbs I think the proposal looks pretty good. Sorry, I completely missed that last time I looked through here. Is an id
field still going to be required on every type or is there plans to change that? #3575
+1
Waiting for this feature to be implemented ...
Prisma2 looks promising
https://github.com/prisma/prisma2/blob/master/docs/data-modeling.md#indexes
https://isprisma2ready.com/
Guess we will have to wait few more months for some more stable release?
@melounek Are you suggesting that this feature is confirmed for rollout in Prisma 2?
I'm also waiting for this to be added for Prisma 1. We cannot upgrade to Prisma 2 yet because it doesn't support subscriptions and we use that heavily.๐
Most helpful comment
One of the most valuable and important features for production-ready apps is in hold for a year. Have no words