Efcore: Cosmos: Optimize `Find` by querying by 'id'

Created on 21 Aug 2019  路  14Comments  路  Source: dotnet/efcore

Use ReadItem
Include the partition key if present

area-cosmos area-perf area-query closed-fixed type-enhancement

Most helpful comment

It's up to the user to provide the correct value generator, we'd just use whatever is configured on id. And if it has no value generator, then we'll fall back to the slow query

Ahh yes, using PropertyBuilder.HasValueGenerator Method.

All 14 comments

Depends on https://github.com/aspnet/EntityFrameworkCore/issues/7391 to make the general case work, but before that is done that we can at least handle the case when the PK is id and the partition key if present.

@1iveowl The other option I mentioned in the comment above is to just handle the simple case where the PK consists of id and the partition key, so the user would be calling Find(partitionKey, id) or Find(id, partitionKey) depending on how they鈥檝e configured it.

The next step up from this would be handling PKs that include the partition key, but don鈥檛 include the id directly. In those cases we usually can calculated the id value using the value generator configured on it, which by default is IdValueGenerator

The above can be accomplished by detecting the query pattern generated by EntityFinder and instead of translating the query based on SelectExpression you'd need to create something like ReadItemExpression. Thinking about this it's actually a much more difficult change than I originally assumed.

@smitpatel for thoughts or advice

@1iveowl If the above doesn't fit in the time you can dedicate for this creating the extension you've proposed - DbSet<>. FindWithPartitionKeyAsync(partitionKey, primaryKey) would also be an acceptable temporary solution. For the implementation you can create a class derived from EntityFinder that would handle this.

ShapedQueryExpression does allow having an expression different than SelectExpression. It may require some conditional code in ShapedQueryCompilingExpressionVisitor. Or we can factor it differently to call shaper specific part outside of query pipeline. It would be a difficult change. Shaper compiler on cosmos is more complicated than core.

@AndriySvyryd, @smitpatel Let's lock-in on the API first.

A few questions

...so the user would be calling Find(partitionKey, id) or Find(id, partitionKey) depending on how they鈥檝e configured it

Where and how is the configuration of the DbSet determining the order of the two values specified/implied?

... this would be handling PKs that include the partition key, but don鈥檛 include the id directly. In those cases we usually can calculated the id value using the value generator configured on it, which by default is [IdValueGenerator] ...

How will DbSet<>Find(???) look for this? Would it be like this: DbSet<>.Find(myPartitionKey);? If so, how would that work if there're multiple documents with the same entity type in Cosmos and with the same Partition Key? Should Find return a list of those documents in that case?

For Find to be an effective and generic way to read one specific document - fast and with ReadItem and from a specific partition - I think that it is a prerequisite that the developer provide both the Primary Key (or the cosmos id) and the Partition Key, and in a structured way where it's clear which is what.

Suggestion:

Could the API be something like this: DbSet<>.Find("myPartitionKey:myId")?

In which case, we assume a string and introduce a Cosmos specific syntax for Find using : (or some other character or string) as the delimiter between the partition key and the id.

If no delimiter (:) is present (i.e. DbSet<>.Find("<some id>")) we assume that this <some id> is the myPartitionKey and consequently return a list in the case where there're more than one document with the entity type, in the partition.

The reason we don't assume myId in the single value scenario, is that this wouldn't work if the access key supplied is a resource token, rather than a master key. Alternatively we'd need to also check what type of access key is supplied and thrown a meaningful exception based on that, to guide the developer.

Where and how is the configuration of the DbSet determining the order of the two values specified/implied?

@1iveowl The PK is configured in this way:

```C#
modelBuilder.Entity().HasKey(c => new { c.State, c.LicensePlate, c.PartitionKey });

> How will DbSet<>Find(???) look for this? Would it be like this: DbSet<>.Find(myPartitionKey);? If so, how would that work if there're multiple documents with the same entity type in Cosmos and with the same Partition Key? Should Find return a list of those documents in that case?

User always needs to specify all PK values in `Find`. But the PK doesn't need to be the Cosmos `id`. Like in the configuration above the user would call

```C#
    context.Set<Car>().Find("WA", "999ZZZ", "1234");

Currently this would translate into something like

SELECT * from c
WHERE c.State = "WA" AND c.Plate = "999ZZZ" AND c.PartitionKey = "1234"

But in most cases we know how to generate the id value, which in this case would be "WA|999ZZZ".
So ideally we would want to call container.ReadItemAsync<>("WA|999ZZZ", "1234") instead

For Find to be an effective and generic way to read one specific document - fast and with ReadItem and from a specific partition - I think that it is a prerequisite that the developer provide both the Primary Key (or the cosmos id) and the Partition Key, and in a structured way where it's clear which is what.

Yes, so the simple case is where the entity type is configured like this:
```C#
modelBuilder.Entity().HasKey(c => new { c.id, c.PartitionKey });

So the `id` and partition key values are specified explicitly:

```C#
    context.Set<Car>().Find("WA|999ZZZ", "1234");

Could the API be something like this: DbSet<>.Find("myPartitionKey:myId")?

This doesn't follow current API conventions and would be confusing for existing EF users.

Thank you @AndriySvyryd. Good comments.

One more thing regarding:

modelBuilder.Entity<Car>().HasKey(c => new { c.State, c.LicensePlate, c.PartitionKey });

This helps us build the query, but how do we identify the partition key explicitly? The example tricks us, because it's so obvious that the third value is the partition key. However it might as well have looked like this:. modelBuilder.Entity<Car>().HasKey(c => new { c.ValueA, c.ValueB, c.ValueC }); and now we don't know which one is the partition key any longer.

I guess for the generic Find to work, we would need to require that the developer also specifies: modelBuilder.Entity<Person>().HasPartitionKey(o => o.ValueC), because then we would know for sure that ValueC is the partition key.

I guess for the generic Find to work, we would need to require that the developer also specifies: modelBuilder.Entity().HasPartitionKey(o => o.ValueC), because then we would know for sure that ValueC is the partition key.

Yes

Regarding:

    context.Set<Car>().Find("WA", "999ZZZ", "1234");

Currently this would translate into something like

SELECT * from c
WHERE c.State = "WA" AND c.Plate = "999ZZZ" AND c.PartitionKey = "1234"

But in most cases we know how to generate the id value, which in this case would be "WA|999ZZZ".
So ideally we would want to call container.ReadItemAsync<>("WA|999ZZZ", "1234") instead

In this example, the assumption is made, that the id equals WA|999ZZZ, because the EF.Core/Cosmos IdValueProvider has been used to generate the id.

However, in a scenario were the Cosmos documents were not created by EF.Core/Cosmosthe id might look like: Car-WA-999ZZZ or something entirely different. This raises a need for an additional feature for overriding the IdValueGenerator to adapt to the id generation scheme used.

However, in a scenario were the Cosmos documents were not created by EF.Core/Cosmosthe id might look like: Car-WA-999ZZZ or something entirely different. This raises a need for an additional feature for overriding the IdValueGenerator to adapt to the id generation scheme used.

It's up to the user to provide the correct value generator, we'd just use whatever is configured on id. And if it has no value generator, then we'll fall back to the slow query

It's up to the user to provide the correct value generator, we'd just use whatever is configured on id. And if it has no value generator, then we'll fall back to the slow query

Ahh yes, using PropertyBuilder.HasValueGenerator Method.

@bricelam Should this be in rc1? /cc @AndriySvyryd

It shipped in preview 4

(I was cleaning up some closed issues on the Backlog)

Was this page helpful?
0 / 5 - 0 ratings