Azure-docs: TTL deletes

Created on 5 Nov 2019 · 19Comments · Source: MicrosoftDocs/azure-docs

How would this work when a large number of TTL expiries occur? Would it use all RUs up to the maximum to do the background delete?
We'd really want to limit the RU usage for the TTL expiry I think.

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: c5b2be67-25e0-a558-dae2-5fa149df2d96
Version Independent ID: e95519f3-80f7-37bf-e3fc-9ebbf5081f17
Content: Create Azure Cosmos containers and databases in autopilot mode.
Content Source: articles/cosmos-db/provision-throughput-autopilot.md
Service: cosmos-db
GitHub Login: @kirillg
Microsoft Alias: kirillg

Pri1 assigned-to-author cosmos-dsvc product-question triaged

Source

simonvane

Most helpful comment

Hi all - clarifying that deletes for TTL are free in autopilot, and will not cause a scale-up of RU/s. Here's an example:

You have an autopilot container with 400 – 4000 RU/s.
Hour 1: T=0: container has no usage (no TTL or workload requests). The billable RU/s is 400.
Hour 1: T=1: TTL is enabled.
Hour 1: T=2: container starts getting requests, that consume 1000 RU in 1 second. There are also 200 RUs worth of TTL that need to happen. The billable RU/s is still 1000 RU/s. Regardless of when the TTLs occur, they will not affect the autopilot scaling logic.

deborahc on 25 Apr 2020

👍2

All 19 comments

@simonvane Thank you for reaching out.

Deleting items based on TTL is free.
There is no additional cost (that is, no additional RUs are consumed) when item is deleted as a result of TTL expiration.

KalyanChanumolu-MSFT on 5 Nov 2019

As I understand it the deletes will use any unused RUs to do the deletes. What I'm not clear about is what "unused RUs" would mean in terms of autopilot where the throughput is set dynamically up to a maximum value.

Are you saying that deletes are carried out using unused RUs from what the database has been dynamically scaled to?

So for example, if we set the maximum RU/s to 10,000 and the throughput was dynamically scaled to 5,000 RU/s when a large number of TTLs expired it would not use any more RUs than we required by other operations?

I just want to make sure that if we use autopilot we are definitely not going to incur cost for the TTL expiry. I know that is what you have already said but I just want to be clear we are talking about exactly the same thing.

Thanks @KalyanChanumolu-MSFT

simonvane on 5 Nov 2019

@simonvane Yes you are right.
The delete operation using TTL consumes left-over Request Units that haven't been consumed by user requests.

Detailed explanation is here if you want to read more about the process.

KalyanChanumolu-MSFT on 5 Nov 2019

@simonvane We will proceed to close this issue now.
If there are further questions regarding this matter, please comment and we will gladly continue the discussion.

KalyanChanumolu-MSFT on 6 Nov 2019

Thank you @KalyanChanumolu-MSFT. I have read that document in detail and even commented on it to make it less ambiguous. The term "left-over" becomes ambiguous when talking about the new autopilot throughput provisioning. That document doesn't mention anything about autopilot.

It is not clear whether "left-over" is left over from the maximum RU/s set or from the minimum RU/s the (10% of tha maximum) or left over from what Cosmos has been dynamically scaled to then not used or some other thing.

We absolutely must have clarity on this to understand how it will affect our use of it / costs.

simonvane on 6 Nov 2019

@kirillg @SnehaGunda Could you please provide more insights here

KalyanChanumolu-MSFT on 6 Nov 2019

@simonvane I think leftover refers to remaining RUs out of the provisioned RUs, in autopilot case it would be : (leftover RUs = max RUs set – consumed RUs). I am not completely sure though I sent an email to our SMEs for confirmation.

SnehaGunda on 6 Nov 2019

@simonvane I reached out to our dev team to find more details. Here is the information I have:

The leftover RUs is referring to the unused replica budget. The concept of replicas & replica budget is very internal to the Cosmos DB implementation and not something end-users need to worry about. Data is deleted once there are enough RUs available to perform the delete operation. Though the data deletion is delayed, data is not returned by any queries (by any API) after the TTL has expired. So you would always consistent results.

I don't see any doc updates specific to this issue. We will close it if there aren't further questions.

please-close

SnehaGunda on 15 Nov 2019

@SnehaGunda - Please could you re-open this. This still doesn't answer my question with respect to the new autopilot feature.

When using autopilot, do TTL deletes use up spare RUs up to the maximum RUs specified?
If the system is not under load, will TTL deletes in any way, increase the RUs used?

The document is still ambiguous by not mentioning autopilot and whether that changes how this works.

simonvane on 15 Nov 2019

@simonvane I don't have permissions to reopen the issue. Either you or @KalyanChanumolu-MSFT @Mike-Ubezzi-MSFT can reopen it.

@simonvane I don't know the answer to your question. I have emailed our product group. Will let you know when I hear back. Alternatively, you can drop an email to [email protected] alias and a broader group can take a look.

SnehaGunda on 22 Nov 2019

@simonvane did you ever get a satisfactory answer to your question? I think I understand what you mean.

For example, if we are currently auto-scaled up to 5000RUs due to reads and writes from normal usage (non-ttl deleting) and there is a moment where there are unused RUs, lets say 300RUs, of which get consumed by newly activated batch of TTL operations. Would the down-scaling logic bring down the 5000RU/s back down to its original allocation even if there are TTL documents that need to be deleted? or would the auto-scale remain pegged at 5000RU/s because it is acknowledging that there are many TTLs deletions that need processing, and thus is blocking scaling down?

The difference between the two possible eventualities is :

1) TLLs don't interfere with the autoscaling logic at all. In which case TTL deletions are eventual and will never cost me anything.

2) TTLs do interfere with the autoscaling logic, and sometimes my cosmos db may be auto-scaled longer than necessary, in order to process the TTLs, in which case it is costing me more to process TTLs in a timely manner.

olitomlinson on 16 Apr 2020

Hi @olitomlinson ,

No, I'm afraid I didn't (see https://docs.microsoft.com/en-us/azure/cosmos-db/time-to-live). I gave up. @SnehaGunda asked @KalyanChanumolu-MSFT or @Mike-Ubezzi-MSFT to reopen the issue but it is still closed.

I think they are saying that it is your case 1 ie we will never be charged for TTL deletes even with autopilot but I've not seen a clear explanation of what is going on.

I understand a little bit more about autopilot now than I did when I asked this question so my guess as to what happens is as follows:

If you're using less than the minimum RU/s then TTL will use any unused RUs from the minimum.
If autopilot has scaled up the RU/s it does so in hourly "chunks" ie if it scales up you are charged at that rate for the hour. I assume that any unused RUs from the hour of scale-up would be used for TTL deletes.

I don't think autopilot will scale-up to do TTL deletes.

I hope I'm right :-)

simonvane on 17 Apr 2020

@simonvane We would be happy to reopen the issue for you.
Since Sneha provided email alias of the product group, we were hoping you had gotten in touch directly.

KalyanChanumolu-MSFT on 17 Apr 2020

👍1

Great discussion, I guess what @simonvane said is true. However, I would let @deborahc confirm as she is our SME on the autopilot feature.

I will clarify the docs when I have more details.

SnehaGunda on 21 Apr 2020

Hi all - clarifying that deletes for TTL are free in autopilot, and will not cause a scale-up of RU/s. Here's an example:

You have an autopilot container with 400 – 4000 RU/s.
Hour 1: T=0: container has no usage (no TTL or workload requests). The billable RU/s is 400.
Hour 1: T=1: TTL is enabled.
Hour 1: T=2: container starts getting requests, that consume 1000 RU in 1 second. There are also 200 RUs worth of TTL that need to happen. The billable RU/s is still 1000 RU/s. Regardless of when the TTLs occur, they will not affect the autopilot scaling logic.

deborahc on 25 Apr 2020

👍2

@deborahc Thanks very much for clarifying/confirming.

Please could the documentation be updated with this clarification?

simonvane on 2 May 2020

I took a closer look at issue don't think an update in our docs is necessary in this case.

The behavior described above for autoscale is the same as it is for manually provisioned throughput which is documented. This from the second paragraph in our article, Time to Live (TTL) in Azure Cosmos DB

Deletion of expired items is a background task that consumes left-over Request Units, that is Request Units that haven't been consumed by user requests. Even after the TTL has expired, if the container is overloaded with requests and if there aren't enough RU's available, the data deletion is delayed. Data is deleted once there are enough RUs available to perform the delete operation. Though the data deletion is delayed, data is not returned by any queries (by any API) after the TTL has expired.

Thanks for raising this issue. Will close this.

please-close

markjbrown on 4 May 2020

@markjbrown I respectfully disagree.

I raised this issue in November and have only recently received confirmation. If you look at the history of this issue it shows that various people from Microsoft were unable to answer the question conclusively and that other developers had the same question.

Unless the documentation explicitly mentions autopilot then, because it works differently from static provisioning, there will always be ambiguity. The term "left-over" is open to interpretation in the context of autopilot.

simonvane on 5 May 2020

Some detail has been added in the Autoscale FAQs here - https://docs.microsoft.com/en-us/azure/cosmos-db/autoscale-faq#how-does-ttl-work-with-autoscale but it would still be good to have this clarity in https://docs.microsoft.com/en-us/azure/cosmos-db/time-to-live.

simonvane on 20 May 2020

Was this page helpful?

0 / 5 - 0 ratings