How would this work when a large number of TTL expiries occur? Would it use all RUs up to the maximum to do the background delete?
We'd really want to limit the RU usage for the TTL expiry I think.
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
@simonvane Thank you for reaching out.
Deleting items based on TTL is free.
There is no additional cost (that is, no additional RUs are consumed) when item is deleted as a result of TTL expiration.
As I understand it the deletes will use any unused RUs to do the deletes. What I'm not clear about is what "unused RUs" would mean in terms of autopilot where the throughput is set dynamically up to a maximum value.
Are you saying that deletes are carried out using unused RUs from what the database has been dynamically scaled to?
So for example, if we set the maximum RU/s to 10,000 and the throughput was dynamically scaled to 5,000 RU/s when a large number of TTLs expired it would not use any more RUs than we required by other operations?
I just want to make sure that if we use autopilot we are definitely not going to incur cost for the TTL expiry. I know that is what you have already said but I just want to be clear we are talking about exactly the same thing.
Thanks @KalyanChanumolu-MSFT
@simonvane Yes you are right.
The delete operation using TTL consumes left-over Request Units that haven't been consumed by user requests.
Detailed explanation is here if you want to read more about the process.
@simonvane We will proceed to close this issue now.
If there are further questions regarding this matter, please comment and we will gladly continue the discussion.
Thank you @KalyanChanumolu-MSFT. I have read that document in detail and even commented on it to make it less ambiguous. The term "left-over" becomes ambiguous when talking about the new autopilot throughput provisioning. That document doesn't mention anything about autopilot.
It is not clear whether "left-over" is left over from the maximum RU/s set or from the minimum RU/s the (10% of tha maximum) or left over from what Cosmos has been dynamically scaled to then not used or some other thing.
We absolutely must have clarity on this to understand how it will affect our use of it / costs.
@kirillg @SnehaGunda Could you please provide more insights here
@simonvane I think leftover refers to remaining RUs out of the provisioned RUs, in autopilot case it would be : (leftover RUs = max RUs set – consumed RUs). I am not completely sure though I sent an email to our SMEs for confirmation.
@simonvane I reached out to our dev team to find more details. Here is the information I have:
The leftover RUs is referring to the unused replica budget. The concept of replicas & replica budget is very internal to the Cosmos DB implementation and not something end-users need to worry about. Data is deleted once there are enough RUs available to perform the delete operation. Though the data deletion is delayed, data is not returned by any queries (by any API) after the TTL has expired. So you would always consistent results.
I don't see any doc updates specific to this issue. We will close it if there aren't further questions.
@SnehaGunda - Please could you re-open this. This still doesn't answer my question with respect to the new autopilot feature.
When using autopilot, do TTL deletes use up spare RUs up to the maximum RUs specified?
If the system is not under load, will TTL deletes in any way, increase the RUs used?
The document is still ambiguous by not mentioning autopilot and whether that changes how this works.
@simonvane I don't have permissions to reopen the issue. Either you or @KalyanChanumolu-MSFT @Mike-Ubezzi-MSFT can reopen it.
@simonvane I don't know the answer to your question. I have emailed our product group. Will let you know when I hear back. Alternatively, you can drop an email to [email protected] alias and a broader group can take a look.
@simonvane did you ever get a satisfactory answer to your question? I think I understand what you mean.
For example, if we are currently auto-scaled up to 5000RUs due to reads and writes from normal usage (non-ttl deleting) and there is a moment where there are unused RUs, lets say 300RUs, of which get consumed by newly activated batch of TTL operations. Would the down-scaling logic bring down the 5000RU/s back down to its original allocation even if there are TTL documents that need to be deleted? or would the auto-scale remain pegged at 5000RU/s because it is acknowledging that there are many TTLs deletions that need processing, and thus is blocking scaling down?
The difference between the two possible eventualities is :
1) TLLs don't interfere with the autoscaling logic at all. In which case TTL deletions are eventual and will never cost me anything.
2) TTLs do interfere with the autoscaling logic, and sometimes my cosmos db may be auto-scaled longer than necessary, in order to process the TTLs, in which case it is costing me more to process TTLs in a timely manner.
Hi @olitomlinson ,
No, I'm afraid I didn't (see https://docs.microsoft.com/en-us/azure/cosmos-db/time-to-live). I gave up. @SnehaGunda asked @KalyanChanumolu-MSFT or @Mike-Ubezzi-MSFT to reopen the issue but it is still closed.
I think they are saying that it is your case 1 ie we will never be charged for TTL deletes even with autopilot but I've not seen a clear explanation of what is going on.
I understand a little bit more about autopilot now than I did when I asked this question so my guess as to what happens is as follows:
I don't think autopilot will scale-up to do TTL deletes.
I hope I'm right :-)
@simonvane We would be happy to reopen the issue for you.
Since Sneha provided email alias of the product group, we were hoping you had gotten in touch directly.
Great discussion, I guess what @simonvane said is true. However, I would let @deborahc confirm as she is our SME on the autopilot feature.
I will clarify the docs when I have more details.
Hi all - clarifying that deletes for TTL are free in autopilot, and will not cause a scale-up of RU/s. Here's an example:
@deborahc Thanks very much for clarifying/confirming.
Please could the documentation be updated with this clarification?
I took a closer look at issue don't think an update in our docs is necessary in this case.
The behavior described above for autoscale is the same as it is for manually provisioned throughput which is documented. This from the second paragraph in our article, Time to Live (TTL) in Azure Cosmos DB
Deletion of expired items is a background task that consumes left-over Request Units, that is Request Units that haven't been consumed by user requests. Even after the TTL has expired, if the container is overloaded with requests and if there aren't enough RU's available, the data deletion is delayed. Data is deleted once there are enough RUs available to perform the delete operation. Though the data deletion is delayed, data is not returned by any queries (by any API) after the TTL has expired.
Thanks for raising this issue. Will close this.
@markjbrown I respectfully disagree.
I raised this issue in November and have only recently received confirmation. If you look at the history of this issue it shows that various people from Microsoft were unable to answer the question conclusively and that other developers had the same question.
Unless the documentation explicitly mentions autopilot then, because it works differently from static provisioning, there will always be ambiguity. The term "left-over" is open to interpretation in the context of autopilot.
Some detail has been added in the Autoscale FAQs here - https://docs.microsoft.com/en-us/azure/cosmos-db/autoscale-faq#how-does-ttl-work-with-autoscale but it would still be good to have this clarity in https://docs.microsoft.com/en-us/azure/cosmos-db/time-to-live.
Most helpful comment
Hi all - clarifying that deletes for TTL are free in autopilot, and will not cause a scale-up of RU/s. Here's an example: