This Evaluating capacity section should clarify what does (and does not) go in to calculating the total capacity required. The index is stored in the capacity and its size is dependent on the input. What of the input though? Is the original input stored in Azure Search such that the capacity required is the input size + index size? The wording of the documentation will lead you to think you need as much capacity as your input documents at a minimum:
(max 300 GB documents per service)
This sounds like it's not a function of the index size, but of actual input document size.
Storage is constrained by disk space or by a hard limit on the maximum number of indexes, document, or other high-level resources, whichever comes first.
By saying documents here, this sounds like original input documents count against my Azure Search storage capacity, not just the indexes built off of them.
A large data source with high redundancy could result in a smaller index than a smaller dataset that contains highly variable content. So it's rarely possible to infer index size based on the size of the original dataset.
We can't infer for each original dataset, but there should be some insights available based on all the data we've indexed. Do indexes ever exceed input size? What is the range of ratios we see in production for diffent document types? Can we list sizes for some well known reference datasets at different index configurations? Customers should have at least some _basic_ idea of what index vs input size looks like so when they rough estimate the cost of Azure Search (before even running an evaluation index build), they don't wildly overestimate cost based on their input size and rule out the service.
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
@wpbrown Thank you for your feedback . We will investigate and update the thread further.
@wpbrown, Thanks for bringing this to our attention. Your feedback has been shared with the content owner (@HeidiSteen) for further review.
@wpbrown - sorry for the late reply:
For the pricing page, I forwarded your question. It seems like "25 GB partitions (max 300 GB per service)" is what it should say, but that needs to be confirmed.
For the service limits page, the word 'documents' should be removed.
For estimating capacity, there have been discussions in the past about publishing estimates (inferring an index size based on a data set). The verbiage you currently see in this docs is the outcome of those conversations. You can reach out to "azuresearch_contact" if you have a specific concern about provisioning at the wrong tier.
@wpbrown, the service limits page is now updated: https://docs.microsoft.com/en-us/azure/search/search-limits-quotas-capacity#document-limits
The pricing page bug is being worked on, but it can take awhile so I cannot give you an ETA. It's outside of my purview.
Since I've addressed everything I can, I'm going to mark this as closed, but if you have further feedback, feel free to reopen.
Thanks!
Most helpful comment
@wpbrown, the service limits page is now updated: https://docs.microsoft.com/en-us/azure/search/search-limits-quotas-capacity#document-limits
The pricing page bug is being worked on, but it can take awhile so I cannot give you an ETA. It's outside of my purview.
Since I've addressed everything I can, I'm going to mark this as closed, but if you have further feedback, feel free to reopen.
Thanks!
please-close