Azure-sdk-for-js: [STORAGE] Ability to filter blobs based on metadata attributes

Created on 30 Jan 2020  路  8Comments  路  Source: Azure/azure-sdk-for-js

Is your feature request related to a problem? Please describe.
When using BlobStorage, I'd like to retrieve blobs based on the value of a metadata attribute that I added when creating the blob file.

Describe the solution you'd like
Ideally, the solution should be a method that returns the blobNames of the blobs that match my criteria.

For example: if I need all the blobs in a container that have the metadata attribute address !== null, it should return a list of the blobNames that match that criteria.

Describe alternatives you've considered
The current alternative is to loop through every single blob file. By using:

 let i = 1;
 const iter = containerClient.listBlobsFlat({includeMetadata: true});
 let blobItem = await iter.next();
 let myBlobs = [];
 while (!blobItem.done) {
     if(blobItem.value.metadata.address) {
         myBlobs.push(blobItem.value.name);
     }
     blobItem = await iter.next();
 }

Additional context
In containers that have loads of blob files (i.e. > 40MI), the application runs very very slow.

Client Storage customer-reported needs-author-feedback no-recent-activity question

Most helpful comment

Given that metadata entries are sent over http headers, and http headers are in practice often limited (8kb total in Node, Nginx, Apache), it seems unlikely that Blob Storage would support millions of metadata entries for a container.

@XiaoningLiu is right though - hopefully the need for additional metadata in the container will be removed soon 馃槈

All 8 comments

@marcosmcb I don't think the service currently supports filtering by metadata (https://docs.microsoft.com/en-us/rest/api/storageservices/list-blobs). So if we were to add a method into the client library, it would be the same as you described in the alternatives section above.

@jeremymeng thanks for your answer, I found a way to speed up my query by using the metadata object of my container as a look up table. I'm wondering if there's any limitation on the size of the metadata object, for example, could it support 2MI entries?

Could you please provide more details on how you do the indexing on the containers?

So, my problem is - I need to delete blobs that are X days old from my container and don't have a certain field in their metadata.

I add the value for that field, in my metadata, after the user makes a purchase, then once I have the confirmation I need to add the order NO to the blob metadata.

So, for blobs that are X days old and don't have the order NO field in their metadata, I can delete them.

My idea is:
To add the blob name to my container's metadata, where my blob name is the key and the date of creation is the value.

Once a customer makes their purchase and I have an order no, I can, therefore, remove the blob name from the container's metadata and add the order no to my blob's metadata.

If I have blobNames in my container's metadata that are X days old and have no Order no in their metadata, they're eligible for deletion.

In short, I'd use my container's metadata as a lookup table for "temporary" blobs and thus, if a blob is X days old and have not got their OrderNO, they're eligible for deletion.

That would help me not to have to look at every single blob in my container and make the deletion of bad blob files much quicker.

The only drawback is - my insertion would be slower because I will need to reset my container's metadata every time I create a blob file. Also, the update of the blob file would be slower as I'd have to reset the container's metadata as well.

My application only has one container, and I could have 40 Million blob files in it, there'd also be around 2 million blob files added to the container every month.

Hi @marcosmcb Azure Storage is working on a feature to boost blobs query. It should help for your case. Stay tuned : )

Given that metadata entries are sent over http headers, and http headers are in practice often limited (8kb total in Node, Nginx, Apache), it seems unlikely that Blob Storage would support millions of metadata entries for a container.

@XiaoningLiu is right though - hopefully the need for additional metadata in the container will be removed soon 馃槈

Now that Blob Tags is available since 12.2.0, could you utilize it to support this scenario?

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

Was this page helpful?
0 / 5 - 0 ratings