Azure-webjobs-sdk: Allow a "newer than" timestamp to be specified for blob trigger

Created on 6 Sep 2017  路  11Comments  路  Source: Azure/azure-webjobs-sdk

Currently our blob scan algorithm will process ALL blobs in the target container that don't have blob receipts. We should investigate whether we can allow the start date for the scan to be specified.

Scenario: assume all blobs in a container have been processed by a blob trigger function in a particular app (WebJob host). Now, if that function is moved to a different app (different host/host ID) all the blobs will be reprocessed, because there are no receipts for those blobs for that host ID.

improvement

Most helpful comment

One possibility that could help here is using Event Grid's support for routing storage events to azure functions. This approach does not involve any blob scanning which is the cause of the main issue here.

https://docs.microsoft.com/en-us/azure/event-grid/resize-images-on-storage-blob-upload-event

All 11 comments

I agree we should expose something to control this. Since we already maintain the blob scan pointer for tracking our last processed blob, we'd need to make sure that the behavior makes sense when these two interact.

For sake of argument -- let's call this new property newerThan.

For example:

  • newerThan = Jan 1, 2017. Last blob scan was Dec 1, 2016 -- We'd skip over all of December when we start processing.
  • newerThan = Jan 1, 2017. Last blob scan was Feb 1, 2017 -- We wouldn't want to re-process all of January, would we?

In other words -- we'd start our scan from whichever was newest between newerThan and the stored blob scan pointer.

As a side note -- I think writing out informational logs (like we do for Timer) would be very helpful here. Something like Found blob scan pointer of {date} and NewerThan value of {date}. Starting scan at {date} because it is the most recent. To change this, .... It'd only write out once at Listener start and could go a long way towards explaining the logic without needing to look up docs.

This would be very helpful in a few scenarios i came across. My current case - scanning over SQL audit blobs generated by Azure's SQL Blob Auditing feature.
We have a pretty high retention rate for those but only need to process the logs going forward, Which sounds perfect for an Azure Function with a Blob Trigger - Until you realize you have to let it run in a NOOP style over all of them, for each host, before its usable.

This would really help similar scenarios.

One possibility that could help here is using Event Grid's support for routing storage events to azure functions. This approach does not involve any blob scanning which is the cause of the main issue here.

https://docs.microsoft.com/en-us/azure/event-grid/resize-images-on-storage-blob-upload-event

Resurfacing this as this is something I would love to be able to do. Any idea on if/when this might be looked at?

Thx!

No idea at this time (that's what the "unknown" milestone means).

Another year has passed - any update?

Any update? This is kind of annoying as I have to sit there waiting for 10 minutes for the trigger to reprocess each blob. I'm not sure why but the receipts get reset sometimes which means it will reprocess everything

Hello, I'm the same.
it fires 3 times.
surely they are the events I have created for testing.
But how do I eliminate them all so I create a new one to run only that one?

In the cosole there are 3 events that are triggered at the same time:
2020-09-18T15:28:13.541 [Information] Executing 'D2KEventGrid' (Reason='EventGrid trigger fired at 2020-09-18T15:28:13.2184099+00:00', Id=595fc416-0280-43e1-8dc5-f285640e986c)
2020-09-18T15:28:13.569 [Information] Executing 'D2KEventGrid' (Reason='EventGrid trigger fired at 2020-09-18T15:28:13.0399176+00:00', Id=f1796f9a-f06c-47e8-8e67-be34950629a3)
2020-09-18T15:28:13.570 [Information] Executing 'D2KEventGrid' (Reason='EventGrid trigger fired at 2020-09-18T15:28:12.7600042+00:00', Id=c34bb611-2c99-4c1d-ba4b-5675cc87236c)

@pablosguajardo I think you're talking about something different to what is being discussed here, because it looks you are using eventgrid, while this issue is discussing the behavior of the built-in blob trigger..

Quick solution for people who are trying to properties.Created
BlobProperties

I was surprised that no one has provided a solution to this problem in 3 years 馃槃.
Hope my example will be useful to someone, until this functionality is added

public static async Task Run([BlobTrigger("attachments/{name}")] Stream file, string name, BlobProperties properties, ILogger log, ExecutionContext context) {
    DateTime processFrom = DateTime.Parse("22/09/2020 16:30");
    if (properties.Created < processFrom) {
        log.LogInformation($"Skipping {name}...");
        return;
    } else {
        // process file
    }
}

Have a nice day!

What about this additional parameter 'Start time'? Does this have anything to do with this? I can't find documentation about this parameter.
image

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lopezbertoni picture lopezbertoni  路  4Comments

giotab picture giotab  路  5Comments

Rafaelki picture Rafaelki  路  3Comments

christopheranderson picture christopheranderson  路  3Comments

mathewc picture mathewc  路  5Comments