Azure-webjobs-sdk: Blob Triggers and Support for Data Lake Storage Gen 2

Created on 16 Dec 2019  路  13Comments  路  Source: Azure/azure-webjobs-sdk

Data Lake Storage Gen 2 has enabled support for multiple protocol access which should mean that blob triggers can work fully.

Can we confirm that customers would be supported for these scenarios?

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-multi-protocol-access

Repro steps

Provide the steps required to reproduce the problem

  1. Create a Data Lake Storage account. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-quickstart-create-account

  2. Create a Function App that triggers from here.

Expected behavior

The blobs should be processed.

Actual behavior

The blobs are processed

Known workarounds

No workaround exists.

Related information

Provide any related information

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction#next-steps
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-multi-protocol-access

P2 Supportability

Most helpful comment

I've been doing a lot of googling to see if ADLS Gen2 can be used as a BlobTrigger to run Azure Functions. I couldn't find any clear explanation of this.

According to this doc. page, ADLS Gen2 has support for reactive events, but yet doesn't seem to be triggering my BlobTrigger Azure Functions.

Can we please have an official statement on this? Is this supported or not? If not, any estimated timeline for it to be supported?

All 13 comments

@mattchenderson - do you know of any updates on this?

I've been doing a lot of googling to see if ADLS Gen2 can be used as a BlobTrigger to run Azure Functions. I couldn't find any clear explanation of this.

According to this doc. page, ADLS Gen2 has support for reactive events, but yet doesn't seem to be triggering my BlobTrigger Azure Functions.

Can we please have an official statement on this? Is this supported or not? If not, any estimated timeline for it to be supported?

Also interested to know if ADLS Gen2 is supported as a trigger or if it works using BlobTrigger.

@thinkgradient This is what I ended up doing:

I've setup a new Event on my ADLS Gen2 account for the BlobCreated event and sending the messages to an Azure Storage Queue.
Then I've created an Azure Storage Queue triggered function and that's it!
Works great.

Using EventGrid?

Yes, I've just set it up through the Events section in the ADSL Gen2 account in Azure Portal.

Ah I see - of course. That makes sense. Thanks eparizzi :)

I have validated that Azure Functions now works with Azure StorageV2 with Data Lake Storage Gen2 Hierarchical namespace enabled without any special configuration needed. Similar to accessing non-hierarchal namespace enabled blobs with "/"s in their name to model directories, setting up a trigger on a specific folder works when hierarchal namespaces is enabled. (e.g. below). I will resolve this issue and create an issue to track updating our documentation to mention this fact.

[FunctionName("Gen2BlobTrigger")]
public static void Run([BlobTrigger("datalakecontainer1/dir2/subdir1/{name}", Connection = "Gen2BlobTrigger_STORAGE")]Stream blobStream, string name, ILogger log) {}

Link to Azure docs issue tracking update to our docs to mention support of hierarchal namespaces in Storage V2 accounts:
https://github.com/MicrosoftDocs/azure-docs/issues/55824

Resolving the issue as no changes to the codebase are needed.

@sidkri,
Apologies for revisiting a closed issue, but I cannot find info anywhere on the following:

  1. Is there a way to scope the Function permissions down (from the DataLake container-level) to a namespace-level?

    • Adding the DataLake connection string as a Function trigger App Setting seems to spike the Function CPU whenever a file is added _to the DataLake_ even though the file is skipped for processing.

    • If I'm not mistaken, Function charges are based not only on executions, but also vCPU time, no?

    • Does this mean that DataLake namespace-triggered Functions will accrue higher charges than dedicated container-triggered Functions?

      Example:

      image

  1. This Azure Feedback item still shows that Gen2 DataLake Trigger is unsupported

    • Would be good to update and get the word out

Any updates on this. if yes, Azure blob trigger function works with data lake gen2. Can any one share code samples? via link or drop a snippet in comment section? That really helps a lot.
What i am trying to do?
I have a data lake container. inside HDFS name spaces ex: "container/year/month/day/bunch of files". files will upload on daily bases and folder structure is dynamic based on current date . i need my azure function to trigger when files are uploaded in day directory. and those files will process and dump data to sql server db[c# code]. Only i have problem is triggering my function over dynamic directory[name space level]. please help me or suggest me on how to approach.

Thanks a million,
Shamshuddeen

@shamshuddeeen ,

  1. Set set the functions.json path value to container/year/month/day/{name}.
  2. Set the local.settings.json triggerStorageAccount to either:

    • The ADL Gen2 Connection String OR

    • A KeyVault reference to the Connection String

    • Unfortunately, this gives the Function full permissions on the ADL Storage Account.

    • No Access Control List nor Role is available to scope the Function's permissions down to the single directory (/bunch of files in your case`)

Note: If I'm not mistaken, any file uploaded below /bunch of files will trigger the Function.

@ericthomas1, sorry for the late response, I missed the email notification of your initial message.

Regarding #1, for consumption plan type functions, resource consumption is calculated based on how long a function invocation lasts only and not for activity like checking trigger sources. For dedicated and premium plans, you pay for how long the workers/nodes are running. Documented here. Scanning of trigger sources is done on the workers/nodes as well as by other services that control scaling of the function app. This is why you can observe trigger source scanning activity when monitoring the worker/node in a dedicated/premium plan.

We monitor activity on a storage container by both polling for contents to detect new/updated items as well as scanning the storage analytics logs. I don't believe there is support for doing this at a "namespace" level.

For #2, thanks for surfacing the feedback item! I'll follow up to get that answered.

Was this page helpful?
0 / 5 - 0 ratings