How can we write an output table generated by a Databricks notebook to some sink (e.g. ADWH) using DataFactory V2.0? Can this be done using a copy activity in ADF or does this need to be done from within the notebook? How can the output dataset of the Databricks notebook be defined/accessed in ADF?
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
@JuleKuhn Thank you for your feedback. Could you link the URL of the documentation you were following? That way, we can pass your feedback to the appropriate content author.
I am following this article: https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-databricks-notebook
Von: Alberto Vega notifications@github.com
Gesendet: Donnerstag, 29. November 2018 19:07
An: MicrosoftDocs/azure-docs azure-docs@noreply.github.com
Cc: Kuhn Julia j.kuhn@yello.de; Mention mention@noreply.github.com
Betreff: Re: [MicrosoftDocs/azure-docs] Write Databricks Output to sink using DataFactory V2.0 (#19698)
@JuleKuhnhttps://github.com/JuleKuhn Thank you for your feedback. Could you link the URL of the documentation you were following? That way, we can pass your feedback to the appropriate content author.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/MicrosoftDocs/azure-docs/issues/19698#issuecomment-442934685, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ArWSga6d4AtXpW41iUN8K88QvylqFQ5Bks5u0CJbgaJpZM4Y5WZn.
@JuleKuhn the actual writing of the data needs to be done within the notebook. You could use the SQL DW connector directly:
https://docs.azuredatabricks.net/spark/latest/data-sources/azure/sql-data-warehouse.html
You can also write to Blob as an intermediate for a later ADF copy activity to SQL DW
https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html:
@JuleKuhn You can now send output from notebook back to ADF. This feature has been enabled.
Use dbutils.notebook.exit("returnValue"), you will see'runOutput' property in the response body.
@nabhishek any chance of including this feature in the doc?
Hi,
How do I pass that runOutput parameter to another Databricks notebook?
Thanks
Hello @sbjeletich . The way I use the output, is a setVariable activity between the first and second databricks notebook activity. I do this just to help with debugging. The expression I use is
@{activity('Notebook1').output.runOutput}
The {curly braces} are to convert to string, since my variable is of type string. I can then use the variable (and convert type) in the parameters section of the next databricks activity.
Perfect. Thanks!
Sharon Bjeletich
Senior Data Architect
—
410 N Michigan Ave N650, Chicago, IL 60611
O: +1.973.630.9326 | C: +1.212.203.6504
[cid:[email protected]]http://www.capaxglobal.com/
From: MartinJaffer-MSFT notifications@github.com
Sent: Monday, August 5, 2019 8:54 PM
To: MicrosoftDocs/azure-docs azure-docs@noreply.github.com
Cc: sbjeletich sbjeletich@capaxglobal.com; Mention mention@noreply.github.com
Subject: Re: [MicrosoftDocs/azure-docs] Write Databricks Output to sink using DataFactory V2.0 (#19698)
Hello @sbjeletichhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsbjeletich&data=02%7C01%7Csbjeletich%40hitachisolutions.com%7C5bc899050bde4164520f08d719d65ccb%7Ce85feadf11e747bba16043b98dcc96f1%7C0%7C0%7C637006280877701472&sdata=N%2FVu3D%2FByfZpZio54eenhPXV3yqnpIN1XBBTd9oQlAo%3D&reserved=0 . The way I use the output, is a setVariable activity between the first and second databricks notebook activity. I do this just to help with debugging. The expression I use is
@{activity('Notebook1').output.runOutput}
The {curly braces} are to convert to string, since my variable is of type string. I can then use the variable (and convert type) in the parameters section of the next databricks activity.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoftDocs%2Fazure-docs%2Fissues%2F19698%3Femail_source%3Dnotifications%26email_token%3DAMSPYK7GH7UM6IRE425LZE3QDBZOLA5CNFSM4GHFMZT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3SXSWQ%23issuecomment-518355290&data=02%7C01%7Csbjeletich%40hitachisolutions.com%7C5bc899050bde4164520f08d719d65ccb%7Ce85feadf11e747bba16043b98dcc96f1%7C0%7C0%7C637006280877711465&sdata=oP38S63%2F1wkE5V9Y5TnDTS1Q0bUJwVo8Qkiz%2BPjpjfU%3D&reserved=0, or mute the threadhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMSPYKZELZLRFQHC5O72FVTQDBZOLANCNFSM4GHFMZTQ&data=02%7C01%7Csbjeletich%40hitachisolutions.com%7C5bc899050bde4164520f08d719d65ccb%7Ce85feadf11e747bba16043b98dcc96f1%7C0%7C0%7C637006280877711465&sdata=LWZJRaf306MCGr2t3uLBalbEbdThY5XfGM%2BTHh270M8%3D&reserved=0.
This e-mail is intended solely for the person or entity to which it is addressed and may contain confidential and/or privileged information. Any review, dissemination, copying, printing or other use of this e-mail by persons or entities other than the addressee is prohibited. If you have received this e-mail in error, please contact the sender immediately and delete this e-mail and any attachments from any device.
Hello @sbjeletich . The way I use the output, is a setVariable activity between the first and second databricks notebook activity. I do this just to help with debugging. The expression I use is
@{activity('Notebook1').output.runOutput}
The {curly braces} are to convert to string, since my variable is of type string. I can then use the variable (and convert type) in the parameters section of the next databricks activity.
@MartinJaffer-MSFT
Having executed an embedded notebook via dbutils.notebook.run(), is there a way to return an output from the child notebook to the parent notebook. As the ephemeral notebook job output is unreachable by Data factory.
@Paul92S
According to the Databricks docs, the output values are returned by the run call.
In the child notebook you would use dbutils.notebook.exit("returnValue")
@JuleKuhn You can now send output from notebook back to ADF. This feature has been enabled.
Use dbutils.notebook.exit("returnValue"), you will see'runOutput' property in the response body.
@nabhishek My output is a dataframe - How do I use the output in a Copy Data activity? I'd like to write the output dataframe as CSV to an Azure Data Lake storage. I already added the dbutils.notebook.exit("returnValue") code line to my notebook. Where do use the @{activity('Notebook1').output.runOutput} string in the Copy Data activity?
Hi, in my use case I have to set many output values processed in a notebook and the use them in DataFactory. The runOutput is working, but what if I need more values? I just can retrieve one variable with this scenario.
Another question, is there a way not to exit the notebook to set those variables?
Thanks!
I guess you have found it now but in case it helps someone else, if you want to pass multiple values back to ADF from databricks you can return a JSON object as a string and reference that from ADF:
dbutils.notebook.exit('{"an_object": {"name": {"value": "exciting"}}}')
and read in ADF:
@activity('Run Notebook - JSON Response').output.runOutput.an_object.name.value
You can also pass back lists of values, for a fuller list see:
See https://the.agilesql.club/2020/02/passing-status-messages-and-results-back-from-databricks-to-adf/
Hope it helps!
ed
Thank you! I had not seen this.
Sharon Bjeletich
Principal Data Architect
—
410 N Michigan Ave N650, Chicago, IL 60611
O: +1.973.630.9326 | C: +1.212.203.6504
[cid:[email protected]]http://www.capaxglobal.com/
From: Ed Elliott notifications@github.com
Sent: Sunday, February 9, 2020 5:08 PM
To: MicrosoftDocs/azure-docs azure-docs@noreply.github.com
Cc: sbjeletich sbjeletich@capaxglobal.com; Mention mention@noreply.github.com
Subject: Re: [MicrosoftDocs/azure-docs] Write Databricks Output to sink using DataFactory V2.0 (#19698)
I guess you have found it now but in case it helps someone else, if you want to pass multiple values back to ADF from databricks you can return a JSON object as a string and reference that from ADF:
dbutils.notebook.exit('{"an_object": {"name": {"value": "exciting"}}}')
and read in ADF:
@activity('Run Notebook - JSON Response').output.runOutput.an_object.name.value
You can also pass back lists of values, for a fuller list see:
See https://the.agilesql.club/2020/02/passing-status-messages-and-results-back-from-databricks-to-adf/https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fthe.agilesql.club%2F2020%2F02%2Fpassing-status-messages-and-results-back-from-databricks-to-adf%2F&data=02%7C01%7Csbjeletich%40hitachisolutions.com%7C7ad631a8751e4562968908d7adac84be%7Ce85feadf11e747bba16043b98dcc96f1%7C0%7C0%7C637168828787675465&sdata=JVvsH17emGEdzucHIEtSI4USy6a0CQV%2Ba1j2l0BnFfM%3D&reserved=0
Hope it helps!
ed
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoftDocs%2Fazure-docs%2Fissues%2F19698%3Femail_source%3Dnotifications%26email_token%3DAMSPYK6FQXTANJIPU5NMM7DRCB5DRA5CNFSM4GHFMZT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELGZTHA%23issuecomment-583899548&data=02%7C01%7Csbjeletich%40hitachisolutions.com%7C7ad631a8751e4562968908d7adac84be%7Ce85feadf11e747bba16043b98dcc96f1%7C0%7C0%7C637168828787675465&sdata=RT95YqPPuU3nB14KgMAa1bffSGWthN0QNz1DEnelj7I%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMSPYK26QCGOFGXTFA4G3H3RCB5DRANCNFSM4GHFMZTQ&data=02%7C01%7Csbjeletich%40hitachisolutions.com%7C7ad631a8751e4562968908d7adac84be%7Ce85feadf11e747bba16043b98dcc96f1%7C0%7C0%7C637168828787685463&sdata=N8kyeNe279zbU5grea8%2FpHz55JmFZ9P1gea6Ord87kE%3D&reserved=0.
This e-mail is intended solely for the person or entity to which it is addressed and may contain confidential and/or privileged information. Any review, dissemination, copying, printing or other use of this e-mail by persons or entities other than the addressee is prohibited. If you have received this e-mail in error, please contact the sender immediately and delete this e-mail and any attachments from any device.
Most helpful comment
@JuleKuhn You can now send output from notebook back to ADF. This feature has been enabled.
Use dbutils.notebook.exit("returnValue"), you will see'runOutput' property in the response body.