Azure-docs: Incomplete docs regarding environment of init script and absolute paths

Created on 31 May 2020  Â·  5Comments  Â·  Source: MicrosoftDocs/azure-docs

Greetings,
The / seen by the init script is NOT the same as the / seen by dbfs CLI (databricks fs ls <PATH>).

Even if one follows the tutorial, neither relative path ./ nor dbfs:/ works from within the init script. One _needs_ to use absolute path. At the same time doing a databricks fs cp <file> dbfs:/databricks/scripts/ does not make <file> accessible via /databricks/scripts/<file>, it's accessible via /dbfs/databricks/scripts/<file>!

This is utterly non-obvious and is not mentioned in the docs anywhere. Please add it to the docs.

You can put init scripts in a DBFS directory accessible by a cluster. Init scripts in DBFS must be stored in the DBFS root. Azure Databricks does not support storing init scripts in a DBFS directory created by mounting object storage.

And yet all your examples use /mnt without context or mentioning that the init script is located in /dbfs/databricks/scripts.


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri2 assigned-to-author azure-databricksvc doc-enhancement triaged

Most helpful comment

Hi @snugghash - thank you for the clarification. I have added to the note in the doc to mention that the path must begin with dbfs. Please allow a few days for the change to go live.

please-close

All 5 comments

@snugghash Thanks for bringing this to our notice. We are investigating into the issue and will update you shortly.

@mamccrea Can you please take a look at this?

Hi @snugghash! Thank you for your feedback. Can you please tell us a little more about your scenario? Which part of this article, specifically, is not working for you? The methods outlined in our docs for creating and configuring init scrips have all been tested and should work.

@mamccrea Hallo!
Yes of course, everything listed in the article works. But if the intention of the article is to serve as an introduction while mentioning the caveats and gotchas of working with init scripts, like it does with the quoted text above, this article doesn't cover all the bases. There's one non-obvious gotcha when working with the filesystem of the cluster, specifically as seen by the init scripts.

The problem: we might assume when we do databricks fs cp <file> dbfs:/databricks/scripts/ using the Databricks CLI, that <file> would be accessible to the init script as /databricks/scripts/<file>, but it is unexpectedly accessible as /dbfs/databricks/scripts/<file>.

I felt like that should be mentioned in the docs somewhere, but the only way I figured it out is via trial and error with databricks fs ls. That cost me time. Since many developers use this, i think you'll save humanity a bunch of time by adding a note describing the filesystem as seen by the machine the init script is running on.

Thank you very much for your time, effort and conversation. Hope the problem is clearer this time around?

Hi @snugghash - thank you for the clarification. I have added to the note in the doc to mention that the path must begin with dbfs. Please allow a few days for the change to go live.

please-close

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Ponant picture Ponant  Â·  3Comments

JeffLoo-ong picture JeffLoo-ong  Â·  3Comments

spottedmahn picture spottedmahn  Â·  3Comments

mrdfuse picture mrdfuse  Â·  3Comments

jharbieh picture jharbieh  Â·  3Comments