Azure-docs: Provide documentation on spark jobs on kubernetes reading data from adls

Created on 22 Jan 2019 · 5Comments · Source: MicrosoftDocs/azure-docs

Could you expand this documentation with how to run spark jobs in aks reading data from adls?

https://docs.microsoft.com/en-us/azure/aks/spark-job

Thank you

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: 5eaa2684-cde6-e260-5725-d0eea894889f
Version Independent ID: ed048e95-a087-50f5-b2c7-0c3b6be2b70f
Content: Run an Apache Spark job with Azure Kubernetes Service (AKS)
Content Source: articles/aks/spark-job.md
Service: container-service
GitHub Login: @lenadroid
Microsoft Alias: alehall

Pri2 assigned-to-author container-servicsvc doc-enhancement triaged

Source

ElianoMarques

All 5 comments

@ElianoMarques can you expand a bit on your ask? What exactly are you unsure of as it relates to this doc?

mimckitt on 23 Jan 2019

👍1

In your example, you run a spark submit job that reads a jar from azure blob but the example doesn't read data from anywhere as its the pi example.

What would be very usefull is to have the documentation around running spark jobs in kubernetes accessing data in adls. If you try out-of-the-box spark 2.4.0 and follow all the processes around building the docker image, adding the standard azure-data-lake jar and hadoop-azure-datalake jars into spark, configure core-site.xml and start spark via a spark-submit, or pyspark or sparkR, neither of the options connect to adls. If you would run spark locally with the same settings it works. Maybe there are some steps extra to configure that connectivity. Also, it would be nice to see an example in the documentation with jupyter.

Was this helpfull?

Eliano

ElianoMarques on 23 Jan 2019

👍1

@ElianoMarques got it. Thank you for the further explanation :)

@lenadroid can you take a look?

CC @iainfoulds

mimckitt on 23 Jan 2019

Thanks for the suggestion, @ElianoMarques. I've created a backlog work item to track this doc suggestion. I don't have an ETA on when this doc may be published.

@MicahMcKittrick-MSFT For now, #please-close