Under the section Use SQL data, there is a code snippet on how to get access to a Microsoft SQL Server. The snippet calls a function to register a secret dprep.MSSQLDataSource()
, and the text following the snippet tells that the secret is used to pass the password to the function.
I can't get this to work, and there are some things I find unclear about this.
First, is password=[DATABASE-PASSWORD]
really correct inside the call to dprep.MSSQLDataSource()
? My guess is that it should really be password=secret
, right?
Second, there is no explanation to the function dprep.register_secret()
. What does it really do; does it register a new secret in some vault, or does it just obtain a previously registered secret? And what should be inserted instead of "[SECRET-USERNAME]"
and "[SECRET-PASSWORD]"
?
Thirdly, is it necessary to connect to a Workspace before doing these steps? It seems likely that the secret should be stored in the key vault created along with the workspace. However, when I navigate to the key vault in Azure Portal, I can see that I am authorized to use the key vault, even though I was the one that created the Workspace that cause the creation of the key vault in the first place. I managed to find a guide on how to add secrets using the CLI, but I only get Access denied
when I try, which seems reasonable due to the fact that I don't seem to be authorized to use the key vault.
I'm a little lost at this point. No matter on how I try to configure, I can only get so far as to get an error message saying ExecutionError: Login failed.
, so I would gladly take any help I can get to be able to connect to my MSSQL server.
Thanks in advance
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
@ArvidBaa Thanks for you details about the document, I have assigned the issue to the content author to investigate further and update the document as appropriate.
@cforbe Hi, can you please check this post, there are some unclear statements, can you please check it and also update the document as necessary? Thanks!
Looking into it now!
@cforbe Look forward to your response. Thanks!
Hi @ArvidBaa,
I am an engineer on the DataPrep team. These are good questions, I will try answering them:
password=[DATABASE-PASSWORD]
really correct inside the call to dprep.MSSQLDataSource()
? My guess is that it should really be password=secret
, right?”password="[DATABASE-PASSWORD]"
. We will resolve this.dprep.register_secret()
. What does it really do; does it register a new secret in some vault, or does it just obtain a previously registered secret? And what should be inserted instead of "[SECRET-USERNAME]"
and "[SECRET-PASSWORD]"
?”register_secret(value="[DATABASE-PASSWORD]", id="[DATABASE-USERNAME]")
. SECRET-USERNAME
and SECRET-PASSWORD
corresponds to DATABASE-USERNAME
and DATABASE-PASSWORD
, this is another documentation error on our end and we will fix it. The register_secret
method for now only registers the secret to our engine process and is only alive for the duration of engine process. If the engine process is restarted (e.g. restarting the notebook kernel), then all the registered secrets are gone. There is a reference notebook that explains our current secret story in more details. However, our read_sql
should be calling register_secrets
for you. I've filed a bug and we will fix it.Please let us know if you have any other questions.
Hi @rongduan-zhu,
You say that the document should say password="[DATABASE-PASSWORD]"
, but the data type for the method is not a string, but a azureml.dataprep.api.engineapi.typedefinitions.Secret
object (see the documentation). So I believe the true value should be password=secret
, where secret=register_secret(value="[DATABASE-PASSWORD]", id="[DATABASE-USERNAME]")
.
Thank you, this was just the information I needed to succeed in connecting to my MSSQL server. However, this raises a few questions, mostly curious ones.
a) Why is the database username passed twice to the function dprep.MSSQLDataSource()
? Both as user_name
and then inside the password
object.
b) Why is it called a secret, when by all means it is just a regular object containing two strings? Both the id
and the value
parameters are saved as strings, in plain sight for anyone reading the code. I have saved them as environment variables to prevent this, but it seems better to store (at least) the password in the key vault connected to the Workspace. I will follow any updates on this with great interest.
I will follow the updates with great interest. Thank you.
Again, thank you for the help. As I said, I have now managed to connect to the database, and can continue to work with the data preparations.
Oh yes, you are absolutely right for 1. My mistake.
Happy to hear you are unblocked now! Will update this thread again once all the issues have been resolved on our end.
@rongduan-zhu - please update.
Have there been any updates to my follow up questions above? Especially how to handle the secrets without writing them in plain text? I am at the moment trying to implement this kind of data preparation in a pipeline for my Machine Learning service, and when doing so I don't want my database credentials to be included as plain text in the scripts. I would like to store them somewhere secure, like in a key vault in Azure Portal (for example the one that is being created along with the workspace?!), or in some other secure location.
@ArvidBaa I am currently working on MS SQL Datastore support in Data Prep and we are targeting the next release (end of Jan). However, since datastore belongs to a workspace, you will need a workspace in order to use datastores.
The experience will be like this once the feature is complete:
import azureml.dataprep as dprep
from azureml.core import Workspace, Datastore
workspace = Workspace.from_config()
datastore = Datastore(workspace, 'sql_datastore')
dflow = dprep.read_sql(datastore, query='SELECT * FROM mytable')
@rongduan-zhu Ok, that looks promising. I am looking forward to the release including that functionality.
Thank you for the update!
Best regards
Arvid Bäärnhielm
@trevorbye - can you respond here please and update the document as needed?
@rongduan-zhu is this design pattern implemented in the current SDK version? If so, I will add it into our documentation.
Yes, please refer to this notebook on how to read from SQL. It's at the bottom of the __Read data from Datastore__ section.
Hope this helps. If it does, #please-close.
In the notebook, it is not explained how the database is actually connected to the workspace Datastore in the beginning. However, I assume the instructions given in this notebook is covering that?
Is this implementation enough for me to be able to connect to my MS SQL? I am getting a little lost in all different types of SQL in Azure. The Datastore
documentation tells me that there are methods for initializing two different types of SQL: Azure PostgreSQL and Azure SQL database. Is this enough?
@YutongTie-MSFT #please-close this as the remaining issue is with the notebook. I've moved this issue to that repo as referenced above. Thanks @ArvidBaa
Most helpful comment
Oh yes, you are absolutely right for 1. My mistake.
Happy to hear you are unblocked now! Will update this thread again once all the issues have been resolved on our end.