When trying to register a dataset with the flag create_new_version=True, I get an error instead of creating a new version.
I get the following error:
File "register.py", line 110, in register_dataset
ds.register(workspace, name=name, create_new_version=True)
File "/data/anaconda/envs/cyril/lib/python3.6/site-packages/azureml/data/_loggerfactory.py", line 106, in wrapper
return func(*args, **kwargs)
File "/data/anaconda/envs/cyril/lib/python3.6/site-packages/azureml/data/abstract_dataset.py", line 313, in register
raise result
Exception: An identical dataset had already been registered, which can be retrieved with `Dataset.get_by_name(workspace, name="cases_train_data.csv", version=1)`.
Here is my call to register the data set:
ds = Dataset.Tabular.from_delimited_files(path)
ds.register(workspace, name=name, create_new_version=True)
SDK Version: azureml-sdk==1.2.0
@stemoor we will review your feedback and get back to you shortly. Thanks.
@stemoor Thanks for reaching out to us. We are investigating the case. Will reach back if we need more info.
@stemoor
re-regitering the same data (datastore+path) with the same workspace is not supported at the moment. Looks like you have already registered the same data under (name="cases_train_data.csv", version=1). If you want to register this data under a different name, you need to first unregister dataset "cases_train_data.csv".
We are adding support ot enable users registering the same data under different names. We will add this in our release notes once the feature is in.
Thanks!
Thanks for the updates @MayMSFT .
@stemoor we will now proceed to close this thread. Thanks.
Okay, so if I want to update the data how does that work? If I update the data on the datastore, does the update get reflected on the dataset? Or, every time I update the data on the datastore, do I have to unregister it and then re-register it? My data is always updating, so I need to know what it the proper way to handle this case.
Please check out dataset versioning which may be applicable to your scenario. You can specify 'create_new_version = True' when registering a new version. Hope this helps. Thanks.
Your suggestion is exactly what I tried doing that got me the error above. I tried registering under the same name using the flag 'create_new_version = True' and got the error above.
@stemoor apologies. I will reopen this thread.
@MayMSFT the document states that 'You can register multiple datasets under the same name and retrieve a specific version by name and version number.' Can you help clarify?
Nevermind! I am sorry I thought I was registering the same data set under the same name but I was wrong by a single letter. Sorry! This is all resolved! Thanks for all your time!