The documentation says that we can attach remote VMs such as _"an Azure VM, a remote server in your organization, or on-premises. Specifically, given the IP address and credentials (user name and password, or SSH key), you can use any accessible VM for remote runs."_
However, in both the studio as well as the provided code snippet, a resource ID is required.
from azureml.core.compute import RemoteCompute, ComputeTarget
# Create the compute config
compute_target_name = "attach-dsvm"
attach_config = RemoteCompute.attach_configuration(resource_id='<resource_id>',
ssh_port=22,
username='<username>',
password="<password>")
# Attach the compute
compute = ComputeTarget.attach(ws, compute_target_name, attach_config)
compute.wait_for_completion(show_output=True)
As such, I was wondering if it's possible to do as the documentation suggests and attach our SSH-accessible servers to AML.
Hi, is there any clarification on this? If it's not supported as described, I would be happy to try to contribute it.
@jacobdanovitch While it was (and maybe still is) supported (though I haven't tested it) as per here, it looks like this is being deprecated. Apparently the new method still works, though it's just undocumented, and all the examples that I've seen focus on a resource_id as per your example. It doesn't look like we can submit a PR to improve this (as I think the docs are auto-generated from the source).
The main question for me is whether remote compute is still in the long term vision for Azure ML, or if it's going to be fully deprecated. I can understand why, but it'd be good to know so I can stop recommending it to customers with on-prem compute they wish to use.
the long-term solution for on-prem compute is through Azure Arc support, which will enable attaching k8s clusters to AML for job submission
non-AML team members unfortunately cannot contribute to our examples, docs, SDK, etc. today, but most of it will be open source in v2
That's a great catch, I knew I'd seem something like that at one point. In fact, while your link to the new method seems to suggest it works, I tried it and got this:
Content: b'{"error":{"code":"NotSupported","message":"Attach by address is no longer supported. Make sure address is removed and resourceId is included as a property (properties.resourceId)."
So it seems as though remote compute is no longer possible, unless there's another way I'm unaware of? That would be a huge shame. @kodonnell Have you ran into that error?
In my use case, I have a ton of data in blob storage that I'm processing with HDI/Databricks, which I then have to download to train on a SLURM cluster. I understand the reasoning, but realistically, this would lead to me forgoing Azure entirely and setting up Spark on the cluster as well. I mean, I'd love to use AMLCompute for everything, but that isn't feasible for doing large experiments as one person.
the long-term solution for on-prem compute is through Azure Arc support, which will enable attaching k8s clusters to AML for job submission
non-AML team members unfortunately cannot contribute to our examples, docs, SDK, etc. today, but most of it will be open source in v2
I hadn't heard of Arc, neat! I'm mostly clueless on k8s; will this function similarly to the original way, where we can just attach any generic server(s)? Is there a rough timeline?
That's great to hear that it's in the plans and really awesome to hear that it'll be open sourced to whatever extent possible.
no clue on rough timeline or exactly how it'll function - I've posted this issue internally to the AML Compute team, hopefully they can provide guidance
@kodonnell Have you ran into that error?
No, sorry - I've only heard that the new method works from a colleague. Have you tried the old method? It's deprecated, but may still work.
the long-term solution for on-prem compute is through Azure Arc support ...
... most of it will be open source in v2
Exciting!
Bringing on prem K8s clusters will be supported through Azure Acr. AML will be able to install an operator on AKS and Arc clusters and DS's will be able to set these compute targets and run training jobs (TF, Py Torch, MPI, Scikit, etc) using the AML SDK .. We started a private preview this week which requires subs allowListing but we are not accepting a lot of customers for the private preview. Public preview is planned for end of Q1 2021
Most helpful comment
no clue on rough timeline or exactly how it'll function - I've posted this issue internally to the AML Compute team, hopefully they can provide guidance