I'm using the following python tune script:
hpo.py
# Install and import libraries
import time
from lightgbm import LGBMClassifier
import pandas as pd
import ray
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from tune_sklearn import TuneSearchCV
# Start timer
start = time.time()
# Initialize Ray
ray.init(address='auto')
# Load breast cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target
# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize estimator
model = LGBMClassifier()
# Initialize parameter distributions
param_dists = {
'boosting_type': ['gbdt'],
'colsample_bytree': (0.8, 0.9, 'log-uniform'),
'reg_alpha': (1.1, 1.3),
'reg_lambda': (1.1, 1.3),
'min_split_gain': (0.3, 0.4),
'subsample': (0.7, 0.9),
'subsample_freq': (20, 21)
}
# Initialize tuner
tuner = TuneSearchCV(
model,
param_dists,
n_iter=20,
scoring='f1_weighted',
n_jobs=-1,
verbose=2,
max_iters=10,
search_optimization='bayesian',
use_gpu=True,
)
# Tune hyperparameters
tuner.fit(X_train, y_train)
print('Best Parameters :', tuner.best_params_)
# Get cross-validated results
df_cv = pd.DataFrame(tuner.cv_results_)
# Predict using best hyperparameters
y_pred = tuner.predict(X_test)
print('F1 Score:', f1_score(y_test, y_pred, average='weighted'))
# Get elapsed time
end = time.time()
print('Elapsed Time :', (end - start))
# Shutdown Ray
ray.shutdown()
and the following configuration file:
tune-default-hpo.yaml
cluster_name: tune-default
provider: {type: aws, region: us-east-2}
auth: {ssh_user: ubuntu}
min_workers: 3
max_workers: 3
head_node:
InstanceType: c5.xlarge
ImageId: ami-08bf49c7b3a0c761e
# Run workers on spot by default. Comment this out to use on-demand.
InstanceMarketOptions:
MarketType: spot
SpotOptions:
MaxPrice: "1" # Max Hourly Price
# Provider-specific config for worker nodes, e.g. instance type.
worker_nodes:
InstanceType: m5.large
ImageId: ami-08bf49c7b3a0c761e
# Run workers on spot by default. Comment this out to use on-demand.
InstanceMarketOptions:
MarketType: spot
SpotOptions:
MaxPrice: "1" # Max Hourly Price
setup_commands: # Set up each node.
- pip install lightgbm ray scikit-optimize torch torchvision tabulate tensorboard tune_sklearn
Command: ray submit tune-default-hpo.yaml hpo.py --start --stop
Questions:
Thanks for making this issue @rohan-gt!
Ray is not terminating the spot instances if the connection is lost. Is there a way to configure ray to terminate them automatically?
What do you mean connection is lost? Maybe ray submit --stop?
Also is there a python API to deploy the instances instead of running the command on the terminal?
Unfortunately, it's not public-facing.
How do I configure Ray to deploy on a single spot instance and not a cluster?
min_workers: 0
max_workers: 0
How can I set the storage size for the spot instances?
worker_nodes:
InstanceType: m4.16xlarge
ImageId: ami-0def3275 # Default Ubuntu 16.04 AMI.
# Set primary volume to 250 GiB
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 250
@richardliaw I tried deploying the cluster from my local PC using the following command:
ray submit tune-default-hpo.yaml hpo.py --start --stop
But if either the submitted Python script has an error or if I close the terminal mid execution or if my internet connection is lost, the spot instance cluster is not shutdown
I later tried passing ray submit --stop which gives the error Error: Missing argument 'CLUSTER_CONFIG_FILE'. while ray submit tune-default-hpo.yaml hpo.py --stop errors out with Command 'ray' not found, did you mean: because I closed the terminal before Ray was installed on the head node.
Is there a way to:
Sorry for the slow reply:
ray submit tune-default-hpo.yaml hpo.py --start --tmux --stop
allows you to teardown the cluster automatically after the job is finished AND does not require the internet connection to be kept on your laptop.
Does that help?
Hmm this still requires the ray package to be present on the head node before it can shutdown right? Because when I tried closing the terminal mid pip installation of the ray package it didn't auto shutdown. But sure if this is how it is that's good enough
Great point; yeah, to clarify, this requires Ray to have started on the head-node (so ray up and ray start needs to have finished).