Hi, I am wondering about the future of CNTK. Is CNTK still being used for training and inference? With new Windows ML and ONNX, please let us know the future roadmap of CNTK.
I think if you view the computation graph as C++ source code, then you'll have analog as below:
CNTK/Caffe2/Pytorch/MxNet/TF --> IDE for writing C++ source code
ONNX --> the C++ standard
Windows ML/TensorRT/etc --> the C++ compiler and executor for the compiled graph on particular platforms
The vision is that model training/inference would be come services in cloud. They should be decoupled from the toolkit user choose to compose their model. We will continue to improve CNTK as model creation toolkit, which includes the training and inference function nowadays. As an open source product, CNTK would stick to a principle we set in Day One: share exactly the same toolkit Microsoft used internally to the world. CNTK users can create and train their models with a reasonable size of training data on their own GPU cluster, and infer their model directly in CNTK with reasonable performance.
We also realize that there are more features needed to make DNN models into full-fledged products. There are general requirements in graph optimization, quantization, device adaption, and scalability for large scale GPU parallel training and data storage/processing. Though CNTK already has some of these features, the engineering complexity for those goes beyond a single team. For better alignment of engineering efforts inside and outside Microsoft, a more important thing is to set up the standard for model representation, and create an ecosystem on top of that.
From customers' point of view, what matters most is how convenient and fast they can try out different models, and deploy their models as products. Those models may come from existing models with modification from users, or built from scratch using toolkits like CNTK. Users would go through an interactive or fully automated process of large scale training of the model with their own data, and tweaking the hyper parameters for training. This process may/should be decoupled from model creation toolkit, when model format is standardized. The benefit of ONNX is that user can leverage optimization for the computation graph in the model regardless of how the model was created. For example, you may have a RNN model created with PyTorch and stored as ONNX format, then load the ONNX model and train with CNTK using variant length sequence packing technique. After the model is trained, user can further optimize the model with quantization for inference speed/latency. Once the model is ready, user may deploy the model as a service in the cloud with high scalability and availability, or use the model in their devices with ONNX support from IHVs.
With a standard of model representation, different vendors can provide their optimization in either software or hardware, regardless of the toolkit selection. That's why lots of companies are involved in the ONNX effort. Windows ML is going to be a system built on top of ONNX, and with support from different vendors it would have optimal inference performance on CPU/GPU/FPGA/ASIC devices on Windows platforms from desktop to IoT devices. Besides, Windows ML is extended beyond ONNX spec to support traditional ML techs like SVM/tree/etc, as ONNX is currently focused on DNN.
@KeDengMS Thank you for the detailed response.
Most helpful comment
I think if you view the computation graph as C++ source code, then you'll have analog as below:
CNTK/Caffe2/Pytorch/MxNet/TF --> IDE for writing C++ source code
ONNX --> the C++ standard
Windows ML/TensorRT/etc --> the C++ compiler and executor for the compiled graph on particular platforms
The vision is that model training/inference would be come services in cloud. They should be decoupled from the toolkit user choose to compose their model. We will continue to improve CNTK as model creation toolkit, which includes the training and inference function nowadays. As an open source product, CNTK would stick to a principle we set in Day One: share exactly the same toolkit Microsoft used internally to the world. CNTK users can create and train their models with a reasonable size of training data on their own GPU cluster, and infer their model directly in CNTK with reasonable performance.
We also realize that there are more features needed to make DNN models into full-fledged products. There are general requirements in graph optimization, quantization, device adaption, and scalability for large scale GPU parallel training and data storage/processing. Though CNTK already has some of these features, the engineering complexity for those goes beyond a single team. For better alignment of engineering efforts inside and outside Microsoft, a more important thing is to set up the standard for model representation, and create an ecosystem on top of that.
From customers' point of view, what matters most is how convenient and fast they can try out different models, and deploy their models as products. Those models may come from existing models with modification from users, or built from scratch using toolkits like CNTK. Users would go through an interactive or fully automated process of large scale training of the model with their own data, and tweaking the hyper parameters for training. This process may/should be decoupled from model creation toolkit, when model format is standardized. The benefit of ONNX is that user can leverage optimization for the computation graph in the model regardless of how the model was created. For example, you may have a RNN model created with PyTorch and stored as ONNX format, then load the ONNX model and train with CNTK using variant length sequence packing technique. After the model is trained, user can further optimize the model with quantization for inference speed/latency. Once the model is ready, user may deploy the model as a service in the cloud with high scalability and availability, or use the model in their devices with ONNX support from IHVs.
With a standard of model representation, different vendors can provide their optimization in either software or hardware, regardless of the toolkit selection. That's why lots of companies are involved in the ONNX effort. Windows ML is going to be a system built on top of ONNX, and with support from different vendors it would have optimal inference performance on CPU/GPU/FPGA/ASIC devices on Windows platforms from desktop to IoT devices. Besides, Windows ML is extended beyond ONNX spec to support traditional ML techs like SVM/tree/etc, as ONNX is currently focused on DNN.