Does mxnet have the plan to support model parallelism on distributed environment? It means to split cerntain layers into sublayers which run across different machines, although much larger communication cost is introduced, while under situations where model couldnot be contained on single machine, it's a useful feature.
@sxjscience Could this implementation run on disributed environment?
Another question, such model parallelism is designed to assign different layers to different devices, and pipeline used to connect devices. If I want both data parallelism and model parallelism for such a pipeline, for example, I use 3 machines for lstm encoder, and 3 machines for lstm decoder, could mxnet support such kinds of hybrid parallelism?
Additionally, could such kinds of pipeline design be generalized to CNN/DNN ? For example, full connect layers are assigned to some machines, while other layers are assigned to other machines where data parallelism is used.
Sorry that I've misinterpreted the question. The example is not for the distributed environment. But I think we can achieve such pipeline using the current KVStore/Executor APIs.
multi machine model parallel is doable with low level api (executor+kvstore), but with modern gpus it's usually unnecessarily complicated and under performing except maybe for very special cases. Recommendation system & matrix factorization is one such case. I suggest you talk with @xlvector if you are interested
even with rs and mf, data parallelism is still preferred due to better performance
@mli But what about the situation where model could not be contained within single machine? In many cases model parallelism is used not because of performance but model size.
This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!
Most helpful comment
multi machine model parallel is doable with low level api (executor+kvstore), but with modern gpus it's usually unnecessarily complicated and under performing except maybe for very special cases. Recommendation system & matrix factorization is one such case. I suggest you talk with @xlvector if you are interested