During run the distrbuted mode on two server. The following error appears.
node1:18225:18239 [0] INFO NET/IB : Using interface eth1 for sideband communication
node1:18225:18239 [0] INFO Using internal Network Socket
node1:18225:18239 [0] INFO Using NCCL Low-latency algorithm for sizes below 16384
node1:18226:18240 [1] INFO NET : Using interface eth1:100.102.32.176<0>
node1:18226:18240 [1] INFO NET/IB : Using interface eth1 for sideband communication
node1:18226:18240 [1] INFO Using internal Network Socket
node1:18226:18240 [1] INFO Using NCCL Low-latency algorithm for sizes below 16384
node1:18225:18239 [0] INFO NET : Using interface eth1:100.102.32.176<0>
node1:18225:18239 [0] INFO NET/Socket : 1 interfaces found
node1:22671:22795 [1] INFO NET : Using interface eth1:100.102.32.175<0>
node1:22671:22795 [1] INFO NET/Socket : 1 interfaces found
node1:18226:18240 [1] INFO NET : Using interface eth1:100.102.32.176<0>
node1:18226:18240 [1] INFO NET/Socket : 1 interfaces found
node1:22670:22794 [0] INFO Using 256 threads
node1:22670:22794 [0] INFO Min Comp Cap 5
node1:22670:22794 [0] INFO NCCL_SINGLE_RING_THRESHOLD=131072
node1:22670:22794 [0] INFO [0] Ring 0 : 0 1 2 3
node1:22671:22795 [1] INFO 1 -> 0 via P2P/IPC
node1:18225:18239 [0] INFO 2 -> 1 via P2P/IPC
node1:22671:22795 [1] INFO 1 -> 2 via P2P/IPC
node1:18226:18240 [1] INFO 3 -> 2 via P2P/IPC
node1:18226:18240 [1] INFO 3 -> 0 via P2P/IPC
node1:18225:18239 [0] INFO 2 -> 3 via P2P/IPC
node1:22670:22794 [0] INFO 0 -> 3 via P2P/IPC
node1:22670:22794 [0] INFO 0 -> 1 via P2P/IPC
node1:22671:22795 [1] transport/p2p.cu:431 WARN failed to open CUDA IPC handle : 30 unknown error
node1:22671:22795 [1] INFO transport/p2p.cu:441 -> 1
node1:22671:22795 [1] INFO init.cu:462 -> 1
node1:22671:22795 [1] INFO init.cu:517 -> 1
node1:18226:18240 [1] transport/p2p.cu:431 WARN failed to open CUDA IPC handle : 30 unknown error
node1:18226:18240 [1] INFO transport/p2p.cu:441 -> 1
node1:18226:18240 [1] INFO init.cu:462 -> 1
node1:18226:18240 [1] INFO init.cu:517 -> 1
node1:18225:18239 [0] transport/p2p.cu:431 WARN failed to open CUDA IPC handle : 30 unknown error
node1:18225:18239 [0] INFO transport/p2p.cu:452 -> 1
node1:18225:18239 [0] INFO init.cu:463 -> 1
node1:18225:18239 [0] INFO init.cu:517 -> 1
Seems like same issue as https://github.com/uber/horovod/issues/110#issuecomment-347754834 - do both of your servers have the same host name?
@alsrgv
Thanks, I get it!
Seems like same issue as #110 (comment) - do both of your servers have the same host name?
thank you very much. it helps me much.
Most helpful comment
thank you very much. it helps me much.