It seems to me that if I have a peerinfo with two addresses of which one is successful, a call to dialbypeerinfo works only 50% of the time, if the 'good' address is attempted first. If the bad address is attempted first, I just get a dial error; I was under the impression that libp2p would try all options until one succeeds.
I have had this issue with addresses that totally fail, eg local addresses - the dial fails after a websocket creation timeout, which is pretty long! I also have it on addresses that somewhat fail, eg the websocket connection does not upgrade. At least in those cases I have a prompt failure, instead of a wait for timeout.
I believe the issue is in libp2p-swarm/transport.js, where at line 56 the result of multiaddrs.shift() is passed to next() which attempts to dial. This is done only once, so only the first multiaddr is dialled to. A comment makes it seem like the writer thought that all dials will happen in sequence. I think instead that it should loop over the multiaddrs attempting to dial them all and calling back with the first success.
See https://github.com/libp2p/js-libp2p-swarm/pull/193 where I have fixed this.
Thanks for catching this @jackkleeman, seems that there might have been a regression along some swarm refactoring, the intended behaviour is for it to try all addresses.
No problem! Fortunately an easy fix
This has been fixed. Use js-ipfs master to benefit from these changes :)