Upon start of the application, it starts normally - then as soon as the IPFS js node gets initialized, the application throws a critical error with the following:
App starting..
events.js:165
throw er; // Unhandled 'error' event
^
Error: websocket error
at WS.Transport.onError (/path/to/application/node_modules/engine.io-client/lib/transport.js:64:13)
at WebSocket.ws.onerror (/path/to/application/node_modules/engine.io-client/lib/transports/websocket.js:150:10)
at WebSocket.onError (/path/to/application/node_modules/ws/lib/EventTarget.js:109:16)
at WebSocket.emit (events.js:180:13)
at WebSocket.finalize (/path/to/application/node_modules/ws/lib/WebSocket.js:182:41)
at ClientRequest._req.on (/path/to/application/node_modules/ws/lib/WebSocket.js:653:12)
at ClientRequest.emit (events.js:180:13)
at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:539:21)
at HTTPParser.parserOnHeadersComplete (_http_common.js:117:17)
at TLSSocket.socketOnData (_http_client.js:444:20)
Emitted 'error' event at:
at done (/path/to/application/node_modules/ipfs/src/core/boot.js:58:19)
at /path/to/application/node_modules/async/internal/parallel.js:39:9
at /path/to/application/node_modules/async/internal/once.js:12:16
at iterateeCallback (/path/to/application/node_modules/async/internal/eachOfLimit.js:44:17)
at /path/to/application/node_modules/async/internal/onlyOnce.js:12:16
at /path/to/application/node_modules/async/internal/parallel.js:36:13
at done (/path/to/application/node_modules/ipfs/src/core/components/start.js:15:16)
at series (/path/to/application/node_moduless/ipfs/src/core/components/start.js:39:25)
at /path/to/application/node_modules/async/internal/parallel.js:39:9
at /path/to/application/node_modules/async/internal/once.js:12:16
In the configuration, I have a node.js application that uses IPFS-js. I have the application dockerized as well as a full ipfs dockerized node that websockets as a peer to the IPFS-js app. This works perfectly fine and I have had no problem until last night when every single one of my environments started throwing the same error out of nowhere. Now none of my team can spin up the api because it has the same issue. One odd thing is that in the gap of time before it crashes, we can spam the API with a curl and get back the IPFS results for a quick second, otherwise it fails. I have confirmed it is not a websocket issue on my end and it is infact something with the IPFS-js package in the nodeJS application. Guidance is much appreciated.
let options = {
config: {
Addresses: {
Swarm: [
// '/dns4/wrtc-star.discovery.libp2p.io/tcp/443/wss/p2p-webrtc-star'
'/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star'
]
}
}
}
const ipfsNode = new IPFS(options)I just found this morning that the ws-star server was hung. It responded to normal http requests properly, meaning that our health-checks though everything was fine but the actual websocket endpoints didn't work.
I've restarted the server and confirmed it to be working again, currently adding better health-checks to it now. Could you please retry and report back if it's working now?
@VictorBjelkholm It is resolved now due to that fix, thank you so much! Much appreciated.
It does seem like we could surface a better error here, though, saying something like could not connect to websocket server <address> or something. It wouldn’t fix the problem of a lot of people all depending on a particular server that went down, but it would at least make it more clear what happened.
@Mr0grog That's a good point, have been raised before in: https://github.com/ipfs/js-ipfs/issues/804#issuecomment-370750313
Basically, current implementation crashes when the node can't start listening on the swarm address (since the endpoint does not), which leads to this crash. While I agree that the error message could be better, I would also say that it should be a warning instead of a error, and the daemon should continue booting even if a swarm address fails to open.
I think I’d agree with all those points 😄
@VictorBjelkholm I am getting this same error again on all my applications. Potentially may need to re-kick off your ws-star server
Coworker of shessenauer here: we're still getting this error. Are there other servers we can add that would be more stable? Even better, if there was an option to check that would allow for the automatic addition of swarm nodes connected to the nodes listed in options.config.Addresses.Swarm so that the list of servers is dynamic and thus maybe more resilient? That would be fantastic. Is that congruent with the idea of a Swarm?
Question: will this happen if any one of the endpoints in options.config.Addresses.Swarm is down? Another way of asking this question is: Do all of the servers in options.config.Addresses.Swarm have to be up to avoid this error?
Thanks guys.
Ok so after looking at it a little more closely, the problem is that IPFS never gets out of the 'starting' state after a websocket connection error is thrown. Is there/can there be a reconnect option for this sort of thing?
Re-opening (and re-titling) this to track things a little more clearly. There are two things to address here:
This error message needs to be more clear (e.g. “failed to connect to websocket ${address}” or something).
A connection failure (probably for any swarm connection?) should not be fatal for IPFS.
These are tied in with #1325, but are concrete enough we should be able to address them sooner and more directly.
This was resolved by https://github.com/ipfs/js-ipfs/pull/1793
Most helpful comment
It does seem like we could surface a better error here, though, saying something like
could not connect to websocket server <address>or something. It wouldn’t fix the problem of a lot of people all depending on a particular server that went down, but it would at least make it more clear what happened.