Okhttp: WebSocket API Review

Created on 12 Oct 2016  Â·  18Comments  Â·  Source: square/okhttp

I went through the WebSocket API and wrote down my thoughts. Nothing too earth-shattering, just some ideas.

  1. How should WebSockets interact with the Dispatcher? In particular should open web sockets count towards maxRequestsPerHost ? (I _think_ yes, but we should explicitly call this out.)
  2. When can I cancel a WebSocketCall? Once it’s open is that equivalent to calling WebSocket#close() ?
  3. Does isCanceled() return true after a WebSocket is closed?
  4. Do we throw an IllegalStateException if you write the socket in onOpen() ? What about sendPing() and close() ? #2905
  5. It looks like pings and the corresponding pongs are very loosely coupled. Any idea how strict the spec is? Seems weird to call out that pongs can come unsolicited. I’m really unsure about what to do with pings and pongs. My first instinct is to hide them from the API and instead offer something to send pings automatically for you on a schedule
  6. I wonder if there’s a better name for WebSocketListener. With Call we have Callback and the names are cute. Is WebSocketCallback terrible?
  7. An obnoxious idea – would it ever make sense to combine the WebSocketCall and WebSocket types? I’m getting here via the (slight) redundancy between cancel() and close(). Obviously sendMessage() would need to be a bit more clever to do something reasonable if it were invoked before onOpen() is received. Though it’s interesting to think about “fast open” where a web socket client can enqueue messages before we have confirmation that it’s all going to connect.
  8. Should it be sendMessage(RequestBody) or just send(RequestBody) or just message(RequestMessage) ? Maybe instead of sendPing(ByteString) it should be just ping(ByteString) or even ping() ? #2904
websockets

Most helpful comment

For 5: I would like to expose the ping pong api.Because this is very helpful to keep your websocket connected and reconnect when meet with network issue or something else.The WebSocketListener onClose method isn't reliable in practice.

All 18 comments

  1. Sure
  2. Before onOpen only. We could send a "going away" close if it was opened but it becomes weird with interactions to WebSocket is canceling a reader close such that subsequent calls to WebSocket should ISE or do we treat it like a peer close where subsequent calls just IOE?
  3. No. Only on user cancel currently.
  4. Ideally we disallow writes on the reader thread entirely. currently you can write in onOpen and any listener callback
  5. We can hide the API iff we ship with auto-pings. i'm still willing to bet exposing it is the first issue filed after the release though.
  6. I think we decided listener because it was long-lived instead of a callback which is a single invocation
  7. Wasn't the initial version like this? I can't remember. I think it would ISE if you tried to do something before connected. My biggest concern is that no matter what the application has to wait for the open callback so why not prevent them from sending anything with the type system instead of with runtime checks.
  8. Probably since close() sends something as well.

Leaning more and more towards a no on 7.

A WebSocketCall is representation of a unit of work that you trigger once and the result of that work becomes WebSocket in the same way as you would Call and get Response.

I’ve got a different approach on 7. A Response is _almost_ a value object. It would be completely except that streaming body is an necessary optimization.

But a WebSocket is a service object. It’s closer to Call because it’s live and interactive. And so I think it’s interesting to contemplate having a single interactive thing rather than one that produces the other.

The type system consequences are super interesting. If you’re building a chat app you need a little state machine to move messages from the user to the websocket. Presumably this is tough work because you need to keep track everything until it has been acknowledged. But do you want to enqueue messages before the websocket is open? Maybe? And for us do we want to optimistically transmit such messages?

I’m flexible on this but I think it’s probably the most interesting API we’ve seen in a long time.

Answered without looking at other answers at all (to get that natural reaction)...

  1. Regarding maxRequestsPerHost - Sure.
  2. IMO, once onOpen() is called, WebSocketCall shouldn't do anything anymore. Therefore WebSocketCall#cancel() should only work before onOpen().
  3. No. Like above, I think WebSocketCall itself should be discarded once onOpen() is called.
  4. Ignoring.
  5. I would like ping/pong to be automatic and was actually thinking that could be an extension added to a later version. That said, I don't know how complex pings can get with some websockets. In Trello it's just empty messages, but maybe other websockets do send actual data in the pings, which would be harder to do automatically.

Also worth noting that, in the case of Trello's sockets, both sides are pinging each other constantly. So would this also include _responding_ to pings automatically?

  1. I like "listener" in this case because it's an ongoing connection. "Callback" seems more appropriate for a one-off operation.
  2. I've looked through my code and the one place that would be tricky to handle would be reconnecting logic. Suppose we decide to close the socket (e.g. a ping times out). We call WebSocket#close, but we have no expectation that WebSocketListener#onClose will ever get called (since hey it was a ping timeout in the first place).

Now, if we want to retry connecting to the sockets, we wait a couple seconds then initiate a new WebSocketCall. But meanwhile, it's still possible that we may (somehow) get WebSocketListener#onClose. So then we're in a circumstance where we're both connecting _and_ we got the message that it's closed.

Maybe the answer here is to have one unique listener per connection?

Regardless, I'm not opposed to the idea of combining them. It just means you need to add a lot more WebSocket#is* methods. I would probably also just throw an exception if you try to call message() or ping() before it is open.

  1. Ignoring.

So would this also include responding to pings automatically

Yep. The spec mandates this so it will always be the behavior.

Another thing about the Trello sockets - once you're connected to sockets (general), we then send messages which subscribe us to individual channels (specific). E.g., there's a channel for your current user or a channel for a board you might have open.

Those channel subscription messages currently have to be queued and then sent once onOpen() is called. So the idea of queueing messages for later sending is appealing in that regard.

But the way the subscription works we have to track our request and then match it with a later response. So it'd be difficult to wrangle if we _thought_ we sent a subscription request but it never actually got sent.

I wonder what it’d look like to hook up Tape to WebSockets. It’d get pretty involved due to a missing link between messages sent and messages received.

With a Request/Response model, you know your request was accepted when you receive a 2xx response. With WebSockets, you must devise your own application-layer scheme to acknowledge which messages were accepted. Ugly.

WebSockets don't seems appropriate for that use case. If you want a request/response model then you should be using http/2 or at worst a keep-alive connection and definitely not web sockets.

The API looks pretty good but I think some of the semantics around building a reliable sender and receiver were a little tricky.

It seems like the API handles reading the socket in it's own thread then sends messages to the WebSocketListener. I assume by the comment "Start a new thread or use another thread in your application" that it is my responsibility to handle the send side. I created my own looper for doing sends. It's pretty tricky to test all the scenarios for proper shutdown in this case. If the API provided a simple way for me to pass in my WebSocket so that all the sends are performed in their own thread and it handled all the shutdown code, that would make using the API a lot simpler.

It's pretty tricky to test all the scenarios for proper shutdown in this case.

I'm not convinced this is true. Proper shutdown happens in only two ways:

  • The client desires it and calls close() with no further writes. The listener eventually receives onClose.
  • The listener receives onClose indicating writes should stop. We auto-reply acknowledging the close on your behalf.

Improper shutdown also happens in two ways, except if you handle the proper shutdown case correctly these come for free:

  • Writing a message or ping throws an IOException. You're obligated to try and call close() and stop writing.
  • The listener gets called with onFailure on indicating writes should stop.

If the listener receives onClose or onFailure future writes to the WebSocket will throw IOException which gets handled by the first point.

So this really boils down to one thing: try/catch your writes and call close if they throw. Everything else comes for free. If you want, you can do a best effort and stop writing when onClose or onFailure happen but they'll quickly error out anyway.

That leads me to

If the API provided a simple way for me to pass in my WebSocket so that all the sends are performed in their own thread and it handled all the shutdown code, that would make using the API a lot simpler.

The API would be more simple, but it wouldn't be easier to use.

In this queued write system how do you know which write failed? What happens to queued data that wasn't written? What do we do with queued data when the listener receives onClose or onFailure? How large is the buffer in the queue? Do we let the application continually enqueue messages faster than they can be processed and eventually OOM?

If we adopted a queued system the writing API would not change at all, but we'd now need to add APIs to support all these cases:

  • A callback or mechanism for backpressure when the web socket is slower at sending than the application is at queueing.
  • A callback mechanism to indicate when each message was successfully sent.
  • A querying mechanism so unsent queued messages can be retrieved and potentially retried on a reconnect.

This actually makes the API more complex and makes writing a correct client more difficult. Your application requirements have to reconcile the assumptions that we've made about how you want to write to web sockets and they won't always match.

Web sockets _are_ complicated to implement and we do force you to make decisions at the application layer. How you want to handle queueing, successful message sends, and retrying the connection is hard to implement inside the library to appease everyone.

I think we need WebSocket.isClosed() though to differentiate between sever-closed sockets that cause IOExeption and normal network errors.

Thanks, for the details. As for cleanup, I think I managed to get most of what you described covered in my client implementation (I just wasn't sure this was all I needed). Having the process you described above saved in some of the more permanent documentation would be nice.

You've made a convincing argument that management of the write side should be left to the implementer.

What struck me as strange about the API is that an established websocket connection is a symmetric pipe, but the API interface is asymmetric (one where the read side appears managed and the write is not). Wouldn't many of the same arguments about more nuanced management apply to the read side as well?

I like the WebSocketListener.onMessage() approach. I think it's better than something like having a blocking WebSocket.receiveMessage() that would only provide framing. But having an unmanaged receive does seem better matched to the WebSocket.sendMessage().

After writing a little experiment code some observations:

  • [ ] We don’t have a violent way to break the socket and free it's resources. I think we need one for poorly behaving servers.
  • [ ] Closing a WebSocket cleanly is awkward. There’s close vs. cancel depending on lifecycle state, plus synchronization with any other threads that are writing or closing.

I propose we always fully buffer WebSocket messages. The consequences:

  • A listener that receives onMessage() can return immediately without reading the response body. The ResponseBody the caller holds is buffered!
  • A writer never blocks writing a message. It’s merely enqueued to be written later, with some limit on how many bytes or messages can be enqueued concurrently.

I’m proposing this behavior even though I know it’s weird. The reason is that message receipts are the application layer’s concern. If the client sends something that’s intended to be durable, the client already needs to hold that thing until a receipt is received. Blocking until the bytes have been written to the socket doesn’t confirm anything.

I’m happy with where this is now.

I still think most people are going to end up writing code to keep track of the state of the websocket in a way that could be provided by the API itself because of this problem.

yeah, for sure. I think I’d like to write a sample to exercise that case, which may motivate expanding the API.

For 5: I would like to expose the ping pong api.Because this is very helpful to keep your websocket connected and reconnect when meet with network issue or something else.The WebSocketListener onClose method isn't reliable in practice.

Was this page helpful?
0 / 5 - 0 ratings