Go: proposal: net/http: handle Seek error in range request by sending entire file

Created on 23 Oct 2020  ·  6Comments  ·  Source: golang/go

In the io/fs proposal, I made the comment that if an fs.File did not support Seek then net/http couldn't serve it when presented by a range request. Many people pointed out that strictly speaking that statement is incorrect: it is always allowed for an HTTP server to respond to a request with a Range header by sending the entire content (the response headers clearly indicate whether ranges or the entire content are being sent).

That said, the net/http server behavior on Seek failure has been here for a long time, long before io/fs. While we _could_ handle a Seek failure by responding with full content, it is less clear to me whether we _should_. Hence this issue, to make sure the question is properly evaluated.

There are two cases where a Seek failure can happen:

  1. A single range has been requested. The Seek to that range fails.
    In this case, if we assume the Seek left the file offset at the start of the file, then we can send the whole thing.

  2. Multiple ranges have been requested. The first Seek succeeds but a later Seek fails.
    In this case, we cannot take back the partial chunks that have already been sent.
    It seems clear that we must report an error.
    A complicating factor here is that a File with a “best-effort” Seek might succeed only for “seek to where the file already is,” and if the first range is 0-N, then that first Seek might succeed even though the next one will fail.
    If we want to try to detect Seek failures before sending any ranges, so that full content can be sent instead, we could try Seeking to the second range, then to the first range, before sending the first range. One of those two should be a non-no-op Seek.

Clearly in both these cases (with the complication in the second), we could detect a Seek failure before sending any ranges and respond by sending the whole file.

On the other hand, I have vague memories of a browser-based PDF reader being _exceedingly_ slow against Go servers because it was making range requests and getting the full content back and discarding all but the ranges it had requested. (Maybe this was before we supported Range at all, or maybe our Range implementation was sending full content; I don't remember.) So the browser was downloading the same, entire large file over and over and over. Yes, the bug was on the client side, but even a correct implementation in that case (notice the full response and save it) would still just appear to be very slow (but not exceedingly slow) when it downloads an entire (say) 100 MB PDF to display just the first page. It may be that any reasonable server _should_ support range on files being served, and that sending the entire file as a pedantically correct response, instead of giving a clear error that would cause the server author to add Seek support, causes more problems than it avoids.

That is, if a client asks for 1 kB of a 1 GB file, it may technically comply with the spec to send the whole GB, but at the same time it may be a worse response than an error.

(The spec is written this way because clients have to deal with the possibility of an old server that has no idea what a range request is. But that doesn't mean Go pretending to be an old server is a good idea.)

I honestly don't know what's right here. What we _can_ do is different from what we _should_ do.
What does nginx or Apache do if a seek fails while serving a range request? What do other servers do?

/cc @bradfitz

Proposal

Most helpful comment

I vote error. If the client thinks the resource is large enough to request a Range, we should assume it is, and as you say, returning 4GB for a 4k request is bad. In-practice, IMO error is more often the better solution. And it is RFC-compliant to do so.

If this were common, I'd suggest a config option. But all modern filesystems can seek, so this should be rare. Configs for every rare error are untenable.

  1. Multiple ranges have been requested. The first Seek succeeds but a later Seek fails. In this case, we cannot take back the partial chunks that have already been sent. It seems clear that we must report an error.

Agreed. If the server has already started sending data, there's no correct answer except to terminate the connection. In theory, Go could do a full read until it gets to the point of the seek; in practice, that would rarely be fast enough to be useful. In practice, this error case seems even rarer.

What does nginx or Apache do if a seek fails while serving a range request? What do other servers do?

Apache Traffic Server (_not_ Apache HTTP Server aka httpd) will error if any mid-read error occurs (as required), and if any initial read error occurs, will fall back to the parent/origin. But as a Caching Proxy, file reads mostly don't apply.

Apache Traffic Control's own _Grove_ Caching Proxy will likewise treat read errors as not-in-cache. But it's written in Go, so the decisions here could potentially change that. 😄

Also, the Go authors probably know this, but for anyone reading who doesn't: I verified, net/http.ServeFile serves the ideal 416 Requested Range Not Satisfiable for ranges after the file size, i.e. when Seek returns 0 EOF. So none of the above applies in that case.

All 6 comments

I vote error. If the client thinks the resource is large enough to request a Range, we should assume it is, and as you say, returning 4GB for a 4k request is bad. In-practice, IMO error is more often the better solution. And it is RFC-compliant to do so.

If this were common, I'd suggest a config option. But all modern filesystems can seek, so this should be rare. Configs for every rare error are untenable.

  1. Multiple ranges have been requested. The first Seek succeeds but a later Seek fails. In this case, we cannot take back the partial chunks that have already been sent. It seems clear that we must report an error.

Agreed. If the server has already started sending data, there's no correct answer except to terminate the connection. In theory, Go could do a full read until it gets to the point of the seek; in practice, that would rarely be fast enough to be useful. In practice, this error case seems even rarer.

What does nginx or Apache do if a seek fails while serving a range request? What do other servers do?

Apache Traffic Server (_not_ Apache HTTP Server aka httpd) will error if any mid-read error occurs (as required), and if any initial read error occurs, will fall back to the parent/origin. But as a Caching Proxy, file reads mostly don't apply.

Apache Traffic Control's own _Grove_ Caching Proxy will likewise treat read errors as not-in-cache. But it's written in Go, so the decisions here could potentially change that. 😄

Also, the Go authors probably know this, but for anyone reading who doesn't: I verified, net/http.ServeFile serves the ideal 416 Requested Range Not Satisfiable for ranges after the file size, i.e. when Seek returns 0 EOF. So none of the above applies in that case.

@rob05c, your response mostly mentions what happens on "read error". What about seek errors? How do the servers respond to that?

And can someone check Apache or nginx? Thanks.

I said "read" but I meant "read or seek." ATS and Grove both behave the same on seek errors. Also to be clear, ATS behavior can be a file or a raw block device, and almost all large production deployments use block devices. In which case there's no difference.

I'm not really seeing any arguments in this issue _for_ sending the entire file.
In the absence of those, it seems like we should leave things as is.
Does anyone disagree?

Based on the discussion above (no one making a strong case to change the current behavior), this seems like a likely decline.

No change in consensus, so declined.

Was this page helpful?
0 / 5 - 0 ratings