Requests: redirect loop with manual host header

Created on 7 May 2017  路  9Comments  路  Source: psf/requests

I use this for a test server:
requests.get('http://1.2.3.4', headers={'host': 'domain.de'})
But the request will loop because the "host" header will be also used for the redirect.

Most helpful comment

As discussed, I'm open to changing it but I don't think it's high priority. I think I disagree with you a bit @sigmavirus24: the Host header is different to most other user-specified headers, and I'm ok with throwing away some set of user-specified headers when they're likely to be substantially confused if persisted across requests.

I think, basically, this is a low-priority concern for me, but if someone comes along and wants to change it I'd be ok with that happening. The change would have to go in 3.0 though: this behaviour, while potentially confusing, is not a bug and so can only be changed in 3.0.

All 9 comments

So, this is a tricky point. Generally, when the user sets a custom Host header we have a problem: to what hosts does that custom host header apply? What about redirects?

In this case, Requests takes the easy route out and just decides to use the custom Host header for everything. That's not necessarily the smartest choice but it is very easy to understand, and lacks subtle bugs caused by edge cases that are hard to follow. The workaround is simple: either (1) set allow_redirects to False and handle them yourself, or (2) add a hosts file entry instead of the Host overload. (Technically there is a (3), which is to override the resolve_redirects method of the SessionRedirectMixin to remove the Host header from the headers dict as appropriate.)

If we want to consider this a bug (and I'm not sure we do), then there are a few possible fixes:

  1. Throw away the Host header on any redirect that supplies a redirect location with a hostname. This is pretty unsubtle approach, frankly, but would probably handle 80% or more of the uses that users want. It's also in-line with our approach for handling the Authorization header.
  2. Allow users to specify a mapping of host to IP that is used before DNS lookups. This is essentially a fake, per-Session hosts file. curl provides this option. This is moderately more subtle and probably useful.

Thoughts from other contributors?

Tools like curl handle this very smart and the way I would expect it:

$ curl -vLs --header 'Host: mydomain.de' --header "TEST: yes" http://1.2.3.4
* Rebuilt URL to: http://1.2.3.4/
*   Trying 1.2.3.4...
* TCP_NODELAY set
* Connected to 1.2.3.4 (1.2.3.4) port 80 (#0)
> GET / HTTP/1.1
> Host: mydomain.de
> User-Agent: curl/7.51.0
> Accept: */*
> TEST: yes
>
< HTTP/1.1 302 Found
< Cache-Control: no-cache
< Content-length: 0
< Location: //www.mydomain.de/
<
* Curl_http_done: called premature == 0
* Connection #0 to host 1.2.3.4 left intact
* Issue another request to this URL: 'http://www.mydomain.de/'
*   Trying 1.2.3.4...
* TCP_NODELAY set
* Connected to www.mydomain.de (1.2.3.4) port 80 (#1)
> GET / HTTP/1.1
> Host: www.mydomain.de
> User-Agent: curl/7.51.0
> Accept: */*
> TEST: yes
>
< HTTP/1.1 200 OK
< Server: nginx
< Date: Sun, 07 May 2017 21:09:12 GMT
< Content-Type: text/html; charset=iso-8859-1

This has been our behaviour for years and has been surprising to a very noisy select few. I'm not sure it needs changing. Users who want different behaviour have absolutely every ability to do so, for example, by controlling the redirect behaviour with allow_redirects=False. That said, if others believe we should throw away the Host header specified by users on redirects, we need 2 things:

  • To throw away most of the other user-specified headers on redirects
  • To provide a way for users to opt back into the old behaviour.

I believe we've also documented why we encourage users not to specify a Host header, so if users run into this, they should be able to sort themselves out without opening issues for our assistance.

As discussed, I'm open to changing it but I don't think it's high priority. I think I disagree with you a bit @sigmavirus24: the Host header is different to most other user-specified headers, and I'm ok with throwing away some set of user-specified headers when they're likely to be substantially confused if persisted across requests.

I think, basically, this is a low-priority concern for me, but if someone comes along and wants to change it I'd be ok with that happening. The change would have to go in 3.0 though: this behaviour, while potentially confusing, is not a bug and so can only be changed in 3.0.

I would recommend turning off redirects with allow_redirects=False and using the Response.next property to get the next PreparedResponse to send to a requests.Session().

e.g:
```python
session = requests.Session()
response = session.get('http://blah', headers={'host': 'blah'}, allow_redirects=False)

request2 = response.next
del request2.headers['host']

r = session.send(request2)

P.S. This will only work with the latest versions of Requests available on PyPi

slight chance this won't solve your problem... but off the top of my head, it works.

It's difficult to understand锛宼he host set for the first request will affect the next requests.Can you fix this? @Lukasa
Especially when we use request to implement Single Sign On feature using client dns,it's confusing that request can't redirect to the thrid website.Because we use the client dns to decide which is nearest destination.

Was this page helpful?
0 / 5 - 0 ratings