Hi,
I am trying a simple get request to a https-url which goes through a proxy (http proxy).
Now the issue is that this proxy blocks connection attempts which have no user-agent set, and for whatever reason requests does not set one in the CONNECT request and I cannot seem to force it to do so.
Going through the same proxy using e.g. curl or wget works as expected as they do set one (same for browsers).
I couldn't find this having been reported before but couldn't find a solution to it either.
python: 2.7.9
requests: 2.4.3
python-ndg-httpsclient: 0.3.2
python-openssl: 0.14
python-pyasn1: 0.1.7
Hey, that's weird.
I suppose that makes a degree of sense though. Can you try this?
import requests
from requests.adapters import HTTPAdapter
class ProxyUAAdapter(HTTPAdapter):
def proxy_headers(self, proxy):
headers = super(ProxyUAAdapter, self).proxy_headers(proxy)
headers['User-Agent'] = requests.utils.default_user_agent()
return headers
s = requests.Session()
s.mount('http://', ProxyUAAdapter())
s.mount('https://', ProxyUAAdapter())
# Make requests through the session, e.g. s.get(url)
Can you confirm that this does in fact work?
Hi,
Yes that does indeed work, thank you very much!
I am wondering if this is expected behaviour and if I am supposed to implement it like that or if this is a bug ?
My guess is that it should use the same user-agent it does for the rest of the connection also for the CONNECT part. What do you think?
I am wondering if this is expected behaviour and if I am supposed to implement it like that or if this is a bug ?
Heh, that's difficult. I'd say that the best way to describe this is that it's an oversight. The extension hooks are in place to do this, but I agree that the logic seems to be wrong here. One thing I do want to check is what happens with a plaintext HTTP request. Are you skilled enough with tcpdump or wireshark to snoop your HTTP requests?
Yes I am, just tell me what you want me to check and I can do that.
You are basically interested in a GET via proxy ?
Yeah, I want a GET to a http:// website via the proxy, and then the tcpdump/wireshark of that. What I'm specifically worried about is having two different user-agents in the request.
I sent you the pcap file via e-mail.
Alright, based on a really quick look at the pcap file this seems to behave mostly right: that is, we don't appear to send two user-agent headers.
However, it doesn't work well with our user-agent override. If you override the user-agent header from the CLI, the proxy gets the requests default user-agent. That sucks a little bit. It would be nice if we could adjust the code to use the user-agent provided by the user. Unfortunately, the Transport Adapter is potentially a bit low-level for that.
I suppose the proxy_headers function could be passed the headers on the request and could search for a user-agent header. I don't like that much though.
:+1:
Thanks for the review.
I am wondering why doing that in the proxy_headers function would be bad idea?
Do you think of any better solution ?
Two reasons.
Firstly, it changes the signature of the proxy_headers function, which represents an API change. That hurts people subclassing the adapter if they've overridden that method, which means we'd need to defer the change to 3.0.0.
Secondly, the proxy_headers function really shouldn't need to know that information.
I wonder if we can just pass the scheme instead and use that. That doesn't avoid problem 1, but it does restrict the scope of problem 2.
So I've been looking at this for 2.9.0, and I just can't think of an API change that doesn't wreck the API of the transport adapter. So I think we need to move this to 3.0.0.
As an interim solution, I created PR https://github.com/requests/requests/pull/4794 to allow headers to be passed in without breaking the API.
I have the same issue using pip behind a proxy. It blocks the CONNECT because there is no USER_AGENT.
CONNECT download.pytorch.org:443 HTTP/1.0
Proxy-Authorization: Basic
HTTP/1.1 403 Forbidden...
I submitted a PR #4794 that fixed this a year ago but the maintainers would rather wait for 3.0 to close this issue, which has been open for over 4 years.
It may just be me, but I'd rather have a "hacky" fix to a real problem instead of waiting half a decade to address the issue. Perfect is the enemy of good.
Most helpful comment
Hey, that's weird.
I suppose that makes a degree of sense though. Can you try this?
Can you confirm that this does in fact work?