Httpx: Implement SOCKS v4, v5 Proxy

Created on 13 Aug 2019  Â·  15Comments  Â·  Source: encode/httpx

Related: #36

help wanted

Most helpful comment

If it's relevant, there is 3rd party SOCKS implementation: httpx-socks

All 15 comments

So the issue with this is that there's no sans-I/O implementation of the SOCKv4 and SOCKSv5 protocols that doesn't require us to add 3 dependencies. The protocols are so simple that I'm actually in favor of writing our own library that has no dependencies.

I'm happy to lend a hand on this @sethmlarson. Does it depend on #259?

Thanks @yeraydiazdiaz! :heart:

It'll definitely intersect on the configuration stage on the client but the dispatcher implementation is separate, let's start by getting a sans-I/O implementation of SOCKSv4 and v5 w/o dependencies and go from there.

I actually think that the sans-I/O implementation should be it's own library, maybe on the python-http org, but it can start on one of our personal accounts. Would you like to be the originator or should I create a repo and add you and you can take it from there?

I'll definitely need some help so it might be easier if it's all setup in non-personal repo from the start 🙂

I pushed the initial commit: https://github.com/sethmlarson/socks
Feel free to make massive changes as nothing currently works E2E, it's just a result of me programming a few hours.
I've sent collaborator requests to everyone interested. :)

The repo will live under python-http once we release for the first time!

while waiting for the implementation, this is a temporary alternative for people who want to use socks4/5 with httpx by using pysocks :

# pip install PySocks

import httpx
import socks
import socket

socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket

URL = 'http://ifconfig.me/ip'

with httpx.Client() as client:
    resp = client.get(URL)
    print(resp.text)

If it's relevant, there is 3rd party SOCKS implementation: httpx-socks

It looks like that one doesnt support trio (my favorite async backend).

It looks like that one doesnt support trio (my favorite async backend).

Trio support added in version 0.2.0

@tomchristie

I wrote a small PoC with httpcore + PySocks with requesting example.com through the local socks5 proxy.

Working with the PoC, I added new method to AsyncioBackend named open_socks_stream (I decided that socks4/5 is a transport for us like tcp, ssl or uds).

If this idea (PySocks and a new method in the backends) works for you, I can start adding socks proxies support to httpcore

@cdeler From what I understand, SOCKS is an application-level protocol that sits on top of TCP (well, there's UDP in SOCKS5, but that's not something we should be thinking about for now), so in theory we shouldn't really need a new type of open_* method on concurrency backends.

Also, since I don't think it's been linked to here yet — @yeraydiazdiaz had started a lovely piece of work on HTTPCore already a few months back, based on the socksio library: https://github.com/encode/httpcore/pull/51. Benefits of socksio is that it's sans-I/O, meaning that we can use it either with sync or async, just like h11 and h2 for HTTP/1.1 and HTTP/2.

So perhaps, if anyone's interested, there'd be room for getting that work up to date. I personally think socksio and the sans-I/O approach is our safest bet there if we want to have an as-simple-and-straightforward-as-possible implementation. :-)

@florimondmanca
Whoops, I lost that there is a PR in progress...

On one hand you are right, proxy works on L7, but it is a transport for us...

Well in terms of pysocks, this library provides us with a socket-like interface which wraps a connection

You can check what I've done here: https://github.com/encode/httpcore/pull/186 (I've created a PR just to show an idea)

I'd like to have a chance to check https://github.com/encode/httpcore/pull/51 :-) Thank you for the advice

Update: looks like @yeraydiazdiaz has the same problem as me with https connection

@florimondmanca you are right, socksio is really better, since it allows us implement the code on connection-level. I closed https://github.com/encode/httpcore/pull/186 (with PySock) and opened https://github.com/encode/httpcore/pull/187 (socksio) draft

I wonder what we should do with socks4. It enforces us to process nslookup on our side (as socks4 connect do not accept domain names).

Lets imagine that someone wants to access "google.com" through socks4 proxy. What should we do there? To raise a ProxyError with "SOCKS4 protocol do not support domain names as a host address" or make nslookup on our side (does anyone know good nslookup libraries for async?) ?

SOCKS4 protocol do not support domain names as a host address

Some socks5 servers also don't support DNS resolving. On the other hand, some socks4 servers support it.

does anyone know good nslookup libraries for async

See how it is implemented in python-socks which httpx-socks is based on. For asyncio backend you can also use aiodns.

Was this page helpful?
0 / 5 - 0 ratings