Hi,
I've noticed that if I do something like:
http_client = https.AsyncClient(base_url="https://my-service/api/v1")
for further requests (e.g. /messages) /api/v1 will be stripped.
Is it expected behavior?
Tested on httpx 0.11.1
Thanks!
The docs are missing a specific reference to that, but you need to make sure any subpaths you're using don't have a leading slash, else they will be interpreted as absolute paths. There's a PR to update the docs for that: https://github.com/encode/httpx/pull/843
Personally I generally use path.lstrip("/") before I do http_client.verb(path, **kwargs) in my calls.
I’d personally like HTTPX to be smart enough to not strip any prefix path too, _and_ allow using a leading / on subsequent requests. Mostly because that’s the UX that test clients in web frameworks have been acustoming folks to. I’d consider the current behavior a bug, somewhat.
@florimondmanca Yeah I think that's a reasonable observation actually. It's not that our .join(...) behaviour is broken, but that we shouldn't quite be using URL joinging semantics for clients with a base URL.
But what of the case where someone _wants_ to clear the prefix? Should they create a new client, or perhaps pass an absolute_path=True arg to the client.verb call?
@StephenBrown2 Note that that sounds like a different use case/feature to me (that we don’t even support in the current state of things).
Just tested a client with a base_url will append without a leading slash, and replace with one, and was surprised that it was not what either of us were talking about:
Python 3.8.1 (default, Jan 10 2020, 09:36:37)
[Clang 11.0.0 (clang-1100.0.33.16)] on darwin
>>> import httpx
>>> http_client = httpx.Client(base_url="https://example.com/api/v1")
>>> append = http_client.get("endpoint")
>>> append.url
URL('https://example.com/api/endpoint')
>>> replace = http_client.get("/api/v2/new")
>>> replace.url
URL('https://example.com/api/v2/new')
>>> httpx.__version__
'0.11.1'
And confirmed in iPython for AsyncClient:
Python 3.8.1 (default, Jan 10 2020, 09:36:37)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import httpx
In [2]: async with httpx.AsyncClient(base_url="https://example.com/api/v1") as http_client:
...: append = await http_client.get("endpoint")
...: print(f"{append.url=}")
...: replace = await http_client.get("/api/v2/new")
...: print(f"{replace.url=}")
...:
append.url=URL('https://example.com/api/endpoint')
replace.url=URL('https://example.com/api/v2/new')
where I expected append.url==URL('https://example.com/api/v1/endpoint'), replace.url was as expected. So, a bug? This is not the case if the base_url has a trailing slash.
In [5]: http_client_ts = httpx.Client(base_url="https://example.com/api/v1/")
...: append = http_client_ts.get("endpoint")
...: append.url
...:
Out[5]: URL('https://example.com/api/v1/endpoint')
In [6]: replace = http_client_ts.get("/api/v2/new")
...: replace.url
...:
Out[6]: URL('https://example.com/api/v2/new')
Here's the result table:
| | base=/api/v1/ | base=/api/v1 |
| --: | :-: | :-: |
| path=/endpoint | /endpoint | /endpoint |
| path=endpoint | /api/v1/endpoint | /api/endpoint |
So either we document this (which I think is the proper solution), or "fix" it in some way.
To be clear, the _only_ usage I'd be expecting from base_url is…
>>> base_url = "https://example.com/api/v1" # Trailing slash is irrelevant
>>> client = httpx.Client(base_url=base_url)
>>> response = client.get("/endpoint") # Leading slash is *always* required
>>> response.url
URL("https://example.com/api/v1/endpoint")
(Right now this is wrongly yielding
URL("https://example.com/endpoint")
as you noted too.)
Formally: request.url = client.base_url.rstrip("/") + url — no need for any other smarts or advanced URL semantics.
From an UX perspective, this is the most sensible thing to do in my POV as it is an extension of the default case of using base_url="https://example.com" and eg .get("/api/v1/endpoint") (which we currently correctly implement).
Personally, now that I see the results, I think that "correct" trumps "surprising" in this case. There are three results currently:
base_url, be sure the base_url ends with a slash and the request path does not begin with one.The only case that is "surprising" to me at first glance is the last one, while the first two make intuitive sense to me and are what I would want. Though the last one makes sense after consideration, I currently work around it in the way I said above, with:
client = httpx.Client(base_url="http://example.com/prefix/") # Use a trailing slash
client.verb(path.lstrip("/")) # Enforce no leading slash
This also works with no base_url prefix:
path = "/endpoint"
client = httpx.Client(base_url="http://example.com") # Use no trailing slash
request = client.get(path.lstrip("/")) # Enforce no leading slash
request.url == URL('http://example.com/endpoint')
I'm just saying we shouldn't even have to _think_ about "what are the possible cases?". Set a base_url, it'll be prepended to all requested URL paths, with automatic handling of trailing slash madness. Wouldn't that be ideal? Also in line with what @dmfigol originally posted.
I'd expect to see just merge, no magic required:
base_url="https://my-service/api/v1" and .get("/messages") will result in https://my-service/api/v1/messagesbase_url="https://my-service/api/v1/" and .get("messages") will result in https://my-service/api/v1/messagesbase_url="https://my-service/api/v1/" and .get("https://my-other-service/api/v1/me") will result in https://my-other-service/api/v1/meSo then to follow:
base_url="https://my-service/api/v1" and .get("v2/messages") will result in https://my-service/api/v1/v2/messagesbase_url="https://my-service/api/v1" and .get("/api/v2/messages") will result in https://my-service/api/v1/api/v2/messagesrequiring a v1_client = httpx.Client(base_url=".../v1") and v2_client = httpx.Client(base_url=".../v2") is probably an acceptable compromise.
@StephenBrown2
/ (as in your first example) should be forbidden. Or naively result in https://my-service/api/v1v2/messages.base_url it's a sign we are talking to multiple services and so we should use multiple separate clients.As long as those restrictions are documented with proper error messages (and the naive case prevented), I'm good with that.
Milestoning to review for 1.0, since there's a potential behavioural change for us to introduce here.
I found this change in behaviour quite surprising; I expected urljoin() semantics and consistency with how a browser treats relative paths. ie: For a given base_url with a path, relative urls beginning with a / should resolve to a path at the server root and those without should be appended to the base_url.
I don't expect that my opinion effect any change, but I don't feel the current behaviour is adequately documented.
I don't feel the current behaviour is adequately documented
True, and if we have an open issue tracking adding further docs for this behavior: https://github.com/encode/httpx/issues/1139#issue-674423090
Most helpful comment
I'm just saying we shouldn't even have to _think_ about "what are the possible cases?". Set a
base_url, it'll be prepended to all requested URL paths, with automatic handling of trailing slash madness. Wouldn't that be ideal? Also in line with what @dmfigol originally posted.