Caddy: Strip path for reverse proxied requests

Created on 15 Apr 2020  路  25Comments  路  Source: caddyserver/caddy

Background

The reverse_proxy directive does not strip the path of the request before forwarding the request upstream. For example, in the following configuration a request to example.com/foo/bar will result in an upstream request of upstream:80/foo/bar (rather than upstream:80/bar).

example.com {
  route /foo* {
    reverse_proxy upstream:80
  }
}

At least from my experience, I quite often want the path to not be included in the upstream request, as the upstream server is a microservice which is not aware that it is being reverse-proxied. A good example given in #2813 (where this issue originated) is using phpMyAdmin at the URL /phpmyadmin. The uri directive can be used to achieve this and the configuration can be changed to something like this.

example.com {
  route /foo* {
    uri strip_prefix /foo
    reverse_proxy upstream:80
  }
}

Issue

As this seems to be a relatively common pattern, I imagine configurations like the following will become quite commonplace.

example.com {
  route /api {
    route /v1* {
      uri strip_prefix /api/v1
      reverse_proxy oldapi:80
    }

    route /v2* {
      uri strip_prefix /api/v1
      reverse_proxy newapi:80
    }
  }

  route /docs* {
    uri strip_prefix /docs
    reverse_proxy docs:80
  }

  reverse_proxy website:80
}

Here there is a lot of duplication of paths, i.e. /docs is written twice. I would like to avoid this to make the configuration more succinct and because it could very easily lead to errors being accidentally introduced. For instance, if I change /docs to be /documentation I may forget to update uri strip_prefix /docs to uri strip_prefix /documents. In a more complex config like this one you can see how there may be more subtle errors with the api/ path as there is nesting too.

Ideas

I suggested a few ideas in #2813 and @francislavoie made some good points, also mentioning that in a lot of cases a subdomain may be preferred. Ideas that I have come up with are...

  1. Change the default behaviour of reverse_proxy to strip out the path. Then you could simply add it back in by changing reverse_proxy upstream:80 to reverse_proxy upstream:80/foo. However, @francislavoie pointed out that this could be a confusing change from the current behaviour in Caddy (both v1 and v2) and may be unintuitive to users who expect the path to be preserved.

  2. Add an option to the reverse_proxy directive to strip the path something like in the following example. However, this seems to go against the good principle that the reverse_proxy directive should not be involved in URL rewrites.

route /foo* {
  reverse_proxy upstream:80 {
    strip_path
  }
}
  1. Introduce a new placeholder (for instance {route} perhaps) which would return the current path matched by Caddy. Then it could be used like so to remove the duplication. However, I'm not entirely sure that this expresses clearly what we are trying to achieve.
route /foo* {
  uri strip_prefix {route}
  reverse_proxy upstream:80
}
  1. Similar to ideas 2 and 3, create a new option in the uri directive that allows the currently matched route to be stripped from the path like in the following. It could also take an optional argument of a maximum number of path segments to strip. However, this increases the complexity of the uri directive with more options.
route /foo* {
  uri strip_route
  reverse_proxy upstream:80
}

I'd be really interested to find out what use cases exist for this and whether there are any other ideas how to do this 馃挕

discussion feature request

Most helpful comment

I have a working implementation in #3281 I think, please take a look! 馃槃

All 25 comments

I think your suggestions 1 and 2 aren't viable for the reasons you mentioned.

I think 3 and 4 are worth exploring some more.

I think the naming of {route} and strip_route are somewhat misleading because we specifically care about the route prefix, i.e. we'd need to grab only the part before *.

If using a placeholder, with your example with /api/v1 being nested routes, the route placeholder would need to be additive each time a route is matched. I think this could work, but it would need a bit of exploration. FYI the internal placeholder name would probably something like {http.handlers.subroute.route_prefix} or something to that effect, and we could provide a shorthand for the Caddyfile

This would also only work if we have simple path matchers, and it wouldn't work with any other kinds of matchers, including path_regexp. That might make it feel a bit brittle.

I think the naming of {route} and strip_route are somewhat misleading because we specifically care about the route prefix, i.e. we'd need to grab only the part before *.

Yep, I agree that's a good point. Alternatively, perhaps it would be an idea to refer to *s within the route in a similar way as you can refer to regex matching groups with $1, $2, etc. Then you would be able to do something like this (where strip_all perhaps strips the entire path).

route /foo* {
  uri strip_all
  reverse_proxy upstream:80$1
}

This could also potentially work with other matchers. There's probably a much nicer syntax for this but could this be a feasible idea?

Edit: maybe this is a bit clunky on second thoughts... it seems the way NGINX solves this is by the presence (or lack of) a trailing slash in the upstream URL which I think is even more confusing so clearly this is not easy to resolve 馃槥

I haven't had time to catch up on this whole conversation yet -- I will sooner or later -- but does rewrite * / or uri strip_prefix {path} not work? (After routing the request to the proper reverse proxy, of course.)

Edit: Oh, you just want to strip a prefix, right?

So what about something like handle but where it takes only a path prefix (not a matcher, very important and confusing distinction, but, let's roll with this for a moment):

handle_path /v1 {
    reverse_proxy oldapi:80
}
handle_path /v2 {
    reverse_proxy newapi:80
}
handle_path /docs {
    reverse_proxy docs:80
}
reverse_proxy website:80

handle_path would strip the prefix from the path and then evaluate the handlers inside it just like handle does. So the only difference is the matcher and an implicit URI transformation.

This is similar to the idea of a switch or map as well, which we're in talks to build for v2.1... but I haven't figured that one out yet.

(I suppose one idea to make that path argument less confusing is to require that it be suffixed with * like a path matcher, but I don't want people thinking they can use any matcher there, that wouldn't make sense... hmmm...)

@mholt I like that idea. I'll try to implement it real quick. Should be easy.

Edit: Bah. Ran into problems with import cycles, not sure how to make an Rewrite struct from builtins.go, so I'll come back to this later when I get help 馃槃

@mholt @francislavoie Thanks for all the ideas and work on this!

Also in answer to the previous question... yes that's right I just want to strip the prefix (the prefix being the bit before * in the /foo* route)

I have a working implementation in #3281 I think, please take a look! 馃槃

I'd like to share my own use-case for a feature like this here, as well. I have an old path /docs which I moved to a subdomain. So I'm doing this:

example.com {
  #...
  redir /docs* https://docs.example.com{uri}
}

Although that {uri} obviously would also contain the /docs prefix, which I'd like to avoid.

This is definitely something we'll address for 2.1.

Apparently nginx does not automatically strip the path it matches in a location block before proxying -- anyone know what the equivalent desired nginx config is like? Although I think @francislavoie's PR is a pretty good implementation, I just want to make sure we take inspiration if possible.

Thanks @mholt for all your ideas and time (also especially to @francislavoie for the great PR). I'd be really happy to help out with anything, particularly as I think this is an important feature.

I get really confused with the NGINX way of doing this so I might get this slightly wrong but here's a shot, the relevant documentation I'm quoting is found here.

If the proxy_pass directive is specified with a URI, then when a request is passed to the server, the part of a normalized request URI matching the location is replaced by a URI specified in the directive:

location /name/ {
    proxy_pass http://127.0.0.1/remote/;
}

Essentially, the path that is matched in the location will be automatically stripped. For instance, for /name/zak, the "part of a normalized request URI matching the location" (i.e. /name/) "is replaced by a URI specified in the directive" (i.e. http://127.0.0.1/remote/), so /name/zak becomes http://127.0.0.1/remote/zak. Note that this is not generally true for location blocks, only in the case of reverse proxying.

If proxy_pass is specified without a URI, the request URI is passed to the server in the same form as sent by a client when the original request is processed, or the full normalized request URI is passed when processing the changed URI:

location /some/path/ {
    proxy_pass http://127.0.0.1;
}

Here's where it gets tricky... http://127.0.0.1 is technically not a URI because there is no trailing /! So in the request /some/path/zak, NGINX doesn't know where to attach the zak on to so all it can do is proxy to http://127.0.0.1/some/path/zak, keeping the original path intact.

Whilst you can see the reasoning behind this, I think that to someone writing a config the addition of a trailing / changing the behaviour in this way could seem pretty arbitrary.

In summary... location blocks do not strip the matched path in general afaik. However, as a special case, NGINX always tries to strip the path when reverse proxying. However, in some cases like the one above (there are more too in the docs!) it can't work out what to strip so it leaves the whole thing.

I think Caddy can definitely take inspiration from making the path really easy to strip, but hopefully in a way that is more intuitive. I assume from NGINX actually making this the default that it is the more common use case so maybe Caddy should even consider doing this too? I also really like the PR as it tries to solve this in the more general case, I left some ideas there which are more loosely inspired by NGINX.

Thank you again for the response to this issue, I hope this is helpful 馃槃

P.S. please feel free to correct me if I've got anything wrong!

Ah, emphasis on:

If the proxy_pass directive is specified with a URI,

So, if a URI is given, then the path is reworked, otherwise it is left alone.

Maybe a reverse_proxy_path or reverse_proxy_rewrite is in order too?

So, if a URI is given, then the path is reworked, otherwise it is left alone.

Exactly, although there are a few more cases given in the docs under which it is left alone (e.g. when the location path is a regular expression).

Maybe a reverse_proxy_path or reverse_proxy_rewrite is in order too?

What would be the behaviour of this, I'm assuming it is specific to reverse proxying?

Whilst different from the NGINX approach, I like the idea of addressing this in the general case as it solves cases like the one described by @codecat too. Perhaps stripping is something that could even be added to the path matcher? Alternatively, maybe capture groups could be extended to regular paths so you could write something like

handle (/foo)* {
  uri strip_prefix {http.path.name.capture_group.1}
  reverse_proxy upstream:80
}

although this may not be too intuitive 馃

What would be the behaviour of this, I'm assuming it is specific to reverse proxying?

I was thinking those would just always strip the prefix it matches on, then proxy. So it would still be a one-liner, as opposed to handle_path which needs 2 lines.

This request looks like https://github.com/caddyserver/caddy/issues/95 that was fixed in Caddy 1. Why not use the same solution?

This request looks like #95 that was fixed in Caddy 1. Why not use the same solution?

Because Caddy v2 is a complete rewrite and does not work the same way as Caddy v1. The solution doesn't apply here. We don't want the reverse_proxy directive to handle path rewrites, those are separate concerns.

Is there a current workaround?

Yup, as explained in the OP, use something like this:

  route /foo* {
    uri strip_prefix /foo
    reverse_proxy upstream:80
  }

Thanks

Would that workaround also work for redirects instead of reverse_proxy?

Probably unnecessary. At this point, this thread is getting derailed. If you have usage questions, please ask on https://caddy.community.

I was thinking those would just always strip the prefix it matches on, then proxy. So it would still be a one-liner, as opposed to handle_path which needs 2 lines.

That would be cool and a lot simpler than what I suggested! Do you think there is a nice way to generalise this to other directives? I think redirects are also a pretty good example but maybe there are more.

Do you think there is a nice way to generalise this to other directives?

That's what handle_path is.

I think a decent directive naming for the proxy shortcut could be subpath_proxy. We can drop the word reverse cause we can just document that it's a shortcut for reverse_proxy in a specific usecase.

Could this perhaps be something that's added for all directives as a one-liner alternative to handle_path? So something like reverse_proxy subpath /foo http://example.com/ as well as redir subpath /foo http://example.com/.

Not a fan. Adds too much complexity to the parsing.

Fair enough 馃槃

Was this page helpful?
0 / 5 - 0 ratings