Caddyfile:
:80
@api {
path_regexp api ^/api/(.*)$
}
rewrite @api {http.regexp.api.1}
reverse_proxy @api localhost:1234
I'm running Caddy 2.0.0b15 using docker (but I don't think this problem is docker-related):
docker run --rm -p 8080:80 -v `pwd`/Caddyfile:/etc/caddy/Caddyfile caddy/caddy:v2.0.0-beta.15-alpine
Now I run curl http://localhost:8080/api/abcd and I get 502 Bad Gateway. That's ok, we aren't running anything on localhost:1234 but we see that Caddy is doing reverse proxy.
Now we add encode gzip to the Caddyfile and repeat the previous curl. The result is http status 200 instead of 502 Bad Gateway
I hope this is enough to reproduce the bug
Thanks for the report -- I do think there's a bug here, but it's not quite what it seems.
The above config yields this JSON:
{
"apps": {
"http": {
"servers": {
"srv0": {
"listen": [
":80"
],
"routes": [
{
"match": [
{
"path_regexp": {
"name": "api",
"pattern": "^/api/(.*)$"
}
}
],
"handle": [
{
"handler": "rewrite",
"uri": "{http.regexp.api.1}"
},
{
"handler": "reverse_proxy",
"upstreams": [
{
"dial": "localhost:1234"
}
]
}
]
}
]
}
}
}
}
}
Adding encode gzip to the config yields this JSON:
{
"apps": {
"http": {
"servers": {
"srv0": {
"listen": [
":80"
],
"routes": [
{
"match": [
{
"path_regexp": {
"name": "api",
"pattern": "^/api/(.*)$"
}
}
],
"handle": [
{
"handler": "rewrite",
"uri": "{http.regexp.api.1}"
}
]
},
{
"handle": [
{
"encodings": {
"gzip": {}
},
"handler": "encode"
}
]
},
{
"match": [
{
"path_regexp": {
"name": "api",
"pattern": "^/api/(.*)$"
}
}
],
"handle": [
{
"handler": "reverse_proxy",
"upstreams": [
{
"dial": "localhost:1234"
}
]
}
]
}
]
}
}
}
}
}
You can see how in the second config, after the rewrite happens, the reverse_proxy matcher no longer matches the request, because it was rewritten. Hence, no proxying happens. In the first, the rewrite and proxy are grouped together in the same route.
So, if there is a bug here, it is in how the Caddyfile adapter groups directives by matchers...
In the meantime, another way to write the intended config is this:
:80
encode gzip
@api {
path_regexp api ^/api/(.*)$
}
route @api {
rewrite {http.regexp.api.1}
reverse_proxy localhost:1234
}
In one respect, your initial config above shouldn't work like you think, because once the request is rewritten, the reverse_proxy doesn't match the @api matcher anymore. So, arguably, the bug (or one bug) is in the config. The fact that the first config works as intended is more due to chance than anything else: because the two directives shared the same matcher, they were grouped into the same route, even though the first handler changes the request so it doesn't match anymore. By adding a new directive that comes between rewrite and reverse_proxy, they can't be grouped into the same matcher anymore, hence it breaks.
But, I agree that adding encode gzip should not so drastically change the behavior of the server.
Does that make sense?
Happy to hear your thoughts on this.
Thank you very much for your explanation!
I'll use the route to fix the problem.
The only thing I don't get is why this Caddyfile:
encode gzip
@api {
path_regexp api ^/api/(.*)$
}
rewrite @api {http.regexp.api.1}
reverse_proxy @api localhost:1234
is not the same as this one:
encode gzip
@api {
path_regexp api ^/api/(.*)$
}
handle @api {
rewrite {http.regexp.api.1}
reverse_proxy localhost:1234
}
From my understanding (probably I'm missing something) the handle is the explicit way of grouping directives in a block, but writing each directive with a matcher token should have the same effect.
I expect the pipeline of transforming the Caddyfile into JSON to be something like:
Looking at the JSON representation of the previous Caddyfiles:
Caddyfile without
handle
{
"apps": {
"http": {
"servers": {
"srv0": {
"listen": [
":80"
],
"routes": [
{
"match": [
{
"path_regexp": {
"name": "api",
"pattern": "^/api/(.*)$"
}
}
],
"handle": [
{
"handler": "rewrite",
"uri": "{http.regexp.api.1}"
},
{
"handler": "reverse_proxy",
"upstreams": [
{
"dial": "localhost:1234"
}
]
}
]
}
]
}
}
}
}
}
Caddyfile with
handle
{
"apps": {
"http": {
"servers": {
"srv0": {
"listen": [
":80"
],
"routes": [
{
"match": [
{
"path_regexp": {
"name": "api",
"pattern": "^/api/(.*)$"
}
}
],
"handle": [
{
"handler": "subroute",
"routes": [
{
"handle": [
{
"handler": "rewrite",
"uri": "{http.regexp.api.1}"
},
{
"handler": "reverse_proxy",
"upstreams": [
{
"dial": "localhost:1234"
}
]
}
]
}
]
}
]
}
]
}
}
}
}
}
I see in the second caddyfile the rewrite and proxy_server are grouped inside a subroute and that makes the trick.
IMHO I don't see the case where two directives with the same matcher shouldn't be grouped together (as an implicit route) but I'd like know if there is any reason :)
@masipcat
The only thing I don't get is why this Caddyfile: ... is not the same as this one: ...
(In the second one, did you mean to use route like my workaround Caddyfile did? Or handle specifically?) The short answer is, because in the first config, you use the @api matcher twice, but the same request that matched it in the first directive (rewrite) did not match it by the time it got to the second directive (reverse_proxy), because rewrite rewrote it.
Indeed, this is not obvious behavior. It's probably the number-1 hardest-decision I've had to make regarding the config: whether all matchers are evaluated before any handlers are invoked, or whether they get invoked procedurally as you evaluate handlers down the chain. I ended up choosing the latter because it is more powerful and useful I think. However, that's something that affects the underlying JSON config more than anything. Technically, I guess, we could enforce either with the Caddyfile. It is just easier to do the same behavior as the JSON, and more consistent too.
I expect the pipeline of transforming the Caddyfile into JSON to be something like:
- Grouping directives by matcher
- In each group/route, sort directives following the directive order
That's actually closer to how nginx config works! But here's something cool, the Caddyfile can do this too!
Here's the thing: there are two primary ways to express routing configuration.
**1. Routing logic on the "outside" (routing first) / Handler logic on the "inside"
This is an oversimplification, but: In paradigm 1, handlers are grouped by matchers and then put in order. In Paradigm 2, handlers are put in order and then grouped by matchers.
Both approaches have pros/cons depending on what you need to do.
Your first config uses paradigm 2. My proposed workaround uses paradigm 1. (Can you see the difference?)
Caddy v1 only supported paradigm 2. This is great for simple configs, or configs where you want to compose a lot of smaller handlers.
NGINX only supports paradigm 1 (I think) -- in the form of location blocks. I have much less NGINX experience, so I could be wrong. But I think that's generally how it works.
Caddy v2 supports both! Depending on what you need to do.
Your case is a good example of how a minimal config can easily break expectations with a small, unsupposing change.
I see in the second caddyfile the rewrite and proxy_server are grouped inside a subroute and that makes the trick.
Yep! And that's because of the difference in the two paradigms. Either directives are ordered and then grouped, or grouped and then ordered. In the first, it's easy for adding a directive to break the groupings because of how it changes the order.
So, Caddy 2 is actually very powerful and expressive while having a simple syntax. We're still learning about the implications of these design choices while we're in beta, and this is a good case study.
The more I think about this, the less I think it is a _bug_ per-se, but the more I think we need to point this out. I'd like to avoid footguns like this in the Caddyfile.
A few options are:
rewrite directives in groups with other directives (but this assumes that only rewrite changes requests' classifications, or matchings)Let me know if that made sense. Seems like you're getting it so far though!
The only thing I don't get is why this Caddyfile: ... is not the same as this one: ...
(In the second one, did you mean to use route like my workaround Caddyfile did? Or handle specifically?)
I meant to use it like the route workaround. I was trying different combinations with and without route/handler in the Caddyfile and checking the transformed JSON, and I ended posting the handler...
Here's the thing: there are two primary ways to express routing configuration.
- Routing logic on the "outside" (routing first) / Handler logic on the "inside"
- Handler logic on the "outside" / Routing logic on the "inside" (routing second)
This explains a lot! Now I see this isn't a bug and is the intended behavior.
The more I think about this, the less I think it is a bug per-se, but the more I think we need to point this out. I'd like to avoid footguns like this in the Caddyfile.
Agree! I think would be great to have some route/handler examples in Caddyfile Tutorial
Glad that helped!
Now I see this isn't a bug and is the intended behavior.
_Almost_. I mean, it's difficult to say. I think it is a bug since in my attempt to simplify the JSON output, I ended up rolling rewrite directives into the same group as other directives that share the same matcher, even though rewrite should* (see footnote) cause the next handler to not match.
I may see about pulling rewrites out of the groupings when this simplification is performed... but that means I'd have to assume that just rewrite (and maybe a few other directives) perform rewrites. And even worse, some matchers don't even rely on the request at all (in theory, you could matchers for date/time)! Maye I should just disable this simplifying/consolidation step entirely.
I also think having a linter that warns when a rewrite shares the same matcher as another directive would be useful, since that would likely be a bug.
* It is up for debate whether rewrite should work this way, but I think it should.
@masipcat I've "fixed" this in 89124aa, although it's a bit hacky: basically, we prevent folding other handlers into the rewrite's route if it happens to be adjacent and share the same matcher. (Those two conditions are required in order to trigger your edge case.) What this patch does is it keeps all "rewrite" handlers in their own route by giving them a "group" name that is unique -- although typically groups are only useful if there is more than route per group. But, it prevents the route-folding logic from treating the two routes' prerequisites as equivalent and thus prevents them from being consolidated.
So now, rewrite should always be in its own route. Be aware though, that sharing a matcher like your original config did (the @api matcher) would match for rewrite, but then after the rewrite, will not match for reverse_proxy. That may not be obvious, but makes sense if we remember that rewrites are treated like internal redirects.
It's probably less confusing than changing the route logic entirely just by adding encode middleware. :)
(This fix applies only to the Caddyfile, which is where the confusing behavior was; the underlying JSON logic is still the same / clean.)
I suspect we'll encounter similar questions again later, so thanks for bringing this up! (For example, I imagine there will be other directives in the future that also change the semantics for matchers, since both handlers and matchers are extensible, who knows what we'll see!)
Thank you for the fix! I'll try it in beta 18