Caddy: Path routing breaks FastCGI SCRIPT_NAME

Created on 29 Aug 2017  路  5Comments  路  Source: caddyserver/caddy

Hi. First of all, I apologize for the length of this issue report, as well as the fact that I did not follow the template. This is not quite a bug report as I am not confident that there is a bug in Caddy, but it does regard a behavior of Caddy that is unintuitive. As such, I had trouble fitting it in the template and gave up.

(I'd be happy to work on a PR for the issue if we did gain consensus that this was a bug.)

Basically, I'm trying to transition an existing PHP-based site over to Caddy. In NGINX, we might do something like this:
(Simplified, but mostly valid config.)

worker_processes 1;
events {
    worker_connections 1024;
}
http {
    server {
        listen 8000;
        location / {
            root /www/home;
        }
        location /forum {
            root /www/phpbb;
            location ~ \.php$ {
                fastcgi_pass php:9000;
            }
        }
        location /wiki {
            root /www/mediawiki;
            location ~ \.php$ {
                fastcgi_pass php:9000;
            }
        }
    }
}

What is intended to happen is pretty simple, /forum will be separate for purposes of configuration. In Caddy, at first glance, this maps well to the path-based routing, and in fact, for some software, this works. (I've noticed it works for MediaWiki.) It's worth noting that the above NGINX config isn't totally right because it will end up serving /forum to /www/phpbb/forum/index.php or so, so there are some simplifications made here, but hopefully the intent is clear.

Anyway, trying to do this sort of thing in Caddy yields something like this:

localhost:8000 {
    root /www/home
}
localhost:8000/forum {
    root /www/phpbb
    fastcgi / php:9000 php
}
localhost:8000/wiki {
    root /www/mediawiki
    fastcgi / php:9000 php
}

Which looks nice, but unfortunately it has a pretty debilitating issue: $_SERVER["SCRIPT_NAME"] and $_SERVER["PHP_SELF"] will end up evaluating to paths relative to the subdirectory routed, not the domain

This may not be a bug. I am not sure if this is intentional or not. I guess there's no actual standard for CGI. Drafts suggest that the consensus is that the "URI" information passed to the script need not match reality:
image

But it feels like it would be more useful if SCRIPT_NAME always included the full request path (after processing it.) I don't think the FastCGI software is interested in an arbitrary path that will never match either the filename on disk nor a valid URL path. The REQUEST_URI is probably useful, but I have a feeling this will be the raw path before rewrites and parsing the query string.

As for software that's impacted, a lot of software that uses Symfony will be affected, because its getRequestUri function and related utilities will all get the wrong paths. Some PHP software does fine, presumably because it sticks to only relative paths, or the path is configurable and can be overridden, or something to that effect. MediaWiki works fine, but phpBB 3 is FUBAR with this, and even the path override does nothing to help here.

I would prefer to not merge all of the configurations into one vhost. It seems like trouble with the separate roots, and the orchestration setup I have has a better time with separate configuration files anyways.

Reproducing

I have a more complete docker-compose based setup if desired.

Caddyfile:

localhost:8000/folder {
    root /www/folder
    fastcgi / php:9000 php
}

/www/folder/index.php:

<?php header("content-type: application/json"); ?>
{
    "PHP_SELF": "<?= $_SERVER["PHP_SELF"]; ?>",
    "REQUEST_URI": "<?= $_SERVER["REQUEST_URI"]; ?>",
    "SCRIPT_NAME": "<?= $_SERVER["SCRIPT_NAME"]; ?>"
}

Expected output

curl http://localhost:8000/folder/

{
    "PHP_SELF": "/folder/index.php",
    "REQUEST_URI": "/folder/",
    "SCRIPT_NAME": "/folder/index.php",
}

curl http://localhost:8000/folder/index.php

{
    "PHP_SELF": "/folder/index.php",
    "REQUEST_URI": "/folder/index.php",
    "SCRIPT_NAME": "/folder/index.php"
}

curl http://localhost:8000/folder/index.php?querystring

{
    "PHP_SELF": "/folder/index.php",
    "REQUEST_URI": "/folder/index.php?querystring",
    "SCRIPT_NAME": "/folder/index.php",
}

Actual output

curl http://localhost:8000/folder/

{
    "PHP_SELF": "/index.php",
    "REQUEST_URI": "/folder/",
    "SCRIPT_NAME": "/index.php",
}

curl http://localhost:8000/folder/index.php

{
    "PHP_SELF": "/index.php",
    "REQUEST_URI": "/folder/index.php",
    "SCRIPT_NAME": "/index.php"
}

curl http://localhost:8000/folder/index.php?querystring

{
    "PHP_SELF": "/index.php",
    "REQUEST_URI": "/folder/index.php?querystring",
    "SCRIPT_NAME": "/index.php",
}

Phew. Hope all of that makes sense.

bug help wanted

Most helpful comment

I just took a look and it looks like it's actually very easy. I was worried I might have to expose more information into the request context, but it seems that no, the path prefix of the current vhost is exposed into a context variable already. That makes the actual change very simple.

The question now is if it's actually a proper solution to just throw the path in front. I think so, but it's hard to be 100% sure.

P.S. I apologize for not mentioning ahead of time that I was going to work on it, but I only started taking a look the past half hour or so. If it was more substantial work I would've definitely mentioned here first to avoid duplicating work.

All 5 comments

Thanks for the detailed issue.

As another PHP developer, I think this is a bug. I would also expect the values you expect. However I'm not sure its a bug we can fix!

I would need to look at the code and see how easy / difficult a fix this would be. These values may be set by PHP itself based on the root caddy passes it.

I at least have an idea of what code in Caddy is involved here. It's this:

https://github.com/mholt/caddy/blob/master/caddyhttp/fastcgi/fastcgi.go#L219

As far as I can tell, a logical solution would be to prepend the path prefix from the route in the Caddyfile, but I don't know enough about Caddy to know if that information makes it down this far.

Thanks for tracking down the likely place. I think we could try that solution. We may need to be careful we don't introduce any bugs.

If you felt you could try a PR we would be really happy with the contribution, otherwise I will try to get to it soon.

I just took a look and it looks like it's actually very easy. I was worried I might have to expose more information into the request context, but it seems that no, the path prefix of the current vhost is exposed into a context variable already. That makes the actual change very simple.

The question now is if it's actually a proper solution to just throw the path in front. I think so, but it's hard to be 100% sure.

P.S. I apologize for not mentioning ahead of time that I was going to work on it, but I only started taking a look the past half hour or so. If it was more substantial work I would've definitely mentioned here first to avoid duplicating work.

Big win all around. Thanks everyone!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wayneashleyberry picture wayneashleyberry  路  3Comments

PhilmacFLy picture PhilmacFLy  路  3Comments

la0wei picture la0wei  路  3Comments

lorddaedra picture lorddaedra  路  3Comments

klaasel picture klaasel  路  3Comments