Caddy: fastcgi response times extremely long

Created on 31 Aug 2016  路  18Comments  路  Source: caddyserver/caddy

1. What version of Caddy are you running (caddy -version)?

Caddy 0.9.1 (+e8e5595 Tue Aug 30 04:52:42 UTC 2016)

2. What are you trying to do? 5. What did you expect to see? 6. What did you see instead (give full error messages and/or log)?

I am replacing nginx as the head of a fastcgi backend with caddy. Everything works fine, but caddys response time is extremely long. Concretely, nginx needs about 2s to serve the first request, but than order of 10ms for subsequent requests. Caddy takes about 1s for every request that goes to the fastcgi back end. (Times measured with firefox's network monitor). Static files are served quickly, the problem seems to be in the fastcgi TCP communication. (Below I will describe a MWE instead of my actual setup.)

3. What is your entire Caddyfile?

  localhost:3030
fastcgi / localhost:2020 {
  ext    !jpg !jpeg !gif !png !ico !bmp !svg !css !js !po !txt !json !html !woff2
}

4. How did you run Caddy (give the full command and describe the execution environment)? 7. How can someone who is starting from scratch reproduce this behavior as minimally as possible?

For the MWE:

go get github.com/mholt/caddy/caddy
go get github.com/echsecutor/fastcgi-test
nohup caddy &
fastcgi-test.exe --port 2020 --folder C:\\

from cygwin under Windows in the folder with the above Caddyfile. (under linux, use some reasonable folder, rest should be the same). Then point your browser to localhost:3030 and observe response times. (Firefox: strg+shit+q) If there are some static files in this folder, those are served quickly (few ms), but every fastcgi request takes about 1s.

help wanted

All 18 comments

At a superficial glance at caddyhttp/fastcgi/fastcgi.go, line 76, I get the impression that caddy re-dials a new fastcgi TCP connection for each request? This is clearly not the best idea, performance wise. nginx keeps a fastcgi connection openfor some configurable idle time. Maybe this is where the performance difference comes from?

Adding
log / caddy.log "{when} {method} {uri} {proto} {status} {size} {>User-Agent} {latency}"
to the Caddyfile, the latency can also be observed from caddy's own log. E.g.

31/Aug/2016:10:32:02 +0200 GET /Go/ HTTP/1.1 304 0 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0 1.002s
31/Aug/2016:10:32:07 +0200 GET /Go/VERSION HTTP/1.1 304 0 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0 1.001s

Thanks for reporting this. I'll try to investigate more and see if it can be improved.

Ok, it boils down to 3 points:

  1. store the FCGIClient somwhere
  2. do not close the connection (side effect of resp.Body.Close() )
  3. Tell fastcgi to keep the connection open in the first place:
diff --git a/caddyhttp/fastcgi/fcgiclient.go b/caddyhttp/fastcgi/fcgiclient.go
index f443f63..574455c 100644
--- a/caddyhttp/fastcgi/fcgiclient.go
+++ b/caddyhttp/fastcgi/fcgiclient.go
@@ -385,7 +385,7 @@ func (w *streamReader) Read(p []byte) (n int, err error) {
 // Do made the request and returns a io.Reader that translates the data read
 // from fcgi responder out of fcgi packet before returning it.
 func (c *FCGIClient) Do(p map[string]string, req io.Reader) (r io.Reader, err error) {
-   err = c.writeBeginRequest(uint16(Responder), 0)
+   err = c.writeBeginRequest(uint16(Responder), FCGIKeepConn)
    if err != nil {
        return
    }

(i.e. replace 0 by FCGIKeepConn)

This leads to a speedup factor 100 in the MWE.

... but only for sequential queries. As soon as queries start mixing (which is allowed in the fastcgi protocol), the answers get mixed up. There are some more improvements needed in fcgiclient to make persistent fcgi connections possible.

Sadly this makes caddy unusable for me at the moment.

The whole issue might be a windows only problem. Under Linux I can currently not reproduce the slow fastcgi response. In other words, caddy works very well and servers fastcgi with no delay under linux. (why are people running servers on windows anyway... ;)

I think it's safe to close this until we ascertain there is actually an issue.

Thanks for the efforts.

Well, I am sure that there is an issue on Windows machines. And I think the problem is clear from the code. It just happens that opening seperate tcp connections for each fastcgi request is no problem under linux.

Maybe something to do with how Windows reserves closed sockets or something? (Not sure, but I've seen similar issues in Windows before...)

I can't do this myself, but would be willing to accept a contribution that improves fcgiclient.go code to support persistent fcgi connections! That'd be a great improvement.

I can confirme that on Windows 7 and Windows 10 running caddy (latest release as well as git master version) with different fastcgi backends, the latency is always >1s, which is clearly unacceptable. I think a big advantage of caddy as compared to nginx is, that it runs properly on Windows. (nginx e.g. can't cope with utf16 paths).
Hence I took some time to write a patch. This fixes the Problem for me, i.e. response times are now back no normal (order of 10ms for small ajax). I will file a pull request. However, I did not dig into the system of reading options from the Caddy file, maybe someone who is familiar with this could add an option to the fastcgi directive for switching persistent fastcgi connections on/off. (On Linux, there seems to be no problem. I guess the OS handels a connection pool or is just not so incredibly slow in opening connections. ;)

I do seem to remember a test we had (back before the 0.9 rewrite) that would call caddy.Start() and caddy.Stop() a few times in quick succession -- definitely within 1 second, and on Windows that test failed so I had to write a hack in... oh yes... this:

    if runtime.GOOS == "windows" {
            // Windows has been known to keep sockets open even after closing the listeners.
            // Tests reveal this error case easily because they call Start() then Stop()
            // in succession. TODO: Better way to handle this? And why limit this to Windows?
            for i := 0; i < 20; i++ {
                time.Sleep(100 * time.Millisecond)
                ln, err = net.Listen("tcp", s.Server.Addr)
                if err == nil {
                    succeeded = true
                    break
                }
            }
        }

I wish I had a good explanation at this point but never really had the time to dive into why. I wonder if it's related to the fastcgi issue you're experiencing.

Here are some benchmarks.
This is
github.com/echsecutor/fastcgi-test
github.com/echsecutor/caddy/caddy (master, commit 60580400d3d28076bfe05deb35a188db6d3c3863)

var UsePersistentFcgiConnections = true
win7-persistene-caddy.log.txt

var UsePersistentFcgiConnections = false
win7-non-persistene-caddy.log.txt

There is no difference for serving files, although I thought the Caddy file
Caddyfile.txt
would make caddy serve only certain files directly, apparently also the .log files were served without calling the cgi backend?

Anyway, point beeing, response times from fastcgi without persistent connections are always >1s

09/Sep/2016:22:51:18 +0200 GET /test/2/ HTTP/1.1 304 0 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0 1.006s
09/Sep/2016:22:51:28 +0200 GET /test/3/ HTTP/1.1 304 0 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0 1.014s
09/Sep/2016:22:51:31 +0200 GET /test/3/ HTTP/1.1 304 0 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0 1.014s

and with persistent connections these come down to normal

09/Sep/2016:22:54:45 +0200 GET /test/2/ HTTP/1.1 304 0 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0 1ms
09/Sep/2016:22:54:47 +0200 GET /test/2/ HTTP/1.1 304 0 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0 24ms
09/Sep/2016:22:54:49 +0200 GET /test/3/ HTTP/1.1 304 0 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0 2ms
09/Sep/2016:22:54:50 +0200 GET /test/4/ HTTP/1.1 304 0 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0 2ms

This is independent of whether or not I open the Windows firewall ports. (Not that this should matter for local host, but with windows, I rather checked ;)

On Linux, there is no difference between the two versions with/without persistent connections.
linux-non-persistant-caddy.log.txt
linux-persistant-caddy.log.txt

Cool, thanks for all your work on this! (Fixed in #1087)

In the current master, the issue is back to life. fastCGI connections (also with pool > 0 ) are non persistent and response times on Windows > 1s.

With @abiosoft s correction https://github.com/mholt/caddy/pull/1129#discussion_r80370543 as in PR https://github.com/mholt/caddy/pull/1134, i.e. with working pooling, this is nicely fixed. I have tested on Windows that responds times are back to the normal few ms. Thanks @abiosoft !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lorddaedra picture lorddaedra  路  3Comments

mholt picture mholt  路  3Comments

klaasel picture klaasel  路  3Comments

PhilmacFLy picture PhilmacFLy  路  3Comments

wayneashleyberry picture wayneashleyberry  路  3Comments