Caddy: ip_hash load balancing policy does not add revived upstreams back to the pool

Created on 26 Sep 2016  路  2Comments  路  Source: caddyserver/caddy

1. What version of Caddy are you running (caddy -version)?

v0.9.2

2. What are you trying to do?

Use the ip_hash load balancing policy with the proxy directive.

3. What is your entire Caddyfile?

Caddyfile_proxy

http:// {
    errors CaddyErrors_proxy.log
    log CaddyLog_proxy.log
    proxy / localhost:580 localhost:581 {
        transparent
        policy ip_hash
        health_check /health.html
        health_check_interval 10s
        health_check_timeout 5s
    }
}

Caddyfile_upstream1

http://:580 {
    errors CaddyErrors_upstream1.log
    log CaddyLog_upstream1.log
    root WebRoot1
}

Caddyfile_upstream2

http://:581 {
    errors CaddyErrors_upstream2.log
    log CaddyLog_upstream2.log
    root WebRoot2
}

WebRoot1 and WebRoot2 directories contain files:

  • index.html with contents PROXY1 and PROXY2 respectively
  • health.html with contents OK

4. How did you run Caddy (give the full command and describe the execution environment)?

caddy.exe -conf caddyfile_upstream1
caddy.exe -conf caddyfile_upstream2
caddy.exe -conf caddyfile_proxy

5. What did you expect to see?

When upstream1 fails, requests from previous and new clients are redirected to upstream2 (given that they were previously assigned to upstream1) as upstream1 is taken out of the pool. Once upstream1 comes back online, health check detects this and adds the upstream back to the pool for new matching requests.

6. What did you see instead (give full error messages and/or log)?

Once upstream1 came back online, health check did not resume and the upstream was still considered down. No health check requests arrived to the upstream afterwards. According to the request log of upstream2, once upstream1 failed, health checks started being performed twice against upstream2 (see attached logs). Afterwards when upstream2 failed, the proxy started returning 502 even when upstream1 was already up for a while. It seems like the upstream is removed from the proxy configuration permanently. The proxy had to be restarted to add upstream1 back to the pool.

CaddyLog_proxy.log

::1 - [26/Sep/2016:10:35:23 +0200] "GET / HTTP/1.1" 200 8
::1 - [26/Sep/2016:10:35:24 +0200] "GET / HTTP/1.1" 200 8
::1 - [26/Sep/2016:10:35:24 +0200] "GET / HTTP/1.1" 200 8
::1 - [26/Sep/2016:10:35:25 +0200] "GET / HTTP/1.1" 200 8
::1 - [26/Sep/2016:10:35:25 +0200] "GET / HTTP/1.1" 200 8
::1 - [26/Sep/2016:10:35:34 +0200] "GET / HTTP/1.1" 200 8 <<< upstream1 was terminated before this request
::1 - [26/Sep/2016:10:35:35 +0200] "GET / HTTP/1.1" 200 8

CaddyLog_upstream1.log

::1 - [26/Sep/2016:10:35:21 +0200] "GET /health.html HTTP/1.1" 200 4
::1 - [26/Sep/2016:10:35:23 +0200] "GET / HTTP/1.1" 200 8
::1 - [26/Sep/2016:10:35:24 +0200] "GET / HTTP/1.1" 200 8
::1 - [26/Sep/2016:10:35:24 +0200] "GET / HTTP/1.1" 200 8
::1 - [26/Sep/2016:10:35:25 +0200] "GET / HTTP/1.1" 200 8
::1 - [26/Sep/2016:10:35:25 +0200] "GET / HTTP/1.1" 200 8

CaddyLog_upstream2.log

::1 - [26/Sep/2016:10:35:21 +0200] "GET /health.html HTTP/1.1" 200 4
::1 - [26/Sep/2016:10:35:33 +0200] "GET /health.html HTTP/1.1" 200 4
::1 - [26/Sep/2016:10:35:34 +0200] "GET / HTTP/1.1" 200 8 <<< first failover request from upstream1
::1 - [26/Sep/2016:10:35:35 +0200] "GET / HTTP/1.1" 200 8
::1 - [26/Sep/2016:10:35:41 +0200] "GET /health.html HTTP/1.1" 200 4 <<< doubled health check requests start here
::1 - [26/Sep/2016:10:35:41 +0200] "GET /health.html HTTP/1.1" 200 4
::1 - [26/Sep/2016:10:35:51 +0200] "GET /health.html HTTP/1.1" 200 4
::1 - [26/Sep/2016:10:35:51 +0200] "GET /health.html HTTP/1.1" 200 4
::1 - [26/Sep/2016:10:36:01 +0200] "GET /health.html HTTP/1.1" 200 4
::1 - [26/Sep/2016:10:36:01 +0200] "GET /health.html HTTP/1.1" 200 4
...

7. How can someone who is starting from scratch reproduce this behavior as minimally as possible?

Start upstream1, upstream2 and proxy instances of caddy. Curl localhost:80 and watch upstream access logs for health check requests. If curl request are assigned to upstream1, terminate the upstream and curl again. Then bring the terminated upstream up again, wait 10s and terminate another upstream. The proxy will return 502 instead of forwarding requests to restarted upstream1. See upstream1 access logs - there will be no new health check requests. See upstream2 access logs - health check requests will be doubled.

This issue does not occur with round_robin policy - health checks to failed upstreams resume correctly once they are up again and are not fired twice against another upstream.

bug good first issue

Most helpful comment

Yeah I'll look into this today.

All 2 comments

Thanks for the detailed report!

Hmm, @krishamoud -- I didn't notice this before, but the IP hash policy is the only one that changes the HostPool. Having seen this issue now, I actually don't think it's necessary (or correct) to remove a down host from the pool entirely; it explains why they don't get added back once they go down. Are you in a position to look into this and submit a fix?

Yeah I'll look into this today.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

muhammadmuzzammil1998 picture muhammadmuzzammil1998  路  3Comments

ericmdantas picture ericmdantas  路  3Comments

wayneashleyberry picture wayneashleyberry  路  3Comments

whs picture whs  路  3Comments

dafanasiev picture dafanasiev  路  3Comments