Kong: False Positive with "Create Or Update Service" API

Created on 13 Jun 2019 · 16Comments · Source: Kong/kong

Summary

In a single instance setup, I used PUT /services/{id} API to update an existing service with the following parameters:

{"name":"6f5b306c-b0ab-3454-b964-4aa177edea69","id":"6f5b306c-b0ab-3454-b964-4aa177edea69","url":"http://ba38e4b5-3964-33d6-a0cd-c889f6842bc0"}

The API call returned 200 OK with the old data (the host field should be updated to ba38e4b5-3964-33d6-a0cd-c889f6842bc0):

{"connect_timeout":60000,"created_at":1560324311,"retries":5,"protocol":"http","updated_at":1560324311,"port":80,"write_timeout":60000,"host":"90a46ade-6f3b-31db-9064-1b26868cf372","name":"6f5b306c-b0ab-3454-b964-4aa177edea69","id":"6f5b306c-b0ab-3454-b964-4aa177edea69","read_timeout":60000}

Note: this issue is easily reproduced with multi-threading.

Steps To Reproduce

create a service with arbitrary data
call PUT /services/{id} to update url field to a random value with multithreading.
check the api result to see if there is a false positive

version

kong: 1.1.1

tasneeds-investigation

Source

shuoqingding

Most helpful comment

Clarification on the above: despite using :connect() in our business logic or having a request-aware B policy, there is no guarantee that the driver will use the same coordinator for each query in a given request, those are only a best-effort policy (a timeout or other failure may still occur and trigger a retry at the driver's level). Besides, if the coordinator is not a replica node for the row being inserted/selected, even if the best-effort policy in reusing the node was successful, we may still observe eventual consistency behavior in a read-after-write pattern such as this one.

Well, truly the appropriate solution here (despite the above workaround and planned workaround) is to not do read-before/after-writes in Cassandra like the DAO is making, which has been a hot topic internally for many years and a lost battle already.

thibaultcha on 17 Jun 2019

👍2

All 16 comments

If you want to update, you should use PATCH, if you want to replace, usePUT.

bungle on 13 Jun 2019

@shuoqingding actually, you are right. it looks like a bug. thanks for reporting, let me take a look at it.

bungle on 13 Jun 2019

🚀1 ❤1

If you want to update, you should use PATCH, if you want to replace, usePUT.

yes, I used PUT for creating or updating service. Thank you for the prompt reply!

shuoqingding on 13 Jun 2019

@shuoqingding, I tried to reproduce, but could not. Can you send the exact commands you used to reproduce this? E.g. with httpie or curl?

Here is what I tried:

(next) $ http PUT :8001/services/6f5b306c-b0ab-3454-b964-4aa177edea69 id=6f5b306c-b0ab-3454-b964-4aa177edea69 name=6f5b306c-b0ab-3454-b964-4aa177edea69 url=http://ba38e4b5-3964-33d6-a0cd-c889f6842bc0 -v
PUT /services/6f5b306c-b0ab-3454-b964-4aa177edea69 HTTP/1.1
Accept: application/json, */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 148
Content-Type: application/json
Host: localhost:8001
User-Agent: HTTPie/1.0.2

{
    "id": "6f5b306c-b0ab-3454-b964-4aa177edea69",
    "name": "6f5b306c-b0ab-3454-b964-4aa177edea69",
    "url": "http://ba38e4b5-3964-33d6-a0cd-c889f6842bc0"
}

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 316
Content-Type: application/json; charset=utf-8
Date: Thu, 13 Jun 2019 16:23:06 GMT
Server: kong/1.2.0

{
    "connect_timeout": 60000,
    "created_at": 1560442986,
    "host": "ba38e4b5-3964-33d6-a0cd-c889f6842bc0",
    "id": "6f5b306c-b0ab-3454-b964-4aa177edea69",
    "name": "6f5b306c-b0ab-3454-b964-4aa177edea69",
    "path": null,
    "port": 80,
    "protocol": "http",
    "read_timeout": 60000,
    "retries": 5,
    "tags": null,
    "updated_at": 1560442986,
    "write_timeout": 60000
}

(next) $ http PUT :8001/services/6f5b306c-b0ab-3454-b964-4aa177edea69 id=6f5b306c-b0ab-3454-b964-4aa177edea69 name=6f5b306c-b0ab-3454-b964-4aa177edea69 url=http://ca38e4b5-3964-33d6-a0cd-c889f6842bc0 -v
PUT /services/6f5b306c-b0ab-3454-b964-4aa177edea69 HTTP/1.1
Accept: application/json, */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 148
Content-Type: application/json
Host: localhost:8001
User-Agent: HTTPie/1.0.2

{
    "id": "6f5b306c-b0ab-3454-b964-4aa177edea69",
    "name": "6f5b306c-b0ab-3454-b964-4aa177edea69",
    "url": "http://ca38e4b5-3964-33d6-a0cd-c889f6842bc0"
}

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 316
Content-Type: application/json; charset=utf-8
Date: Thu, 13 Jun 2019 16:24:07 GMT
Server: kong/1.2.0

{
    "connect_timeout": 60000,
    "created_at": 1560443047,
    "host": "ca38e4b5-3964-33d6-a0cd-c889f6842bc0",
    "id": "6f5b306c-b0ab-3454-b964-4aa177edea69",
    "name": "6f5b306c-b0ab-3454-b964-4aa177edea69",
    "path": null,
    "port": 80,
    "protocol": "http",
    "read_timeout": 60000,
    "retries": 5,
    "tags": null,
    "updated_at": 1560443047,
    "write_timeout": 60000
}

(though I used next branch, similar to 1.2.0)

bungle on 13 Jun 2019

@bungle Actually the version of kong I use is 1.1.0. The script I used to reproduce this issue is attached below, hope it helps.

import threading
import uuid
import json
import requests

url = "http://127.0.0.1:8001/services/d9b258db-3a02-479b-aeff-07ca61b45975"

def test(number):
    uuid4 = uuid.uuid4()
    host = str(uuid4)

    data = {
        "name": "d9b258db-3a02-479b-aeff-07ca61b45975",
        "id": "d9b258db-3a02-479b-aeff-07ca61b45975",
        "url": "http://" + host
    }

    headers = {'Content-Type': 'application/json'}
    response = requests.put(url, headers=headers, data=json.dumps(data))

    result = response.json()

    if (result['host'] != host):
        resp = {'code': response.status_code, 'req_body': data, 'resp_body': result}
        print('Response not match: ' +  str(resp) + '\n')
    else:
        print('Ok\n')


if __name__ == '__main__':

    n = 10
    for i in range(n):
        my_thread = threading.Thread(target=test, args=(i,))
        my_thread.start()

Sample Result:

Ok

Ok

Ok

Response not match: {'code': 200, 'resp_body': {u'retries': 5, u'protocol': u'http', u'name': u'd9b258db-3a02-479b-aeff-07ca61b45975', u'tags': None, u'created_at': 1560490867, u'updated_at': 1560490867, u'connect_timeout': 60000, u'port': 80, u'host': u'55efb840-1f61-4bb2-b404-b10aa8426a7a', u'read_timeout': 60000, u'path': None, u'write_timeout': 60000, u'id': u'd9b258db-3a02-479b-aeff-07ca61b45975'}, 'req_body': {'url': 'http://6f16ad81-223b-409d-bf3e-6c037543e207', 'name': 'd9b258db-3a02-479b-aeff-07ca61b45975', 'id': 'd9b258db-3a02-479b-aeff-07ca61b45975'}}

Ok
Response not match: {'code': 200, 'resp_body': {u'retries': 5, u'protocol': u'http', u'name': u'd9b258db-3a02-479b-aeff-07ca61b45975', u'tags': None, u'created_at': 1560490867, u'updated_at': 1560490867, u'connect_timeout': 60000, u'port': 80, u'host': u'17d4239c-6388-46ed-98cb-8aaeec4c542f', u'read_timeout': 60000, u'path': None, u'write_timeout': 60000, u'id': u'd9b258db-3a02-479b-aeff-07ca61b45975'}, 'req_body': {'url': 'http://98e2389f-815f-45e7-a1a5-cf46263216f1', 'name': 'd9b258db-3a02-479b-aeff-07ca61b45975', 'id': 'd9b258db-3a02-479b-aeff-07ca61b45975'}}

Ok


Response not match: {'code': 200, 'resp_body': {u'retries': 5, u'protocol': u'http', u'name': u'd9b258db-3a02-479b-aeff-07ca61b45975', u'tags': None, u'created_at': 1560490867, u'updated_at': 1560490867, u'connect_timeout': 60000, u'port': 80, u'host': u'3836de9f-1348-4af3-bfb7-cdb8b0dcdae7', u'read_timeout': 60000, u'path': None, u'write_timeout': 60000, u'id': u'd9b258db-3a02-479b-aeff-07ca61b45975'}, 'req_body': {'url': 'http://17d4239c-6388-46ed-98cb-8aaeec4c542f', 'name': 'd9b258db-3a02-479b-aeff-07ca61b45975', 'id': 'd9b258db-3a02-479b-aeff-07ca61b45975'}}
Ok

shuoqingding on 14 Jun 2019

I guess the response of PUT is read from db after write without any lock, so maybe this issue is expected behavior?

shuoqingding on 14 Jun 2019

@shuoqingding is this Cassandra or Postgres?

bungle on 14 Jun 2019

Cassandra seem to be using select after write. Postgres does not.

bungle on 14 Jun 2019

Cassandra.

shuoqingding on 14 Jun 2019

@thibaultcha is there a setting that we can use to get over this? Or can :connect help with this (too)?

@shuoqingding what is your setting for: cassandra_lb_policy?

bungle on 14 Jun 2019

👀1

@bungle I didn't set cassandra_lb_policy, so it should be RequestRoundRobin by default.

shuoqingding on 16 Jun 2019

@shuoqingding can you try DC Aware? Does it make any difference?

bungle on 17 Jun 2019

The appropriate _fix_ in this case is to ensure that Admin API queries are executed with a QUORUM consistency. There should (as discussed many times before) be a way to configure Admin API and Proxy server consistencies independently (PRs welcome!). A workaround we suggest from time to time is to configure control plane nodes (those with proxy_listen = off) with a high consistency, while data plane nodes (those with admin_listen = off) can be configured with a lower consistency (for higher availability) _if desired_. This of course assumes that your deployment pattern separates data and control planes, which may or may not be the case already.

thibaultcha on 17 Jun 2019

👍2

thibaultcha on 17 Jun 2019

👍2

@shuoqingding can you try DC Aware? Does it make any difference?

@bungle Sorry for the late reply, I use the single-DC cluster for Cassandra, so I think the DC awareness should not be a factor? please let me know if I am wrong.

shuoqingding on 30 Jun 2019

@thibaultcha Thank you for the explanation and suggestions. "having an independent control node" is a great idea for this, however we have decided to utilize DB-less feature to avoid this issue. Thanks anyway!

shuoqingding on 30 Jun 2019

Was this page helpful?

0 / 5 - 0 ratings