In a single instance setup, I used PUT /services/{id} API to update an existing service with the following parameters:
{"name":"6f5b306c-b0ab-3454-b964-4aa177edea69","id":"6f5b306c-b0ab-3454-b964-4aa177edea69","url":"http://ba38e4b5-3964-33d6-a0cd-c889f6842bc0"}
The API call returned 200 OK with the old data (the host field should be updated to ba38e4b5-3964-33d6-a0cd-c889f6842bc0):
{"connect_timeout":60000,"created_at":1560324311,"retries":5,"protocol":"http","updated_at":1560324311,"port":80,"write_timeout":60000,"host":"90a46ade-6f3b-31db-9064-1b26868cf372","name":"6f5b306c-b0ab-3454-b964-4aa177edea69","id":"6f5b306c-b0ab-3454-b964-4aa177edea69","read_timeout":60000}
Note: this issue is easily reproduced with multi-threading.
PUT /services/{id} to update url field to a random value with multithreading.kong: 1.1.1
If you want to update, you should use PATCH, if you want to replace, usePUT.
@shuoqingding actually, you are right. it looks like a bug. thanks for reporting, let me take a look at it.
If you want to update, you should use
PATCH, if you want to replace, usePUT.
yes, I used PUT for creating or updating service. Thank you for the prompt reply!
@shuoqingding, I tried to reproduce, but could not. Can you send the exact commands you used to reproduce this? E.g. with httpie or curl?
Here is what I tried:
(next) $ http PUT :8001/services/6f5b306c-b0ab-3454-b964-4aa177edea69 id=6f5b306c-b0ab-3454-b964-4aa177edea69 name=6f5b306c-b0ab-3454-b964-4aa177edea69 url=http://ba38e4b5-3964-33d6-a0cd-c889f6842bc0 -v
PUT /services/6f5b306c-b0ab-3454-b964-4aa177edea69 HTTP/1.1
Accept: application/json, */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 148
Content-Type: application/json
Host: localhost:8001
User-Agent: HTTPie/1.0.2
{
"id": "6f5b306c-b0ab-3454-b964-4aa177edea69",
"name": "6f5b306c-b0ab-3454-b964-4aa177edea69",
"url": "http://ba38e4b5-3964-33d6-a0cd-c889f6842bc0"
}
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 316
Content-Type: application/json; charset=utf-8
Date: Thu, 13 Jun 2019 16:23:06 GMT
Server: kong/1.2.0
{
"connect_timeout": 60000,
"created_at": 1560442986,
"host": "ba38e4b5-3964-33d6-a0cd-c889f6842bc0",
"id": "6f5b306c-b0ab-3454-b964-4aa177edea69",
"name": "6f5b306c-b0ab-3454-b964-4aa177edea69",
"path": null,
"port": 80,
"protocol": "http",
"read_timeout": 60000,
"retries": 5,
"tags": null,
"updated_at": 1560442986,
"write_timeout": 60000
}
(next) $ http PUT :8001/services/6f5b306c-b0ab-3454-b964-4aa177edea69 id=6f5b306c-b0ab-3454-b964-4aa177edea69 name=6f5b306c-b0ab-3454-b964-4aa177edea69 url=http://ca38e4b5-3964-33d6-a0cd-c889f6842bc0 -v
PUT /services/6f5b306c-b0ab-3454-b964-4aa177edea69 HTTP/1.1
Accept: application/json, */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 148
Content-Type: application/json
Host: localhost:8001
User-Agent: HTTPie/1.0.2
{
"id": "6f5b306c-b0ab-3454-b964-4aa177edea69",
"name": "6f5b306c-b0ab-3454-b964-4aa177edea69",
"url": "http://ca38e4b5-3964-33d6-a0cd-c889f6842bc0"
}
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 316
Content-Type: application/json; charset=utf-8
Date: Thu, 13 Jun 2019 16:24:07 GMT
Server: kong/1.2.0
{
"connect_timeout": 60000,
"created_at": 1560443047,
"host": "ca38e4b5-3964-33d6-a0cd-c889f6842bc0",
"id": "6f5b306c-b0ab-3454-b964-4aa177edea69",
"name": "6f5b306c-b0ab-3454-b964-4aa177edea69",
"path": null,
"port": 80,
"protocol": "http",
"read_timeout": 60000,
"retries": 5,
"tags": null,
"updated_at": 1560443047,
"write_timeout": 60000
}
(though I used next branch, similar to 1.2.0)
@bungle Actually the version of kong I use is 1.1.0. The script I used to reproduce this issue is attached below, hope it helps.
import threading
import uuid
import json
import requests
url = "http://127.0.0.1:8001/services/d9b258db-3a02-479b-aeff-07ca61b45975"
def test(number):
uuid4 = uuid.uuid4()
host = str(uuid4)
data = {
"name": "d9b258db-3a02-479b-aeff-07ca61b45975",
"id": "d9b258db-3a02-479b-aeff-07ca61b45975",
"url": "http://" + host
}
headers = {'Content-Type': 'application/json'}
response = requests.put(url, headers=headers, data=json.dumps(data))
result = response.json()
if (result['host'] != host):
resp = {'code': response.status_code, 'req_body': data, 'resp_body': result}
print('Response not match: ' + str(resp) + '\n')
else:
print('Ok\n')
if __name__ == '__main__':
n = 10
for i in range(n):
my_thread = threading.Thread(target=test, args=(i,))
my_thread.start()
Sample Result:
Ok
Ok
Ok
Response not match: {'code': 200, 'resp_body': {u'retries': 5, u'protocol': u'http', u'name': u'd9b258db-3a02-479b-aeff-07ca61b45975', u'tags': None, u'created_at': 1560490867, u'updated_at': 1560490867, u'connect_timeout': 60000, u'port': 80, u'host': u'55efb840-1f61-4bb2-b404-b10aa8426a7a', u'read_timeout': 60000, u'path': None, u'write_timeout': 60000, u'id': u'd9b258db-3a02-479b-aeff-07ca61b45975'}, 'req_body': {'url': 'http://6f16ad81-223b-409d-bf3e-6c037543e207', 'name': 'd9b258db-3a02-479b-aeff-07ca61b45975', 'id': 'd9b258db-3a02-479b-aeff-07ca61b45975'}}
Ok
Response not match: {'code': 200, 'resp_body': {u'retries': 5, u'protocol': u'http', u'name': u'd9b258db-3a02-479b-aeff-07ca61b45975', u'tags': None, u'created_at': 1560490867, u'updated_at': 1560490867, u'connect_timeout': 60000, u'port': 80, u'host': u'17d4239c-6388-46ed-98cb-8aaeec4c542f', u'read_timeout': 60000, u'path': None, u'write_timeout': 60000, u'id': u'd9b258db-3a02-479b-aeff-07ca61b45975'}, 'req_body': {'url': 'http://98e2389f-815f-45e7-a1a5-cf46263216f1', 'name': 'd9b258db-3a02-479b-aeff-07ca61b45975', 'id': 'd9b258db-3a02-479b-aeff-07ca61b45975'}}
Ok
Response not match: {'code': 200, 'resp_body': {u'retries': 5, u'protocol': u'http', u'name': u'd9b258db-3a02-479b-aeff-07ca61b45975', u'tags': None, u'created_at': 1560490867, u'updated_at': 1560490867, u'connect_timeout': 60000, u'port': 80, u'host': u'3836de9f-1348-4af3-bfb7-cdb8b0dcdae7', u'read_timeout': 60000, u'path': None, u'write_timeout': 60000, u'id': u'd9b258db-3a02-479b-aeff-07ca61b45975'}, 'req_body': {'url': 'http://17d4239c-6388-46ed-98cb-8aaeec4c542f', 'name': 'd9b258db-3a02-479b-aeff-07ca61b45975', 'id': 'd9b258db-3a02-479b-aeff-07ca61b45975'}}
Ok
I guess the response of PUT is read from db after write without any lock, so maybe this issue is expected behavior?
@shuoqingding is this Cassandra or Postgres?
Cassandra seem to be using select after write. Postgres does not.
Cassandra.
@thibaultcha is there a setting that we can use to get over this? Or can :connect help with this (too)?
@shuoqingding what is your setting for: cassandra_lb_policy?
@bungle I didn't set cassandra_lb_policy, so it should be RequestRoundRobin by default.
@shuoqingding can you try DC Aware? Does it make any difference?
The appropriate _fix_ in this case is to ensure that Admin API queries are executed with a QUORUM consistency. There should (as discussed many times before) be a way to configure Admin API and Proxy server consistencies independently (PRs welcome!). A workaround we suggest from time to time is to configure control plane nodes (those with proxy_listen = off) with a high consistency, while data plane nodes (those with admin_listen = off) can be configured with a lower consistency (for higher availability) _if desired_. This of course assumes that your deployment pattern separates data and control planes, which may or may not be the case already.
Clarification on the above: despite using :connect() in our business logic or having a request-aware B policy, there is no guarantee that the driver will use the same coordinator for each query in a given request, those are only a best-effort policy (a timeout or other failure may still occur and trigger a retry at the driver's level). Besides, if the coordinator is not a replica node for the row being inserted/selected, even if the best-effort policy in reusing the node was successful, we may still observe eventual consistency behavior in a read-after-write pattern such as this one.
Well, truly the appropriate solution here (despite the above workaround and planned workaround) is to not do read-before/after-writes in Cassandra like the DAO is making, which has been a hot topic internally for many years and a lost battle already.
@shuoqingding can you try DC Aware? Does it make any difference?
@bungle Sorry for the late reply, I use the single-DC cluster for Cassandra, so I think the DC awareness should not be a factor? please let me know if I am wrong.
@thibaultcha Thank you for the explanation and suggestions. "having an independent control node" is a great idea for this, however we have decided to utilize DB-less feature to avoid this issue. Thanks anyway!
Most helpful comment
Clarification on the above: despite using
:connect()in our business logic or having a request-aware B policy, there is no guarantee that the driver will use the same coordinator for each query in a given request, those are only a best-effort policy (a timeout or other failure may still occur and trigger a retry at the driver's level). Besides, if the coordinator is not a replica node for the row being inserted/selected, even if the best-effort policy in reusing the node was successful, we may still observe eventual consistency behavior in a read-after-write pattern such as this one.Well, truly the appropriate solution here (despite the above workaround and planned workaround) is to not do read-before/after-writes in Cassandra like the DAO is making, which has been a hot topic internally for many years and a lost battle already.