Kong: Router doesn't match URL encoded URIs

Created on 10 Apr 2017  Â·  16Comments  Â·  Source: Kong/kong

Summary

When an API is setup to match a URL encoded URI, Kong returns 404 - [message:no API found with those values]

Steps To Reproduce

  1. setup an API with URI /endel%C3%B8st (Fails)
{
  "uris": [
    "/endel%C3%B8st"
  ],
  "id": "b3ad4077-b9b4-4b7d-8b63-6900491e2fae",
  "upstream_read_timeout": 60000,
  "preserve_host": true,
  "created_at": 1491842755000,
  "upstream_connect_timeout": 60000,
  "upstream_url": "http://192.168.195.119:8080",
  "strip_uri": false,
  "name": "my-api",
  "https_only": false,
  "http_if_terminated": true,
  "retries": 5,
  "upstream_send_timeout": 60000,
  "hosts": [
    "api.lokal"
  ]
}
  1. curl --request GET --url http://api.lokal:8000/endel%C3%B8st returns 404 from Kong
    Kong access.log:
    172.17.0.1 - - [10/Apr/2017:16:46:07 +0000] "GET /endel%C3%B8st HTTP/1.1" 404 56 "-" "-"

  2. setup an API with URI /ende (Passes)

{
  "uris": [
    "/ende"
  ],
  "id": "a7ce01b8-7bd8-4718-b372-5d80f052ef6c",
  "upstream_read_timeout": 60000,
  "preserve_host": true,
  "created_at": 1491842825000,
  "upstream_connect_timeout": 60000,
  "upstream_url": "http://192.168.195.119:8080",
  "strip_uri": false,
  "name": "my-api",
  "https_only": false,
  "http_if_terminated": true,
  "retries": 5,
  "upstream_send_timeout": 60000,
  "hosts": [
    "api.lokal"
  ]
}
  1. curl --request GET --url http://api.lokal:8000/endel%C3%B8st is passed to upstream
    Kong access.log:
    172.17.0.1 - - [10/Apr/2017:16:47:24 +0000] "GET /endel%C3%B8st HTTP/1.1" 401 131 "-" "-"

Additional Details & Logs

  • Kong version (0.10.1)

All 16 comments

@rasmusrask, yes you are correct. Kong router uses ngx.var.uri:
https://github.com/Mashape/kong/blob/master/kong/core/router.lua#L619

which is a normalized version of uri. It is possible to use ngx.var.request_uri instead and strip the possible arguments out of it when doing matching:
http://nginx.org/en/docs/http/ngx_http_core_module.html#var_request_uri

(or url-encode ngx.var.uri).

@rasmusrask,

try to change this line:
https://github.com/Mashape/kong/blob/master/kong/core/router.lua#L619

local uri         = ngx.var.uri

to

local uri         = ngx.var.request_uri
do
  local s = string.find(uri, "?", 2, true)
  if s then
    uri = string.sub(uri, 1, s - 1)
  end
end

And report if this makes a difference. Not sure though if we want this as there are some niceties in this normalization aka it normalizes url /test/../here to /here and it would mean that we need to then reimplement this normalization for some cases where it is actually valuable.

@rasmusrask, you could also try to setup API with /endeløst and then request /endel%C3%B8st

@bungle It is not possible to setup the API with /endeløst. Kong returns

[uris:uri with value '/endeløst' is invalid: must only contain alphanumeric and '., -, _, ~, /, %' characters]

Possibly beacuse https://github.com/Mashape/kong/issues/1205 was solved using %-encoded characters.

@bungle The fix you supplied works, but I'm not sure why %C3%B8 is normalized in ngx.var.uri.
The way I understand https://tools.ietf.org/html/rfc3986#section-2.3 only the unreserved characters should be normalized (as well as the dot-segments).

Kong should be as transparent as possible and thus should avoid normalizing any incoming URI before trying to match it to a registered API. I think we should use ngx.var.request_uri just as @bungle suggested, taking care of stripping away the query string part.

@bungle or @rasmusrask Would you mind proposing a PR for your patch with possible test cases for those behaviors you described? Thank you!

@thibaultcha,

Are you ok with no normalization at all? e.g. /../ normalization doesn't then happen?

Indeed, I think that's what we should be aiming for.

I will prosose a PR.

There are three groups of characters in a valid URI:
A) reserved (may need to be percent-encoded, depend on scheme)
B) unreserved (shall never be percent-encoded)
C) other (shall always be percent encoded)

If this is not satisified, it is not a URI.

Normalization has nothing to do with percent-encoding characters (with one exception, irrelevant to this case that it enforces (B) ). I suggest that normalization is not re-implemented.
This issue is about character group (C); characters that are neither reserved nor unreserved. They (other characters) shall always be percent-encoded whether or not the URI is normalized or not. Otherwise it is not a (valid) URI. So if the value of variables like ngx.var.request_uri and uri should match the variable names it should contain URI's (where other characters are percent-encoded).

I disagree that ngx.var.request_uri (non-normalized URI) is a better choice. I think keep using ngx.var.uri (normalized URI) would be the right choice. There, in the normalized URI universe "other characters" _shall still_ be percent-encoded (and unreserved characters percent-decoded by the way).

@jarl-dk we are mainly talking about nginx $uri / $document_uri normalization here. It seems it is problematic (as noted) — and I'm pretty sure Nginx by default changes the semantics for $uri (compared to original request).

We could change the API side to allow say /endeløst, but it would be the best if we could use same Nginx normalization procedure for them as well, but there is no such API that we can use. We could make a request to some internal uri normalization url to get that though, but it seems ugly.

Another option, as proposed here and in #2373 is that we could use $request_uri, and there we just pass that on (maybe stripping the prefix) and let the receiving party do whatever he wants with that (aka normalize it in their end). @jarl-dk, why do you think that it is important to do normalization on reverse proxy? For a better API matching, maybe (?).

Yet another option is to normalize on our own, which is bad as well.

Nginx $uri normalization is described as:

after decoding the text encoded in the “%XX” form, resolving references to relative path components “.” and “..”, and possible compression of two or more adjacent slashes into a single slash.

(possible compression is by default on, but can be controller by http://nginx.org/en/docs/http/ngx_http_core_module.html#merge_slashes).

@bungle : I agree it is not important to work with normalized uris on a reverse proxy. I just didn't see any reason to change this as I expected that the Nginx implementation of "URI normalization" matched the common understanding of URI normalization.

However your quote from Nginx implementation of "normalization" indicates that the Nginx normalization will result in invalid URIs (that is really sad!), where "other characters" like "ø" will be percent-decoded resulting in an invalid URI when the URI contains such "other characters". That is very sad.

So given that information I have changed my mind and totally agree with you to avoid Nginx normalization by all means.

@bungle : I think ngx.var.request_uri is the original request uri and cannot be modified.
Say, I have modified the uri in a plugin using ngx.set_uri(newUri) , changing ngx.var.uri to ngx.var.request_uri won't make kong route to the updated uri as ngx.var.request_uri still holds the original uri. What do you think about this?

@shiprabehera, correct. It doesn't matter in this case. What matters here is what we use as a base value. My PR changes the Kong router to use request_uri.

I just merged the fix to master with some additional integration tests. Thanks again for the report @rasmusrask, and thanks for the fix @bungle!

Was this page helpful?
0 / 5 - 0 ratings