example broken cookie inbound_referral_site: Direct Traffic
Than cookie contains non escaped space and http.cookie.SimpleCookie fails to parse all of them and returns empty dictionary
Following code fails (sanic uses the same aproach)
c = SimpleCookie()
s1 = 'inbound_referral_site=Direct Traffic; good_cookie=123'
c.load(s1)
print(c.items())
Output: dict_items([])
It should return at least good cookies
https://github.com/django/django/blob/master/django/http/cookie.py#L20
Django's cookie parsing can work with such cookies without issues
He is right.
If you have this app:
from sanic import Sanic
from sanic import response
app = Sanic(__name__)
@app.route("/")
async def test(request):
print(request.cookie)
return response.json({"test": True})
if __name__ == '__main__':
app.run(host="0.0.0.0", port=8000)
and send a request like this:
import requests
requests.get('http://0.0.0.0:8000',cookies={'a1': 'b1', 'a2': 'b 2'})
printed cookies are empthy! There are other examples too.
Interesting, one of these conditions is where it's failing: https://github.com/python/cpython/blob/6f0eb93183519024cb360162bdd81b9faec97ba6/Lib/http/cookies.py#L565
Due to SimpleCookie(BaseCookie).load() using __parse_string.
I'm fine with taking the Django way then, but let's make sure there isn't any other important logic in __parse_string that it's missing.
Also here is the issue in Python https://bugs.python.org/issue2988.
Unfortunately it's closed already. There should at least be flag BaseCookie.load(cookiestring, ignore_errors=True)
I took a look into this topic as it caught my attention and I've found some very interesting points, that I'd like to point out.
Sanic is using the http.cookies from the python library, in the python docs for http.cookies states the following:
...strictly applied the parsing rules described in the RFC 2109 and RFC 2068 specifications
Moreover we also have from the docs:
The character set, string.ascii_letters, string.digits and !#$%&'*+-.^_`|~: denote the set of valid characters allowed by this module in Cookie name (as key).
And as we can see, no space is considered (I found this sentence a bit ambiguous... maybe it only applies to the Cookie's attribute and not to the value)
Worth mentioning: Changed in version 3.3: Allowed ‘:’ as a valid Cookie name character.
So now if we take a look into RFC 2109 under #4.1 Syntax we can see that a cookie is an attribute-value representation, furthermore it states:
The following grammar uses the notation, and tokens DIGIT (decimal digits) and token (informally, a sequence of non-special, non-white space characters) from the HTTP/1.1 specification RFC 2068 (HTTP/1.1) to describe their syntax.
The syntax states that a value can be a word => token | quoted-string
So then,
Attributes (names) (attr) are case-insensitive. White space is permitted between tokens.
NOTE: The syntax above allows whitespace between the attribute and the = sign.
This concludes that a cookie with spaces in an unquoted value as inbound_referral_site = Direct Traffic violates the cookie Syntax and therefore is not a valid Cookie. This means that a Cookie header is completely invalid if any attribute-value is broken (A cookie can have multiple attributes but please do not confuse the cookie's attributes with a Cookie).
A Cookie is the whole thing Cookie: .....
The correct form, not only by python but by the HTTP protocol, is to quote them:
inbound_referral_site = Direct Traffic => inbound_referral_site = "Direct Traffic"
Python code:
from http.cookies import SimpleCookie
# create a valid cookie
valid_cookie = SimpleCookie()
valid_cookie.load('inbound_referral_site="Direct Traffic"')
print(valid_cookie.output())
# out: 'Set-Cookie: inbound_referral_site="Direct Traffic"'
# create invalid cookie
invalid_cookie = SimpleCookie()
invalid_cookie.load('inbound_referral_site=Direct Traffic')
print(invalid_cookie.output())
# out: ''
@r0fls @Yaser-Amiri
@arnulfojr Accepted! It's not standard (compatible with RFC) BUT what developers do? _requests_ is the most favorited HTTP client in Python and it didn't implement it, _Django_ is most favorited web framework for Python and it didn't implement it. many other ... :/
I think it should be implemented :/
I see the controversy but I think we're going to have to side with the RFC. If it gets ammended, we'll change the parsing, but I think it's better to stick to the guidelines. Thank you @Yaser-Amiri for you contribution, and @arnulfojr for your research. BTW 418, I'm a teapot 🍵
@Yaser-Amiri if there is an issue with requests and its cookie attribute formatting (of course I've also used it a ton, but I don't remember encountering this) then I think you should open an issue there or search there issues. I hate to say it, but that's their problem if they're not following the instructions.
Most helpful comment
I took a look into this topic as it caught my attention and I've found some very interesting points, that I'd like to point out.
Sanic is using the
http.cookiesfrom the python library, in the python docs for http.cookies states the following:Moreover we also have from the docs:
And as we can see, no space is considered (I found this sentence a bit ambiguous... maybe it only applies to the Cookie's attribute and not to the value)
Worth mentioning:
Changed in version 3.3: Allowed ‘:’ as a valid Cookie name character.So now if we take a look into RFC 2109 under #4.1 Syntax we can see that a cookie is an attribute-value representation, furthermore it states:
The syntax states that a
valuecan be aword => token | quoted-stringSo then,
This concludes that a cookie with spaces in an unquoted value as
inbound_referral_site = Direct Trafficviolates the cookie Syntax and therefore is not a valid Cookie. This means that aCookieheader is completely invalid if any attribute-value is broken (A cookie can have multiple attributes but please do not confuse the cookie's attributes with a Cookie).A Cookie is the whole thing
Cookie: .....The correct form, not only by python but by the HTTP protocol, is to quote them:
inbound_referral_site = Direct Traffic=>inbound_referral_site = "Direct Traffic"Python code:
@r0fls @Yaser-Amiri