certbot 🚀 - Support IDNs via unicode on the commandline

Making this ticket the slightly more ambitious one that will require us to accept and encode IDNs correctly; #3619 is the first step, which is letting the user provide their own Punycode'd input.

pde on 12 Oct 2016

For reference, Python has some built-in punycode support in the encodings.idna module (in the standard library since Python 2.3!) https://docs.python.org/2/library/codecs.html#module-encodings.idna

But it looks like it operates on "labels", so I think you have to do

".".join(idna.ToASCII(idna.nameprep(x)) for x in d.split("."))

This worked correctly for test names that I tried. Notably, idna.ToASCII(idna.nameprep(d)) does _not_ do the right thing; for example, idna.ToASCII(idna.nameprep(u"Schönheit.com")) is 'xn--schnheit.com-6ib' instead of the correct 'xn--schnheit-p4a.com'.

(You have to apply the punycode encoding to each DNS label, not to the FQDN as a whole.)

schoen on 12 Oct 2016

Also, for consistent UI, we should probably then save some kind of data about whether each name was originally supplied as punycode or not. If the user asked for -d xn--schnheit-p4a.com, it might be confusing if we said "Congratulations! You've obtained a certificate for schönheit.com!", while if the user asked for -d schönheit.com, it might be similarly confusing if we said "Congratulations! You've obtained a certificate for xn-schnheit-p4a.com!". So it would be best to have Certbot somehow know which one the user asked for and then use that form in related output or error messages where possible.

I don't think we currently have anywhere that information like this could be represented.

schoen on 12 Oct 2016

@schoen we could save it as a config / renewal config variable. That would be easy if you wanted all domains to be reported as either punycode or not; but if you wanted it to be remembered on a per-domain basis, we'd have to save a JSON object which maps each domain to whether it was originally unicode or not.

pde on 12 Oct 2016

Perhaps another better solution would be to always display both: you've obtained a cert for xn--schnheit-p4a.com (schönheit.com).

pde on 12 Oct 2016

Oh, I wasn't even thinking about renewal! That seems like a harder problem. I was just thinking inside of the current run.

schoen on 12 Oct 2016

Yes, I think displaying both is going to be the easiest option! It would also help educate the user about the ways that the IDN could be represented (which might come up in other contexts).

schoen on 12 Oct 2016

I could make a function in util.py that represents a given domain for presentation, so if it's example.com it would say u"example.com" but if it's exámple.com it would say u"xn--exmple-qta.com (exámple.com)". We could use that function whenever giving any output that relates to a subject domain name.

schoen on 12 Oct 2016

Given a quick search it looks like the built-in Python package (encodings.inda) only supports IDNA2003 style encoding, which will cause slight incompatibilities with the Boulder implementation which only supports IDNA2008.

rolandshoemaker on 12 Oct 2016

@rolandshoemaker, do you have a reference on the differences between them, or can you explain them? Would there be a simple workaround possible?

schoen on 12 Oct 2016

I did follow through with my idea of making a function that can represent a domain for presentation starting from either Unicode, ACE, or pure ASCII forms:

>>> print(represent(u"example.com"))
example.com
>>> print(represent(u"exámple.com"))
exámple.com (xn--exmple-qta.com)
>>> print(represent(u"xn--exmple-qta.com"))
exámple.com (xn--exmple-qta.com)

Currently, my function is arguably buggy if the user provides mixed Unicode and ACE labels in a single FQDN (though arguably correct for some purposes).

schoen on 12 Oct 2016

There are a large number of deviations between the two encodings in quite common scripts, UTS46 Section 7 attempts to describe a subset of the differences but I don't think it would be feasible to shim the differences yourself.

I think the best option here is to find a third party library that explicitly supports IDNA2008.

rolandshoemaker on 12 Oct 2016

@rolandshoemaker, thanks for the heads-up! The good news is that there's a "drop-in" replacement for the Python standard library version:

https://pypi.python.org/pypi/idna/

The bad news is that we're going to have yet another PyPi dependency as a result.

schoen on 12 Oct 2016

Good news! We already indirectly depend on idna. I'm not sure which of our dependencies brings it in, but it's there. See a virtual environment after running pip install certbot or certbot-auto.

bmw on 12 Oct 2016

I propose closing this. U-labels in in IDNA are intended only as a presentation layer, with A-labels used under the hood.

For instance, when configuring a server_name in Nginx, you have to use A-labels. Since Certbot requires you to already have a virtual host for any hostnames you want to issue for, the user will already need to have a server_name in A-label form for their hostname. So there's no risk that the user doesn't know how to express their hostname in A-label form. Additionally, converting between the two will make presentation confusing for people who host IDNs but don't understand them. If Certbot offers to get me a certificate for ウェブ.crud.net, I don't have any way of knowing that that corresponds to the line in my Nginx config that says server_name xn--gckc5l.crud.net;

Also, when copying and pasting URLs from the URL bar in Chrome, you get the A-label, and when configuring your DNS, you have to use the A-label.

jsha on 23 Oct 2016

@jsha, would you want Certbot to attempt to show the U-label as well when displaying an A-label, even if it doesn't accept the former as input?

Would you want Certbot to give users who request a U-label name information about this, maybe with a reference to some documentation and tools for converting to an A-label? Like

One of the requested names you provided, ウェブ.crud.net, contains a non-ASCII character. Names that contain such characters have to be formatted as IDNA A-labels using punycode, like xn--gckc5l.crud.net, before being passed to Let's Encrypt or Certbot. For more information, please see https://en.wikipedia.org/wiki/Internationalized_domain_name#Internationalizing_Domain_Names_in_Applications

schoen on 24 Oct 2016

@pde, what do you think of @jsha's suggestion that we not directly allow IDNs to be requested by Certbot (but maybe try to show a useful message when a user attempts to use one, as I proposed in reply)?

schoen on 24 Oct 2016

I don't think forcing users to encode their unicode domains themselves is great UX wise, this will require they understand the differences between IDNA2003 and IDNA2008 etc and find their own conversion tools (of which the variously available online ones aren't that great).

I understand @jsha's reasoning here but I think given the expected user base of Certbot adding this extra complexity would be a net negative.

rolandshoemaker on 25 Oct 2016

I understand @jsha's reasoning here but I think given the expected user base of Certbot adding this extra complexity would be a net negative.

I think I'm misunderstanding you. Is your argument that Certbot _should_ or _should not_ accept Unicode for domains on the command line?

given the expected user base of Certbot

I think the implication here is that Certbot is aimed at novice administrators, or at least those not experience with the Web PKI. Which is true. But I think even those administrators mainly deal with domain names in Punycode form, simply because that is what they have to use in specifying domains in the most relevant tools.

IDNA is intended mainly for frontend display purposes.

I like Seth's suggestion, but I think we don't need to provide the example encoding. I think this case will be hit infrequently enough that it's not worth the complexity.

jsha on 25 Oct 2016

I think I'm misunderstanding you. Is your argument that Certbot should or should not accept Unicode for domains on the command line?

That it should accept Unicode. While it's true it is mainly for display purposes I doubt this will actually prevent people from trying to use it. The Chrome behavior is also non-standard, Firefox will for instance convert xn-- prefixed domains to their unicode representation (including when you copy the URL).

Given the discussions on the community forum and in IRC make it seem like the expected behavior from users is that they should be able to simply pass a unicode name to --domains I don't really see a good argument to disallow it.

rolandshoemaker on 25 Oct 2016

If the worry here is that people will be confused as to why their certs contain xn-- names instead of the Unicode ones why not just print a notice/warning type thing that explains it will links to further resources?

rolandshoemaker on 25 Oct 2016

If the worry here is that people will be confused as to why their certs contain xn-- names

No, my main concern is about avoiding unnecessary complexity. The best code is the code you don't have to write.

At a minimum, we can do a release without this, but with #3619 included, and see how many enhancement requests we get.

jsha on 25 Oct 2016

👍1

Would you want Certbot to attempt to show the U-label as well when displaying an A-label, even if it doesn't accept it as input?

Would you want Certbot to give users who request a U-label name information about this, maybe with a reference to some documentation and tools for converting to an A-label?

schoen on 25 Oct 2016

Would you want Certbot to attempt to show the U-label as well when displaying an A-label, even if it doesn't accept it as input?

Only if it didn't add additional dependencies. But I think it does, and I don't think the tradeoff is worth it.

Would you want Certbot to give users who request a U-label name information about this, maybe with a reference to some documentation and tools for converting to an A-label?

Yes, because this is easy to do with no extra dependencies, and with very little code.

jsha on 25 Oct 2016

Given as @bmw notes they are already using this dependency and that the actual code to translate from unicode to punycode would be a two line change as far as I can tell I don't really see this as 'unnecessary complexity'.

If there were easy to use, standardized 3rd-party tools that made doing this for a user easy without understanding the differences in the encoding styles I'd agree that this wasn't necessary, but after a bit of a search I couldn't easily find any (it seems 90% of existing tools only support IDNA2003 but make no actual mention of that fact).

rolandshoemaker on 25 Oct 2016

Only if it didn't add additional dependencies. But I think it does, and I don't think the tradeoff is worth it.

It's a real pity that the encodings changed from the 2003 to the 2008 version, because otherwise it would be as simple as .encode("idna") in Python! :frowning:

schoen on 25 Oct 2016

the actual code to translate from unicode to punycode would be a two line change

A slightly obscure counterargument is that it's still possible to use a non-UTF-8 terminal and have an encoding confusion between how the U-label appears to the user and how the U-label is presented to Certbot. (In Python 2, we would have to choose a character encoding to use when decoding sys.argv, while in Python 3 it's apparently chosen for us -- I don't know on what basis.) But I wouldn't expect a significant number of users to encounter this issue.

schoen on 25 Oct 2016

Is there either direction in which IDNA2008 always agrees with IDNA2003? That is, is either .encode("idna") or .decode("idna") from Python's IDNA2003 implementation still correct with respect to IDNA2008 U-labels? (If so, we could still use that direction without creating or cementing a library dependency.)

schoen on 26 Oct 2016

Russian domains is not support!
certbot certonly --standalone -d eps-nfo.ru -d xn----itbzecmx.xn--p1ai -d xn----jtbwehmq.xn--p1ai -d www.xn----itbzecmx.xn--p1ai -d www.xn----jtbwehmq.xn--p1ai -d www.eps-nfo.ru --no-ui

Obtaining a new certificate
An unexpected error occurred:
The request message was malformed :: Name does not end in a public suffix
Please see the logfiles in /var/log/letsencrypt for more details.

2016-11-02 01:08:24,411:DEBUG:certbot.main:Exiting abnormally:
Traceback (most recent call last):
  File "/root/letsencrypt/venv/bin/certbot", line 11, in <module>
    load_entry_point('certbot', 'console_scripts', 'certbot')()
  File "/root/letsencrypt/certbot/main.py", line 773, in main
    return config.func(config, plugins)
  File "/root/letsencrypt/certbot/main.py", line 569, in obtain_cert
    action, _ = _auth_from_domains(le_client, config, domains, lineage)
  File "/root/letsencrypt/certbot/main.py", line 100, in _auth_from_domains
    lineage = le_client.obtain_and_enroll_certificate(domains)
  File "/root/letsencrypt/certbot/client.py", line 281, in obtain_and_enroll_certificate
    certr, chain, key, _ = self.obtain_certificate(domains)
  File "/root/letsencrypt/certbot/client.py", line 253, in obtain_certificate
    self.config.allow_subset_of_names)
  File "/root/letsencrypt/certbot/auth_handler.py", line 68, in get_authorizations
    domain, self.account.regr.new_authzr_uri)
  File "/root/letsencrypt/acme/acme/client.py", line 212, in request_domain_challenges
    typ=messages.IDENTIFIER_FQDN, value=domain), new_authzr_uri)
  File "/root/letsencrypt/acme/acme/client.py", line 192, in request_challenges
    new_authz)
  File "/root/letsencrypt/acme/acme/client.py", line 663, in post
    return self._check_response(response, content_type=content_type)
  File "/root/letsencrypt/acme/acme/client.py", line 566, in _check_response
    raise messages.Error.from_json(jobj)
Error: urn:acme:error:malformed :: The request message was malformed :: Name does not end in a public suffix

slavonnet on 2 Nov 2016

👍1

Thanks @slavonnet. This is a bug that has been reported by others and will be fixed: https://community.letsencrypt.org/t/chinese-idn-issurance-requests-malformed/21528/5.

jsha on 2 Nov 2016

The client team decided to kick this issue for now. While many of us think we should probably support this feature at some point, we don't think it's critical for 0.10.0 and would like to see how many Certbot users request this feature so we can prioritize it accordingly.

bmw on 2 Nov 2016

👍3 👎1

The support is not critical from my point of view, but a major task since certbot is the recommended client. So the IDN support should be implemented in the near future... I'm waiting a long time for the IDN support (more than one year now) and I want to start with the recommended tools, before using 3rd party tools. Thanks in advance!

BernhardtD on 21 Nov 2016

To be clear, Certbot will allow you to request certs for IDNs starting in our 0.10.0 release. You'll just have to input the domains as punycode rather than unicode until this ticket is resolved.

bmw on 21 Nov 2016

👍1

Using the current master branch, I still get this error with a punycode domain:

An unexpected error occurred:
The request message was malformed :: Name does not end in a public suffix

The domain in question ends with the suffix .xn--ngbc5azd, the Arabic equivalent of .net, which has been generally available as a TLD since 2014.

benjamingeer on 7 Dec 2016

@benjamingeer There was a bug in Boulder (the Let's Encrypt server-side component Certbot talks to) that prevented issuance for IDN TLDs. You should follow this Issue for more information. As of yesterday this is fixed in Staging. I expect that you will likely be able to issue with certbot master for .xn--ngbc5azd domains Thursday after the planned Boulder update to production.

Hope that helps!

cpu on 7 Dec 2016

@bmw is there a timeline for 0.10 ?

matthiasg on 12 Dec 2016

Kind of. There's no hard deadline and we'll delay the release until it's ready. With that said, we hope to have the release out before the holidays.

bmw on 12 Dec 2016

👍1

Any ETA?

Holidays are over ;)

brealorg on 31 Dec 2016

Yeah we missed our deadline :frowning_face:

We should have a release out in the next week or two.

bmw on 3 Jan 2017

👍5

Alright, thanks for the reply! :+1:

brealorg on 3 Jan 2017

@bmw thats no biggie, just keep us in the loop :) thanks for the reply

matthiasg on 4 Jan 2017

I'm still having issues, tried registering a cert for bakkeløbet.dk. I puny encoded the domain name with punycoder.com. Using certbot 0.10.2.

cmdline: sudo letsencrypt certonly -a webroot --webroot-path=/var/www/bakkeløbet/wordpress/ -d xn--bakkelbet-q8a.dk -d www.xn--bakkelbet-q8a.dk

The output from letsencrypt is:
Encountered exception during recovery
'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/certbot/error_handler.py", line 99, in _call_registered
self.funcs-1
File "/usr/lib/python2.7/dist-packages/certbot/auth_handler.py", line 280, in _cleanup_challenges
self.auth.cleanup(achalls)
File "/usr/lib/python2.7/dist-packages/certbot/plugins/webroot.py", line 224, in cleanup
validation_path = self._get_validation_path(root_path, achall)
File "/usr/lib/python2.7/dist-packages/certbot/plugins/webroot.py", line 198, in _get_validation_path
return os.path.join(root_path, achall.chall.encode("token"))
File "/usr/lib/python2.7/posixpath.py", line 80, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)
An unexpected error occurred:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)
Please see the logfiles in /var/log/letsencrypt for more details.

Please add support for danish letters: æ, ø and å

brunis on 8 May 2017

Hi @brunis,

Looks like this is a problem with filename handling, not IDNs: https://github.com/certbot/certbot/issues/4630.

Closing this ticket for now, since it's likely to generate confusion.

jsha on 8 May 2017

I'm reopening this issue as we're getting duplicates (see #4912). I added more information to the original post for people who stumble across this issue.

bmw on 25 Jul 2017

As a reminder, Nginx and Apache don't actually support configuration using U-labels. Since we require that the domains specified on the command line be present in the configuration files (for Nginx and Apache plugins), an administrator already has to have encoded their domain name to Punycode at least once in order to configure their web server.

Interestingly, Nginx will blindly accept non-ASCII characters in a config, and will respond to those characters when sent in a Host header. But since user agents will never send those non-ASCII characters in a Host header, this is incorrect. See the below config, where putting the non-ASCII characters in an Nginx config results in a server that doesn't respond to the correct Host header.

nginx.conf:

error_log  nginx.error.log;
pid nginx.pid;
worker_processes  1;
daemon off;

events {
    worker_connections  1024;
}

http {
      server {
                listen 127.0.0.1:9000 default_server;
                access_log  nginx.access.log;
                location / {
                  return 200 "default server\n";
                }
      }

      server {
                listen 127.0.0.1:9000;
                server_name ウェブ.crud.net;
                access_log  nginx.access.log;
                location / {
                  return 400 "improperly encoded hostname\n";
                }
        }
}

$ nginx -c nginx.conf -p . &
$ curl localhost:9000 --header "Host: ウェブ.crud.net"
improperly encoded hostname
$ curl localhost:9000 --header "Host: xn--gckc5l.crud.net"
default server

jsha on 26 Jul 2017

an other failing case :

certbot-auto output :

http-01 challenge for xn--1-sua.money
http-01 challenge for cesium.xn--1-sua.money
http-01 challenge for www.xn--1-sua.money
Cleaning up challenges
An unexpected error occurred:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 13: ordinal not in range(128)
Please see the logfiles in /var/log/letsencrypt for more details.

log :

2018-04-09 18:33:00,002:DEBUG:certbot.error_handler:Calling registered functions
2018-04-09 18:33:00,002:INFO:certbot.auth_handler:Cleaning up challenges
2018-04-09 18:33:06,839:DEBUG:certbot.log:Exiting abnormally:
Traceback (most recent call last):
  File "/opt/eff.org/certbot/venv/bin/letsencrypt", line 11, in <module>
    sys.exit(main())
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 1266, in main
    return config.func(config, plugins)
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 1031, in run
    certname, lineage)
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 113, in _get_and_save_cert
    renewal.renew_cert(config, domains, le_client, lineage)
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/renewal.py", line 297, in renew_cert
    new_cert, new_chain, new_key, _ = le_client.obtain_certificate(domains)
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/client.py", line 294, in obtain_certificate
    orderr = self._get_order_and_authorizations(csr.data, self.config.allow_subset_of_names)
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/client.py", line 330, in _get_order_and_authorizations
    authzr = self.auth_handler.handle_authorizations(orderr, best_effort)
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/auth_handler.py", line 73, in handle_authorizations
    resp = self._solve_challenges(aauthzrs)
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/auth_handler.py", line 124, in _solve_challenges
    resp = self.auth.perform(all_achalls)
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/configurator.py", line 1033, in perform
    http_response = http_doer.perform()
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/http_01.py", line 61, in perform
    self.configurator.save("HTTP Challenge", True)
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/configurator.py", line 963, in save
    self.parser.filedump(ext='')
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/parser.py", line 242, in filedump
    out = nginxparser.dumps(tree)
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/nginxparser.py", line 134, in dumps
    return str(RawNginxDumper(blocks.spaced))
  File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/nginxparser.py", line 98, in __str__
    return ''.join(self)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 13: ordinal not in range(128)
2018-04-09 18:33:06,840:ERROR:certbot.log:An unexpected error occurred:

and nginx conf :

server {
        listen 80 default_server;
        listen [::]:80 http2 default_server;
        server_name _;
        index index.html index.nginx-debian.html;
        location '/.well-known/acme-challenge' {
                root /var/www/demo;
        }
        root /var/www/html;
        location / {
                return 301 https://$host$request_uri;
                # First attempt to serve request as file, then
                # as directory, then fall back to displaying a 404.
                try_files $uri $uri/ =404;
        }
}

server {
        listen 443 ssl http2;
        #server_name ǧ1.money xn--1-sua.money;
        server_name xn--1-sua.money;
        root /home/gammanu/1forma-tic.fr;
    ssl_certificate /etc/letsencrypt/live/demo.mycelia.tools-0001/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/demo.mycelia.tools-0001/privkey.pem; # managed by Certbot
}
server {
        listen 443 ssl http2;
        #server_name cesium.ǧ1.money cesium.xn--1-sua.money;
        server_name cesium.xn--1-sua.money;
        root /home/gammanu/g1.1000i100.fr/cesium/;
    ssl_certificate /etc/letsencrypt/live/demo.mycelia.tools-0001/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/demo.mycelia.tools-0001/privkey.pem; # managed by Certbot
}
server {
    listen 80;
    #server_name www.ǧ1.money www.xn--1-sua.money g1.money june.money www.g1.money www.june.money mlg1.fr www.mlg1.fr;
    server_name www.xn--1-sua.money g1.money june.money www.g1.money www.june.money mlg1.fr www.mlg1.fr;
    return 301 https://ǧ1.money$request_uri;
}
server {
    listen 80;
    server_name cesium.g1.money cesium.june.money;
    return 301 https://cesium.ǧ1.money$request_uri;
}

1000i100 on 9 Apr 2018

Interesting. It looks like Certbot manages to mostly parse the Nginx config, but later when RawNginxDumper attempts to dump it, it fails, presumably on the ǧ (0xc7a7 in UTF-8) in the return 301 statement. I'll file a separate issue. If you'd like to fix your immediate issue, try replacing the U-label (ǧ) in your redirects with the corresponding A-label (xn--...).

jsha on 9 Apr 2018

If you'd like to fix your immediate issue, try replacing the U-label (ǧ) in your redirects with the corresponding A-label (xn--...).

Thanks ! It works after deleting them from everywhere in the file (redirects and comments)

1000i100 on 10 Apr 2018

With certbot 0.31.0 I'm getting

Non-ASCII domain names not supported. To issue for an Internationalized Domain Name, use Punycode.

davispuh on 22 Apr 2019

Yes, that's expected. See the conversation above: Most components of the web serving infrastructure, including your domain name registration and your web server config, use the Punycode (xn--) representation. That's what you should use at the Certbot command line as well. The U-label form is mainly a display layer thing in the browser.

I'm going to lock this issue for now so we don't go too far off-topic.

jsha on 22 Apr 2019

Certbot: Support IDNs via unicode on the commandline

Most helpful comment

All 50 comments

Related issues