Boulder has merged IDN support recently and is planning to enable it in staging and production in the near future (next few weeks most likely).
Boulder/ACME requires that any IDN be converted to ASCII using the IDNA2008 ToASCII
encoding method before being sent in a CSR/authorization request etc etc.
bmw's edit: Since version 0.10.0, Certbot supports requesting IDNs as long as they are given to Certbot as punycode.
Making this ticket the slightly more ambitious one that will require us to accept and encode IDNs correctly; #3619 is the first step, which is letting the user provide their own Punycode'd input.
For reference, Python has some built-in punycode support in the encodings.idna
module (in the standard library since Python 2.3!) https://docs.python.org/2/library/codecs.html#module-encodings.idna
But it looks like it operates on "labels", so I think you have to do
".".join(idna.ToASCII(idna.nameprep(x)) for x in d.split("."))
This worked correctly for test names that I tried. Notably, idna.ToASCII(idna.nameprep(d))
does _not_ do the right thing; for example, idna.ToASCII(idna.nameprep(u"Schönheit.com"))
is 'xn--schnheit.com-6ib'
instead of the correct 'xn--schnheit-p4a.com'
.
(You have to apply the punycode encoding to each DNS label, not to the FQDN as a whole.)
Also, for consistent UI, we should probably then save some kind of data about whether each name was originally supplied as punycode or not. If the user asked for -d xn--schnheit-p4a.com
, it might be confusing if we said "Congratulations! You've obtained a certificate for schönheit.com!", while if the user asked for -d schönheit.com
, it might be similarly confusing if we said "Congratulations! You've obtained a certificate for xn-schnheit-p4a.com!". So it would be best to have Certbot somehow know which one the user asked for and then use that form in related output or error messages where possible.
I don't think we currently have anywhere that information like this could be represented.
@schoen we could save it as a config / renewal config variable. That would be easy if you wanted all domains to be reported as either punycode or not; but if you wanted it to be remembered on a per-domain basis, we'd have to save a JSON object which maps each domain to whether it was originally unicode or not.
Perhaps another better solution would be to always display both: you've obtained a cert for xn--schnheit-p4a.com (schönheit.com)
.
Oh, I wasn't even thinking about renewal! That seems like a harder problem. I was just thinking inside of the current run.
Yes, I think displaying both is going to be the easiest option! It would also help educate the user about the ways that the IDN could be represented (which might come up in other contexts).
I could make a function in util.py
that represents a given domain for presentation, so if it's example.com it would say u"example.com" but if it's exámple.com it would say u"xn--exmple-qta.com (exámple.com)". We could use that function whenever giving any output that relates to a subject domain name.
Given a quick search it looks like the built-in Python package (encodings.inda
) only supports IDNA2003 style encoding, which will cause slight incompatibilities with the Boulder implementation which only supports IDNA2008.
@rolandshoemaker, do you have a reference on the differences between them, or can you explain them? Would there be a simple workaround possible?
I did follow through with my idea of making a function that can represent a domain for presentation starting from either Unicode, ACE, or pure ASCII forms:
>>> print(represent(u"example.com"))
example.com
>>> print(represent(u"exámple.com"))
exámple.com (xn--exmple-qta.com)
>>> print(represent(u"xn--exmple-qta.com"))
exámple.com (xn--exmple-qta.com)
Currently, my function is arguably buggy if the user provides mixed Unicode and ACE labels in a single FQDN (though arguably correct for some purposes).
There are a large number of deviations between the two encodings in quite common scripts, UTS46 Section 7 attempts to describe a subset of the differences but I don't think it would be feasible to shim the differences yourself.
I think the best option here is to find a third party library that explicitly supports IDNA2008.
@rolandshoemaker, thanks for the heads-up! The good news is that there's a "drop-in" replacement for the Python standard library version:
https://pypi.python.org/pypi/idna/
The bad news is that we're going to have yet another PyPi dependency as a result.
Good news! We already indirectly depend on idna
. I'm not sure which of our dependencies brings it in, but it's there. See a virtual environment after running pip install certbot
or certbot-auto.
I propose closing this. U-labels in in IDNA are intended only as a presentation layer, with A-labels used under the hood.
For instance, when configuring a server_name
in Nginx, you have to use A-labels. Since Certbot requires you to already have a virtual host for any hostnames you want to issue for, the user will already need to have a server_name
in A-label form for their hostname. So there's no risk that the user doesn't know how to express their hostname in A-label form. Additionally, converting between the two will make presentation confusing for people who host IDNs but don't understand them. If Certbot offers to get me a certificate for ウェブ.crud.net
, I don't have any way of knowing that that corresponds to the line in my Nginx config that says server_name xn--gckc5l.crud.net;
Also, when copying and pasting URLs from the URL bar in Chrome, you get the A-label, and when configuring your DNS, you have to use the A-label.
@jsha, would you want Certbot to attempt to show the U-label as well when displaying an A-label, even if it doesn't accept the former as input?
Would you want Certbot to give users who request a U-label name information about this, maybe with a reference to some documentation and tools for converting to an A-label? Like
One of the requested names you provided, ウェブ.crud.net, contains a non-ASCII character. Names that contain such characters have to be formatted as IDNA A-labels using punycode, like xn--gckc5l.crud.net, before being passed to Let's Encrypt or Certbot. For more information, please see https://en.wikipedia.org/wiki/Internationalized_domain_name#Internationalizing_Domain_Names_in_Applications
@pde, what do you think of @jsha's suggestion that we not directly allow IDNs to be requested by Certbot (but maybe try to show a useful message when a user attempts to use one, as I proposed in reply)?
I don't think forcing users to encode their unicode domains themselves is great UX wise, this will require they understand the differences between IDNA2003 and IDNA2008 etc and find their own conversion tools (of which the variously available online ones aren't that great).
I understand @jsha's reasoning here but I think given the expected user base of Certbot adding this extra complexity would be a net negative.
I understand @jsha's reasoning here but I think given the expected user base of Certbot adding this extra complexity would be a net negative.
I think I'm misunderstanding you. Is your argument that Certbot _should_ or _should not_ accept Unicode for domains on the command line?
given the expected user base of Certbot
I think the implication here is that Certbot is aimed at novice administrators, or at least those not experience with the Web PKI. Which is true. But I think even those administrators mainly deal with domain names in Punycode form, simply because that is what they have to use in specifying domains in the most relevant tools.
IDNA is intended mainly for frontend display purposes.
I like Seth's suggestion, but I think we don't need to provide the example encoding. I think this case will be hit infrequently enough that it's not worth the complexity.
I think I'm misunderstanding you. Is your argument that Certbot should or should not accept Unicode for domains on the command line?
That it should accept Unicode. While it's true it is mainly for display purposes I doubt this will actually prevent people from trying to use it. The Chrome behavior is also non-standard, Firefox will for instance convert xn--
prefixed domains to their unicode representation (including when you copy the URL).
Given the discussions on the community forum and in IRC make it seem like the expected behavior from users is that they should be able to simply pass a unicode name to --domains
I don't really see a good argument to disallow it.
If the worry here is that people will be confused as to why their certs contain xn--
names instead of the Unicode ones why not just print a notice/warning type thing that explains it will links to further resources?
If the worry here is that people will be confused as to why their certs contain xn-- names
No, my main concern is about avoiding unnecessary complexity. The best code is the code you don't have to write.
At a minimum, we can do a release without this, but with #3619 included, and see how many enhancement requests we get.
Would you want Certbot to attempt to show the U-label as well when displaying an A-label, even if it doesn't accept it as input?
Would you want Certbot to give users who request a U-label name information about this, maybe with a reference to some documentation and tools for converting to an A-label?
Would you want Certbot to attempt to show the U-label as well when displaying an A-label, even if it doesn't accept it as input?
Only if it didn't add additional dependencies. But I think it does, and I don't think the tradeoff is worth it.
Would you want Certbot to give users who request a U-label name information about this, maybe with a reference to some documentation and tools for converting to an A-label?
Yes, because this is easy to do with no extra dependencies, and with very little code.
Given as @bmw notes they are already using this dependency and that the actual code to translate from unicode to punycode would be a two line change as far as I can tell I don't really see this as 'unnecessary complexity'.
If there were easy to use, standardized 3rd-party tools that made doing this for a user easy without understanding the differences in the encoding styles I'd agree that this wasn't necessary, but after a bit of a search I couldn't easily find any (it seems 90% of existing tools only support IDNA2003 but make no actual mention of that fact).
Only if it didn't add additional dependencies. But I think it does, and I don't think the tradeoff is worth it.
It's a real pity that the encodings changed from the 2003 to the 2008 version, because otherwise it would be as simple as .encode("idna")
in Python! :frowning:
the actual code to translate from unicode to punycode would be a two line change
A slightly obscure counterargument is that it's still possible to use a non-UTF-8 terminal and have an encoding confusion between how the U-label appears to the user and how the U-label is presented to Certbot. (In Python 2, we would have to choose a character encoding to use when decoding sys.argv
, while in Python 3 it's apparently chosen for us -- I don't know on what basis.) But I wouldn't expect a significant number of users to encounter this issue.
Is there either direction in which IDNA2008 always agrees with IDNA2003? That is, is either .encode("idna")
or .decode("idna")
from Python's IDNA2003 implementation still correct with respect to IDNA2008 U-labels? (If so, we could still use that direction without creating or cementing a library dependency.)
Russian domains is not support!
certbot certonly --standalone -d eps-nfo.ru -d xn----itbzecmx.xn--p1ai -d xn----jtbwehmq.xn--p1ai -d www.xn----itbzecmx.xn--p1ai -d www.xn----jtbwehmq.xn--p1ai -d www.eps-nfo.ru --no-ui
Obtaining a new certificate
An unexpected error occurred:
The request message was malformed :: Name does not end in a public suffix
Please see the logfiles in /var/log/letsencrypt for more details.
2016-11-02 01:08:24,411:DEBUG:certbot.main:Exiting abnormally:
Traceback (most recent call last):
File "/root/letsencrypt/venv/bin/certbot", line 11, in <module>
load_entry_point('certbot', 'console_scripts', 'certbot')()
File "/root/letsencrypt/certbot/main.py", line 773, in main
return config.func(config, plugins)
File "/root/letsencrypt/certbot/main.py", line 569, in obtain_cert
action, _ = _auth_from_domains(le_client, config, domains, lineage)
File "/root/letsencrypt/certbot/main.py", line 100, in _auth_from_domains
lineage = le_client.obtain_and_enroll_certificate(domains)
File "/root/letsencrypt/certbot/client.py", line 281, in obtain_and_enroll_certificate
certr, chain, key, _ = self.obtain_certificate(domains)
File "/root/letsencrypt/certbot/client.py", line 253, in obtain_certificate
self.config.allow_subset_of_names)
File "/root/letsencrypt/certbot/auth_handler.py", line 68, in get_authorizations
domain, self.account.regr.new_authzr_uri)
File "/root/letsencrypt/acme/acme/client.py", line 212, in request_domain_challenges
typ=messages.IDENTIFIER_FQDN, value=domain), new_authzr_uri)
File "/root/letsencrypt/acme/acme/client.py", line 192, in request_challenges
new_authz)
File "/root/letsencrypt/acme/acme/client.py", line 663, in post
return self._check_response(response, content_type=content_type)
File "/root/letsencrypt/acme/acme/client.py", line 566, in _check_response
raise messages.Error.from_json(jobj)
Error: urn:acme:error:malformed :: The request message was malformed :: Name does not end in a public suffix
Thanks @slavonnet. This is a bug that has been reported by others and will be fixed: https://community.letsencrypt.org/t/chinese-idn-issurance-requests-malformed/21528/5.
The client team decided to kick this issue for now. While many of us think we should probably support this feature at some point, we don't think it's critical for 0.10.0 and would like to see how many Certbot users request this feature so we can prioritize it accordingly.
The support is not critical from my point of view, but a major task since certbot is the recommended client. So the IDN support should be implemented in the near future... I'm waiting a long time for the IDN support (more than one year now) and I want to start with the recommended tools, before using 3rd party tools. Thanks in advance!
To be clear, Certbot will allow you to request certs for IDNs starting in our 0.10.0 release. You'll just have to input the domains as punycode rather than unicode until this ticket is resolved.
Using the current master
branch, I still get this error with a punycode domain:
An unexpected error occurred:
The request message was malformed :: Name does not end in a public suffix
The domain in question ends with the suffix .xn--ngbc5azd
, the Arabic equivalent of .net
, which has been generally available as a TLD since 2014.
@benjamingeer There was a bug in Boulder (the Let's Encrypt server-side component Certbot talks to) that prevented issuance for IDN TLDs. You should follow this Issue for more information. As of yesterday this is fixed in Staging. I expect that you will likely be able to issue with certbot master for .xn--ngbc5azd
domains Thursday after the planned Boulder update to production.
Hope that helps!
@bmw is there a timeline for 0.10 ?
Kind of. There's no hard deadline and we'll delay the release until it's ready. With that said, we hope to have the release out before the holidays.
Any ETA?
Holidays are over ;)
Yeah we missed our deadline :frowning_face:
We should have a release out in the next week or two.
Alright, thanks for the reply! :+1:
@bmw thats no biggie, just keep us in the loop :) thanks for the reply
I'm still having issues, tried registering a cert for bakkeløbet.dk. I puny encoded the domain name with punycoder.com. Using certbot 0.10.2.
cmdline: sudo letsencrypt certonly -a webroot --webroot-path=/var/www/bakkeløbet/wordpress/ -d xn--bakkelbet-q8a.dk -d www.xn--bakkelbet-q8a.dk
The output from letsencrypt is:
Encountered exception during recovery
'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/certbot/error_handler.py", line 99, in _call_registered
self.funcs-1
File "/usr/lib/python2.7/dist-packages/certbot/auth_handler.py", line 280, in _cleanup_challenges
self.auth.cleanup(achalls)
File "/usr/lib/python2.7/dist-packages/certbot/plugins/webroot.py", line 224, in cleanup
validation_path = self._get_validation_path(root_path, achall)
File "/usr/lib/python2.7/dist-packages/certbot/plugins/webroot.py", line 198, in _get_validation_path
return os.path.join(root_path, achall.chall.encode("token"))
File "/usr/lib/python2.7/posixpath.py", line 80, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)
An unexpected error occurred:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)
Please see the logfiles in /var/log/letsencrypt for more details.
Please add support for danish letters: æ, ø and å
Hi @brunis,
Looks like this is a problem with filename handling, not IDNs: https://github.com/certbot/certbot/issues/4630.
Closing this ticket for now, since it's likely to generate confusion.
I'm reopening this issue as we're getting duplicates (see #4912). I added more information to the original post for people who stumble across this issue.
As a reminder, Nginx and Apache don't actually support configuration using U-labels. Since we require that the domains specified on the command line be present in the configuration files (for Nginx and Apache plugins), an administrator already has to have encoded their domain name to Punycode at least once in order to configure their web server.
Interestingly, Nginx will blindly accept non-ASCII characters in a config, and will respond to those characters when sent in a Host header. But since user agents will never send those non-ASCII characters in a Host header, this is incorrect. See the below config, where putting the non-ASCII characters in an Nginx config results in a server that doesn't respond to the correct Host header.
nginx.conf:
error_log nginx.error.log;
pid nginx.pid;
worker_processes 1;
daemon off;
events {
worker_connections 1024;
}
http {
server {
listen 127.0.0.1:9000 default_server;
access_log nginx.access.log;
location / {
return 200 "default server\n";
}
}
server {
listen 127.0.0.1:9000;
server_name ウェブ.crud.net;
access_log nginx.access.log;
location / {
return 400 "improperly encoded hostname\n";
}
}
}
$ nginx -c nginx.conf -p . &
$ curl localhost:9000 --header "Host: ウェブ.crud.net"
improperly encoded hostname
$ curl localhost:9000 --header "Host: xn--gckc5l.crud.net"
default server
an other failing case :
certbot-auto output :
http-01 challenge for xn--1-sua.money
http-01 challenge for cesium.xn--1-sua.money
http-01 challenge for www.xn--1-sua.money
Cleaning up challenges
An unexpected error occurred:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 13: ordinal not in range(128)
Please see the logfiles in /var/log/letsencrypt for more details.
log :
2018-04-09 18:33:00,002:DEBUG:certbot.error_handler:Calling registered functions
2018-04-09 18:33:00,002:INFO:certbot.auth_handler:Cleaning up challenges
2018-04-09 18:33:06,839:DEBUG:certbot.log:Exiting abnormally:
Traceback (most recent call last):
File "/opt/eff.org/certbot/venv/bin/letsencrypt", line 11, in <module>
sys.exit(main())
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 1266, in main
return config.func(config, plugins)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 1031, in run
certname, lineage)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/main.py", line 113, in _get_and_save_cert
renewal.renew_cert(config, domains, le_client, lineage)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/renewal.py", line 297, in renew_cert
new_cert, new_chain, new_key, _ = le_client.obtain_certificate(domains)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/client.py", line 294, in obtain_certificate
orderr = self._get_order_and_authorizations(csr.data, self.config.allow_subset_of_names)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/client.py", line 330, in _get_order_and_authorizations
authzr = self.auth_handler.handle_authorizations(orderr, best_effort)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/auth_handler.py", line 73, in handle_authorizations
resp = self._solve_challenges(aauthzrs)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot/auth_handler.py", line 124, in _solve_challenges
resp = self.auth.perform(all_achalls)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/configurator.py", line 1033, in perform
http_response = http_doer.perform()
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/http_01.py", line 61, in perform
self.configurator.save("HTTP Challenge", True)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/configurator.py", line 963, in save
self.parser.filedump(ext='')
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/parser.py", line 242, in filedump
out = nginxparser.dumps(tree)
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/nginxparser.py", line 134, in dumps
return str(RawNginxDumper(blocks.spaced))
File "/opt/eff.org/certbot/venv/local/lib/python2.7/site-packages/certbot_nginx/nginxparser.py", line 98, in __str__
return ''.join(self)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 13: ordinal not in range(128)
2018-04-09 18:33:06,840:ERROR:certbot.log:An unexpected error occurred:
and nginx conf :
server {
listen 80 default_server;
listen [::]:80 http2 default_server;
server_name _;
index index.html index.nginx-debian.html;
location '/.well-known/acme-challenge' {
root /var/www/demo;
}
root /var/www/html;
location / {
return 301 https://$host$request_uri;
# First attempt to serve request as file, then
# as directory, then fall back to displaying a 404.
try_files $uri $uri/ =404;
}
}
server {
listen 443 ssl http2;
#server_name ǧ1.money xn--1-sua.money;
server_name xn--1-sua.money;
root /home/gammanu/1forma-tic.fr;
ssl_certificate /etc/letsencrypt/live/demo.mycelia.tools-0001/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/demo.mycelia.tools-0001/privkey.pem; # managed by Certbot
}
server {
listen 443 ssl http2;
#server_name cesium.ǧ1.money cesium.xn--1-sua.money;
server_name cesium.xn--1-sua.money;
root /home/gammanu/g1.1000i100.fr/cesium/;
ssl_certificate /etc/letsencrypt/live/demo.mycelia.tools-0001/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/demo.mycelia.tools-0001/privkey.pem; # managed by Certbot
}
server {
listen 80;
#server_name www.ǧ1.money www.xn--1-sua.money g1.money june.money www.g1.money www.june.money mlg1.fr www.mlg1.fr;
server_name www.xn--1-sua.money g1.money june.money www.g1.money www.june.money mlg1.fr www.mlg1.fr;
return 301 https://ǧ1.money$request_uri;
}
server {
listen 80;
server_name cesium.g1.money cesium.june.money;
return 301 https://cesium.ǧ1.money$request_uri;
}
Interesting. It looks like Certbot manages to mostly parse the Nginx config, but later when RawNginxDumper attempts to dump it, it fails, presumably on the ǧ
(0xc7a7 in UTF-8) in the return 301
statement. I'll file a separate issue. If you'd like to fix your immediate issue, try replacing the U-label (ǧ
) in your redirects with the corresponding A-label (xn--
...).
If you'd like to fix your immediate issue, try replacing the U-label (ǧ) in your redirects with the corresponding A-label (xn--...).
Thanks ! It works after deleting them from everywhere in the file (redirects and comments)
With certbot 0.31.0 I'm getting
Non-ASCII domain names not supported. To issue for an Internationalized Domain Name, use Punycode.
Yes, that's expected. See the conversation above: Most components of the web serving infrastructure, including your domain name registration and your web server config, use the Punycode (xn--
) representation. That's what you should use at the Certbot command line as well. The U-label form is mainly a display layer thing in the browser.
I'm going to lock this issue for now so we don't go too far off-topic.
Most helpful comment
Yeah we missed our deadline :frowning_face:
We should have a release out in the next week or two.