Esmvaltool: Loading provenance tags without specifying the encoding prduces weird yaml crashes

Created on 15 Mar 2019  路  14Comments  路  Source: ESMValGroup/ESMValTool

@bouweandela - have a look at this: fresh install of version2_developemnt on Jasmin and I get this crap:

(esmvaltool) [valeriu@jasmin-sci2 esmvaltool_var_test]$ esmvaltool -c config-user.yml recipe_preprocessor_test.yml Traceback (most recent call last):
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/reader.py", line 89, in peek
    return self.buffer[self.pointer+index]
IndexError: string index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/bin/esmvaltool", line 11, in <module>
    load_entry_point('ESMValTool==2.0a1', 'console_scripts', 'esmvaltool')()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/__init__.py", line 487, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2728, in load_entry_point
    return ep.load()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2346, in load
    return self.resolve()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2352, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0a1-py3.6.egg/esmvaltool/_main.py", line 41, in <module>
    from ._config import configure_logging, read_config_user_file
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0a1-py3.6.egg/esmvaltool/_config.py", line 202, in <module>
    TAGS = _load_tags()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0a1-py3.6.egg/esmvaltool/_config.py", line 199, in _load_tags
    return yaml.safe_load(file)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/__init__.py", line 94, in safe_load
    return load(stream, SafeLoader)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/__init__.py", line 72, in load
    return loader.get_single_data()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/constructor.py", line 35, in get_single_data
    node = self.get_single_node()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/composer.py", line 127, in compose_mapping_node
    while not self.check_event(MappingEndEvent):
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/parser.py", line 428, in parse_block_mapping_key
    if self.check_token(KeyToken):
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/scanner.py", line 116, in check_token
    self.fetch_more_tokens()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/scanner.py", line 252, in fetch_more_tokens
    return self.fetch_plain()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/scanner.py", line 676, in fetch_plain
    self.tokens.append(self.scan_plain())
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/scanner.py", line 1289, in scan_plain
    self.peek(length+1) in '\0 \t\r\n\x85\u2028\u2029') \
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/reader.py", line 91, in peek
    self.update(index+1)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/reader.py", line 153, in update
    self.update_raw()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/yaml/reader.py", line 178, in update_raw
    data = self.stream.read(size)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3688: ordinal not in range(128)

if in _config.py line 198 we specify the encoding like with open(filename, encoding='utf-8') as file: it works fine, but why would this work when there is no actual need of it?

Also, we need to tell @mattiarighi and others to stop adding entries with special characters (like accents and umlauts etc)

bug

All 14 comments

Hi V,

I think it's somehow related to the locale setup of your system. Can you try adding

import locale
locale.setlocale(locale.LC_ALL, 'C.UTF-8')

at the top of esmvaltool/_main.py and see if that helps?

Some more information: the default encoding value use by open is whatever is returned by locale.getpreferredencoding(), so I think we should be able to fix this for all files by ensuring that is 'UTF-8' using locale.setlocale(locale.LC_ALL, 'C.UTF-8').

See also:
https://docs.python.org/3/library/functions.html#open
https://docs.python.org/3/library/locale.html#locale.getpreferredencoding

Also, we need to tell @mattiarighi and others to stop adding entries with special characters (like accents and umlauts etc)

I don't think I agree with this. I think that people should be able to spell their names etc. correctly if possible.

@bouweandela so here's a dubious situation I am in: everytime I connect from my laptop to Jasmin, all works fine wrt this issue (ie I am not getting it); everytime I connect to Jasmin via a local Reading machine (the only way I can access Jasmin from outside the campus domain eg when I am at home or traveling) I get the encoding-related issue. Also adding the lines you suggest up a few comments in _main.py results in an unequivocal trace

(esmvaltool) [valeriu@jasmin-sci2 ~]$ esmvaltool --help
Traceback (most recent call last):
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/bin/esmvaltool", line 11, in <module>
    load_entry_point('ESMValTool==2.0a1', 'console_scripts', 'esmvaltool')()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/__init__.py", line 487, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2728, in load_entry_point
    return ep.load()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2346, in load
    return self.resolve()
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2352, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0a1-py3.6.egg/esmvaltool/_main.py", line 30, in <module>
    locale.setlocale(locale.LC_ALL, 'C.UTF-8')
  File "/home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/locale.py", line 598, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting
(esmvaltool) [valeriu@jasmin-sci2 ~]$ vim /home/users/valeriu/anaconda3Feb19/envs/esmvaltool/lib/python3.6/site-packages/ESMValTool-2.0a1-py3.6.egg/esmvaltool/_main.py

Ok, let's not add that to _main.py then, because apparently we cannot rely on that locale being present. I think you'll just need to set your locale to something that uses 'utf-8'. See e.g. here for some advice on how to do that: https://www.tecmint.com/set-system-locales-in-linux/

good pointer dude! So I found out that if I get on to Jasmin from oak.reading.ac.uk my locale is totally messed up:

(esmvaltool) [valeriu@jasmin-sci2 ~]$ locale
LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C

whreas if I am onto Jasmin straight from my computer:

(esmvaltool) [valeriu@jasmin-sci2 esmvaltool_var_test]$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

I will investigate with the Jasmin folk as why the hell this is happening

of relevance: https://stackoverflow.com/questions/29609371/how-do-not-pass-locale-through-ssh

setting locale as in the link you sent will not work on a site where one doesn't have admin rights (like me on Jasmin); also that explanation in the stackoverflow issue does not apply to me since I am not sending any environment via ssh

Read on a bit, at the bottom it says how to change it if you're not admin.

yes I tried man :grin: It don't work:

[valeriu@jasmin-sci2 ~]$ LANG="en_GB.UTF-8"
[valeriu@jasmin-sci2 ~]$ export LANG
[valeriu@jasmin-sci2 ~]$ LC_CTYPE="en_GB.UTF-8"
[valeriu@jasmin-sci2 ~]$ export LC_CTYPE
[valeriu@jasmin-sci2 ~]$ locale
LANG=en_GB.UTF-8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C

solution: add

export LANG=en_GB.UTF-8
export LC_ALL=en_GB.UTF-8

in .profile - note that this is a workaround rather than a solution since the ENV gets passed through via ssh even if I don't explicitly ask for it (that is very odd! my .ssh/config has absolutely no mention of the SendEnv variable).

Anyway, maybe we should put this information somewhere in the documentation so people don't get totally bamboozled

nevermind Bouwe, that IS the solution, just talked to Andy Heaps, the admin of oak, and the node is actually SendEnv-ing the environment; I'll add this to the documenation and also add the pointer to use gcc7 as per #974

Also, we need to tell @mattiarighi and others to stop adding entries with special characters (like accents and umlauts etc)

Don't look at me, I've never done that :eyes:

the Boss always gets it :grin:

Was this page helpful?
0 / 5 - 0 ratings