Hosts: UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 3: character maps to <undefined>

Created on 12 Jun 2020 · 8Comments · Source: StevenBlack/hosts

Hi all 👋

I'm getting this error while running the script and updating the sources. I'm on Windows 10 with Python 3.8.3.

Traceback (most recent call last):
  File "updateHostsFile.py", line 1750, in <module>
    main()
  File "updateHostsFile.py", line 282, in main
    final_file = remove_dups_and_excl(merge_file, exclusion_regexes)
  File "updateHostsFile.py", line 937, in remove_dups_and_excl
    hostname, normalized_rule = normalize_rule(
  File "updateHostsFile.py", line 1025, in normalize_rule
    print("==>%s<==" % rule)
  File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 3: character maps to <undefined>

Source

EgidioCaprino

Most helpful comment

Assuming that we are talking about the cp1252 encoding as mentioned in:

File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]

I can't (literally) reproduce.

$ # Change the Python encoding to CP1252 through the `PYTHONIOENCODING` environment variable.
$  export PYTHONIOENCODING="cp1252"
$ # Start the generation.
$ python updateHostsFile.py -a
[truncated]
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder 
It contains 57,286 unique entries.

Therefore, I don't know where the problem is here. Unless OP can give us more information, I'm not going to look for a problem which may not exist.

Other info

Python version

$ python -VV
Python 3.8.3 (default, May 17 2020, 18:15:42) 
[GCC 10.1.0]

Why using the `PYTHONIOENCODING` environment variable?

As the problem comes from print(), that means that I can reproduce by changing the default stdout encoding.

File "updateHostsFile.py", line 1025, in normalize_rule
    print("==>%s<==" % rule)

Here is the example, which proves that it's working.

$ export PYTHONIOENCODING="utf-8"
$ python
Python 3.8.3 (default, May 17 2020, 18:15:42) 
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'utf-8'
>>> print(u'\xe9')
é

$ export PYTHONIOENCODING="cp1252"
$ python
Python 3.8.3 (default, May 17 2020, 18:15:42) 
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp1252'
>>> print('\xe9')
�

Now what about `\ufeff`?

I never played with it but it is here good explained.

So I tried, with PYTHONIOENCODING (again).

With CP1252

>>> print('\ufeff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/encodings/cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to <undefined>

With UTF-8

>>> print('\ufeff')

>>>

Now, talking about this project (itself), I really don't know where \ufeff comes from as the line:

    print("==>%s<==" % rule)

is generated at the end... And I really can't find anything about this.

@StevenBlack @XhmikosR I leave the rest for you!

funilrys on 6 Jul 2020

👍2

All 8 comments

Works fine here:

C:\Users\xmr\Desktop\hosts>ver

Microsoft Windows [Version 10.0.19041.329]

C:\Users\xmr\Desktop\hosts>python --version
Python 3.8.3

C:\Users\xmr\Desktop\hosts>python updateHostsFile.py
Do you want to update all data sources? [Y/n] n
OK, we'll stick with what we've got locally.
Do you want to exclude any domains?
For example, hulu.com video streaming must be able to access its tracking and ad servers in order to play video. [Y/n] n
OK, we'll only exclude domains in the whitelist.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder
It contains 57,460 unique entries.
Do you want to replace your existing hosts file with the newly generated file? [Y/n] n

XhmikosR on 13 Jun 2020

What's your system config and the exact command you are using to run the script? Also, I assume you are on the latest master?

C:\Users\xmr\Desktop>@systeminfo | @findstr /B /C:"OS Name" /B /C:"OS Version" /B /C:"System Locale" /B /C:"Input Locale"
OS Name:                   Microsoft Windows 10 Pro
OS Version:                10.0.19041 N/A Build 19041
System Locale:             en-us;English (United States)
Input Locale:              en-us;English (United States)

XhmikosR on 13 Jun 2020

👍1

Closing.

StevenBlack on 15 Jun 2020

@StevenBlack do note that this is probably valid and we should be using encoding="utf-8" in more places. It just happens with specific locales, probably.

@funilrys FYI

XhmikosR on 15 Jun 2020

@XhmikosR I closed this because OP appears unresponsive...

StevenBlack on 15 Jun 2020

@XhmikosR if I don't update the data sources it works fine, as you did:

Do you want to update all data sources? [Y/n] n

Does it still work for you when you update the data sources?

EgidioCaprino on 16 Jun 2020

You still fail to give the requested info though. And yeah, it works fine here.

XhmikosR on 16 Jun 2020

Assuming that we are talking about the cp1252 encoding as mentioned in:

File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]

I can't (literally) reproduce.

$ # Change the Python encoding to CP1252 through the `PYTHONIOENCODING` environment variable.
$  export PYTHONIOENCODING="cp1252"
$ # Start the generation.
$ python updateHostsFile.py -a
[truncated]
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder 
It contains 57,286 unique entries.

Therefore, I don't know where the problem is here. Unless OP can give us more information, I'm not going to look for a problem which may not exist.

Other info

Python version

$ python -VV
Python 3.8.3 (default, May 17 2020, 18:15:42) 
[GCC 10.1.0]

Why using the `PYTHONIOENCODING` environment variable?

As the problem comes from print(), that means that I can reproduce by changing the default stdout encoding.

File "updateHostsFile.py", line 1025, in normalize_rule
    print("==>%s<==" % rule)

Here is the example, which proves that it's working.

$ export PYTHONIOENCODING="utf-8"
$ python
Python 3.8.3 (default, May 17 2020, 18:15:42) 
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'utf-8'
>>> print(u'\xe9')
é

$ export PYTHONIOENCODING="cp1252"
$ python
Python 3.8.3 (default, May 17 2020, 18:15:42) 
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp1252'
>>> print('\xe9')
�

Now what about `\ufeff`?

I never played with it but it is here good explained.

So I tried, with PYTHONIOENCODING (again).

With CP1252

>>> print('\ufeff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/encodings/cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to <undefined>

With UTF-8

>>> print('\ufeff')

>>>

Now, talking about this project (itself), I really don't know where \ufeff comes from as the line:

    print("==>%s<==" % rule)

is generated at the end... And I really can't find anything about this.

@StevenBlack @XhmikosR I leave the rest for you!

funilrys on 6 Jul 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

march262020.best

scafroglia93 · 3Comments

ravenjs cdn blocked by https://github.com/lightswitch05/hosts

hyphenized · 3Comments

KADhosts blocks spyware.neocities.org as "fake news"

The-Compiler · 3Comments

Request for the files hosted on http://sbc.io/

mikhoul · 3Comments

xHelper domains (+ips)

Laicure · 3Comments

Hosts: UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 3: character maps to <undefined>

Most helpful comment

Other info

Python version

Why using the PYTHONIOENCODING environment variable?

Now what about \ufeff?

All 8 comments

Other info

Python version

Why using the PYTHONIOENCODING environment variable?

Now what about \ufeff?

Related issues

Why using the `PYTHONIOENCODING` environment variable?

Now what about `\ufeff`?

Why using the `PYTHONIOENCODING` environment variable?

Now what about `\ufeff`?