Hi all 👋
I'm getting this error while running the script and updating the sources. I'm on Windows 10 with Python 3.8.3.
Traceback (most recent call last):
File "updateHostsFile.py", line 1750, in <module>
main()
File "updateHostsFile.py", line 282, in main
final_file = remove_dups_and_excl(merge_file, exclusion_regexes)
File "updateHostsFile.py", line 937, in remove_dups_and_excl
hostname, normalized_rule = normalize_rule(
File "updateHostsFile.py", line 1025, in normalize_rule
print("==>%s<==" % rule)
File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 3: character maps to <undefined>
Works fine here:
C:\Users\xmr\Desktop\hosts>ver
Microsoft Windows [Version 10.0.19041.329]
C:\Users\xmr\Desktop\hosts>python --version
Python 3.8.3
C:\Users\xmr\Desktop\hosts>python updateHostsFile.py
Do you want to update all data sources? [Y/n] n
OK, we'll stick with what we've got locally.
Do you want to exclude any domains?
For example, hulu.com video streaming must be able to access its tracking and ad servers in order to play video. [Y/n] n
OK, we'll only exclude domains in the whitelist.
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder
It contains 57,460 unique entries.
Do you want to replace your existing hosts file with the newly generated file? [Y/n] n
What's your system config and the exact command you are using to run the script? Also, I assume you are on the latest master?
C:\Users\xmr\Desktop>@systeminfo | @findstr /B /C:"OS Name" /B /C:"OS Version" /B /C:"System Locale" /B /C:"Input Locale"
OS Name: Microsoft Windows 10 Pro
OS Version: 10.0.19041 N/A Build 19041
System Locale: en-us;English (United States)
Input Locale: en-us;English (United States)
Closing.
@StevenBlack do note that this is probably valid and we should be using encoding="utf-8" in more places. It just happens with specific locales, probably.
@funilrys FYI
@XhmikosR I closed this because OP appears unresponsive...
@XhmikosR if I don't update the data sources it works fine, as you did:
Do you want to update all data sources? [Y/n] n
Does it still work for you when you update the data sources?
You still fail to give the requested info though. And yeah, it works fine here.
Assuming that we are talking about the cp1252 encoding as mentioned in:
File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
I can't (literally) reproduce.
$ # Change the Python encoding to CP1252 through the `PYTHONIOENCODING` environment variable.
$ export PYTHONIOENCODING="cp1252"
$ # Start the generation.
$ python updateHostsFile.py -a
[truncated]
==>fe00::0 ip6-localnet<==
==>ff00::0 ip6-mcastprefix<==
==>ff02::2 ip6-allrouters<==
==>ff02::3 ip6-allhosts<==
Success! The hosts file has been saved in folder
It contains 57,286 unique entries.
Therefore, I don't know where the problem is here. Unless OP can give us more information, I'm not going to look for a problem which may not exist.
$ python -VV
Python 3.8.3 (default, May 17 2020, 18:15:42)
[GCC 10.1.0]
PYTHONIOENCODING environment variable?As the problem comes from print(), that means that I can reproduce by changing the default stdout encoding.
File "updateHostsFile.py", line 1025, in normalize_rule
print("==>%s<==" % rule)
Here is the example, which proves that it's working.
$ export PYTHONIOENCODING="utf-8"
$ python
Python 3.8.3 (default, May 17 2020, 18:15:42)
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'utf-8'
>>> print(u'\xe9')
é
$ export PYTHONIOENCODING="cp1252"
$ python
Python 3.8.3 (default, May 17 2020, 18:15:42)
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp1252'
>>> print('\xe9')
�
\ufeff?I never played with it but it is here good explained.
So I tried, with PYTHONIOENCODING (again).
With CP1252
>>> print('\ufeff')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/encodings/cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to <undefined>
With UTF-8
>>> print('\ufeff')
>>>
Now, talking about this project (itself), I really don't know where \ufeff comes from as the line:
print("==>%s<==" % rule)
is generated at the end... And I really can't find anything about this.
@StevenBlack @XhmikosR I leave the rest for you!
Most helpful comment
Assuming that we are talking about the
cp1252encoding as mentioned in:I can't (literally) reproduce.
Therefore, I don't know where the problem is here. Unless OP can give us more information, I'm not going to look for a problem which may not exist.
Other info
Python version
Why using the
PYTHONIOENCODINGenvironment variable?As the problem comes from
print(), that means that I can reproduce by changing the defaultstdoutencoding.Here is the example, which proves that it's working.
Now what about
\ufeff?I never played with it but it is here good explained.
So I tried, with
PYTHONIOENCODING(again).With
CP1252With
UTF-8Now, talking about this project (itself), I really don't know where
\ufeffcomes from as the line:is generated at the end... And I really can't find anything about this.
@StevenBlack @XhmikosR I leave the rest for you!