Hi,
I am trying to create csv file for the exercises.
As far as i understand the titles created perfectly ,but with Cast file i have some issues.
While building those csv this is the error i am getting.
Can you please help me to solve this error?
thanks
(base) PS D:Python scripts\pycon-pandas-tutorial> python build/BUILD.py
Reading "genres.list.gz" to find interesting movies
Found 226013 titles
Writing "titles.csv"
Finished writing "titles.csv"
Reading release dates from "release-dates.list.gz"
Finished writing "release_dates.csv"
Reading 'actors.list.gz'
Traceback (most recent call last):
File "build/BUILD.py", line 229, in
File "build/BUILD.py", line 176, in main
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'x99' in position 38: character maps to
Without having access to your machine and actors.list.gz
file, I'm not sure if I can reproduce the problem here on my own laptop. The line in BUILD.py
that the traceback complains about (why doesn't it quote the lines, I wonder?) is:
output.writerow((title, year, name, role_type, character, n))
Maybe you could edit BUILD.py
and wrap that line in a try-except so we could see what row is causing the problem?
try:
output.writerow((title, year, name, role_type, character, n))
except:
print((title, year, name, role_type, character, n))
raise
Hello,
I had the same issue. I edited the BUILD.py
with the try-except snippet and the output was:
Reading "genres.list.gz" to find interesting movies
Found 226013 titles
Writing "titles.csv"
Finished writing "titles.csv"
Reading release dates from "release-dates.list.gz"
Finished writing "release_dates.csv"
Reading 'actors.list.gz'
('Yuxu', 2001, 'Gyunduz Abbasov', 'actor', 'Hiday脡x99t', 9)
Traceback (most recent call last):
File "build/BUILD.py", line 233, in
File "build/BUILD.py", line 177, in main
File "C:\Users\yatzima\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'x99' in position 38: character maps to
I solved the issue by using utf-8 encoding instead of the default, which I believe is Unicode. So I replaced line 106 in BUILD.py
with the following.
output = csv.writer(open('../data/cast.csv', 'w', encoding="utf-8"))
Not sure if it solves OPs problem, but thought I should share.
I will be tempted, over Christmas break when I'll have time to look back at this tutorial code, to see if I can get all the code upgrade to UTF-8 so that movies and actors with non-ASCII names can be safely loaded and displayed. Back in 2015 when I wrote the tutorial it didn鈥檛 seem like a realistic goal, but maybe after a half decade I should see whether the snags I ran into back then could today be avoided. I'll see what happens.
I ran into the same issue as the OP.
yatzima's solution worked for me
Thanks!
Most helpful comment
I will be tempted, over Christmas break when I'll have time to look back at this tutorial code, to see if I can get all the code upgrade to UTF-8 so that movies and actors with non-ASCII names can be safely loaded and displayed. Back in 2015 when I wrote the tutorial it didn鈥檛 seem like a realistic goal, but maybe after a half decade I should see whether the snags I ran into back then could today be avoided. I'll see what happens.