Pycon-pandas-tutorial: creating Cast csv file

Created on 28 Sep 2020  路  4Comments  路  Source: brandon-rhodes/pycon-pandas-tutorial

Hi,
I am trying to create csv file for the exercises.
As far as i understand the titles created perfectly ,but with Cast file i have some issues.
While building those csv this is the error i am getting.
Can you please help me to solve this error?
thanks

(base) PS D:Python scripts\pycon-pandas-tutorial> python build/BUILD.py
Reading "genres.list.gz" to find interesting movies
Found 226013 titles
Writing "titles.csv"
Finished writing "titles.csv"
Reading release dates from "release-dates.list.gz"
Finished writing "release_dates.csv"
Reading 'actors.list.gz'
Traceback (most recent call last):
File "build/BUILD.py", line 229, in
File "build/BUILD.py", line 176, in main
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'x99' in position 38: character maps to

Most helpful comment

I will be tempted, over Christmas break when I'll have time to look back at this tutorial code, to see if I can get all the code upgrade to UTF-8 so that movies and actors with non-ASCII names can be safely loaded and displayed. Back in 2015 when I wrote the tutorial it didn鈥檛 seem like a realistic goal, but maybe after a half decade I should see whether the snags I ran into back then could today be avoided. I'll see what happens.

All 4 comments

Without having access to your machine and actors.list.gz file, I'm not sure if I can reproduce the problem here on my own laptop. The line in BUILD.py that the traceback complains about (why doesn't it quote the lines, I wonder?) is:

output.writerow((title, year, name, role_type, character, n))

Maybe you could edit BUILD.py and wrap that line in a try-except so we could see what row is causing the problem?

                try:
                    output.writerow((title, year, name, role_type, character, n))
                except:
                    print((title, year, name, role_type, character, n))
                    raise

Hello,

I had the same issue. I edited the BUILD.py with the try-except snippet and the output was:

Reading "genres.list.gz" to find interesting movies
Found 226013 titles
Writing "titles.csv"
Finished writing "titles.csv"
Reading release dates from "release-dates.list.gz"
Finished writing "release_dates.csv"
Reading 'actors.list.gz'
('Yuxu', 2001, 'Gyunduz Abbasov', 'actor', 'Hiday脡x99t', 9)
Traceback (most recent call last):
File "build/BUILD.py", line 233, in
File "build/BUILD.py", line 177, in main
File "C:\Users\yatzima\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'x99' in position 38: character maps to

I solved the issue by using utf-8 encoding instead of the default, which I believe is Unicode. So I replaced line 106 in BUILD.py
with the following.
output = csv.writer(open('../data/cast.csv', 'w', encoding="utf-8"))

Not sure if it solves OPs problem, but thought I should share.

I will be tempted, over Christmas break when I'll have time to look back at this tutorial code, to see if I can get all the code upgrade to UTF-8 so that movies and actors with non-ASCII names can be safely loaded and displayed. Back in 2015 when I wrote the tutorial it didn鈥檛 seem like a realistic goal, but maybe after a half decade I should see whether the snags I ran into back then could today be avoided. I'll see what happens.

I ran into the same issue as the OP.

yatzima's solution worked for me
Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jjisnow picture jjisnow  路  3Comments

disimone picture disimone  路  3Comments

juntingzh picture juntingzh  路  3Comments

gijzelaerr picture gijzelaerr  路  3Comments

ClimbsRocks picture ClimbsRocks  路  3Comments