Hi,
when im using
embeddings_index = {}
glove_data = 'glove.6B.50d.txt'
f = open(glove_data)
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
f.close()
print('Loaded %s word vectors.' % len(embeddings_index))`
I get the following error in for line in f:
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-72-ad0473c921c9> in <module>()
2 glove_data = 'glove.6B.50d.txt'
3 f = open(glove_data)
----> 4 for line in f:
5 values = line.split()
6 word = values[0]
C:\Users\Leonard\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2273: character maps to <undefined>
oh it works now after i use
f = open(glove_data, encoding="utf8")
Hi,
I am experiencing the same problem, when using 'utf8' decoding I get this error ;
'charmap' codec can't decode byte 0x9d in position 3692: character maps to
it seems "utf8" decoding dose not resolve the pb. I am using windows 10, do you know if there is a specific encoder/decoder for windows 10.Thanks
'utf8' as suggested by leonardltk worked for me. thanks!
Hi,
I am experiencing the same problem, when using 'utf8' decoding I get this error ;
'charmap' codec can't decode byte 0x9d in position 3692: character maps to
it seems "utf8" decoding dose not resolve the pb. I am using windows 10, do you know if there is a specific encoder/decoder for windows 10.Thanks
Hi,
have you resolved the issue? I'm facing same problem.
try it
import os, re
word_embeddings = {}
with open(os.path.join('../input/glove6b50dtxt/glove.6B.50d.txt')) as f:
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:], dtype='float32')
word_embeddings[word] = coefs
f.close()
help plz
It worked with encoding="utf8"!!
Most helpful comment
oh it works now after i use
f = open(glove_data, encoding="utf8")