Pydata-book: Ch2 p18 JSON error

Created on 24 Nov 2016 · 30Comments · Source: wesm/pydata-book

Still having trouble using JSON in Python 3

I have added the 'rb' to open()
records = [json.loads(line) for line in open(path,'rb')]

Now getting error in json.loads

TypeError: the JSON object must be str, not 'bytes'

Any help appreciated
Thanks

Source

TerrySnow1963

All 30 comments

Have fixed access by using
records = [json.loads(line.decode("utf-8")) for line in open(path,'rb')]

TerrySnow1963 on 25 Nov 2016

It seems you necessitated explicit decoding of line when you added the b (for binary) option in your open statement. Without the b (or by using an explicit t), your line variable will contain a string instead of a sequence of bytes. Then you'll be good.

pglezen on 4 Dec 2016

Hey Guys, I seem to have similar problem on Page 18

I tried to follow the instruction from the book;
import. json
path = 'ch/02/usagov_bitly_data2012-03-16-1331923249.txt'
records =[json.loads(line) for line in open(path)]

and i keep hitting an error with the following ;
Traceback (most recent call last):
File "", line 1, in
records =[json.loads(line) for line in open(path)]
FileNotFoundError: [Errno 2] No such file or directory: 'usagov_bitly_data2012-03-16-1331923249.txt'

Ohnonononono on 8 Dec 2016

I seems you have mistyped the path. There is no / between ch and 02. Also there shouldn't be a . (dot) between import and json.

pglezen on 8 Dec 2016

@pglezen

Thanks. i tried without the (dot) between the import and json and I'm still getting the error'

Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    records = [json.loads(line) for line in open(path)]
FileNotFoundError: [Errno 2] No such file or directory: 'usagov_bitly_data2012-03-16-1331923249.txt'

As you can see from my post i'm new to python
Is there anyone that can help me on this ? Is frustrating as this is the first exercise on the book and i can't seem to get it done

Thanks guys

Ohnonononono on 9 Dec 2016

You also had the wrong path. Did you correct that? The error message you report is complaining about the file not being found. Perhaps you could provide the code along with your output.

pglezen on 9 Dec 2016

Yeah i did. I removed the 'ch/02' and it still didnt work

>>> import json
>>> path ='Desktop/pydata-book-master/ch02/usagov_bitly_data2012-03-16-1331923249.txt'
>>> records=[json.loads(line) for line in open(path)]

And i tried this too

>>> import json
>>> path ='usagov_bitly_data2012-03-16-1331923249.txt'
>>> records=[json.loads(line) for line in open(path)]

Ohnonononono on 10 Dec 2016

You just need to make sure you're in the right directory. This is not a json or pandas issue. Try changing to your Git repository directory first. In your case, change to your Desktop/pydata-book-master directory. Then start the Python interpreter. Set your path variable to ch02/usagov_bitly_data2012-03-16-1331923249.txt.

Here is an example of a session. I import the os namespace and list the current directory to verify I'm in the right spot.

>>> import os
>>> os.listdir('.')
['.DS_Store', '.git', '.gitignore', 'appendix_python.ipynb', 'ch02', 'ch02.ipynb', 'ch03', 'ch04', 'ch04.ipynb', 'ch05', 'ch05.ipynb', 'ch06', 'ch06.ipynb', 'ch07', 'ch07.ipynb', 'ch08', 'ch08.ipynb', 'ch09', 'ch09.ipynb', 'ch10.ipynb', 'ch11', 'ch11.ipynb', 'ch12.ipynb', 'ch13', 'COPYING', 'fec_study.ipynb', 'README.md']

Then I verify the path using os.path.exists.

>>> path = 'ch02/usagov_bitly_data2012-03-16-1331923249.txt'
>>> os.path.exists(path)
True
>>> os.path.exists('ch02/doesNotExist.txt')
False
>>>

Once you verify you can find the file, the json.loads should work for you.

If you're new to Python, don't forget there is an Appendix in the back of the book, Python Language Essentials. Cheers!

pglezen on 10 Dec 2016

Thanks @pglezen . I managed to get it to work by changing the directory as mentioned but i'm running into issue for the last line.
Is it okay to use 'IDLE' for this exercise ?

>>> records=[json.loads(line) for line in open(path)]

this is the error message

Traceback (most recent call last):
  File "<pyshell#26>", line 1, in <module>
    records=[json.loads(line) for line in open(path)]
  File "<pyshell#26>", line 1, in <listcomp>
    records=[json.loads(line) for line in open(path)]
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 6987: ordinal not in range(128)
>>>

Ohnonononono on 11 Dec 2016

I've never used IDLE. But I suggest that for learning new programming environments (and I stress environment, not just syntax of the language), you work with command line interpreters/compiler first. There are many benefits; the two biggest ones being

You have a better understanding of the nuts and bolts of the interpreter (through its options) which will help when you attempt to deploy to production.
It's easier for other people to help you when you provide the input to the interpreter rather than describing a bunch of settings in an IDE.

Back to your problem at hand - it seems that maybe you have a corrupt file. Perhaps you should download it again. Your error message mentions byte offset position 6987. If you convert this to hex:

>>> "6987 in hex is {:x}".format(6987)
'6987 in hex is 1b4b'

When I check offset 1b4b in a hex editor,

od -A x -t xC  usagov_bitly_data2012-03-16-1331923249.txt | grep "001b40 "
0001b40    3b  71  3d  30  2e  34  22  2c  20  22  68  68  22  3a  20  22

there is no 0xe2 at position 0xB on this line. It's a 0x68.

As a final note, since this issue no longer appears to be Pandas related, I would suggest you post problems on StackOverflow. StackOverflow has many more "eyes" and you'll most likely get a response within a few minutes.

pglezen on 11 Dec 2016

Thanks @pglezen

I'm using "Anaconda", guess that comes with the line interpreters/complie you mentioned.

On a final note before i post it on Stackoverflow. I re-download the file but i'm still running issues with it

UnicodeDecodeError Traceback (most recent call last)
in ()
----> 1 records=[json.loads(line) for line in open(path)]

in (.0)
----> 1 records=[json.loads(line) for line in open(path)]

/Users/gambit_remy08/anaconda/lib/python3.5/encodings/ascii.py in decode(self, input, final)
24 class IncrementalDecoder(codecs.IncrementalDecoder):
25 def decode(self, input, final=False):
---> 26 return codecs.ascii_decode(input, self.errors)[0]
27
28 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 6987: ordinal not in range(128)

Ohnonononono on 12 Dec 2016

That's strange behavior. I don't understand why that happens.

The "Anaconda" distribution is good for working with Pandas and NumPy. That shouldn't be a problem for you.

After you post to StackOverflow, post a link to your StackOverflow inquiry here so those following this can follow it there.

pglezen on 12 Dec 2016

Thanks for the advice. I'll do so

Ohnonononono on 12 Dec 2016

You need to open the file with the correct encoding, since it is not encoded in ASCII.

records = [json.loads(line) for line in open(path, encoding='utf8')]

burhan on 18 Dec 2016

@burhan

I tried your method but nothing happened as i entered the code.
then i switch back to the original code

records = [json.loads(line) for line in open(path)]

and the error pop right back.

Ohnonononono on 18 Dec 2016

The reason nothing happened is because it worked. When you typed your code again, the error came back, So, just use the version I wrote, and after that try something like this:

print(records[0])

burhan on 18 Dec 2016

👍1

@burhan

Thanks you're a lifesaver :) !!!!!!!

btw, does the book codes often deviate to get it to work?

Ohnonononono on 18 Dec 2016

Most books, once they are published (since it takes a long time to get a book published) have issues as the programming world moves a lot faster than the publishing world. Therefore, it is common to find issues with the code examples - precisely why code samples are now being provided via github so the community can contribute and update for the benefit of all.

burhan on 18 Dec 2016

No problem... Guess i'm going to visit here more often :)

Ohnonononono on 18 Dec 2016

It's also difficult to take into account the myriad of desktop configurations. I can attest to the fact that the code sample works find as currently published with no alterations on my system. But between different OSs (I'm running a Mac), different versions of Python (I'm running 3.5) and different command line and IDE environments, it's difficult to anticipate the hardships encountered for every configuration.

As burhan pointed out, that's part of the reason it's being exposed publicly on GitHub. People such as yourself (Ohnono...) taking the time to patiently relate their issues and bearing with us (sometimes over multiple attempts) to determine the problem is what results in a better end product. It also betters the understanding for all of us involved.

pglezen on 18 Dec 2016

Hey Guys ... I managed to moved past 18 till pg 27 which is the movie data and i'm running on another set of issues with the regex.

I tried ' engine = 'python' and rewrote the whole code form 'Import pandas as pd' but still run the same error

import pandas as pd

unames = ['user_id', 'gender', 'age', 'occupation', 'zip']

users = pd.read_table('ch02/movielens/users.dat', sep='::', header=None, names=unames)
/Users/gambit_remy08/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  if __name__ == '__main__':

Ohnonononono on 22 Dec 2016

@Ohnonononono that actually doesn't appear to be an error, it's only a warning. You can try using users afterward to see

wesm on 22 Dec 2016

@wesm What do you mean by users?

Sorry, im still new to python

Ohnonononono on 23 Dec 2016

👍1

He means the users variable that you created in your script. You can run

users.head()

to verify that your users variable really does contain what is expected.

pglezen on 23 Dec 2016

👍1

Oh okay, tested and working fine :)

Ohnonononono on 23 Dec 2016

import json
path = '/home/eyeglasses/usagov_bitly_data2012-03-16-1331923249.txt'
records = [json.loads(line) for line in open(path)]

show error:

JSONDecodeError Traceback (most recent call last)
in ()
1 import json
2 path = '/home/eyeglasses/usagov_bitly_data2012-03-16-1331923249.txt'
----> 3 records = [json.loads(line) for line in open(path)]

in (.0)
1 import json
2 path = '/home/eyeglasses/usagov_bitly_data2012-03-16-1331923249.txt'
----> 3 records = [json.loads(line) for line in open(path)]

/home/eyeglasses/anaconda3/lib/python3.5/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
317 parse_int is None and parse_float is None and
318 parse_constant is None and object_pairs_hook is None and not kw):
--> 319 return _default_decoder.decode(s)
320 if cls is None:
321 cls = JSONDecoder

/home/eyeglasses/anaconda3/lib/python3.5/json/decoder.py in decode(self, s, _w)
337
338 """
--> 339 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
340 end = _w(s, end).end()
341 if end != len(s):

/home/eyeglasses/anaconda3/lib/python3.5/json/decoder.py in raw_decode(self, s, idx)
355 obj, end = self.scan_once(s, idx)
356 except StopIteration as err:
--> 357 raise JSONDecodeError("Expecting value", s, err.value) from None
358 return obj, end

JSONDecodeError: Expecting value: line 2 column 1 (char 1)

eyeglasses on 31 Mar 2017

Perhaps you need to specify the encoding. Check the post above by Burhan on December 17 where he adds an explicit encoding option to the open statement. If you're using Windows, it won't necessarily default to utf8.

pglezen on 31 Mar 2017

This example is now accurate as of Python 3.6 in the 2nd edition

wesm on 9 Sep 2017

import json
path = 'D:\dataFiles\yob1880.txt'
records = [json.loads(line) for line in open(path, encoding='utf8')]

Error:

JSONDecodeError Traceback (most recent call last)
in
1 import json
2 path = 'D:\dataFiles\yob1880.txt'
----> 3 records = [json.loads(line) for line in open(path, encoding='utf8')]

in (.0)
1 import json
2 path = 'D:\dataFiles\yob1880.txt'
----> 3 records = [json.loads(line) for line in open(path, encoding='utf8')]

~\Anaconda3\libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
346 parse_int is None and parse_float is None and
347 parse_constant is None and object_pairs_hook is None and not kw):
--> 348 return _default_decoder.decode(s)
349 if cls is None:
350 cls = JSONDecoder

~\Anaconda3\libjson\decoder.py in decode(self, s, _w)
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):

~\Anaconda3\libjson\decoder.py in raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)