Jq: parsing new line delimited json

Created on 3 Sep 2016  Â·  6Comments  Â·  Source: stedolan/jq

Hi! I'm trying to parse multiple TB log files. Each log file is about 20 GB and contains new line delimited json. There's a json on every line. The problem is it being unstructured. Every line may or may not have some keys.. I'm trying to create a sql warehouse or use spark or something( I'm open to any new tool if recommended) to parse this huge log files. Any thoughts? Should I use jq to parse and dump into something like Redshift? Is it even possible with jq?

Most helpful comment

Here's my current workaround (for bash):

while read -r line; do echo "$line" | jq '.filter.goes.here'; done < inputfile

All 6 comments

Here's my current workaround (for bash):

while read -r line; do echo "$line" | jq '.filter.goes.here'; done < inputfile

@renderit - jq handles JSON streams, by design. Thus, even jq 1.4 should easily handle "JSON Lines", without your having to do anything special. (With jq 1.5, you have the additional option of using inputs.)

I'm not quite sure what your concerns are -- as far as I can tell, you certainly don't wan't to be doing what @paulmelnikow suggested -- but I was thinking that perhaps they might be addressed in a short introduction to jq that I wrote: "A Stream-Oriented Introduction to jq". It's just a first draft, so I'd appreciate your feedback.

Indeed, this works just as well. I'm not dealing with a large file.

jq '.filter.goes.here' < inputfile

Thus, even jq 1.4 should easily handle "JSON Lines", without your having to do anything special.

Would be great to add this info to the docs!

does not make more sense representing them as top-level arrays?

so you could extract subparts by ranges:
jq .[0:3] input.ndjson

@eadmaster — Here's an alternative if you need to read a specific range of lines of a file.

jq '.filter.goes.here' < sed -n '10,15p;16q' inputfile

In the above example, it only reads lines 10 through 15. The 16q instructs sed to exit on the next line to prevent scanning the remainder of inputfile.

Though, I do agree, it'd be nice to have a range option in jq or the option to access as a top-level array.

Indeed, this works just as well. I'm not dealing with a large file.

jq '.filter.goes.here' < inputfile

Thus, even jq 1.4 should easily handle "JSON Lines", without your having to do anything special.

Would be great to add this info to the docs!

Has this been added to the docs? I didn't find it there, and I think it's a great addition. I was trying the first option with do-while bashing; and I almost missed this one!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tbelaire picture tbelaire  Â·  4Comments

lhunath picture lhunath  Â·  3Comments

LoganBarnett picture LoganBarnett  Â·  3Comments

mcandre picture mcandre  Â·  3Comments

neowulf picture neowulf  Â·  3Comments