Jq: Option to silently ignore any lines with invalid JSON

Created on 29 Jan 2015  路  10Comments  路  Source: stedolan/jq

I primarily use jq to extract and filter messages from our application logs, which are JSON. However, occasionally the system will interleave two messages essentially making the line useless junk. When jq encounters this like it complains about it, and rightfully, but I would find it immensely useful if I could tell it to just silently ignore that line and carry on.

support

Most helpful comment

I wish this option existed and I don't understand why it can't work... because when parsing JSON, usually (or at least often) it is one record per line in a JSON Lines file. So it could just fail and emit nothing in a certain mode, like --validate-filter. Then I could grep/filter empty lines.

Maybe that behavior isn't what everyone needs, but many of us do! So that as an option would be great. It would make jq much more powerful for validating JSON Lines data. --seq doesn't work, as there is no sequence character in JSON Lines data.

All 10 comments

@kennethjor Please see:

a) http://tools.ietf.org/html/draft-ietf-json-text-sequence-13 (soon to be an RFC), and

b) the --seq option to jq in the master branch in github.

That's likely the closest that jq will come to ignoring failures. This is designed with loggers in mind.

If you follow the discussions from the JSON WG you'll understand the problem. Say that you have an error in a JSON text, so you... discard it, but now you need to know where the next one starts, and that turns out to be difficult to figure out.

@nicowilliams Thanks for that, that's excellent. Yes it did occur to me exactly how a processor like jq would be able to decide when invalid JSON had indeed ended. This seems to exactly fit the bill of what I need!

@nicowilliams I think there's a bug in the sequence parsing. I would submit a pull request for a test at least, but my C is not the strongest. As is my understanding of JSON sequence, I'm surrounding the JSON text with 0x1e and 0x0a characters. jq parses it, so technically it works, but it complains about it at every line. Terminal output below:

kenn@klaatu:/tmp$ xxd test.json 
0000000: 1e7b 2261 223a 317d 0a                   .{"a":1}.
kenn@klaatu:/tmp$ cat test.json | jq --seq .
ignoring parse error: Truncated value at line 1, column 1
{
  "a": 1
}
kenn@klaatu:/tmp$ jq --version
jq-1.5rc1-15-g2e92c3e

Hey, thanks for the report, and for trying it out! I'll take a look as soon
as possible.

@nicowilliams No worries, jq is my new favourite tool and will forever be a standard part of my arsenal.

@kennethjor So is --seq sufficient? Should we close this issue?

@nicowilliams I haven't had a chance to check out your fix, but as for this issue, --seq is exactly what I needed. Thank you.

I wish this option existed and I don't understand why it can't work... because when parsing JSON, usually (or at least often) it is one record per line in a JSON Lines file. So it could just fail and emit nothing in a certain mode, like --validate-filter. Then I could grep/filter empty lines.

Maybe that behavior isn't what everyone needs, but many of us do! So that as an option would be great. It would make jq much more powerful for validating JSON Lines data. --seq doesn't work, as there is no sequence character in JSON Lines data.

@rjurney Some kinds of invalidity in JSON texts can easily be handled (e.g., extra commas, trailing commas, missing commas, noise between texts...), but others can't be because handled easily or at all. E.g., {"foo":["bar"}true -- how to parse this?

What I'd like to do at some point is add lots of input/output formats, including _some_ not-quite-JSON formats. But this is a volunteer project, so it's a matter of who has the time to contribute this.

For those who arrive here before the FAQ, see:

1) "Q: Is there a way to have jq keep going after it hits an error in the input file?"

2) the section Processing not-quite-valid JSON

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lhunath picture lhunath  路  3Comments

thedward picture thedward  路  3Comments

rokka-n picture rokka-n  路  4Comments

sloanlance picture sloanlance  路  3Comments

tbelaire picture tbelaire  路  4Comments