Jq: Comments in Json

Created on 11 Jun 2014  路  12Comments  路  Source: stedolan/jq

Is there a way to have jq ignore comments that have been added to the json file:
{
/* People insist on putting comments in data */
"foo": "test",
"bar": "baz"
}

feature request wontfix

Most helpful comment

Douglas Crockford explains why he removed comments from JSON in https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaGSr (It's not that they were not put in. They were there. They were removed. They should not be put back.) But, in the last sentence he also gives you the solution.

$ brew install jsmin

$ cat > example.json <<'EOF'
{
/* People insist on putting comments in data */
"foo": "test",
"bar": "baz"
}
EOF

$ cat example.json | jq '.["bar"]'
parse error: Invalid numeric literal at line 2, column 3

$ cat example.json | jsmin | jq '.["bar"]'
"baz"

These posts are becoming a common theme for me. https://github.com/mitchellh/packer/issues/1768#issuecomment-275168113

All 12 comments

Is there a way to have jq ignore comments that have been added to the json file:
{
/* People insist on putting comments in data */
"foo": "test",
"bar": "baz"
}

No. JSON is specified by RFC7159, and it doesn't have a format for
comments. Even if jq allowed it (it could), it couldn't preserve them
on output, it couldn't insert them into objects, and so on.
Outputting them would break interop with other JSON parsers, and
that's the real deal breaker, otherwise I'd suggest something like:

{
  "foo": "test" /* comment */,
  ..
}

as that way we could at least logically preserve comments. But as it
is, "no". Sorry :(

I have needed this in the past, so what I tend to do is this:

{
  "comment_0": "People insist on putting comments in data",
  "foo": "test",
  "comment_1": "another comment",
  ...
}

or

{
  "foo__comment": "People insist on putting comments in data",
  "foo": "test",
  "bar__comment": "another comment",
  "bar": ...
}

No comments in arrays, nor at the top-level in any case. C'est la vie.

More generally, it would be great if there could be a "--javascript" switch that would authorize jq to accept Javascript-style JSON. This would, for example, allow {a: "1"} to be "translated" to {"a": "1"}.

@pkoppstein My intention is to allow multiple parser types. Ditto
encoders. You could write such a parser. But the built-in/default JSON
parser and encoder will deal strictly with RFC7159 JSON, full stop.

@nicowilliams wrote:

No. .... jq ... couldn't preserve them on output

Please note that the original request was explicitly that they be ignored.

,,,the built-in/default JSON parser and encoder will deal strictly with RFC7159 JSON, full stop.

Understood, but don't forget that currently jq is not as strict as the "full stop" would suggest. See e.g. #348

@pkoppstein Yeah, I know, but I think that's just unfortunate. I suppose I could add "ignore comments" to the current parser, but I don't really feel up to it. I'm very busy with other things and any cycles I can spare to jq I'd rather use for more deserving things. Even if you send me a PR for that, I think I might not take it, mostly because the parser is a bit messy and I don't want to make it messier. (I tried extending it to support streaming, but that made it much too messy, and that's when I decided on having multiple parsers.)

OTOH, a PR for multiple parser types... that'd be nice. I'm thinking of how to integrate that properly with the I/O builtins I'm still baking.

Also, quite frankly, there's a difference between accepting unescaped control characters and ignoring comments that jq couldn't produce. The latter makes one wonder what the point of comments is. It's much better to use a commenting convention like I described than comments that no parser will/should preserve! Mind you, there are problems with accepting unescaped control characters. I may very well remove that, possibly for just some control characters, possibly for all the ASCII ones -- do not rely on jq's willingness to accept unescaped control characters!

Regarding the handling of comments in JSON, see https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json

@pkoppstein Thanks! Do you know if that will also work for mongodb generated json ... this typically looks something like this:

{
        "_id" : ObjectId("1234567890"),
        "foo" : ObjectId("123456789"),
        "bar" : ObjectId("8897865866758"),
        "timestamp" : ISODate("2015-04-05T16:00:00.174Z")
}

@vito-c - That of course is, as they put it, "mongodb-extended-json", not JSON. ("http://docs.mongodb.org/master/reference/mongodb-extended-json)

MongoDB does provide a utility for producing JSON from a MongoDB instance (but apparently not from a text file):

mongoexport is a utility that produces a JSON or CSV export of data stored in a MongoDB instance

Details about mongoexport are at: http://docs.mongodb.org/master/reference/program/mongoexport/#bin.mongoexport

Here is the URL of a script that purports to convert bson to json:
https://gist.github.com/tedsparc/1763326

To quote the great @pkoppstein,

Nothing I've read about jq suggests that it should reject invalid JSON. In fact, it would be great if it had the ability (perhaps governed by a switch) to transform imperfect JSON into JSON.

Please also note that the most recent "Proposed Standard" for JSON (http://tools.ietf.org/html/rfc7159) explicitly says:

A JSON parser MAY accept non-JSON forms or extensions.

For example, jq currently accepts strings containing U+0083, which is not valid JSON according to RFC7159. In fact, this is almost exactly the same as the comments issue; both are valid JavaScript but not valid JavaScript Object Notation.

It would be reasonable to have a "strict" (or "non-strict") flag. I don't expect jq to output non-strict JSON. But it would be good for it to accept non-strict JSON, as it already does today.

It would be reasonable to have a "strict" (or "non-strict") flag. I don't expect jq to output non-strict JSON. But it would be good for it to accept non-strict JSON, as it already does today.

馃憤

Douglas Crockford explains why he removed comments from JSON in https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaGSr (It's not that they were not put in. They were there. They were removed. They should not be put back.) But, in the last sentence he also gives you the solution.

$ brew install jsmin

$ cat > example.json <<'EOF'
{
/* People insist on putting comments in data */
"foo": "test",
"bar": "baz"
}
EOF

$ cat example.json | jq '.["bar"]'
parse error: Invalid numeric literal at line 2, column 3

$ cat example.json | jsmin | jq '.["bar"]'
"baz"

These posts are becoming a common theme for me. https://github.com/mitchellh/packer/issues/1768#issuecomment-275168113

Was this page helpful?
0 / 5 - 0 ratings

Related issues

geoffeg picture geoffeg  路  3Comments

lhunath picture lhunath  路  3Comments

mcandre picture mcandre  路  3Comments

rokka-n picture rokka-n  路  4Comments

ve3ied picture ve3ied  路  4Comments