Jq: sort JSON Lines by property value?

Created on 13 Jul 2017  路  3Comments  路  Source: stedolan/jq

Given a file (or stream) of JSON objects, I want to sort them by the value of one of their properties.

Example input (a file named test.jsonl):

{"bar":10, "foo":4}
{"foo":3, "bar":100}
{"foo":2, "bar":1}

I'd like to sort them by the values of their foo properties, giving:

{"foo":2, "bar":1}
{"foo":3, "bar":100}
{"bar":10, "foo":4}

The problem is that sort_by() only works with arrays of objects. This input formatted as JSON Lines (JSONL) isn't treated as an array. If it were, then a jq command like sort_by(.foo) would do the job. Trying that with this input gives only errors:

jq: error (at test.jsonl:1): Cannot index number with string "foo"
jq: error (at test.jsonl:2): Cannot index number with string "foo"
jq: error (at test.jsonl:3): Cannot index number with string "foo"

I've gotten very close with this command:

jq --raw-input --slurp '[split("\n")[:-1]|.[]|fromjson]|sort_by(.foo)' test.jsonl

(I learned [:-1] prevents the last line being added to the array as an empty string. There must be a better way, though. It could also throw away a good line if the last one isn't terminated with a newline.)

Which gives:

[
  {
    "foo": 2,
    "bar": 1
  },
  {
    "foo": 3,
    "bar": 100
  },
  {
    "bar": 10,
    "foo": 4
  }
]

The only problem with this is that the result is one large array. It's no longer one JSON object per line. I need to find some way to return the elements of the sorted array one per line.

In addition to getting the output in the right format, is there a better (that is, more efficient or simpler) way to do this? I'd like to be able to skip the options, especially --slurp, because I suspect they will slow down the processing.

馃挕 It would be helpful if jq had an option to treat a file containing one JSON object per line (JSONL) as though it were an array of objects instead.

Most helpful comment

It would be helpful if jq had an option to treat a file containing one JSON object per line (JSONL) as though it were an array of objects instead.

Your wish has been granted! That's exactly what the -s option does.

So one solution to your problem is:

jq -s -c 'sort_by(.foo)[]' json-lines.json
{"foo":2,"bar":1}
{"foo":3,"bar":100}
{"bar":10,"foo":4}

The -c and '[]' combine to give you JSONLINES :-)

p.s. For future reference, please ask usage questions at stackoverflow.com with the "jq" tag.

p.p.s. There is now an entry about JSONL in the FAQ in https://github.com/stedolan/jq/wiki/FAQ#general-questions

All 3 comments

It would be helpful if jq had an option to treat a file containing one JSON object per line (JSONL) as though it were an array of objects instead.

Your wish has been granted! That's exactly what the -s option does.

So one solution to your problem is:

jq -s -c 'sort_by(.foo)[]' json-lines.json
{"foo":2,"bar":1}
{"foo":3,"bar":100}
{"bar":10,"foo":4}

The -c and '[]' combine to give you JSONLINES :-)

p.s. For future reference, please ask usage questions at stackoverflow.com with the "jq" tag.

p.p.s. There is now an entry about JSONL in the FAQ in https://github.com/stedolan/jq/wiki/FAQ#general-questions

Thanks, @pkoppstein! That works great. You're such a fast coder. 馃槈

Kidding aside, I realize now how I went down the wrong path:

  1. I thought jq was designed for processing "valid" JSON input. I expected to get errors if I used it to parse "invalid" JSONL input. I guess it's actually equally well-suited for each format.
  2. I misunderstood the documentation for the --raw-input/-R option. I thought I'd need this option so my "invalid" JSONL input wouldn't be treated as JSON, which may cause errors. (I had seen errors in my earlier attempts with jq and this option seemed to fix them.)
  3. I also misunderstood the documentation for the --slurp/-s option, which the -R option suggested would be useful. The -s description and the idea of getting all the input in a single "slurp" made me think the option was for processing efficiency rather than parsing the input differently. I thought the "large array" mentioned was something internal to jq processing, not the actual representation of the data.

I suppose I should open an issue to request documentation updates. I'd include suggestions for the improved wording, of course. Is there a pull request policy for this repo?

Thanks for mentioning JSONL in the FAQ, too. I'd like to see more about JSONL in the documentation. The few mentions (none?) of JSONL reinforced my impression that it's not supported well.

I'll post to Stack Overflow using the jq tag for future questions. I thought an issue in this repo would be the right place for my question because I saw many other support questions here. The "Issues" link in the header menu at the top of the manual page seems to suggest it, too. Maybe the manual should include a "Support" link to Stack Overflow.

@sloanlance - Thanks for the explanation of the "comedy of errors" leading to a misunderstanding of the use of the various options.

Yes, the documentation could do more to highlight jq's support for streams of JSON entities, especially as that is one of its main features. (I first came across jq because I was looking for a command-line tool that handled such streams.) The best way to have improvements to the documentation incorporated into "master" is to submit a "Pull Request", so please have at it :-)

Was this page helpful?
0 / 5 - 0 ratings