Jq: anyone could explain to me about stream documentation ?

Created on 5 Oct 2019  路  1Comment  路  Source: stedolan/jq

https://stedolan.github.io/jq/manual/#Streaming

what is this doc means?

Streaming
With the --stream option jq can parse input texts in a streaming fashion, allowing jq programs to start processing large JSON texts immediately rather than after the parse completes. If you have a single JSON text that is 1GB in size, streaming it will allow you to process it much more quickly.

However, streaming isn鈥檛 easy to deal with as the jq program will have [, ] (and a few other forms) as inputs.

Several builtins are provided to make handling streams easier.

The examples below use the streamed form of [0,[1]], which is [[0],0],[[1,0],1],[[1,0]],[[1]].

Streaming forms include [, ] (to indicate any scalar value, empty array, or empty object), and [] (to indicate the end of an array or object). Future versions of jq run with --stream and -seq may output additional forms such as ["error message"] when an input text fails to parse.

what is this [path, leaf-value] and what is with 0,[1] ?

from the faq the example is this

jq -n '[{foo:"bar"},{foo:"baz"}]' | jq -cn --stream 'fromstream(1|truncate_stream(inputs))'

I don't really understand this what is this 1|truncate_stream(inputs) means? I'm really confused

Most helpful comment

I agree, this is hard to understand. I was just looking to process a big json file in powershell and got interested. Here's the streaming version of that json. These examples work in powershell or bash. --stream turns the json into a list of the paths to the values. And then there's kind of an end marker for every array or object. This post might help: http://subtxt.in/library-data/2016/03/28/json_stream_jq

echo '[{"foo":"bar"},{"foo":"baz"}]' | jq --stream -c

[[0,"foo"],"bar"]
[[0,"foo"]]
[[1,"foo"],"baz"]
[[1,"foo"]]
[[1]]

Here it is after the "parent folder' in the paths is deleted by truncate_stream. The "0," and "1," are deleted. There's no more array, so there's no more end of array marker "[[1]]". You could put 0 instead of 1 before "|truncate_stream" to see it before the change.

echo '[{"foo":"bar"},{"foo":"baz"}]' | jq -cn --stream '1|truncate_stream(inputs)'     

[["foo"],"bar"]
[["foo"]]
[["foo"],"baz"]
[["foo"]]

And then fromstream() turns it back into json. Now there's no longer the square brackets for an array on the outside. It's just two objects.

echo '[{"foo":"bar"},{"foo":"baz"}]' | jq -cn --stream 'fromstream(1|truncate_stream(inputs))'

{"foo":"bar"}
{"foo":"baz"}

>All comments

I agree, this is hard to understand. I was just looking to process a big json file in powershell and got interested. Here's the streaming version of that json. These examples work in powershell or bash. --stream turns the json into a list of the paths to the values. And then there's kind of an end marker for every array or object. This post might help: http://subtxt.in/library-data/2016/03/28/json_stream_jq

echo '[{"foo":"bar"},{"foo":"baz"}]' | jq --stream -c

[[0,"foo"],"bar"]
[[0,"foo"]]
[[1,"foo"],"baz"]
[[1,"foo"]]
[[1]]

Here it is after the "parent folder' in the paths is deleted by truncate_stream. The "0," and "1," are deleted. There's no more array, so there's no more end of array marker "[[1]]". You could put 0 instead of 1 before "|truncate_stream" to see it before the change.

echo '[{"foo":"bar"},{"foo":"baz"}]' | jq -cn --stream '1|truncate_stream(inputs)'     

[["foo"],"bar"]
[["foo"]]
[["foo"],"baz"]
[["foo"]]

And then fromstream() turns it back into json. Now there's no longer the square brackets for an array on the outside. It's just two objects.

echo '[{"foo":"bar"},{"foo":"baz"}]' | jq -cn --stream 'fromstream(1|truncate_stream(inputs))'

{"foo":"bar"}
{"foo":"baz"}
Was this page helpful?
0 / 5 - 0 ratings