Jq: Need to be able to handle JSON data inside JSON

Created on 6 Mar 2019  路  20Comments  路  Source: stedolan/jq

Hi,

I am trying to parse a json that includes json data inside it. The json data has escaped quotes around fields and values. This is causing problems with jq.

Test script:

!/bin/bash

result='[{"Status":"A", "JSON":"{\"field1\":\"value1\"}"},{"Status":"B", "JSON":"{\"field1\":\"value2\"}"}]'
echo $result
result=$(echo $result | jq '[.[]."Status"]')
echo $result

Expected output:
[{"Status":"A", "JSON":"{\"field1\":\"value1\"}"},{"Status":"B", "JSON":"{\"field1\":\"value2\"}"}]
["A","B"]

Actual output:
[{"Status":"A", "JSON":"{\"field1\":\"value1\"}"},{"Status":"B", "JSON":"{\"field1\":\"value2\"}"}]
]B",

support

Most helpful comment

@tarekz The problem is in your understanding of what your shell is doing. In particular,

result=$(echo "$result" | jq '[.[]."Status"]')
echo $result

reads the output of echo "$result" | jq '[.[]."Status"]', removes any one or more newlines at the end of the output, stores that into the result variable; then, splits that into words depending on $IFS and invokes echo with the resulting array.

The strange look of the output on Windows results from having carriage returns embedded in the output while the line feeds that should have followed them are getting split on.

What your echo $result is printing out is basically "[\r \"A\",\r \"B\"\r ]\r\n", which appears as

 ]B",

All 20 comments

  1. You evidently have quoting issues, in your script and/or in the writeup here on github. In short, though, the first line that sets result should be:

    result='[{"Status":"A", "JSON":"{\"field1\":\"value1\"}"},{"Status":"B", "JSON":"{\"field1\":\"value2\"}"}]'

    Notice there are 8 backslashes.

  2. Please don't forget to "close" this issue.

  3. For future reference, unless you are sure there is a bug in jq itself, please ask usage questions involving jq by posting at stackoverflow.com with the jq tag: https://stackoverflow.com/questions/tagged/jq

Not sure what happened there, but it seems that when i submitted my issue the \ were removed?
What you corrected is actually what i have in my script. If you still feel this is not a bug, I will close it and ask the question on stackoverflow.
Here is my script again properly formatted:
result='[{"Status":"A", "JSON":"{\"field1\":\"value1\"}"},{"Status":"B", "JSON":"{\"field1\":\"value2\"}"}]' echo $result result=$(echo $result | jq '[.[]."Status"]') echo $result

@tarekz - Maybe you will have more success if you quote $result when echoing it.

Still same. But honestly I am not sure if it is a bug on jq side since the same example on https://jqplay.org/s/aAfdPTxJpi works fine. It also works fine if I dump the curl result into a file before feeding it back to jq.

@tarekz see the fromjson builtin in the manual.

$ jq -cn '"[0,1]"|fromjson'
[0,1]
$ 

There's also tojson.

$ jq -cn '{foo:{bar:"baz"}} | tojson | debug | fromjson'
["DEBUG:","{\"foo\":{\"bar\":\"baz\"}}"]
{"foo":{"bar":"baz"}}
$ 

@tarekz mind you, your JSON text is not valid:

parse error: Invalid literal at line 1, column 33

image
This is the actual json I am using in testing my script. In any case, I will close the issue while I check your previous answer. Thanks for the help!!

You're welcome!

We have a nice community, including an IRC channel (#jq) on freenode, and a stackoverflow tag (`jq).

@tarekz oh, somehow the backslashes got lost when I tried your input. Try this:

$ echo '[{"Status":"A", "JSON":"{\"field1\":\"value1\"}"},{"Status":"B", "JSON":"{\"field1\":\"value2\"}"}]'|jq '(..?|.JSON?) |= fromjson'
[
  {
    "Status": "A",
    "JSON": {
      "field1": "value1"
    }
  },
  {
    "Status": "B",
    "JSON": {
      "field1": "value2"
    }
  }
]
$ 

@nicowilliams actually it seems the issue is due to me testing my script on GIT BASH on WINDOWS platform. I tried my script on a Unix machine and it passed... But I would love an explanation of what might be causing this.

@tarekz can you post the whole script? Use backticks to quote it, like this:

    ```
    script contents here
    ```

@nicowilliams - Over at stackoverflow, @tarekz revealed:

I am testing on git bash for windows. Bash version 4.4.12

There is no problem with his script on a Mac using either bash 3 or 4.

#!/bin/bash
echo "parsing result in variable"
result='[{"Status":"A", "JSON":"{\"field1\":\"value1\"}"},{"Status":"B", "JSON":"{\"field1\":\"value2\"}"}]'
echo $result > json_in_json.json

result=$(echo "$result" | jq '[.[]."Status"]')
echo $result

echo "parsing result from file"
 jq '[.[]."Status"]' json_in_json.json

Also, i have asked the question on stakcoverflow if you would like to answer it there.
https://stackoverflow.com/questions/55029614/jq-unable-to-parse-json-inside-json

@pkoppstein you mentioned the wrong tarek :)

@tarekz The problem is in your understanding of what your shell is doing. In particular,

result=$(echo "$result" | jq '[.[]."Status"]')
echo $result

reads the output of echo "$result" | jq '[.[]."Status"]', removes any one or more newlines at the end of the output, stores that into the result variable; then, splits that into words depending on $IFS and invokes echo with the resulting array.

The strange look of the output on Windows results from having carriage returns embedded in the output while the line feeds that should have followed them are getting split on.

What your echo $result is printing out is basically "[\r \"A\",\r \"B\"\r ]\r\n", which appears as

 ]B",

Thank you! @muhmuhten
Do you mind answering me on StackOverflow too?
https://stackoverflow.com/questions/55029614/jq-unable-to-parse-json-with-embedded-json

@muhmuhten I do wonder why @tarekz's bash didn't also chomp the CR. Was it perhaps an Ubuntu bash run under Windows Subsystem for Linux? Was it using the Windows echo somehow?

Pasting this here if it can help:
$ bash --version
GNU bash, version 4.4.12(1)-release (x86_64-pc-msys)
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

@tarekz The problem is an interaction between that bash (which pretends Windows isn't Windows and doesn't use \r\n) and the jq executable you're using (from us) which is built as a native Windows application (so it uses \r\n). Witness:

    $ r=$(printf 'foo\r\n')
    $ printf 'foo\r\n'|od -c
    0000000   f   o   o  \r  \n
    0000005
    $ echo "$r"|od -c
    0000000   f   o   o  \n
    0000004
    $ r=$(printf 'foo\r\nbar\r\n')
    $ echo "$r"|od -c
--->0000000   f   o   o  \r  \n   b   a   r  \n
    0000011
    $ r=$(printf 'foo\nbar\n')
    $ echo "$r"|od -c
    0000000   f   o   o  \n   b   a   r  \n
    0000010
    $

See that? Bash did not remove that interior carriage return. It did remove the one at the end though.

I blame bash.

I suppose we could have an option to not output CRs. Or a Cigwin release of jq. I'm not terribly keen on either of those.

Anyone have any views on what, if anything, to do here?

Was this page helpful?
0 / 5 - 0 ratings