Jq: Are wildcard values possible?

Created on 5 Feb 2013  路  6Comments  路  Source: stedolan/jq

(Sorry for asking a question here, I couldn't find a forum for jq)

I have a pair of artificial and convoluted files

company1.json

{
    "A0001": {
        "album": "Album1", 
        "ref": "0001",
        "artist": "Artist1",
        "tracks": {
            "T1": {
                 "track": "1",
                 "title": "Album 1 Song 1",
                 "composer": "Composer Song 1"
            },
            "T2": {
                 "track": "2",
                 "title": "Album 1 Song 2",
                 "composer": "Composer Song 2"
            },
            "T3": {
                 "track": "3",
                 "title": "Album 1 Song 3",
                 "composer": "Composer Song 3"
            }
        }
    }, 
    "A0002": {
        "album": "Album2", 
        "ref": "0002",
        "artist": "Artist2",
        "tracks": {
            "T1": {
                 "track": "1",
                 "title": "Album 2 Song 1",
                 "composer": "Composer Song 1"
            },
            "T2": {
                 "track": "2",
                 "title": "Album 2 Song 2",
                 "composer": "Composer Song 2"
            },
            "T3": {
                 "track": "3",
                 "title": "Album 2 Song 3",
                 "composer": "Composer Song 3"
            }
        }
    }
}

company2.json

{
    "A0003": {
        "album": "Album3", 
        "ref": "0003",
        "artist": "Artist3",
        "tracks": {
            "T1": {
                 "track": "1",
                 "title": "Album 3 Song 1",
                 "composer": "Composer Song 1"
            },
            "T2": {
                 "track": "2",
                 "title": "Album 3 Song 2",
                 "composer": "Composer Song 2"
            },
            "T3": {
                 "track": "3",
                 "title": "Album 3 Song 3",
                 "composer": "Composer Song 3"
            }
        }
    }, 
    "A0004": {
        "album": "Album4", 
        "ref": "0004",
        "artist": "Artist4",
        "tracks": {
            "T1": {
                 "track": "1",
                 "title": "Album 4 Song 1",
                 "composer": "Composer Song 1"
            },
            "T2": {
                 "track": "2",
                 "title": "Album 4 Song 2",
                 "composer": "Composer Song 2"
            },
            "T3": {
                 "track": "3",
                 "title": "Album 4 Song 3",
                 "composer": "Composer Song 3"
            }
        }
    }
}

I can run the following

cat company*.json | jq 'if .A0002.tracks.T2.title == "Album 2 Song 2" then (.A0002) else select(false) end | .album, .artist'

which returns

"Album2"
"Artist2"

Is there a way I can wildcard the values A0002 and T2?
eg:

cat company*.json | jq 'if .**WILDCARD**.tracks.**WILDCARD**.title == "Album 2 Song 2" then (.**COMPLETE RECORD**) else select(false) end | .album, .artist'

also can I do some form of string match
eg

cat company*.json | jq 'if .**WILDCARD**.tracks.**WILDCARD**.title ISLIKE "Song 2" then (.**COMPLETE RECORD**) else select(false) end | .album, .artist'
support

Most helpful comment

.[] | select( .tracks[].title | contains("Song 2") ) | .album, .artist

My understanding (as a user) is that .[] acts as a wildcard - it produces the values of an object, as a stream. It also works for arrays (actualy, only its application to arrays is documented..). NB: if it occurs after another selector, you omit the ., so it's just [].

Thus you would need .[].tracks[].title for the test... however, for the result, you don't actually want the whole object, but only the value of the matching field (e.g. not the whole object in company1.json, but just value of field A0002). One way to do this is to iterate over the fields you want, applying the test to each, and returning the matching ones:

.[] | if .tracks[].title == "Album 2 Song 2" then (.) else select(false) end | .album, .artist

This can be simplified a bit. First, select(false) is the same as empty:

.[] | if .tracks[].title == "Album 2 Song 2" then (.) else empty end | .album, .artist

Next, the whole if expression can be written as a select:

.[] | select( .tracks[].title == "Album 2 Song 2" ) | .album, .artist

Finally, to answer your second question, jq doesn't have an ISLIKE, but it does have contains which, when used with strings, is a substring test. It takes the sought substring as an argument, and returns true or false. So, the script becomes:

.[] | select( .tracks[].title | contains("Song 2") ) | .album, .artist

All 6 comments

.[] | select( .tracks[].title | contains("Song 2") ) | .album, .artist

My understanding (as a user) is that .[] acts as a wildcard - it produces the values of an object, as a stream. It also works for arrays (actualy, only its application to arrays is documented..). NB: if it occurs after another selector, you omit the ., so it's just [].

Thus you would need .[].tracks[].title for the test... however, for the result, you don't actually want the whole object, but only the value of the matching field (e.g. not the whole object in company1.json, but just value of field A0002). One way to do this is to iterate over the fields you want, applying the test to each, and returning the matching ones:

.[] | if .tracks[].title == "Album 2 Song 2" then (.) else select(false) end | .album, .artist

This can be simplified a bit. First, select(false) is the same as empty:

.[] | if .tracks[].title == "Album 2 Song 2" then (.) else empty end | .album, .artist

Next, the whole if expression can be written as a select:

.[] | select( .tracks[].title == "Album 2 Song 2" ) | .album, .artist

Finally, to answer your second question, jq doesn't have an ISLIKE, but it does have contains which, when used with strings, is a substring test. It takes the sought substring as an argument, and returns true or false. So, the script becomes:

.[] | select( .tracks[].title | contains("Song 2") ) | .album, .artist

13ren, thank you very much for your clear and detailed response. I'd been tearing out my hair trying to understand this.

I have two further questions related to my example files.

Firstly, given that

.[] | select( .tracks[].title == "Album 2 Song 2" ) 

finds the target record for me, is there a way to return just the whole records key? in this case A0002.

(I can see the keys operator, but can't see how to use it to achieve this.)

Secondly, Having passed back just the key, eg A0002 it would be great to pass back the file-name that the record was found in also, eg company1.json. Is there an elegant way to do this?

(My thoughts were to add a top level key "FILENAME": "company1.json to the input file and reference it with .FILENAME, but I get jq: error: Cannot index string with string if I do that)

. as $in| keys[]| select( $in[.].tracks[].title == "Album 2 Song 2" )

To return the key (fieldname), you have to use that in the lookup. keys gives you the fieldnames as an array; keys[] streams the contents of this array (as with [] before).

Because jq only has one stream argument, and we need both the object we are addressing and the key/fieldname, we need to store one of them in a variable. In the above, the object is stored in variable $in. Setting a variable doesn't affect the stream, but acts exactly like ..

I think your idea for a top-level key should work. Note that the wildcard code will attempt to lookup (or "index") the field tracks of the object. If you try to do this on a string instead of an object, you get that error. Cannot index string with string. So one solution is to add a check for the type of value. e.g.

. as $in| keys[]| select( ($in[.]|type=="object") and $in[.].tracks[].title == "Album 2 Song 2" )

You can factor out the $in[.], just within the boolean expression, which doesn't affect the overall result:

. as $in| keys[]| select($in[.]| (type=="object" and .tracks[].title == "Album 2 Song 2") )

Though maybe it's clearer, and more jq-like, to just filter it out in a separate step:

. as $in| keys[]| select($in[.]|type=="object") | select( $in[.].tracks[].title == "Album 2 Song 2")

BTW: type isn't documented, but noted in an issue.

As an aside, you could even have a separate string outside the object even though it's not valid josn (because jq accepts a stream of json instances), like:

"myfilename"
{...}

. as $in| select(type=="object")| keys[]| select( $in[.].tracks[].title == "Album 2 Song 2" )

Finally, there is a way to do _partially_ want you want. Although jq doesn't distinguish between different input _files_, it does distinguish between json instances in a stream. The --slurp/-s flag assembles this stream into an array. If you have one json per file, this tells you which file it was (though not the filename). Here's some discussion.

BTW: I think you might be interested in this little tutorial towards the end of the docs, about more complex queries: http://stedolan.github.com/jq/manual/#VariablesandFunctions

I'm beginning to feel like I'm taking advantage of you, If I can crack this, then I'm all set to solve my task

If I alter the input file to something like

{
    "FILENAME": company1.json",
    "A0001": {
        "album": "Album1", 
        "ref": "0001",
        "artist": "Artist1",
        "tracks": {
            "T1": {
                 "track": "1",
                 "title": "Album 1 Song 1",
                 "composer": "Composer Song 1"
            },
            "T2": {
                 "track": "2",
                 "title": "Album 1 Song 2",
                 "composer": "Composer Song 2"
            },
            "T3": {
                 "track": "3",
                 "title": "Album 1 Song 3",
                 "composer": "Composer Song 3"
            }
        }
    }, 
    "A0002": {
        "album": "Album2", 
        "ref": "0002",
        "artist": "Artist2",
        "tracks": {
            "T1": {
                 "track": "1",
                 "title": "Album 2 Song 1",
                 "composer": "Composer Song 1"
            },
            "T2": {
                 "track": "2",
                 "title": "Album 2 Song 2",
                 "composer": "Composer Song 2"
            },
            "T3": {
                 "track": "3",
                 "title": "Album 2 Song 3",
                 "composer": "Composer Song 3"
            }
        }
    }
}

and then use

cat company*.json | jq '. as $in | .FILENAME as $fn | keys[] | select($in[.]|type=="object") | select($in[.].tracks[].title == "Album 2 Song 2") as $res| $res, $fn'

I get this

"A0002"
"company1.json"

whoo-hoo!!!

is this the best way to tackle this or is there a more jq way?

I'm beginning to feel like I'm taking advantage of you

You can fix that by passing on your understanding to the next asker... also a tremendous way to consolidate your understanding. ;-)

I especially appreciated the example of how to use contains().

Was this page helpful?
0 / 5 - 0 ratings

Related issues

thedward picture thedward  路  3Comments

rubensayshi picture rubensayshi  路  3Comments

mcandre picture mcandre  路  3Comments

neowulf picture neowulf  路  3Comments

thelonious picture thelonious  路  4Comments