Jq: `path` throws errors when you try to do anything interesting with `select`

Created on 13 May 2016  路  12Comments  路  Source: stedolan/jq

I'm not sure if this is an issue, but I wanted to bring it to your attention. I've been talking with @davesnx, who's been playing with the idea of creating an Atom plugin to highlight parts of a JSON based on jq programs. I suggested the use of path to be able to tell where in the JSON object are the referred parts located. However, we quickly stumbled into some behaviors for path that we couldn't comprehend.

So, let's say we have a JSON like:

{
  "things": [
    {
      "name": "thing 1",
      "nested": {
        "values": [
          7,
          23
        ]
      }
    },
    {
      "name": "thing 2",
      "nested": {
        "values": [
          12,
          32
        ]
      }
    },
    {
      "name": "thing 3",
      "nested": {
        "values": [
          15,
          33
        ]
      }
    },
    {
      "name": "thing 4",
      "nested": {
        "values": [
          16,
          32
        ]
      }
    },
    {
      "name": "thing 5",
      "nested": {
        "values": [
          15,
          35
        ]
      }
    }
  ]
}

In this JSON, the following operations work as expected:

path(.things[] | select(.name == "thing 1"))
path(.things[] | select(.nested.values == [15, 35]))
path(.things[] | select(.nested.values[] < 30))

But the following don't:

path(.things[] | select(.name | contains("1")))
path(.things[] | select(.nested | .values == [15, 35]))
path(.things[] | select(.nested.values | .[] < 30))

Note, however, that the following _do_ work:

path(.things[] | select(.name | contains("0")))
path(.things[] | select(.nested | .values == [13, 37]))
path(.things[] | select(.nested.values | .[] < 0))

The fact that queries are or aren't valid depending on whether or not there are matches in the corresponding JSON makes this particularly confusing to figure out or debug.

bug fixed in master

All 12 comments

Trying to gain a full understanding of jq's path/1 is a bit like
trying to understand wave function collapse in quantum mechanics.
I won't go into details because there are simple ways to
use path/1 without causing the "collapse" that you have observed.

First, notice that you can almost achieve what you
want by using the form:

path(.things[] | .name | select(CRITERION))

where CRITERION is pipe-free (and thus prevents "collapse").

For example:

path(.things[] | .name | select( contains("1") ))

For your input, this returns ["things",0,"name"]
rather than ["things",0].

Alternatively, you could use the following filter:

def which(expr; criterion):
  path(expr) as $p
  | if getpath($p) | criterion then $p else empty end;

This reveals all the paths to those elements of expr that satisfy the given criterion.

So instead of the rather mysterious:

path(.things[] | select(.name == "thing 1"))

one would write:

which(.things[]; select(.name == "thing 1"))

For the given input, both yield: ["things",0].

And whereas:

path(.things[] | select(.name | contains("1"))) 

fails, the following produces ["things",0]:

which(.things[] ; select(.name | contains("1")))

Similarly,

which(.things[] ; select(.name | contains("1")))

yields a stream of arrays.

I feel like we could benefit from explaining what's _really_ going on under the hood in path/1 in the documentation, but to be honest, I don't quite understand it enough myself.

I do know that jq 1.5 has better detection for when we provide an invalid expression to path/1, which was added in #864. We produce an error when path detects "bad path expressions".

./jq 'path(.things[] | select(.name | contains("1")))' things.json
jq: error (at things.json:49): Invalid path expression with result {"name":"thing 1","nested"...

Hi guys,

I'm just a user of jq, I don't even know how works inside and sounds magic to me in most of times.

Trying to gain a full understanding of jq's path/1 is a bit like
trying to understand wave function collapse in quantum mechanics.
I won't go into details because there are simple ways to
use path/1 without causing the "collapse" that you have observed.

First, notice that you can almost achieve what you
want by using the form:

path(.things[] | .name | select(CRITERION))

where CRITERION is pipe-free (and thus prevents "collapse").

If the path wraps a piping with a select() and inside have a non-piping stuff it works, and if inside the select there's a piping it breaks?

@davesnx - When writing the paragraphs you quoted, I was trying to give a simple and sufficient condition for how to get easily predictable results (i.e. results you would reasonably expect without knowing about the implementation details of path/1) when using use path/1 with select/1. I was not saying that using pipes within the select subexpression will necessarily break easy-predictability.

I would love to keep this open, because I was facing some problems with path().

Have the same issue, but with del(). In my case it could be solved with an operator like =~ for regex versus piping it to test.

This is fixed in master! Yes, I know, we're overdue for a 1.6 release :( I just need to find the time for it.

Allow me to explain the magic away.

So... path(<program>) outputs the paths constructed from all the .[] and .[<key>] operators in <program>, which is really neat and useful. Did I say "all"? Well, not quite. In .name | select(<stuff>), the .[]/.[<key>] operators in <stuff> cannot contribute to the paths that path() should produce.

Now getting it right, as to which expressions contribute to path building, and which don't... has taken some time. First @dtolnay put in a consistency check which is what raises the error that you're seeing. Then we've been adding the missing code to exclude certain sub-expressions from path building context.

For those who are interested in the internals, the EACH, EACH_OPT, INDEX, and INDEX_OPT instruction opcodes can all contribute to path building. The PATH_BEGIN and PATH_END instructions enter or leave, respectively, path-building context. The instructions SUBEXP_BEGIN and SUBEXP_END instructions respectively enter or leave "sub-expression context", meaning "the context of expressions in a path-building context that don't contribute to path building". Bugs like this are all about places where the compiler should have been adding SUBEXP_BEGIN/SUBEXP_END brackets around generated code.

Eventually we should be confident that we've missed no sub-exp case and we can then remove @dtolnay's consistency check.

I was just bitten by this. What is the state of 1.6?

To be more specific, I am trying to delete some environment variables from Kubernetes JSON

cat foo.Deployment.json | jq '(.spec.template.spec.containers[] | select(.name == "foo").env) |= del(.[] | select(.name | startswith("FOO_")))'

jq: error (at :0): Invalid path expression with result {"name":"FOO_",...

Also stuck with del( . | any(startswith("x") or startswith("y")) | not) in v5. Seems to work with v6 https://jqplay.org/s/AVpz_IkfJa.

Is the workaround from @pkoppstein above adjustable for del() and, if so, how might that be achieved?

This is fixed in 1.6.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lbrader picture lbrader  路  3Comments

sloanlance picture sloanlance  路  3Comments

kelchy picture kelchy  路  4Comments

ghost picture ghost  路  4Comments

neowulf picture neowulf  路  3Comments