Jq: Selectively replace values if they exist (regression in 1.5?)

Created on 9 Sep 2015  路  4Comments  路  Source: stedolan/jq

I have been using jq to clean the output from jupyter notebooks, which are saved in a json-based file format, for versioning with git. I have a very simple jq statement that worked fine with v1.4, but fails in v1.5 with jq: error (at test_jq.ipynb:51): Invalid path expression with result {"cell_type":"code","execu..., and I'm not sure why. I hope someone here can help.

The format of the notebook files is something like (i've removed some keys in metadata which are not relevant here):

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# First heading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hallo\n"
     ]
    }
   ],
   "source": [
    "print(\"hallo\")"
   ]
  }
 ],
 "metadata": {
   "version": "3.4.3"
 },
 "nbformat": 4,
 "nbformat_minor": 0
}

So "cells" is an array, and some of the cells have an "outputs" array and an "execution_count" field. When these exist, I want to set them to the empty array [] and null, respectively, and still output the whole input, with just these replacements. The code I have, and which works in jq v1.4 is

jq '(.cells[] | select(.outputs) ) |= [] | (.cells[] | select(.execution_count)) |= null' input.ipynb

As I wrote above, with jq 1.5, I get the error Invalid path expression with result {"cell_type":"code","execu.... I have been playing around to find a fix for this, but haven't found a solution yet. So I'm hoping someone here who knows more about jq can help. Note that I do not want to add an "outputs" or "execution_count" field to cells that do not have them (they're only valid for cell_type=="code").

Update: Just in case it's not clear, here's the output I want:

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# First heading"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"hallo\")"
      ]
    }
  ],
  "metadata": {
    "version": "3.4.3"
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
support

Most helpful comment

@jfeist But we should explain why your jq code broke with 1.5. (.foo[] | select(.bar)) |= [] fails because you're trying to assign to... what, every element of the array at .foo?? This, for example, works: (.foo[].bar | select(.)) |= [].

If I understood correctly what you were trying to accomplish, this works with 1.4 and 1.5:

$ jq '((.cells[].outputs | select(.) ) |= []) | (((.cells[].execution_count | select(.))) |= null)'

and is only a very slight tweak of the original.

All 4 comments

@jfeist - I believe the following is what you're looking for. It produces the same results (in either jq 1.4 or jq 1.5) as your query (in jq 1.4) using the data you provided, trimmed of the last two key-value pairs (which are syntactically invalid):

   .cells |= map(if has("outputs") then .outputs = [] else . end
                 | if has("execution_count") then .execution_count = null else . end)

Thanks a lot, works perfectly!
Just out of interest: Do you know why my original solution works in jq 1.4, but not in jq 1.5? Was I mistakenly exploiting a bug/unintended behavior?

PS: I've updated the input above to remove an extraneous } that I missed when trimming the metadata section to make the example smaller (which is why last two keys were invalid). Now the line you gave produces exactly the desired output for the input in the original post.

@jfeist But we should explain why your jq code broke with 1.5. (.foo[] | select(.bar)) |= [] fails because you're trying to assign to... what, every element of the array at .foo?? This, for example, works: (.foo[].bar | select(.)) |= [].

If I understood correctly what you were trying to accomplish, this works with 1.4 and 1.5:

$ jq '((.cells[].outputs | select(.) ) |= []) | (((.cells[].execution_count | select(.))) |= null)'

and is only a very slight tweak of the original.

Thanks, that explains it perfectly!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

thelonious picture thelonious  路  4Comments

tbelaire picture tbelaire  路  4Comments

sonots picture sonots  路  3Comments

ghost picture ghost  路  4Comments

sloanlance picture sloanlance  路  3Comments