I have been using jq to clean the output from jupyter notebooks, which are saved in a json-based file format, for versioning with git. I have a very simple jq statement that worked fine with v1.4, but fails in v1.5 with jq: error (at test_jq.ipynb:51): Invalid path expression with result {"cell_type":"code","execu..., and I'm not sure why. I hope someone here can help.
The format of the notebook files is something like (i've removed some keys in metadata which are not relevant here):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# First heading"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"hallo\n"
]
}
],
"source": [
"print(\"hallo\")"
]
}
],
"metadata": {
"version": "3.4.3"
},
"nbformat": 4,
"nbformat_minor": 0
}
So "cells" is an array, and some of the cells have an "outputs" array and an "execution_count" field. When these exist, I want to set them to the empty array [] and null, respectively, and still output the whole input, with just these replacements. The code I have, and which works in jq v1.4 is
jq '(.cells[] | select(.outputs) ) |= [] | (.cells[] | select(.execution_count)) |= null' input.ipynb
As I wrote above, with jq 1.5, I get the error Invalid path expression with result {"cell_type":"code","execu.... I have been playing around to find a fix for this, but haven't found a solution yet. So I'm hoping someone here who knows more about jq can help. Note that I do not want to add an "outputs" or "execution_count" field to cells that do not have them (they're only valid for cell_type=="code").
Update: Just in case it's not clear, here's the output I want:
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# First heading"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(\"hallo\")"
]
}
],
"metadata": {
"version": "3.4.3"
},
"nbformat": 4,
"nbformat_minor": 0
}
@jfeist - I believe the following is what you're looking for. It produces the same results (in either jq 1.4 or jq 1.5) as your query (in jq 1.4) using the data you provided, trimmed of the last two key-value pairs (which are syntactically invalid):
.cells |= map(if has("outputs") then .outputs = [] else . end
| if has("execution_count") then .execution_count = null else . end)
Thanks a lot, works perfectly!
Just out of interest: Do you know why my original solution works in jq 1.4, but not in jq 1.5? Was I mistakenly exploiting a bug/unintended behavior?
PS: I've updated the input above to remove an extraneous } that I missed when trimming the metadata section to make the example smaller (which is why last two keys were invalid). Now the line you gave produces exactly the desired output for the input in the original post.
@jfeist But we should explain why your jq code broke with 1.5. (.foo[] | select(.bar)) |= [] fails because you're trying to assign to... what, every element of the array at .foo?? This, for example, works: (.foo[].bar | select(.)) |= [].
If I understood correctly what you were trying to accomplish, this works with 1.4 and 1.5:
$ jq '((.cells[].outputs | select(.) ) |= []) | (((.cells[].execution_count | select(.))) |= null)'
and is only a very slight tweak of the original.
Thanks, that explains it perfectly!
Most helpful comment
@jfeist But we should explain why your jq code broke with 1.5.
(.foo[] | select(.bar)) |= []fails because you're trying to assign to... what, every element of the array at.foo?? This, for example, works:(.foo[].bar | select(.)) |= [].If I understood correctly what you were trying to accomplish, this works with 1.4 and 1.5:
and is only a very slight tweak of the original.