Jq: Script Files

Created on 27 Nov 2015  Â·  43Comments  Â·  Source: stedolan/jq

What is the filetype extension for script files? I was thinking of making a syntax highlighting file

contribution

Most helpful comment

@vito-c - A Q has been added to the FAQ: https://github.com/stedolan/jq/wiki/FAQ#editor-bindings

All 43 comments

Most of us use .jq
I'm not aware of any other extensions in use, but I've been fairly absent
recently.

On Fri, Nov 27, 2015 at 3:16 PM vito-c [email protected] wrote:

What is the filetype extension for script files? I was thinking of making
a syntax highlighting file

—
Reply to this email directly or view it on GitHub
https://github.com/stedolan/jq/issues/1025.

sounds legit. I did a quick google search and didn't see .jq in use anywhere else. Are there any syntax highlighting files currently available?

Not to my knowledge. I'm excited to see what you come up with, though!

On Fri, Nov 27, 2015 at 3:20 PM vito-c [email protected] wrote:

sounds legit. I did a quick google search and didn't see .jq in use
anywhere else. Are there any syntax highlighting files currently available?

—
Reply to this email directly or view it on GitHub
https://github.com/stedolan/jq/issues/1025#issuecomment-160199668.

Is there a list of all keywords somewhere? Could people post up some lengthy jq filters so I can test how this is going to look? So I don't end up making unicorn rainbow barf :D

should I include these as functions or keywords?
https://github.com/stedolan/jq/blob/master/src/builtin.jq

My preference would be to color builtin functions different from user-defined functions different from keywords different from formatting operators (@sh and friends).

The builtin functions are all the jq-coded ones in builtin.jq, all the c-coded ones in function_list in builtin.c: https://github.com/stedolan/jq/blob/0d177d240dc06adfb676716d5adc849b326c21f5/src/builtin.c#L1249 and the two bytecoded ones in builtin_defs in builtin.c: https://github.com/stedolan/jq/blob/0d177d240dc06adfb676716d5adc849b326c21f5/src/builtin.c#L1325

The keywords are if, then, elif, else, end, and, or, reduce, as, try, catch, label, break, def, foreach, import, include, module, modulemeta.

For formatting operators just use @[a-zA-Z0-9_]+ like the lexer does, no need to enumerate them.

Thanks for your work on this! What editor(s) is this going to work in?

@vito-c asked:

Could people post up some lengthy jq filters ...?

See e.g. https://github.com/joelpurra/jqnpm/wiki

And in particular https://github.com/joelpurra/jq-bigint --main.jq definitely qualifies as lengthy.

By the way, __loc__ is also a keyword (de facto); and of course "true", "false" and "null" have a special status.

Good luck!

jq's src/builtin.jq is a pretty sizeable jq code file.

Thanks a lot for the lengthy scripts they have been helpful! @dtolnay I use Vim/Neovim so that is the editor I'm supporting. I'm open to making highlighting for some other editors later as well.

I'm going to break down some tasks for myself here so you guys can give feed back. I wanted to get your opinions on how a documentation commenting block should look.

I can definitely highlight builtin c and builtin jq functions differently there might be a bit of challenge with user functions though. User functions will probably end up being the default color.

Questions:

  • I was also curious can jq emit anything from the lexer like an AST?
  • I have never used the formatting operators does anyone have an example of those?
  • is jq -n -c the fastest way to compile a script and get errors (think linter)

Syntax Highlighting for Vim/Neovim

  • [x] keywords, conditionals, booleans, numbers
  • [x] strings
  • [x] variables
  • [x] comments, todos, etc
  • [x] numbers optional
  • [x] string optional uncolored quotes (make sure escaping works inside)
  • [x] operators
  • [x] user defined functions (default colors)
  • [x] jq defined functions (builtin.jq & builtin.c)
  • [ ] formating opertators @[a-zA-Z0-9_]+
  • [ ] documentation comments

Almost forgot. Progress!
screen shot 2015-11-29 at 10 20 01 am

Looking good! And I'm a vim user, so, yay!

Now, what i think might be helpful is to color commas and semicolons
differently, and to color parenthesis and pipe. I'm not sure i care too
have all builtins highlighted differently than other identifiers.

The unicorn rainbow barf is definitely beginning. Here is a mockup of what I would like in my color scheme. The main differences are numbers being uncolored and strings being a single color (although make sure to handle string interpolations).

As I commented earlier (though not very explicitly), I don't think you should color jq-coded and c-coded builtins differently from each other.

  • No, jq does not currently build an AST. The lexer produces a sequence of tokens and the parser goes straight from tokens to bytecode.
  • I haven't used it either but here it is in the manual: https://stedolan.github.io/jq/manual/#Formatstringsandescaping
  • That will run the script, not just compile. What is this for? We may need to add a way to compile without running.

color

@dtolnay I misread you at first then! I was hoping builtin c and jq would be the same color :+1: I rather like having string quotes different colors I can easily make this a configuration variable for you though so that it can be toggled. I will edit the task list inline.

That will run the script, not just compile. What is this for? We may need to add a way to compile without running.

I was going to use this for syntax checking which is doable but there might be side effects if the script also runs? I also noticed there is no column for errors https://github.com/stedolan/jq/issues/1027 which means best I could do was mark the error at the line.

@dtolnay can tokens be emitted? I was wondering how it would look

No, the closest is jq --debug-dump-disasm which dumps the bytecode.

@dtolnay I will make highlighting numbers a variable as well. I like being able to highlight numbers because it makes it easier to see the range.

@nicowilliams commas, semicolons, pipes, parenthesis do you want these the same as other operators (same color as == - +)? I was thinking semicolons and pipes should be special.

@pkoppstein did you mean $__loc__?

@vito-c - If you look in lexer.l you'll see __loc__ has the status of a keyword.

To see the effect, compare:

jq -n 'def __loc__: 0;'

with

jq -n 'def _loc__: 0;'

jq does have an internal AST-like representation of jq programs called
"blocks", complete with links to line number and column number
information. It has no serialization for this data structure, but I've
wanted one for a different reason: to help me reason about the compiler.
Armed with two reasons for it, it's safe to say we should add a
serialization of the block representation of jq programs. The
serialization should use JSON, naturally, though it's not a perfect fit
because of the need to represent "binders", but that's easy to solve.
Anyways, I'll think about it.

@nicowilliams sweet! Neovim also supports async operations and remote plugins. I have seen a go plugin for neovim that uses an AST representation to perform syntax highlighting. I was thinking how cool it would be if you could have your jq blocks highlighted differently. It was more of just an idea I was pondering though.

It's be particularly awesome if we could support selection of
sub-expressions as a way to handle cases where operator precedence hurts.

I am going to highlight the builtin.c & builtin.jq functions as functions. path will also fall under function but empty and not strike me more as keywords.

builtins.c

| 1 | 2 | 3 | 4 | 5 |
| --- | --- | --- | --- | --- |
| _plus | _negate | _minus | _multiply | _divide |
| _mod | tojson | fromjson | tonumber | tostring |
| keys | keys_unsorted | startswith | endswith | ltrimstr |
| rtrimstr | split | explode | implode | _strindices |
| setpath | getpath | delpaths | has | _equal |
| _notequal | _less | _greater | _lesseq | _greatereq |
| contains | length | utf8bytelength | type | isinfinite |
| isnan | isnormal | infinite | nan | sort |
| _sort_by_impl | _group_by_impl | min | max | _min_by_impl |
| _max_by_impl | error | format | env | get_search_list |
| get_prog_origin | get_jq_origin | _match_impl | modulemeta | _input |
| debug | stderr | strptime | strftime | mktime |
| gmtime | now | input_filename | input_line_number | |

builtins.jq

| 1 | 2 | 3 | 4 | 5 |
| --- | --- | --- | --- | --- |
| _assign | _flatten | _modify | _nwise | add |
| all | any | arrays | ascii_downcase | ascii_upcase |
| booleans | bsearch | capture | combinations | del |
| error | finites | first | flatten | from_entries |
| fromdate | fromdateiso8601 | fromstream | group_by | gsub |
| in | index | indices | input | inputs |
| inside | isfinite | iterables | join | last |
| leaf_paths | limit | map | map_values | match |
| max_by | min_by | normals | nth | nulls |
| numbers | objects | paths | range | recurse |
| recurse_down | repeat | reverse | rindex | scalars |
| scalars_or_empty | scan | select | sort_by | split |
| splits | strings | sub | test | to_entries |
| todate | todateiso8601 | tostream | transpose | truncate_stream |
| unique | unique_by | until | values | walk |
| while | with_entries | | | |

I'd say any of the things starting with an _ shouldn't be highlighted.
Those are mostly internal implementations of things that we don't
necessarily want people playing with. Especially the ones in builtin.c
which are primarily the implementations of operators.

Also, things that have _impl on them aren't really for public consumption
and are used behind-the-scenes by other exposed functions.

On Mon, Nov 30, 2015 at 3:22 PM vito-c [email protected] wrote:

I am going to highlight the builtin.c & builtin.jq functions as functions.
path will also fall under function but empty and not strike me more as
keywords.
builtins.c 1 2 3 4 5 _plus _negate _minus _multiply _divide _mod tojson
fromjson tonumber tostring keys keys_unsorted startswith endswith ltrimstr
rtrimstr split explode implode _strindices setpath getpath delpaths has
_equal _notequal _less _greater _lesseq _greatereq contains length
utf8bytelength type isinfinite isnan isnormal infinite nan sort
_sort_by_impl _group_by_impl min max _min_by_impl _max_by_impl error
format env get_search_list get_prog_origin get_jq_origin _match_impl
modulemeta _input debug stderr strptime strftime mktime gmtime now
input_filename input_line_number builtins.jq 1 2 3 4 5 _assign _flatten
_modify _nwise add all any arrays ascii_downcase ascii_upcase booleans
bsearch capture combinations del error finites first flatten from_entries
fromdate fromdateiso8601 fromstream group_by gsub in index indices input
inputs inside isfinite iterables join last leaf_paths limit map map_values
match max_by min_by normals nth nulls numbers objects paths range recurse
recurse_down repeat reverse rindex scalars scalars_or_empty scan select
sort_by split splits strings sub test to_entries todate todateiso8601
tostream transpose truncate_stream unique unique_by until values walk
while with_entries

—
Reply to this email directly or view it on GitHub
https://github.com/stedolan/jq/issues/1025#issuecomment-160750330.

Yes. empty and not are very much like keywords. path() is rather special, but since one can and probably would write useful functions as special as path() using path(), I'm not sure how much benefit one might get from treating path() as special in the syntax highlighter, but I tend to think it's better to treat it as special.

We should be guided here by what people do for highlighting LISPs. As with LISP we have syntax, special forms, and everything else. And because arguments are closures, all jq functions can behave in ways more akin to LISP macros. If a LISP loop macro is highlighted differently than user-defined function/macro calls, then perhaps we should do the same for any jq functions that are "special" in some sense that relates to how likely it is that it will help to do so.

I saw a paper this weekend [http://ppig.org/sites/default/files/2015-PPIG-26th-Sarkar.pdf] about syntax highlighting that showed that it is useful indeed. In particular it helps reduce the need for a reader's eyes to fall on keywords very often, and helps the reader reduce the amount of context they need to parse a program/expression. Formatting/indentation seems to play a similar role (though the authors of that paper didn't look into that, but their code samples were all properly indented).

Unless we do our own studies of different variations of syntax highlighting for jq :), there is going to be a lot of subjectivity in your choice of what builtins to highlight as special and which ones not to.

@wtlangford Builtins with _ prefixes should be made not available for binding outside the builtins themselves. However, until we implement that, we should definitely highlight such builtins as erroneous tokens -- it won't bother jq developers.

@nicowilliams Sounds good, but what do you mean by erroneous tokens?

Though, of course, jq users should be allowed to define their own private functions for their modules... We might want to think more carefully about visibility rules. We might need a package-like way to group symbols from multiple related files.

@wtlangford I mean: angry red :)

Angry red is certainly one way to do it. I was going to comment on how to approach that in a more generic, module-oriented way of doing it, but I felt this thread was perhaps not the right place.

Specials

type path | and ;
What about paths, delpaths, getpath, and setpath?

Keywords

| 1 | 2 | 3 | 4 | 5 |
| --- | --- | --- | --- | --- |
| if | then | elif | else | and |
| or | not | empty | try | catch |
| null | end | reduce | as | label |
| break | foreach | import | include | module |
| modulemeta | | | | |

Booleans

true false

Errors

Things that start with _ in the builtins file. I noticed that everything hat has _impl at the end also starts with _ so that is two birds with one stone.

Functions

Functions moving to keywords:

in while error stderr del debug

Repeaters

Should these have keyword or function highlighting
repeat recurse until recurse_down iterables range

More possible keywords:

These feel like keywords to me
with_entries to_entries from_entries nth has env

What about these

get_jq_origin get_prog_origin

All the functions

| 1 | 2 | 3 | 4 | 5 |
| --- | --- | --- | --- | --- |
| add | all | any | arrays | ascii_downcase |
| ascii_upcase | booleans | bsearch | capture | combinations |
| del | error | finites | first | flatten |
| _from_entries_ | fromdate | fromdateiso8601 | fromstream | group_by |
| gsub | in | index | indices | input |
| inputs | inside | isfinite | _iterables_ | join |
| last | leaf_paths | limit | map | map_values |
| match | max_by | min_by | normals | _nth_ |
| nulls | numbers | objects | _paths_ | _range_ |
| _recurse_ | _recurse_down_ | _repeat_ | _reverse_ | rindex |
| scalars | scalars_or_empty | scan | select | sort_by |
| split | splits | strings | sub | test |
| _to_entries_ | todate | todateiso8601 | tostream | transpose |
| truncate_stream | unique | unique_by | _until_ | values |
| walk | while | _with_entries_ | tojson | fromjson |
| tonumber | tostring | keys | keys_unsorted | startswith |
| endswith | ltrimstr | rtrimstr | split | explode |
| implode | setpath | getpath | delpaths | _has_ |
| contains | length | utf8bytelength | type | isinfinite |
| isnan | isnormal | infinite | nan | sort |
| min | max | error | format | _env_ |
| get_search_list | _get_prog_origin_ | _get_jq_origin_ | modulemeta | debug |
| stderr | strptime | strftime | mktime | gmtime |
| now | input_filename | input_line_number | | |

del should be in the same category as paths, delpaths, getpath, and setpath, which aren't very special (certainly not as special as path).

I don't think the repeaters should be special at all. Ditto with*, from*, to*. So perhaps just four colors for idents/keywords: one for keywords and keyword-alikes, one for specials, one for all other builtins, and one for user-defined idents. $ident should probably get its own color, so that's five colors. Maybe ; should get keyword status, maybe a separate color.

Once you ship it we can start discussing improvements to that breakdown.

Can't wait! :)

@vito-c -- It seems to me that a syntax highlighter for jq should clearly distinguish between ordinary filters (anything that can be written as NAME or NAME(...)) on the one hand, and filters having a special
syntactic form on the other, notably:

and; or; if/then/elif/else/end; reduce/as; foreach/as

Notice that, from a syntactic point of view, "not", "recurse", "repeat", "while" and "until" are quite ordinary, as indeed are "debug", "empty", etc.

Conversely, lumping "null" into the keyword category is problematic for at least two reasons:

  1. "null" is not a keyword (in the sense, for example, that 'def null: 0;' is syntactically valid);
  2. "null" is syntactically and semantically like "true" and "false" (see lexer.l for syntax; and semantically, they're all named constants).

In summary, it seems to me that whatever else it does, the coloring scheme:

  • should make it very easy to spot non-ordinary filters;
  • should NOT classify "null" as a "keyword".

Thanks!

true, false, and null are rather special, being IDENTs in the parser, but also immediately turned into the corresponding constants in the parser:

$ jq -n 'def null: true; null'
null

How about that.

Whether something should be highlighted one way or the other strikes me as a subjective call that should be based on what minimizes context switching for jq programmers (see linked paper). To me true, false, and null are a lot more special than foo or fromjson, and it's useful to call attention to them as in other languages. But also, too many colors would be distracting, and too many "keywords" would be as well. We'll have to iterate over this to find something that works well for most.

Submitted for feedback :)
https://github.com/vito-c/jq.vim

Beautiful! Thanks!

For src/builtin.jq this is a bit painful because all the functions there are builtins, so it's a sea of yellow and light-blue; for other .jq files it's better. Either way it's better than no highlighting! I'll play with the color scheme at some point. Green for semicolons is a bit weird, as I want either those or commas to stand out. Hmmm, I guess commas should stand out more. Well, we should all play with it and see.

Should we link to your repo from the jq README and site?

I'll play with the color scheme at some point.

I used molokai when I was creating the theme

Should we link to your repo from the jq README and site?

sure thing! I just wanted to try and give back to the wonderful tool you all have created :+1:

See #1032.

@nicowilliams https://github.com/vito-c/jq.vim if you want to put it somewhere on the site for people to find feel free :)

Many thanks!
I ignored the availability of this syntax file :-(

@nicowilliams can we post it on the wiki somewhere? :D

I can post on the vim mailing list to see how I can get it added to the default install too if that would help

@vito-c - A Q has been added to the FAQ: https://github.com/stedolan/jq/wiki/FAQ#editor-bindings

Was this page helpful?
0 / 5 - 0 ratings

Related issues

benjamin-bin-shen picture benjamin-bin-shen  Â·  3Comments

kaihendry picture kaihendry  Â·  4Comments

ve3ied picture ve3ied  Â·  4Comments

rubensayshi picture rubensayshi  Â·  3Comments

geoffeg picture geoffeg  Â·  3Comments