Bazel: Please add a Skylark builtin for parsing JSON to a dict (useful for repo rules)

Created on 14 Sep 2017  路  12Comments  路  Source: bazelbuild/bazel

When writing repository rules that inspect the state of the local system, I commonly want to run some local command and inspect its machine-readable output. The rule would then make configuration decisions based on available features, versions, installed plugins, and so on.

Nearly all tools that offer machine-readable version data use JSON, or at least offer it as an option. Rather than try to hack my way through with .split and .index, it would be much nicer if I could take a JSON string and have Bazel parse it directly into a dict within Skylark. A builtin json_to_dict(str) -> dict function would do nicely.

Using cmake as an example, I want to get its version using repository_ctx.execute():

$ cmake -E capabilities
{"version":{"isDirty":false,"major":3,"minor":9,"patch":1,"string":"3.9.1","suffix":""}}

... and then use that (3,9,1) tuple to decide whether it's recent enough, or should be ignored in favor of an external repository.

There was a previous issue open about this (https://github.com/bazelbuild/bazel/issues/1813), but it had unclear motivations and was closed without action taken.

P2 area-ExternalDeps team-XProduct feature request

Most helpful comment

FWIW, rules_jvm_external has been using @erickj's bazel-json parser in Starlark and we haven't ran into any issues with it since the beginning of the project.

All 12 comments

@laurentlb , @vladmos : WDYT?

FTR - simply for kicks and because I was curious about using JSON content as filters in an aspect rule... I've written this toy program to parse JSON using a push down automaton:

https://github.com/erickj/bazel_json/blob/master/lib/json_parser.bzl

some examples of how to use it here:
https://github.com/erickj/bazel_json/blob/master/test/json_parse_tests.bzl

It needs quite a bit more work, but it should be a good start for anyone looking for basic JSON parsing

Where does the input come from? Are you inserting the string inside a .bzl file and you want to parse it? In that case, why do you use json?

In general, parsing/transforming json should be done during execution (outside Skylark).

Where does the input come from? Are you inserting the string inside a .bzl file and you want to parse it? In that case, why do you use json?

See the original post -- the JSON is coming from an external binary, which uses it as a machine-readable output format. Even if cmake wrote out Skylark-compatible Python expressions, it'd be hard to parse them from Skylark.

In general, parsing/transforming json should be done during execution (outside Skylark).

That's fine for normal builds, but for repository_rule implementations it would be nice if we could do at least basic parsing without having to shell out to some very large dependency like Python.

+1 for use during repository rule execution. Using the parser I linked above above I've found it useful to read and write intermediate build info in JSON, then just execute cat -> json_parse to get a structured object.

Also I've recently found a convenient pattern in using JSON to serialiaze highly structured data to enable using providers as a pseudo interface between macros and repository rule impls. e.g. I recently wrote a macro that creates a list of structs, in which each struct (defined as a provider) is composed of several mixed scalar types, a list of strings, and a dict. This list of structs acts as a declarative dsl for downloading, "installing", and aliasing some bootstrap dependencies into a repostiroy rule.

AFAIU, using attrs to pass this structure into a repository rule would need to deconstruct and then reconstruct the data into flattened string lists, string dicts, etc... OR construct some weird intermediate rule based representation so that I could use labels to point to my structures. (Please let me know if I'm wrong here, you'd literally make my week)

In any case, JSON makes it super easy. struct.toJson -> attr.string_list -> MyProvider(**parse_json) and viola I have my data on both sides of the repo rule.

Even if cmake wrote out Skylark-compatible Python expressions, it'd be hard to parse them from Skylark.

But it's possible to generate an object like

config = {
  "foo": {
    ...
  },
  ...
}

and just load this instance. An even cleaner version would be to generate and use struct.

Running into this: coursier spits out a JSON map of Maven artifact dependencies. Having a built-in Starlark construct (even if only in repository rules) can reduce a lot of the work to write special tools, or use jq, to parse such data.

cc @andyscott

Moving to the ExternalDeps team, because this feature would be useful in the context of workspaces (e.g. with repository_ctx.execute()).

@dkelmer wdyt?

FWIW, rules_jvm_external has been using @erickj's bazel-json parser in Starlark and we haven't ran into any issues with it since the beginning of the project.

That's good to hear, @jin - I was looking at it as well. Glad to hear you're having good success with it.

There's now a couple implementations of a JSON API in non-Bazel tools built on Starlark. To see if we can settle on a reasonable shared JSON API, I sent https://github.com/bazelbuild/starlark/pull/83 as a proposed standard JSON module in the Starlark spec.

Was this page helpful?
0 / 5 - 0 ratings