When writing repository rules that inspect the state of the local system, I commonly want to run some local command and inspect its machine-readable output. The rule would then make configuration decisions based on available features, versions, installed plugins, and so on.
Nearly all tools that offer machine-readable version data use JSON, or at least offer it as an option. Rather than try to hack my way through with .split
and .index
, it would be much nicer if I could take a JSON string and have Bazel parse it directly into a dict
within Skylark. A builtin json_to_dict(str) -> dict
function would do nicely.
Using cmake as an example, I want to get its version using repository_ctx.execute()
:
$ cmake -E capabilities
{"version":{"isDirty":false,"major":3,"minor":9,"patch":1,"string":"3.9.1","suffix":""}}
... and then use that (3,9,1)
tuple to decide whether it's recent enough, or should be ignored in favor of an external repository.
There was a previous issue open about this (https://github.com/bazelbuild/bazel/issues/1813), but it had unclear motivations and was closed without action taken.
@laurentlb , @vladmos : WDYT?
FTR - simply for kicks and because I was curious about using JSON content as filters in an aspect rule... I've written this toy program to parse JSON using a push down automaton:
https://github.com/erickj/bazel_json/blob/master/lib/json_parser.bzl
some examples of how to use it here:
https://github.com/erickj/bazel_json/blob/master/test/json_parse_tests.bzl
It needs quite a bit more work, but it should be a good start for anyone looking for basic JSON parsing
Where does the input come from? Are you inserting the string inside a .bzl file and you want to parse it? In that case, why do you use json?
In general, parsing/transforming json should be done during execution (outside Skylark).
Where does the input come from? Are you inserting the string inside a .bzl file and you want to parse it? In that case, why do you use json?
See the original post -- the JSON is coming from an external binary, which uses it as a machine-readable output format. Even if cmake
wrote out Skylark-compatible Python expressions, it'd be hard to parse them from Skylark.
In general, parsing/transforming json should be done during execution (outside Skylark).
That's fine for normal builds, but for repository_rule implementations it would be nice if we could do at least basic parsing without having to shell out to some very large dependency like Python.
+1 for use during repository rule execution. Using the parser I linked above above I've found it useful to read and write intermediate build info in JSON, then just execute cat
-> json_parse
to get a structured object.
Also I've recently found a convenient pattern in using JSON to serialiaze highly structured data to enable using providers as a pseudo interface between macros and repository rule impls. e.g. I recently wrote a macro that creates a list of structs, in which each struct (defined as a provider) is composed of several mixed scalar types, a list of strings, and a dict. This list of structs acts as a declarative dsl for downloading, "installing", and aliasing some bootstrap dependencies into a repostiroy rule.
AFAIU, using attrs to pass this structure into a repository rule would need to deconstruct and then reconstruct the data into flattened string lists, string dicts, etc... OR construct some weird intermediate rule based representation so that I could use labels to point to my structures. (Please let me know if I'm wrong here, you'd literally make my week)
In any case, JSON makes it super easy. struct.toJson -> attr.string_list -> MyProvider(**parse_json)
and viola I have my data on both sides of the repo rule.
Even if cmake wrote out Skylark-compatible Python expressions, it'd be hard to parse them from Skylark.
But it's possible to generate an object like
config = {
"foo": {
...
},
...
}
and just load
this instance. An even cleaner version would be to generate and use struct
.
Running into this: coursier
spits out a JSON map of Maven artifact dependencies. Having a built-in Starlark construct (even if only in repository rules) can reduce a lot of the work to write special tools, or use jq
, to parse such data.
cc @andyscott
Moving to the ExternalDeps team, because this feature would be useful in the context of workspaces (e.g. with repository_ctx.execute()
).
@dkelmer wdyt?
FWIW, rules_jvm_external has been using @erickj's bazel-json parser in Starlark and we haven't ran into any issues with it since the beginning of the project.
That's good to hear, @jin - I was looking at it as well. Glad to hear you're having good success with it.
There's now a couple implementations of a JSON API in non-Bazel tools built on Starlark. To see if we can settle on a reasonable shared JSON API, I sent https://github.com/bazelbuild/starlark/pull/83 as a proposed standard JSON module in the Starlark spec.
Most helpful comment
FWIW, rules_jvm_external has been using @erickj's bazel-json parser in Starlark and we haven't ran into any issues with it since the beginning of the project.