Crystal: [RFC] Parsing JSON/YAML at compile time

Created on 19 Oct 2020 · 13Comments · Source: crystal-lang/crystal

Using JSON/YAML files for configuration is a common practice. Being able to read these files at compile time (into a useable structure) could be quite useful. A personal use case for me would be the allowing of using YAML to define configuration that can later be used to register services/parameters within https://github.com/athena-framework/dependency-injection.

Currently this is only half possible as you can read in the file, but it'll be a StringLiteral so you can't really use it in any meaningful fashion. https://github.com/crystal-lang/crystal/issues/8835 might be a solution as you could just define a custom macro method to do it, but that itself still has quite a bit of discussions yet to had.

I think this would be pretty easy to do. The YAML and JSON modules could be leveraged for the actual parsing, only macro code that would be needed is transforming the YAML/JSON::Any values into macro *Literal nodes. This feature would also be simple parsing, returning a HashLiteral or ArrayLiteral for example. There are no types in macro land so (de)serialization isn't really required.

The compiler already seems to have a dependency on the JSON module, so that wouldn't be a problem. However, YAML support would be a new non ideal dependency as it would add a libyaml requirement. YAML would be more ideal in regards to the format of the configuration file tho. I'm sure we could think of something.

I have a working proof of concept here: https://github.com/crystal-lang/crystal/compare/master...Blacksmoke16:yaml-json-macro-parse

test.yml

---
name: foo
firewalls:
  main:
    pattern: ^/admin
ids:
  - 1
  - 2

test.json

{
  "name": "foo",
  "ids": [
    1,
    2
  ]
}

test.cr

{% begin %}
  {% data = read_yaml "./test.yml" %}
  {% pp data %}
  {% pp data["name"] %}
  {% pp data["ids"].map &.+(1) %}
{% end %}

{% begin %}
  {% data = read_json "./test.json" %}
  {% pp data %}
  {% pp data["name"] %}
  {% pp data["ids"].map &.+(1) %}
{% end %}

# {"name" => "foo", "firewalls" => {"main" => {"pattern" => "^/admin"}}, "ids" => [1_i64, 2_i64]}
# "foo"
# [2_i64, 3_i64]
# {"name" => "foo", "ids" => [1_i64, 2_i64]}
# "foo"
# [2_i64, 3_i64]

Source

Blacksmoke16

👀1

All 13 comments

I thought about this a bit more. I think I'd be more in favor of having parse_json/yaml methods. Then it would also support parsing from non file sources, plus wouldn't have to touch the other read_file methods. I.e. can do {% data = parse_json read_file "./data.json" %}

Blacksmoke16 on 20 Oct 2020

I think this is way too much for macros to do.

What's the concrete use case? Can you show some real example?

asterite on 20 Oct 2020

👍1

@asterite Sure, I can think of a few.

The main one(s) I was thinking of are related to my DI shard. Being able to do this would allow me to expand the features of that shard quite a bit (and even more with #8835). The two examples I can think of ATM are:

Have access to parameters when building the container
- Similar to https://symfony.com/doc/2.1/components/dependency_injection/parameters.html#parameters-in-configuration-files and/or https://symfony.com/blog/new-in-symfony-4-1-getting-container-parameters-as-a-service
Have access to the configuration while building the container, such as adding/removing services based on the configuration
- Similar to https://symfony.com/doc/current/service_container/compiler_passes.html

Parameters

E.x. you would have a YAML file like:

parameters:
  some.key: value
  my.api.key: abc123

Then given a service definition of like:

@[ADI::Register(_api_key: "%my.api.key%")]
class ApiClient

  def initialize(@api_key : String); end
end

This would generate:

class ServiceContainer
  private getter api_client : ApiClient { ApiClient.new "abc123" }

  def initialize
    # Work around for https://github.com/crystal-lang/crystal/issues/7975.
    Athena::DependencyInjection::ServiceContainer
  end
end

I will admit this is possible w/o this feature, however IMO it's better to hardcode the resolved params into the container so that the file doesn't need to be shipped with the binary, nor would the file need to be read multiple times (as the container is instantiated frequently).

File Based Service Registration

In addition to parameters, this feature would also expose the rest of the configuration file _while_ the container is being built. I.e. can be used to determine _how_ it gets built. An example of this is Athena's configuration file. If the user didn't want CORS support, they wouldn't have a file or would omit the cors key. I could pick up on this key missing and not even add that listener class to the container. As opposed to now where it gets added and invoked, just noop based on the missing key.

I think ideally this logic would live as a compiler pass, like (or ChangeNamePass the within https://github.com/crystal-lang/crystal/pull/9091#issuecomment-615435665). When paired with #8835 I could see the data read from the configuration being passed to each compiler pass. Each feature/component (shard) could read different keys to determine how to build out that component's services.

This would especially be useful for configuring built in services, like database connection:

orm:
  default_connection: master
  connections:
    master:
      driver: mysql
      host: '%my_app.db.host%' # Notice this can also use parameters
      port: 3306
      dbname: '%my_app.db.database%'
      user: '%my_app.db.user%'
      password: '%my_app.db.password%'
      charset: UTF8
      server_version: 5.7

From here I could use a compiler pass specific to the ORM component to register these connections so they could be injected for DB operations in other services. For now I could probably read them in and manually register them via code.

Yes I could probably do this without this feature as well, but it would move the registration of these services from a compile time operation into a runtime operation. As mentioned before this would have some less than ideal implications. Plus given the file is known at a compile time I don't see a reason to _not_ try and make everything I can a compile time operation.

Other Use Cases

Outside of my main area, I'm sure others could think of use cases for this. It essentially would enable the consumption of structured data to generate Crystal code; where that data could be generated as part of a build process before building the binary.

Blacksmoke16 on 20 Oct 2020

I just don't understand why this needs to be specified in a JSON/YAML file as opposed to defining it in an array/hash and passing that to a macro...

asterite on 20 Oct 2020

@asterite My thinking was that they're already common configuration formats. It essentially would be a "Code as Configuration" approach. YAML/JSON is easier to write/read/update than actual Crystal code. Plus it also adds an abstraction in there, allowing me/you to change how the configuration data is processed from the data itself.

Blacksmoke16 on 20 Oct 2020

😕1

You could use macro run to do it too.

asterite on 20 Oct 2020

@asterite I thought about that, but it returns a MacroId, so even if you parse the JSON in there and return it, wouldn't be able to iterate on it or anything.

Or do you mean generate all the container code and return that instead?

Blacksmoke16 on 20 Oct 2020

Ah, nevermind. I don't know why I'm discussing all of this, sorry.

asterite on 20 Oct 2020

Making the compiler require yaml, thus libyaml, is not a good idea, at all.
In a way, #9091 could allow to extend the macro syntax to fit this kind of cases.

j8r on 20 Oct 2020

Or do you mean generate all the container code and return that instead?

That's quite legit and what I would recommend over making the macro language more and more complex. Actually it's what I do for crystal-gobject, imagine if I would try to implement all of that in macro code!

jhass on 20 Oct 2020

Yea run is prob the way to go. I'll have to play around with it.

@jhass How do you handle the generation exactly? Like would you have to use ECR to do it or string concatenation etc?

Blacksmoke16 on 20 Oct 2020

I started out with essentially string concatenation by just printing the bits and pieces to stdout, recently I refactored that a little to https://www.github.com/jhass/crystal-gobject/tree/main/src%2Fcrout.cr See also https://www.github.com/jhass/crystal-gobject/tree/main/samples%2Fcrout%2Fcrout.cr

jhass on 20 Oct 2020

👍1

@jhass Cool thanks! I'll have to play around with porting the container generation. Guess it means I'll actually need to create types for stuff now :P.

The main problem/challenge I see is going to how to handle annotation related stuff. I guess there's no reason I couldn't use macros in this run file as well to map annotation values to structs, then just build everything at the end...Guess I got myself a new weekend project.