Vega-lite: Layers with shared encodings

Created on 15 Feb 2018  ·  27Comments  ·  Source: vega/vega-lite

From reading the documentation it seems like there's not an easy way to have multiple marks share the same encodings without repeating them. For example in this example from the docs:

{
  "data": {"url": "data/stocks.csv"},
  "layer": [
    {
      "mark": "line",
      "encoding": {
        "x": {"field": "date", "type": "temporal"},
        "y": {"field": "price", "type": "quantitative"},
        "color": {"field": "symbol", "type": "nominal"}
      }
    },
    {
      "mark": "rule",
      "encoding": {
        "y": {
          "field": "price",
          "type": "quantitative",
          "aggregate": "mean"
        },
        "size": {"value": 2},
        "color": {"field": "symbol", "type": "nominal"}
      }
    }
  ]
}

the color scale is identical between the two layers, and most of the y scale is the same too. To avoid this kind of repetition it would be nice if the layers could "inherit" encodings from the top level, possibly augmenting them along the way:

{
  "data": {"url": "data/stocks.csv"},
  "encoding": {
    "y": {"field": "price", "type": "quantitative"},
    "color": {"field": "symbol", "type": "nominal"}
  },
  "layer": [
    {
      "mark": "line",
      "encoding": {
        "x": {"field": "date", "type": "temporal"}
      }
    },
    {
      "mark": "rule",
      "encoding": {
        "y": { "aggregate": "mean" },
        "size": {"value": 2}
      }
    }
  ]
}
Area - Macro / Composite Mark Help Wanted

Most helpful comment

I think this issue raises a wider point about using JSON to represent specs. Personally, I see the request in the OP to be a useful one, but a higher level language operation (inheritance / composition) rather than one that relates to the semantics of a specification. I see JSON primarily as an interchange format for specifications and language libraries like Altair (and in my case elm-vega) as environments for higher level operations.

Trying to debug a large css file provides a lesson the difficulties of forcing inheritance into a system that doesn't support it well. And do many people author directly in JSON for anything beyond the simplest visualizations?

A use-case I have quite frequently is the need to be able to render a vega-lite spec at a range of sizes. In addition to the obvious width/height, it involves scaling stroke widths, font sizes, symbol sizes and sometimes presence/absence of legends, titles etc. I can do that quite straightforwardly by parameterising functions that generate vega-lite specs, but there is no easy way to do this within the JSON spec itself.

I'd rather see the vega-lite schema kept as simple as possible and close to a graphical grammar even if this can sometimes involve some repetition. Higher level languages and software can than apply the higher level optimisations.

All 27 comments

Ironically this is easier in regular Vega, because you define scales at the top level and then reference them within individual marks.

Thanks. We're thinking about this too, but haven't decided whether to do it. (Likely yes, in the future)

That said, I don't know if I agree that this is easier in Vega since you still have to (1) manually create your scale(s) and (2) refer them multiple times in multiple marks so it's quite equally laborious.

That said, I don't know if I agree that this is easier in Vega since you still have to (1) manually create your scale(s)

That part is of course more laborious in vega, I just meant the ability to share a definition.

and (2) refer them multiple times in multiple marks so it's quite equally laborious.

The point is less about brevity and more that you don't want to repeat specification logic in multiple places and allow the copies to potentially get out of sync with each other. Essentially, DRY.

We agree with the sentiment and just haven't gotten round to spec-ing/ implementing this. Altair and other DSLs build on Vega-Lite also make it easier to share components. Closing in favor of https://github.com/vega/vega-lite/issues/1274#issuecomment-365461548.

@domoritz -- https://github.com/vega/vega-lite/issues/1274#issuecomment-365461548 is a bit different though.

That one is useful if you want exactly the same encoding and transform etc. This one is more like you want them to share some parts of the encoding. So I think they are two different proposals that should be considered separately.

The goal is to reduce the size of the spec. I disagree with the suggestion in this issue as it makes it harder for readers to track what's going on. I think the right approach following principles that we have established for Vega-Lite is to use repeat.

Repeat is totally off the point here because in this use-case, you want to augment shared encoding with more encoding.

Algebraically it's like
layer([{mark:line, encoding:{x,y}}, {mark: point, encoding:{x,y, color}}]) =
layer([{mark:line}, {mark: point, encoding:{color}}], shared_part={encoding:{x,y}})

One could implement this with a special mark type, e.g. shared, common or template, from which marks of the same layer would inherit.

Another more flexible possibility would be a sort of parent / child composition operator.

@g3o2 not sure how a specification would look like with your proposals.


Continuing conversation from above.

Although I don't agree with @domoritz about repeat, I agree with him that encoding in layer makes it harder for readers to track what's going on, especially for specs with multi-level nesting.

We intentionally constraint config to be at the top-level only to avoid multi-level config cascading, which makes it hard to implement correct behavior and also leads to confusing specs. So this may cause similar problems (but for encoding instead).

One question If we allow encoding to be extracted and merged, what about mark, selection, config? For config, I would say we would definitely not allow to be included as a part of subspec.
For selection, I don't know what should that mean. For mark and selection, maybe there are less need, but it may be unclear for users why certain properties of single-view spec can be extracted and some cannot.

Another question is if layer support this, shouldn't vconcat and repeat support it too? (I would say no because it will definitely make it easier to cause the case for specs with multi-level nesting.) If we're to support this at all, I would only allow for layer.

While I agree with DRY, there is also a limit for DRY too. For example, normally people would say don't write a function unless you use it more than 2 times or if you code is really long. This example above is a bit tedious, but not that bad. I also want to see a real-world common use case where it's so tedious to repeat that we need to allow extraction. If there is a set of convincing examples, we might ignore other issues above.

That said, without clear answers to these questions, I don't think we want to rush to implement this request. (Plus we have many other issues to fix.) –– I'm not saying that we definitely should not do this. But just like any other features, we need to carefully think about implication of a new feature on other parts of the language.

I fully agree with @kanitw.

I think this issue raises a wider point about using JSON to represent specs. Personally, I see the request in the OP to be a useful one, but a higher level language operation (inheritance / composition) rather than one that relates to the semantics of a specification. I see JSON primarily as an interchange format for specifications and language libraries like Altair (and in my case elm-vega) as environments for higher level operations.

Trying to debug a large css file provides a lesson the difficulties of forcing inheritance into a system that doesn't support it well. And do many people author directly in JSON for anything beyond the simplest visualizations?

A use-case I have quite frequently is the need to be able to render a vega-lite spec at a range of sizes. In addition to the obvious width/height, it involves scaling stroke widths, font sizes, symbol sizes and sometimes presence/absence of legends, titles etc. I can do that quite straightforwardly by parameterising functions that generate vega-lite specs, but there is no easy way to do this within the JSON spec itself.

I'd rather see the vega-lite schema kept as simple as possible and close to a graphical grammar even if this can sometimes involve some repetition. Higher level languages and software can than apply the higher level optimisations.

I also fully agree with @jwoLondon :-)

Altair just showed off how this can work in https://github.com/altair-viz/altair/pull/450.

Well said, @jwoLondon. That makes a lot of sense.

This gist illustrates the above issue for a slope graph, which is not that uncommon.

I am not sure if its size is tedious enough though. In addition, with a modern code editor, the necessary redundancies can be quite conveniently managed.

Certainly JS offers all the flexibility one could want for preparing specs which efficiently reuse common subparts. My use case involves allowing users to edit their own charts directly in the JSON format, so suggestions about preparing the spec in JS don't really help me 🤷‍♂️

I suppose I could "implement" this feature myself as a preprocessing step.

The example provided by @g3o2 is actually quite tedious.

@pelotom Maybe we will support this (only for layer, but not concat, repeat) in the future. But right now, we don't have the bandwidth to think it through. So yeah, for now please feel free to do your own pre-processing step.

Some more thought from a shower (always extra my creative energy source lol).

Earlier we argue against encoding in layer for readability reason. But repeating things for 5 times like @g3o2 actually makes the spec even hard to read than not having it. (For example, hard to know if x and y for all 5 layers are actually the same.)

Moreover, there can be a problem when users customize different properties of an encoding channel:

For example, the following spec has a customize axis format on the first layer. Since layering uses shared axis by default, then the format is actually applied to the shared axis.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "description": "Google's stock price over time.",
  "data": {"url": "data/stocks.csv"},
  "transform": [{"filter": "datum.symbol==='GOOG'"}],
  "layer": [{
    "mark": "line",
    "encoding": {
      "x": {"field": "date", "type": "temporal", "axis": {"format": "%Y"}},
      "y": {"field": "price", "type": "quantitative"}
    }
  }, {
    "mark": "circle",
    "encoding": {
      "x": {"field": "date", "type": "temporal"},
      "y": {"field": "price", "type": "quantitative"}
    }
  }]
}

Suppose a user wanna use a log scale for y. It can be added to either layer and the compiler will apply the scale type to the shared scale. Ideally the user should add "scale": {"type": "log"} to the first layer as we already do other customization there but not everybody will do that.

Now imagine when there are 5 layers and the user (or a collaborator) keep adding customization to different parts of the spec. Now the user(s) is producing a spec that's arguably harder to read than having a consolidated encoding on the top at the layer spec level.

I'd rather see the vega-lite schema kept as simple as possible and close to a graphical grammar even if this can sometimes involve some repetition. Higher level languages and software can than apply the higher level optimisations.

I agree with this point. Vega-Lite shouldn't offer as many syntactic sugar as higher level tools like Altair and elm-vega. But I think we shouldn't offer too little to the point that when people export visualizations from Altair and elm-vega and things will be not very readable.

For this issue, I start to think that we are hitting that boundary.


That said, I won't have time to implement this yet.

If we reach an agreement about this, here would be roughly the steps to make this work just in case someone want to work on this

  • [ ] Add new class ExtendedLayerSpec, which extends GenericLayerSpec with encoding

    • [ ] Add encoding to LayerSpec with description that this is based encoding for all underlying layer

    • [ ] Make TopLevelExtendedSpec use ExtendedLayerSpec instead of GenericLayerSpec.

  • [ ] modify normalizeLayer method, which is our preprocessing step for layer

    • [ ] Make the method takes ExtendedLayerSpec as input instead of GenericLayerSpec and extract encoding and use them to extend encoding of nested specs. Note that nested specs can be either LayerSpec or UnitSpec, so it's a bit tricky there.
  • [ ] Add test

  • [ ] Add documentation

I like the original suggestion a lot.

Also worth pointing out that ggplot2 has something like that: you can either specify your aes in the ggplot call and then it is the default for every layer, or you can specify an aes for every layer.

I think for the julia wrapper (VegaLite.jl) we almost certainly would need a way to specify encodings only once for a plot that has multiple layers. We could roll our complete own story that propagates default encodings, but so far we have managed to keep the julia wrapper a VERY thin wrapper around the original here, and it would be nice to keep it that way.

Unless @domoritz say otherwise, I'd say please feel free to submit a PR for this -- see https://github.com/vega/vega-lite/issues/3384#issuecomment-366491364 for how to fix this.

I don't have the bandwidth to contribute actual PRs here. I hope it is ok to leave feedback what would help us on the julia side, even though I can't help out here directly. I certainly have no expectation that these things just happen :)

If we support layer, we should also support other compositions. See https://stackoverflow.com/questions/49094436/vega-lite-accessing-repeat-variable-to-use-as-filter for a use case.

If we support layer, we should also support other compositions. See https://stackoverflow.com/questions/49094436/vega-lite-accessing-repeat-variable-to-use-as-filter for a use case.

I look at it but I'm not convinced that we need to similar support for other composition from this example. If we have more general repeat https://github.com/vega/vega-lite/issues/2518, then the example you post only need encoding in layer, not in concat.

I'm just saying that generalized repeat and this issue have similar intentions.

Well, it's not the same. Repeat can't extend encoding mapping.

Exactly. They are different but both allow for shorter specs but one may work better in one case than another.

Hurray, that is fantastic!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ijlyttle picture ijlyttle  ·  3Comments

kanitw picture kanitw  ·  3Comments

kanitw picture kanitw  ·  3Comments

kanitw picture kanitw  ·  4Comments

ijlyttle picture ijlyttle  ·  4Comments