Vega-lite: Support expression for formatting axis

Created on 27 Jun 2019  Â·  31Comments  Â·  Source: vega/vega-lite

We already have format types for number and time. We could introduce a flexible expression formatter. This would resolve https://github.com/vega/vega-lite-api/issues/12.

Related issue: https://github.com/vega/vega/issues/608

cc @curran

Area - Visual Encoding RFC / Discussion

Most helpful comment

Thank you for raising the issue @jbleich89. You found a bug that will be fixed in https://github.com/vega/vega-lite/pull/7039.

All 31 comments

Can you describe more how would formatType: 'expression' would work?

I'm confused since expression is more general than format (e.g., it can include multiple fields).

Motivating context:

image

At the end of the day, what I'd really like to see is a fork of https://observablehq.com/@mbostock/working-with-wikipedia-data that formats the X ticks of the bar chart using "$90B" rather than "90G".

Copying over my comment from the other issue, here's an example of resultant Vega that we'd ideally want to generate:

  "axes": [
    {
      "scale": "x",
      "orient": "top",
      ...
      "format": "s",
      "encode": {
        "labels": {
           "update": {
             "text": {"signal": "replace(datum.label, 'G', 'B')"}
           }
        }
      }
    },

The idea is that you would write a spec like

{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "data": {
    "values": [
      {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
      {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
      {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {
        "field": "b",
        "type": "quantitative",
        "axis": {
            "formatType": "expression",
            "format": "+datum.label * 10"
        }
    }
  }
}

We would then use the provided format in the encode block.

In the implementation, we will need to distinguish format types that vega supports (number and time) from the expression format type.

Question -- in your spec example, @domoritz, how would you invoke the regular formatting options? E.g., in @curran's example spec, he wanted to be able to _post-process_ the output of the .3s formatter.

Thanks for explanation. I think this makes sense.

One case that the expression (without extension) in Vega wouldn't support is when we want to add unit to only the topmost labels, like this:

$90M
 80M
 70M
 ...

I'll file issue in Vega to see if we can expose more info in the datum besides label and value.

Question -- in your spec example, @domoritz, how would you invoke the regular formatting options? E.g., in @curran's example spec, he wanted to be able to post-process the output of the .3s formatter.

Good point. This should just be a part of the guide encoding (which is a secret feature right now) then we don't have to introduce this as a conflicting formatType.

how would you invoke the regular formatting options

@arvind You can invoke formatting in expressions. For example replace(format(datum.label, 's'), 'G', 'B') (see https://vega.github.io/vega/docs/expressions/#format-functions)

replace(format(datum.label, 's'), 'G', 'B')

I think you mean replace(format(datum.value, 's'), 'G', 'B') (value, not label).


One problem with formatType: 'expression' is that it will still be either inconsistent or weird for text channel's "format" property.

If we say, text's formatType is only 'number' | 'time', then it's inconsistent with formatType in axis/legend.

But if text's formatType supports 'expression', it will be weird as we have to refer to datum.<field_name> instead of datum.value or datum.label.

For example, imagine:

encoding: {
  ...
  text: {field: 'a', type: 'quantitative', format: 'replace(format(datum.a, 's'), 'G', 'B')', formatType: 'expression'}
}

Basically, the field: "a" part is becoming redundant at this point.


Also, the transition from

format: 's'

to

format: "replace(format(datum.value, 's'), 'G', 'B')"
formatType: 'expression'

is a bit drastic (not incremental).

Perhaps, if we make allow title, labels, ticks, domain, and grid to be an encoding object (as we may wanna do in #5056) for setting underlying encode block (https://github.com/vega/vega-lite/issues/2907), we can do something like:

format: 's'
labels: {
  text: {expr: "replace(datum.label, 'G', 'B')"}
}

which may have a smoother transition as some of the original part is still kept.

However, the con of this approach is that the format would be separate from the label text and labels.text.expr is still a bit obscure.

Alternatively, we could introduce a separate formatExpr:

format: 's',
formatExpr: "replace(datum.label, 'G', 'B')"

which is simpler, but still split formatExpr from format.

Given they should be together, I wonder if we should eliminate formatType and group everything in a format object that can combine number: string or time: string with expr like:

format: {
 number: 's',
 expr: "replace(datum.label, 'G', 'B')"
}

This seems nicer but still have a inconsistency issue for text encoding's format.

Perhaps, we can get around that by always replacing datum.value in text encoding's format expression with datum.<field_name> (and similarly replace datum.label with format(datum.<field_name>, ...)).

FWIW, none of our examples currently use formatType, but it's probably useful for formatting time data that got casted to ordinal type.

Btw, as I think about the need for axis/legend encode block, I think the secret encoding block that we have is probably an overkill and introduce unnecesssary complexity. (See https://github.com/vega/vega-lite/issues/2907#issuecomment-506884844)

Maybe this could be a use case for vega-label?

For data exploration, the SI system in vega-lite really just works fine. For data explanation, repeating the measurement unit in each axis label instead of mentioning it in the axis title is hardly more readable.

What do you think about passing in a JavaScript function as part of the Vega spec? This would allow arbitrary JavaScript (e.g. post-process the result from the format function using any other function). It also feels simpler than anything I've seen proposed here.

What do you think about passing in a JavaScript function as part of the Vega spec?

Hi @curran, that's precisely the purpose of Vega's expression functions — giving us an escape hatch rather than continuously expanding the surface area of the visualization language itself. And, importantly, having our own expression parser allows us to control and sandbox allowable functions — a necessary feature for deploying Vega/Vega-Lite specifications in security-concious environments like Wikipedia. If the expression functions do not cover a desired feature, our preference would be to introduce new functions (rather than enable wholly arbitrary JavaScript). Hope that makes sense!

Plus, you can use expressions in other languages such as Python with Altair.

Excellent! It's great to know the "escape hatch" is there. It's also great to hear the reasoning behind the desire to introduce new functions to the sandboxed environment, which makes total sense.

However, I still would love to be able to write this, from the Vega-Lite JS API:

const xAxisTickFormat = number =>
  d3.format('.3s')(number)
    .replace('G', 'B');

vl.markBar()
  .data({values: d3.zip(names, totals)})
  .encode(
    vl.y().fieldN("0").sort(null).axis({title: null}),
    vl.x().fieldQ("1").axis({orient: "top", format: xAxisTickFormat, title: "Total revenue (est.)"})
  )
  .width(width)
  .autosize({type: "fit-x", contains: "padding"})
  .render()

Perpaps the JS API internally could re-write or transform the spec such that the function passed in is invoked via Vega's expression functions. This would make the JS API more usable, as developers could use what they _already know_, rather than learning an entirely different language that they _do not already know_. Although, this development would _only_ make the JS API more usable, and would not improve the core Vega-Lite spec at all, so the audience for this improvement would be limited (would exclude Altair users for example), so I can understand it would not be a high priority in the grand scheme of things.

Food for thought! Thanks all for your time here. I really appreciate your efforts.

Another strategy to handle this specific use case could be to file a feature request with d3.format to allow the localising of the SI letters via the (d3.formatLocale function)[https://github.com/d3/d3-format/blob/master/README.md#formatLocale]. This would then apply downstream to the vega ecosystem.

this would make the JS API more usable, as developers could use what they already know, rather than learning an entirely different language that they do not already know.

It's worth noting that Vega expression is just a subset of Javascript. Thus, supporting arbitrary JS format function won't be a high priority for us for now.

The format expression is simply for labels.

I think a better alternative is to make labels: boolean become

labels: boolean | {expr: ...}

Then we don't have to mess with format (and also can reuse results from format via datum.label).

We can then do:

"axis": {
  "labels": {"expr": "replace(datum.label, 'G', 'B')"}
}

to replace G with B as discussed above.

Or we can even make it labels: boolean | string and do:

"axis": {
  "labels": "replace(datum.label, 'G', 'B')"
}

though it's a bit less obvious that we have expression support here.

I see. Yes, I agree that expression should be a separate property then. However, I don't think we want to use the existing labels property. It currently means A boolean flag indicating if labels should be included as part of the axis.. If we make it an expression, I would read it as though the expression returns a boolean to either show or hide a particular label.

How about we add a new property value, text, or expr?

From a Slack converation, @domoritz and I settled on adding a new property named labelExpr, so we can do:

"axis": {
  "labelExpr": "replace(datum.label, 'G', 'B')"
}

Fixed in #5260

Will labelExpr also work with text, for formatting the text mark layer? The link in the documentation: https://vega.github.io/vega-lite/usage/config.html#custom-format-type is broken on how to use custom format types

Here are the docs: https://vega.github.io/vega-lite/docs/config.html#custom-format-type. I'm fixing the links right now.

Does this work in the tooltip as well? If not, possible to add?

                "tooltip": [
                    {
                        "field": "cumulative_downloads",
                        "format": ".4s",
                        "type": "quantitative",
                        "labelExpr": "replace(datum.label, 'M', 'A')"
                    },

Tooltip is not a guide so the property isn't called labelExpr. You can use a formatExpr.

Any examples I could reference? Thanks!

SOrry, I misremembered and spoke too soon. There is no formatExpr and I don't know why labelExpr shows up in the autocomplete since there is no axis.

What you need to do is to either create a custom formatter (https://vega.github.io/vega-lite/docs/config.html#custom-format-type) or derive a new field using the calculate transform and use that field in the tooltip.

Thank you for raising the issue @jbleich89. You found a bug that will be fixed in https://github.com/vega/vega-lite/pull/7039.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

domoritz picture domoritz  Â·  4Comments

mcadams92 picture mcadams92  Â·  3Comments

ijlyttle picture ijlyttle  Â·  4Comments

domoritz picture domoritz  Â·  3Comments

kanitw picture kanitw  Â·  4Comments