Vega-lite: Support expression for formatting axis

Created on 27 Jun 2019 · 31Comments · Source: vega/vega-lite

We already have format types for number and time. We could introduce a flexible expression formatter. This would resolve https://github.com/vega/vega-lite-api/issues/12.

cc @curran

Area - Visual Encoding RFC / Discussion

Source

domoritz

Most helpful comment

Thank you for raising the issue @jbleich89. You found a bug that will be fixed in https://github.com/vega/vega-lite/pull/7039.

domoritz on 13 Nov 2020

👍2

All 31 comments

Can you describe more how would formatType: 'expression' would work?

I'm confused since expression is more general than format (e.g., it can include multiple fields).

kanitw on 27 Jun 2019

curran on 28 Jun 2019

Motivating context:

At the end of the day, what I'd really like to see is a fork of https://observablehq.com/@mbostock/working-with-wikipedia-data that formats the X ticks of the bar chart using "$90B" rather than "90G".

curran on 28 Jun 2019

👍1

Copying over my comment from the other issue, here's an example of resultant Vega that we'd ideally want to generate:

  "axes": [
    {
      "scale": "x",
      "orient": "top",
      ...
      "format": "s",
      "encode": {
        "labels": {
           "update": {
             "text": {"signal": "replace(datum.label, 'G', 'B')"}
           }
        }
      }
    },

arvind on 28 Jun 2019

The idea is that you would write a spec like

{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "data": {
    "values": [
      {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
      {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
      {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {
        "field": "b",
        "type": "quantitative",
        "axis": {
            "formatType": "expression",
            "format": "+datum.label * 10"
        }
    }
  }
}

We would then use the provided format in the encode block.

In the implementation, we will need to distinguish format types that vega supports (number and time) from the expression format type.

domoritz on 28 Jun 2019

Question -- in your spec example, @domoritz, how would you invoke the regular formatting options? E.g., in @curran's example spec, he wanted to be able to _post-process_ the output of the .3s formatter.

arvind on 28 Jun 2019

👍1

Thanks for explanation. I think this makes sense.

One case that the expression (without extension) in Vega wouldn't support is when we want to add unit to only the topmost labels, like this:

$90M
 80M
 70M
 ...

I'll file issue in Vega to see if we can expose more info in the datum besides label and value.

kanitw on 28 Jun 2019

Question -- in your spec example, @domoritz, how would you invoke the regular formatting options? E.g., in @curran's example spec, he wanted to be able to post-process the output of the .3s formatter.

Good point. This should just be a part of the guide encoding (which is a secret feature right now) then we don't have to introduce this as a conflicting formatType.

kanitw on 28 Jun 2019

how would you invoke the regular formatting options

@arvind You can invoke formatting in expressions. For example replace(format(datum.label, 's'), 'G', 'B') (see https://vega.github.io/vega/docs/expressions/#format-functions)

domoritz on 28 Jun 2019

👍1

replace(format(datum.label, 's'), 'G', 'B')

I think you mean replace(format(datum.value, 's'), 'G', 'B') (value, not label).

One problem with formatType: 'expression' is that it will still be either inconsistent or weird for text channel's "format" property.

If we say, text's formatType is only 'number' | 'time', then it's inconsistent with formatType in axis/legend.

But if text's formatType supports 'expression', it will be weird as we have to refer to datum.<field_name> instead of datum.value or datum.label.

For example, imagine:

encoding: {
  ...
  text: {field: 'a', type: 'quantitative', format: 'replace(format(datum.a, 's'), 'G', 'B')', formatType: 'expression'}
}

Basically, the field: "a" part is becoming redundant at this point.

Also, the transition from

format: 's'

format: "replace(format(datum.value, 's'), 'G', 'B')"
formatType: 'expression'

is a bit drastic (not incremental).

Perhaps, if we make allow title, labels, ticks, domain, and grid to be an encoding object (as we may wanna do in #5056) for setting underlying encode block (https://github.com/vega/vega-lite/issues/2907), we can do something like:

format: 's'
labels: {
  text: {expr: "replace(datum.label, 'G', 'B')"}
}

which may have a smoother transition as some of the original part is still kept.

However, the con of this approach is that the format would be separate from the label text and labels.text.expr is still a bit obscure.

Alternatively, we could introduce a separate formatExpr:

format: 's',
formatExpr: "replace(datum.label, 'G', 'B')"

which is simpler, but still split formatExpr from format.

Given they should be together, I wonder if we should eliminate formatType and group everything in a format object that can combine number: string or time: string with expr like:

format: {
 number: 's',
 expr: "replace(datum.label, 'G', 'B')"
}

This seems nicer but still have a inconsistency issue for text encoding's format.

Perhaps, we can get around that by always replacing datum.value in text encoding's format expression with datum.<field_name> (and similarly replace datum.label with format(datum.<field_name>, ...)).

kanitw on 28 Jun 2019

FWIW, none of our examples currently use formatType, but it's probably useful for formatting time data that got casted to ordinal type.

kanitw on 28 Jun 2019

Btw, as I think about the need for axis/legend encode block, I think the secret encoding block that we have is probably an overkill and introduce unnecesssary complexity. (See https://github.com/vega/vega-lite/issues/2907#issuecomment-506884844)

kanitw on 28 Jun 2019

Maybe this could be a use case for vega-label?

For data exploration, the SI system in vega-lite really just works fine. For data explanation, repeating the measurement unit in each axis label instead of mentioning it in the axis title is hardly more readable.

g3o2 on 29 Jun 2019

What do you think about passing in a JavaScript function as part of the Vega spec? This would allow arbitrary JavaScript (e.g. post-process the result from the format function using any other function). It also feels simpler than anything I've seen proposed here.

curran on 29 Jun 2019

What do you think about passing in a JavaScript function as part of the Vega spec?

Hi @curran, that's precisely the purpose of Vega's expression functions — giving us an escape hatch rather than continuously expanding the surface area of the visualization language itself. And, importantly, having our own expression parser allows us to control and sandbox allowable functions — a necessary feature for deploying Vega/Vega-Lite specifications in security-concious environments like Wikipedia. If the expression functions do not cover a desired feature, our preference would be to introduce new functions (rather than enable wholly arbitrary JavaScript). Hope that makes sense!

arvind on 29 Jun 2019

👍2

Plus, you can use expressions in other languages such as Python with Altair.

domoritz on 29 Jun 2019

Excellent! It's great to know the "escape hatch" is there. It's also great to hear the reasoning behind the desire to introduce new functions to the sandboxed environment, which makes total sense.

However, I still would love to be able to write this, from the Vega-Lite JS API:

const xAxisTickFormat = number =>
  d3.format('.3s')(number)
    .replace('G', 'B');

vl.markBar()
  .data({values: d3.zip(names, totals)})
  .encode(
    vl.y().fieldN("0").sort(null).axis({title: null}),
    vl.x().fieldQ("1").axis({orient: "top", format: xAxisTickFormat, title: "Total revenue (est.)"})
  )
  .width(width)
  .autosize({type: "fit-x", contains: "padding"})
  .render()

Perpaps the JS API internally could re-write or transform the spec such that the function passed in is invoked via Vega's expression functions. This would make the JS API more usable, as developers could use what they _already know_, rather than learning an entirely different language that they _do not already know_. Although, this development would _only_ make the JS API more usable, and would not improve the core Vega-Lite spec at all, so the audience for this improvement would be limited (would exclude Altair users for example), so I can understand it would not be a high priority in the grand scheme of things.

Food for thought! Thanks all for your time here. I really appreciate your efforts.

curran on 1 Jul 2019

Another strategy to handle this specific use case could be to file a feature request with d3.format to allow the localising of the SI letters via the (d3.formatLocale function)[https://github.com/d3/d3-format/blob/master/README.md#formatLocale]. This would then apply downstream to the vega ecosystem.

g3o2 on 2 Jul 2019

👍2

this would make the JS API more usable, as developers could use what they already know, rather than learning an entirely different language that they do not already know.

It's worth noting that Vega expression is just a subset of Javascript. Thus, supporting arbitrary JS format function won't be a high priority for us for now.

kanitw on 2 Jul 2019

The format expression is simply for labels.

I think a better alternative is to make labels: boolean become

labels: boolean | {expr: ...}

Then we don't have to mess with format (and also can reuse results from format via datum.label).

We can then do:

"axis": {
  "labels": {"expr": "replace(datum.label, 'G', 'B')"}
}

to replace G with B as discussed above.

kanitw on 31 Jul 2019

Or we can even make it labels: boolean | string and do:

"axis": {
  "labels": "replace(datum.label, 'G', 'B')"
}

though it's a bit less obvious that we have expression support here.

kanitw on 31 Jul 2019

I see. Yes, I agree that expression should be a separate property then. However, I don't think we want to use the existing labels property. It currently means A boolean flag indicating if labels should be included as part of the axis.. If we make it an expression, I would read it as though the expression returns a boolean to either show or hide a particular label.

How about we add a new property value, text, or expr?

domoritz on 31 Jul 2019

From a Slack converation, @domoritz and I settled on adding a new property named labelExpr, so we can do:

"axis": {
  "labelExpr": "replace(datum.label, 'G', 'B')"
}

kanitw on 31 Jul 2019

👍1

Fixed in #5260

kanitw on 4 Aug 2019

Will labelExpr also work with text, for formatting the text mark layer? The link in the documentation: https://vega.github.io/vega-lite/usage/config.html#custom-format-type is broken on how to use custom format types

a10k on 29 May 2020

👍1

Here are the docs: https://vega.github.io/vega-lite/docs/config.html#custom-format-type. I'm fixing the links right now.

domoritz on 29 May 2020

👍1

Does this work in the tooltip as well? If not, possible to add?

                "tooltip": [
                    {
                        "field": "cumulative_downloads",
                        "format": ".4s",
                        "type": "quantitative",
                        "labelExpr": "replace(datum.label, 'M', 'A')"
                    },

jbleich89 on 13 Nov 2020

Tooltip is not a guide so the property isn't called labelExpr. You can use a formatExpr.

domoritz on 13 Nov 2020

Any examples I could reference? Thanks!

jbleich89 on 13 Nov 2020

SOrry, I misremembered and spoke too soon. There is no formatExpr and I don't know why labelExpr shows up in the autocomplete since there is no axis.

What you need to do is to either create a custom formatter (https://vega.github.io/vega-lite/docs/config.html#custom-format-type) or derive a new field using the calculate transform and use that field in the tooltip.

domoritz on 13 Nov 2020

👍1

Thank you for raising the issue @jbleich89. You found a bug that will be fixed in https://github.com/vega/vega-lite/pull/7039.

domoritz on 13 Nov 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Support different Vega subsets

domoritz · 4Comments

Cannot set month as expression in a DateTime definition object for scale domains

mcadams92 · 3Comments

Boxplot with repeat

ijlyttle · 4Comments

Support Timer Events and Animations

domoritz · 3Comments

Refactor: extract special Expression type

kanitw · 4Comments