Vega-lite: Failed lookup generates `Cannot read property 'geometry' of null` error

Created on 26 Mar 2020  路  10Comments  路  Source: vega/vega-lite

Consider performing a lookup that attaches a topjson as a secondary data source to a standard data table as primary. The specification below shows an example of doing this:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": {
    "values": [{"fips": 499999, "state":"UT"},
               {"fips": 4, "state":"AZ"},
               {"fips": 8, "state":"CO"},
               {"fips": 35, "state":"NM"}]
  },
  "transform": [
    {
      "lookup": "fips",
      "from": {
        "data": {
          "url": "https://vega.github.io/vega-lite/data/us-10m.json",
          "format": { "type": "topojson","feature": "states"}
        },
        "key": "id"
      },
      "as": "geo"
    }
  ],
  "mark": "geoshape",
  "encoding": {
    "shape": {"field": "geo","type": "geojson"},
    "color": {"field": "state","type": "nominal"}
  }
}

However, if any of the lookups fail (in the example above Utah's FIPS code is incorrect as 499999), the entire specification fails with the error Cannot read property 'geometry' of null.

While this can be remedied by filtering out invalid geoshapes ({"filter" : "isValid(datum.geo)"}), this shouldn't be necessary. In contrast, when the topojson is primary and the table secondary, failed lookups simply don't get encoded, without any need to perform explicit filtering. It would be good to make this default behaviour for topojson as secondary too.

Bug P2

Most helpful comment

I have an idea. I what if we always generate a filter for geoshapes? We could make that part of the filterinvalid we already have. Essentially, we add the filter because of the geo mark, and not because of the lookup.

All 10 comments

I was wondering if there was any progress on this. I find it catches out students quite frequently and with a non-specific error message it is quite hard for them to identify the cause.

I just took a look at this issue. Thanks for your patience.

Do you think this is an issue with Vega (the behavior of the lookup transform should change) or Vega-Lite (Vega-Lite should add a filter to the compiled spec)? Your comments make me think that this should be fixed in Vega.

If you agree, I will move the issue.

I think the effect should be that failed lookups are silent, just as they are if the topojson is primary and tabular data are secondary. I don't know enough about the internal architecture to know where the fix would be, but if there was a Vega-Lite filtering of null values and that was easy, that would seem reasonable to me.

It would be fairly easy for us to add the highlighted filter.

Screen Shot 2020-12-01 at 22 23 40

Open the Chart in the Vega Editor

However, we still get an error: "[Error] Cannot read property 'context' of undefined".

@jheer do you think we should change Vega or add the filter from Vega-Lite?

It looks like the error here concerns accessing a nested object that doesn't exist. If so, the error originates out of Vega's field utility, which is not designed to support lookups on non-existent nested objects:

vega.field('foo.bar')({})
// > Uncaught TypeError: Cannot read property 'bar' of undefined

If we decide to add null checks to the vega utilities, here are the two locations that need to be updated:

But I'm wary in general of changing the semantics, as this would affect all of Vega. Moreover, that would only "fix" the error with the property access. The end result of the access would still be null / undefined which might break later downstream logic if it is expecting something else (such as valid GeoJSON data).

Note that the lookup appears to be operating just fine. The problem is that downstream code is expecting a lookup result that is not there. My solution in Vega has always been to filter any lookup failures. Providing a VL option to automatically include a post-lookup filter seems useful, but it would need to parametrizable (you don't always want to filter!) and I'm not sure what the default setting should be (on or off) -- having it set to "on" might be considered a breaking change.

Thank you for the insight into what Vega changes would be needed.

If we don't want to change Vega and we also don't want to always filter, I would vote for not making any changes here and expecting users to filter explicitly by adding {"filter" : "isValid(datum.geo)"}.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": {
    "values": [{"fips": 499999, "state":"UT"},
               {"fips": 4, "state":"AZ"},
               {"fips": 8, "state":"CO"},
               {"fips": 35, "state":"NM"}]
  },
  "transform": [
    {
      "lookup": "fips",
      "from": {
        "data": {
          "url": "https://vega.github.io/vega-lite/data/us-10m.json",
          "format": { "type": "topojson","feature": "states"}
        },
        "key": "id"
      },
      "as": "geo"
    },
    {"filter" : "isValid(datum.geo)"}
  ],
  "mark": "geoshape",
  "encoding": {
    "shape": {"field": "geo","type": "geojson"},
    "color": {"field": "state","type": "nominal"}
  }
}

Open the Chart in the Vega Editor

I am struggling to see a situation in Vega Lite where you would not want to filter though. Remember a failed lookup breaks the entire visualization with a null error. It's true that ultimately this could give a clue that one or more of the lookups has failed, but it would not be obvious at present. And also bear in mind that when looking up the other way, we get no such error at all.

I would have thought at intercepting the null via filtering and issuing a warning about a failed lookup would be much preferable, and more consistent with the way VL works elsewhere.

I think in the geo case you always want to filter but I can see cases where one would prefer to get a fallback value rather than filter tuples that have no match. Imagine you have a large relation of countries and want to add population data from another dataset. You would probably not want to remove a country just because you don't have data in the dataset you look up in. The semantics of lookup is no a join but to augment a dataset with additional data for each tuple.

I would prefer not to change the semantics of lookup. We could add a new option to add a filter but it's a minor usability improvement and doesn't alleviate the original problem in this issue.

Maybe what we really want is a join transform?

I have an idea. I what if we always generate a filter for geoshapes? We could make that part of the filterinvalid we already have. Essentially, we add the filter because of the geo mark, and not because of the lookup.

That sounds like an ideal solution. I don't think there would ever be a situation where a geoshape containing nulls would be needed, so this should have no problematic downstream effects.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mcnuttandrew picture mcnuttandrew  路  3Comments

iliatimofeev picture iliatimofeev  路  3Comments

domoritz picture domoritz  路  3Comments

learnwithratnesh picture learnwithratnesh  路  4Comments

domoritz picture domoritz  路  4Comments