Deck.gl: pydeck: Closer integration with Python geospatial ecosystem

Created on 14 Apr 2020  路  11Comments  路  Source: visgl/deck.gl

Target Use case

Judging from my own experience, I would expect that most or at least many users of pydeck would be using it to visualize data that they've modified using other tools in the Python geospatial ecosystem. Using for example tools like _Shapely_, for planar geometric operations using the underlying GEOS library, or _GeoPandas_, which extends Pandas to integrate with Shapely.

The usual layout of a GeoDataFrame, the main class in GeoPandas, is one feature per row, with a geometry column that contains each row's geometry.

As far as I can tell, there is currently no integration with such existing libraries. Most of the existing pydeck examples (on the website at least) pass URLs to Deck.gl for loading the data. This is understandable, since these examples are ported from the Deck.gl docs, but I don't think it's a true representation of how most users would pass data to Deck.gl.

Proposed feature

There are a couple ways in which integration could be improved.

__geo_interface__

The __geo_interface__ attribute is a standard implemented by several Python GIS packages, including GeoPandas, Shapely, and others.

As defined here, it essentially allows you to get a GeoJSON representation of data, regardless of the package/class the data is in.

For example with a GeoDataFrame, the __geo_interface__ is respective of the data passed:

import geopandas as gpd
features = [{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  },
  "properties": {
    "name": "Dinagat Islands"
  }
},{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [125, 10]
  },
  "properties": {
    "name": "Water"
  }
}]
gdf = gpd.GeoDataFrame.from_features(features)
gdf.__geo_interface__
# {'type': 'FeatureCollection',
#  'features': [{'id': '0',
#    'type': 'Feature',
#    'properties': {'name': 'Dinagat Islands'},
#    'geometry': {'type': 'Point', 'coordinates': (125.6, 10.1)},
#    'bbox': (125.6, 10.1, 125.6, 10.1)},
#   {'id': '1',
#    'type': 'Feature',
#    'properties': {'name': 'Water'},
#    'geometry': {'type': 'Point', 'coordinates': (125.0, 10.0)},
#    'bbox': (125.0, 10.0, 125.0, 10.0)}],
#  'bbox': (125.0, 10.0, 125.6, 10.1)}

gdf.geometry.unary_union.__geo_interface__
# {'type': 'MultiPoint', 'coordinates': ((125.0, 10.0), (125.6, 10.1))}

gdf.geometry[0].__geo_interface__
# {'type': 'Point', 'coordinates': (125.6, 10.1)}

Supporting this would enable interconnection with a large part of the ecosystem. You could just check for the presence of the __geo_interface__ attribute on the data object passed to pydeck.

However there are still some questions, like what if a user asks for a LineLayer but passes in a set of Point geometries, instead of LineString geometries. Or what if a user asks for a ScatterplotLayer, but passes a LineString or Polygon geometry? Should automatic conversion happen? Should there be validation on the Python side?

Coordinate Reference System

Several packages include metadata on the CRS the data is in. When a user is working with data in non-WGS84, it's easy to forget to transform the data into the right CRS. It might be possible to print a warning if it's clear the data is not in WGS84, but there might not be a non-hacky way to do this.

Something like

if data.__class__.__name__ == 'GeoDataFrame':
    crs = data.crs
    # might not be a non-hacky way to check equality to WGS84
    if crs and crs != WGS84:
        print('Warning: Data might not be in WGS84')

To Do List

  • [ ] Add label and assign to milestone
  • [ ] Coding
  • [ ] Doc update
  • [ ] What鈥檚 new update
  • [ ] Test

cc @ajduberstein

feature pydeck

All 11 comments

+1 as someone who just spent a lot of time wrestling a GeoDataFrame into an acceptable format for pydeck. Very supportive of this!

Same and same! I have a GeoDataFrame and am trying to corral it into a GeoJSONLayer or a PolygonLayer. Would love for this to be officially supported.

@medriscoll any code you could share re: your current process?

@ajduberstein It looks like adding a simple check for / call to this __geo_interface on the data prop on the python side would be quite easy?

However there are still some questions, like what if a user asks for a LineLayer but passes in a set of Point geometries, instead of LineString geometries. Or what if a user asks for a ScatterplotLayer, but passes a LineString or Polygon geometry? Should automatic conversion happen? Should there be validation on the Python side?

I would start with making this work for the GeoJSONLayer. It is the only layer that accepts GeoJSON, other layers need to have their data processed by the app.

I agree this would be an important addition.

One other question: how to handle property names. For several attributes/accessors, pydeck allows you to specify an attribute for data-driven styling, i.e. something like getFillColor='.color', where color is a column of the table.

But when you call __geo_interface__, all non-geometry features get wrapped into a _properties_ object. So now the .color column is now instead accessed at .properties.color, and the .color reference presumably wouldn't work. (@ajduberstein correct me if necessary).

So is the onus on the user to preface with .properties or on the JS side do we check for .properties.color if .color doesn't exist?

on the JS side do we check for .properties.color if .color doesn't exist?

We use a general expression parser for JS-like syntax so this should already be supported.

Got to https://deck.gl/playground/ and select the GeoJSON example and see how the expression is written there: "getElevation": "@@=properties.valuePerSqm"

{
  "description": "The deck.gl website GeoJsonLayer (polygons) example in JSON format",
  "websiteUrl": "https://deck.gl/#/examples/core-layers/geojson-layer-polygons",
  "initialViewState": {
    "latitude": 49.254,
    "longitude": -123.13,
    "zoom": 11,
    "maxZoom": 16,
    "pitch": 45,
    "bearing": 0
  },
  "views": [
    {
      "@@type": "MapView",
      "controller": true,
      "mapStyle": "mapbox://styles/mapbox/light-v9"
    }
  ],
  "layers": [
    {
      "@@type": "GeoJsonLayer",
      "data": "https://raw.githubusercontent.com/visgl/deck.gl-data/master/examples/geojson/vancouver-blocks.json",
      "opacity": 0.8,
      "stroked": false,
      "filled": true,
      "extruded": true,
      "wireframe": true,
      "elevationScale": 0.1,
      "getElevation": "@@=properties.valuePerSqm",
      "getFillColor": [
        199,
        233,
        180
      ],
      "getLineColor": [
        255,
        255,
        255
      ]
    }
  ]
}

@mappingvermont Here's the Python snippet I used to convert a GeoDataFrame into a standard data frame that can be passed to pydeck.Layer with 'PolygonLayer' type. Similar to this example here, properties are specified as additional columns in the data frame, and the geometry is contained in a 'coordinates' column.

I pull the polygon coordinates (a tuple of tuples) out of the GeoDataFrame.geometry.__geo_interface__ properties object and convert it to a list of lists, which seems to be what pydeck wants.

```python
import geopandas as gpd
import pandas as pd

gdf['coordinates'] = gdf.apply(
lambda row :
row['geometry'].__geo_interface__['coordinates'],
axis=1)
df = pd.DataFrame(gdf)

Got to deck.gl/playground and select the GeoJSON example and see how the expression is written there: "getElevation": "@@=properties.valuePerSqm"

That's exactly my point...

When you know you're working with GeoJSON, you need properties.valuePerSqm. But more generally when you're working with a Pandas DataFrame, you don't need the properties prefix. I.e. in this example, it's:

get_weight="profit / employees > 0 ? profit / employees : 0"

When you call __geo_interface__, the table is transformed into GeoJSON, so the profit column would now need to be referenced as properties.profit, so I assume that string wouldn't work.

Re @medriscoll, that's a fine workaround, but that's different than my proposal for pydeck.

Specifically, your snippet converts a geometry column stored as binary to a geometry column stored as GeoJSON. That means that you're still sending a _table_ to pydeck. That's not really something we could do on the pydeck side without adding a dependency on GeoPandas.

The proposal here is to call __geo_interface__ on the opaque object passed in. If that attribute exists, then we call it, transforming the input to GeoJSON.

That works for a GeoDataFrame because the _entire GeoDataFrame_ has a __geo_interface__ attribute, converting the table to a FeatureCollection.

When you call __geo_interface__, the table is transformed into GeoJSON, so the profit column would now need to be referenced as properties.profit, so I assume that string wouldn't work.

OK I suppose it would be trivial for pydeck post process the geojson after calling the geo_interface to move the properties, although that is not really proper geojson AFAIK. You could also keep values both on root and properties. Whatever makes most sense to python users here is fine with me.

Glad to see the interest here鈥揑'll put this feature in pydeck 0.5. I'll get this out within six weeks. I'll update this issue when there's a beta published.

would be trivial for pydeck post process the geojson after calling the geo_interface to move the properties

@ibgreen Probably going this route to start and we can change the implementation later

@kylebarron Yes, for anyone reading this, my snippet is proposed as a workaround, not a long-term solution :) - the __geo_interface__ path is a great one.

Was this page helpful?
0 / 5 - 0 ratings