Google-cloud-python: BigQuery should have a module that follows the DB-API to allow for a SQLAlchemy dialect

Created on 26 Sep 2016  路  10Comments  路  Source: googleapis/google-cloud-python

There is a module to connect SQLAlchemy to BigQuery, but it only has partial support. I'd love to be able to connect to BigQuery using SQLAlchemy this way and be able to play with Caravel as was mentioned in that partial support issue.

To do this properly, we'd need to provide a module with an implementation of a SQLAlchemy dialect I would imagine we'd want to package this separately so as not to add a dependency on SQLAlchemy. Looking more closely at the dialect interface it looks like the database driver would need to implement the Python DB-API.

I'm considering this a low priority feature request, but do think it would be cool.

feature request bigquery p2

Most helpful comment

All 10 comments

This is a good topic, very useful.

@tswast BigQuery's DML support is still marked as experimental: I would want to see some API stability commitment on the back-end before trying to implement DB-API semantics.

I'd like to use this on my projects. Are there plans to build support?

@jkilpatricksc I have a pull request open, but it's not complete. Working on it!

There is some tentative of plugging sql alchemy:
https://pypi.python.org/pypi/sqlalchemy_bigquery/

Having Big Query connector to SQL Alchemy would give a strong incentive to use Big Query

Maintainer on Superset here (https://github.com/airbnb/superset)

@tswast I'm curious to hear how far you made it. It seems like it should be fairly straightforward to mimic DBAPI and to define a SQLAlchemy dialect. Note that the dialect can be incomplete and limited to work for read, as [of course] you wouldn't use DML or expect to use an ORM on top BigQuery. Also there's a need to implement the metadata-fetching methods to allow us to fetch metadata around tables/views/columns/types.

For reference, the pyhive project (https://github.com/dropbox/PyHive) does exactly this (mimic DBAPI and defines a SQLAlchemy dialect) on top of Presto and Hive, and it works very well for Superset and other interfaces. It's really nice to have a common interface (sqlalchemy) to all databases for people building tools that are expected to work against many databases.

I paused my work on https://github.com/GoogleCloudPlatform/google-cloud-python/pull/2921 but am restarting it now. I'm glad to hear there's interest in this.

3460, now merged, added query.num_dml_affected_rows, needed to implement cursor.rowcount.

Was this page helpful?
0 / 5 - 0 ratings