Presto: Implement HBase connector

Created on 1 Sep 2016  Â·  15Comments  Â·  Source: prestodb/presto

Throw an issue here as I'm working on it and will push a PR soon.

Plan to have a production ready plugin for the end of the year.

Main design choices:

  • Tables defined in JSON configuration files (like kafka/redis/mongodb connector)
  • Split aligned to region with HBase key pruning
  • parallelism aligned to region to scale
  • Manage configuration file for HBase (auto-updater)

Any advices, questions or tips are welcomes (specialy if you had started a plugin like me)

Most helpful comment

what the progress ?

All 15 comments

@damiencarol You may find some of the work for Apache Accumulo helpful, currently sitting in https://github.com/prestodb/presto/pull/5030.

@adamjshook yeah, I'm reading the PR code right now

@damiencarol I'm happy to answer any questions or give you any pointers on some BigTable-esque optimizations I've built to improve query times.

@adamjshook did your connector run in production ? also did you implemented insert/update/delete ?

@damiencarol Yes, it's been in production since March/April or so. INSERT is supported, but we use the Java APIs and some tools I've built for higher throughput. Presto doesn't support UPDATE (as far as I know), but you can issue another INSERT statement that shares the same Accumulo row ID and it effectively acts as an update. I haven't implemented DELETE yet -- haven't had a use case come up to drive the effort of implementing it.

First naive version here #6037 .
Please be kind with me, it is a work in progress.

what the progress ?

what the progress now,please ? ~~~

We are exploring ways of trying to use Presto to query a HBase table. Can I get an update on where we are w.r.t HBase connector for Presto and any references for the same ?

Any news about this part?

There's a PR for an Apache Phoenix connector here [1] which would allow you
to read an HBase table.

[1] https://github.com/prestodb/presto/pull/10536

On Tue, Aug 28, 2018 at 11:56 PM Nick notifications@github.com wrote:

Any news about this part?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/prestodb/presto/issues/6010#issuecomment-416845516,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF4-K5-4rZznp3otBVsWDXhEDyxucxGaks5uVjsHgaJpZM4JzBqN
.

@JamesRTaylor Is it fair to say the Presto -> HBase (through Phoenix connector) is fairly nascent and probably not widely used in large production systems. Looking at doing something like this for a fairly large web scale production system. So hence wanted to know how hardened this is and current usage.

Sounds like the author of the Phoenix connector is using it in production, @ganeshjothikumar, but you should ask him to confirm. Since neither the HBase connector nor the Phoenix connector are part of Presto yet, I'd imagine that they're both similar. FWIW, the SQL abstraction and query push down that Phoenix provides will make for a better fit as a Presto connector unless you're either 1) ok with many serial, full table scans by HBase, or 2) you try to do what Phoenix is doing within the HBase connector. Neither of these is a good option IMHO.

What's the latest on this PR (i.e., is it moving forward)?

what the progress now??

Was this page helpful?
0 / 5 - 0 ratings