Presto: Support for dynamically detect new catalogs or deleted catalogs

Created on 28 Feb 2015  ·  33Comments  ·  Source: prestodb/presto

I need presto to be able to add catalogs dynamically without restarting presto server.

So generally, if I add a new catalog file to presto, without a "launcher restart" command the server can automatically detect it. And once a catalog is removed(suppose it is not in use), the server will also detect it.

Our system is like this: We have a table in mysql storing all the data servers(IP , port, type, username and password). And when a new data source is added to the system, a new catalog is added to the server who has presto server on it. We cannot afford presto server to restart to get it refreshed. Instead, we want it to detect that change automatically.

Most helpful comment

Is anyone still working on this feature or is it being dropped?

All 33 comments

@dain Can you take a look at this?
I added a watcher thread in CatalogManager, when new catalog files are added, it will load them automatically.

I can now add a new catalog without restarting presto server, and show catalogs goes well.

But the only question is I can't query the new catalog's table, faile with error "No nodes available to run query"

So it seems I missed one last step, like PrestoServer.java did (function updateDatasources()), update the datasources in ServiceAnnouncement.

My question is: how can I update the datasources in CatalogManager.java?

Oh, I finished it, but it was really a huge hack.

Make announcer a local variable in PrestoServer, and add a static function updateDatasourcesAnnouncement, like TestingPrestoServer did.

call PrestoServer.updateDatasourcesAnnouncement when new catalog added.

so the server itself still cannot look for new catalogs dynamically?

yuananf [email protected]于2015年3月15日星期日写道

Oh, I finished it, but it was really a huge hack.

Make announcer a local variable in PrestoServer, and add a static function
updateDatasourcesAnnouncement, like TestingPrestoServer did.

call PrestoServer.updateDatasourcesAnnouncement when new catalog added.


Reply to this email directly or view it on GitHub
https://github.com/facebook/presto/issues/2445#issuecomment-81364586.

After modify some code like I said, it can now dynamically detect new catalogs.
But it was hack so I didn't make a pull request.

@yuananf I'm looking for a way to add catalogs at runtime as well and your solution with monitoring the file system for changes is great.

Do you think that a REST endpoint for managing catalogs makes sense? In my particular use case I'm interacting with presto through the REST API so having a way to add/remove/update catalogs with HTTP requests would be great.

@svstanev actually your idea is another good approach, I have hacked my presto code with something similar(to load local jar at runtime).

Here are possible steps:

  1. Add command LOAD\ADD CATALOG 'name' in parser
  2. Bind AddCatalog to AddCatalogTask(which implement DataDefinitionTask) in CoordinatorModule.
  3. In AddCatalogTask, send REST request to every endpoint in the cluster, ask everyone to add its local catalog file.
  4. Regiister a AddCatalogResource to receive REST request from step 3, and load local catalog files.

@svstanev @cberner , do you prefer this one or #5155 ?

@yuananf actually having a statement for creating catalogs would be awesome and is definitely better than a REST API that is plain workaround for the missing functionality.

It may even worth to consider providing the complete set of CREATE/ALTER/DROP CATALOG statements similar to way we define and manipulate databases (non-standard), tables, views, etc..

The question is whether these statements fit the Presto big picture and the project stakeholders' vision and future plans.

The way we envision something like this working is via special commands. It's still unclear whether we need custom grammar or whether implementing it as callable procedures would be sufficient. But, this is the easiest part to figure out.

The complexity is in how plugins and connectors are managed across the cluster. The changes need to be transactional and can't affect running queries. At a minimum, we'd need a way to safely unload a connector and re-load it. The system would probably need to keep multiple versions running while active queries finish.

Also, there are currently no defined semantics or mechanisms for changing the configs of a connector. For some options, this is simply not possible (e.g., changing thread pool sizes, etc).

@martint you're right that the interface is the easiest part and it can be anything (a command, a proc, REST API, etc).

I'm pretty new to Presto and still lack any deep understanding on the its architecture and internals so thank you for the explanations.

From what I've seen so far the connectors are instantiated on node startup and later on referenced by a connectorId. If I've understood correctly this is what makes it hard to update or delete a catalog as you have to make sure that it's not in use anymore (though adding a new catalog should be safe).

If this is the problem than the less runtime state the nodes have to maintain the better. So if workers instantiate the connectors on demand (ex. in order to process a TableScanOperator), and then dispose them when done, then the coordinator will be the only place where to maintain the list of the available catalogs; during the planning phase the connection information should be included in the corresponding plan nodes (ex. TableScanNode). This connection information will be a snapshot of the catalogs' state at the moment the query enters the execution pipeline so any subsequent changes to the catalogs won't reflect the running queries. Still the worker nodes can maintain a cache of connector instances (based on the connection info of course) if needed.

From what I've seen so far the connectors are pretty fundamental functionality so it's hard for me to decide if the above is a possible solution.

@yuananf , we are also getting the same problem whatever @ndsorrowchi mentioned , Can please comment the link for the code with your changes.

I have been slowly adding features so we could one day support this. It is a very complex issue as Martin pointed out, and there is very little demand for the feature (it is more of a curiosity for me).

As you point out, adding a new catalog is very simple, you simply add it to the catalogs manager and new queries will see the new catalog. On the other hand, removing a catalog is very difficult.
Currently, the association between catalog names and connector instances is transactional, meaning that for each individual transaction there is a list of catalogs visible to the query. My plan is to allow removing catalogs by hiding the catalog from new transactions, and then waiting for existing transactions that reference the catalog to complete.

The bigger issue is I think we need to restrict the the system to only have one connector instance created per catalog name at a time. The is because many connectors create externally visible static resources that would conflict on name if there were two instances bound to the same catalog_name. For example, many connectors register JMX resources using the catalog_name. We could make these connectors register under random names, but that would make use of the management data difficult (customize alerting and monitoring systems to Presto). Additionally, some connectors like the memory connector use so much memory that running two instances at the same time is impossible. Because of these restriction, my plan it to limit DROP CATALOG to auto commit mode, and to block CREATE CATALOG for that name until all transactions referencing the old catalog to finish, which would make hot-swap impossible.

There is a related problem with upgrading plugin code as some plugins have large caches and JXM binding also.

To @ramveer1193
I think @yuananf had put his source code here.
https://github.com/yuananf/presto/commits/catalog

We're interested in this.

Our use-case is one catalog per review app:

As review apps are rolled in and out with git branches, restarting the whole prestodb cluster to add/remove a catalog is not very nice for users of other review apps.

Is anyone still working on this feature or is it being dropped?

I am also very interested in this feature for the exact same reasons mentioned by @LouisKottmann.

Or is there any way to restart a prestodb cluster node by node with the new catalogs without affecting user experience ?

Is anyone still working on this feature or is it being dropped?

I also want this feature in presto, so that i will add and use the catalog at run time without restarting it. Is there any date on which i will expect that this bug or new feature will be added in presto.

My understanding is:

  • adding a new catalog is relatively safe and straight forward
  • removing a catalog is a bit more complex, but can be done as making it not accessible by new query, and remove it after existing queries using it are finished
  • modifying, or removing & immediately adding an updated version, is a very complex problem.

If most of the use cases for this feature are in the first 2 categories, I think it's fair to implement it first without support the 3rd one.

I'm not aware of anyone working on it right now (at least not in Facebook). So if anyone want to volunteer working on this, that would be very much appreciated.

@All , I was working on this feature and able to delete and add catalog without restarting it via a rest call. I will raise a pull request for the same. let me know your thoughts for same.

there is this pull request for this issue, once it merged it should be available with code

Any progress on this? It would be VERY useful to be able to add catalogs without restarting Presto every time

@fpompermaier Could you elaborate your use case? Thanks!

It would be great if I can dynamically add/remove catalogs without restarting Presto

It would be great if I can dynamically add/remove catalogs without restarting Presto

I have raise the PR for this , You can find it here: https://github.com/prestodb/presto/pull/12605 Its not merge it though, You can share your views there.

It would be great if I can dynamically add/remove catalogs without restarting Presto

I have raise the PR for this , You can find it here: #12605 Its not merge it though, You can share your views there.

Indeed...why not merging it?

@ramveer93 Do you think it would be a good idea, as mentioned by @aweisberg here, to get the functionality for adding catalogs approved then work on removing catalogs in a separate PR? Personally, I would love to see this PR go through.

Our BI/analytics platform provides a capability to register, update and delete data sources.
Our customers need to manipulate their data sources without downtime.
Missing support in Presto forces me to stop evaluating it and moving to other engines like Drill.
The update is also important and not so rare use case - customers may need to modify e.g. user/password due to security reasons.
They can create a new user with a new password and switch without downtime.

do we have any update on Dynamically Detect New/Deleted Catalogs?
is it possible to manage catalogs only on coordinate node instead of worker node?

As of now I created a Presto Agent App which is actually a Spring Boot App and providing Rest APIs to manager catalogs on Presto Server. For example, I can add, edit and delete catalog on Presto Server using my agent app. Also I am restarting presto server using my agent app after adding, editing and deleting catalogs.

As of now I am able to add catalogs for below connectors:

  1. Hive
  2. MySQL
  3. SQLServer
  4. MongoDB
  5. Kafka
  6. MemSQL
  7. Oracle
  8. PostgreSQL
  9. RedShift
  10. Google BigQuery

This approach is working fine with single user but when multiple user will use Agent App that will be a problem. So, I added a locking mechanism in my agent app and now one user can only add/edit/delete catalogs.

For my use-case presto should load catalogs dynamically and also I am felling that this feature is very much required in presto.

As of now I created a Presto Agent App which is actually a Spring Boot App and providing Rest APIs to manager catalogs on Presto Server. For example, I can add, edit and delete catalog on Presto Server using my agent app. Also I am restarting presto server using my agent app after adding, editing and deleting catalogs.

As of now I am able to add catalogs for below connectors:

1. Hive

2. MySQL

3. SQLServer

4. MongoDB

5. Kafka

6. MemSQL

7. Oracle

8. PostgreSQL

9. RedShift

10. Google BigQuery

This approach is working fine with single user but when multiple user will use Agent App that will be a problem. So, I added a locking mechanism in my agent app and now one user can only add/edit/delete catalogs.

For my use-case presto should load catalogs dynamically and also I am felling that this feature is very much required in presto.

Very nice, I will probably use this approach.Thanks for sharing. Is it publicly available?

@SonuSingh200190, it's strange, I don't see MemSQL or BigQuery in Presto. Which version are you on?

@bitsondatadev Hi Brian, I am using PrestoSQL 348.
Also this connectors are available in Trino 354.
image
image

@fmendez89
Hi Fran, sorry that agent code is not available publicly as of now.
Let me check if I can share that agent code to you.

@bitsondatadev Hi Brian, I am using PrestoSQL 348.

Also this connectors are available in Trino 354.

image

image

I thought that might be the case. To make it clear for folks wanting to use your solution, Trino (formerly known as PrestoSQL) is a fork of Presto created by the original founders and majority code contributors of Presto.

The repository for Trino is located here: https://github.com/trinodb/trino

Related issue: https://github.com/trinodb/trino/issues/2110

If you have any questions find me on out slack: https://trino.io/slack.html

@SonuSingh200190 feel free to also post your solution there since you are using Trino.

Was this page helpful?
0 / 5 - 0 ratings