Dataverse: Solr Container Scaling

Created on 20 Jun 2018  路  17Comments  路  Source: IQSS/dataverse

There has been some initial work to allow scaling for postgres and glassfish.

https://github.com/IQSS/dataverse/pull/4599
https://github.com/IQSS/dataverse/pull/4626

A similar project needs to be undertaken for Solr. I would expect the implementation to use a StatefulSet and perhaps an Operator.

More details on scaling solr:

https://lucene.apache.org/solr/guide/6_6/introduction-to-scaling-and-distribution.html

All 17 comments

After some trials and errors, right now I am working on making solr a headless service with 2 nodes (a master and a slave).

@thaorell and I discussed this issue a bit at http://irclog.iq.harvard.edu/dataverse/2018-07-20 and my quick take is that Solr docs recommend SolrCloud but there's concerns that adding Zookeeper to the mix will complicate things.

Currently, the following work has been done:

  1. Making Solr a StatefulSet
  2. Configuring master-slave replication for better scalability, for more information, read https://lucene.apache.org/solr/guide/6_6/index-replication.html#index-replication
  3. Setting up how backup and restoration for the master pod in OpenShift

@thaorell great! Are you close to making a pull request? Are you blocked in any way? Please let us know how we can help.

@pdurbin I think I am ready for a pull request. One question though, I have two new files solrconfig_master.xml and solrconfig_slave.xml, should these be in conf/solr or conf/docker/solr?

as I have mentioned with @pdurbin, I also wrote some docs about how to configure persistent volumes on Kubernetes so Solr can backup and restore its index (glassfish and postgres will also follow suit if needed later)

I have two new files solrconfig_master.xml and solrconfig_slave.xml, should these be in conf/solr or conf/docker/solr?

Let's have @matthew-a-dunlap comment on this because he's actively working on Solr config files for #4836.

Awesome to hear about the backup and restore! As a developer, I run Solr in a very non-fancy way. I'm quick to reinstall it entirely. I have so little data on my laptop that for me it's quick to delete all the data out of Solr and reindex my installation of Dataverse. Real backup and restore sounds like a great feature for production installations of Dataverse.

I've started making some changes to our solr setup in #4836. solrconfig.xml has changed somewhat (I went back to a clean slate) and is definitely going to change more.

More importantly, I changed our solr installation steps based upon recommendations from folks in the solr IRC. Our code in develop is pointing to the installation folder for its templates which could lead to unforseen consequences.

I'm not sure what the best next step is. I can probably update the configs you've created @thaorell as I go, but you may need to test them in the end. Hopefully the solrconfig.xml won't change again much after this story (and if maintaining it becomes a pain we can start using the programmatic API configurations to consolidate what we do).

thanks @matthew-a-dunlap, I will create a PR soon so you could see the files. Ideally these files (either for standalone or distributed deployment cases) should be very similar

Sounds great! @thaorell I realized I didn't answer your initial question about the config placement, maybe put them in the dockers folder for now and if we bring scaling into our normal deployments we can then move those

@matthew-a-dunlap when your finish with #4836, I would appreciate it if you could inform me so I would fix my solrconfig_master.xml and solrconfig_slave.xml.

@thaorell We decided for #4836 to keep it simple and only fix highlighting inschema.xml. The boosting fix and the solr best-practice changes are being put off for #4938 . Should mean that we don't have any conflicts as you don't seem to be touching schema.xml.

@thaorell - sending this back your way after talking with @matthew-a-dunlap. Let us know when @danmcp's feedback is implemented and we'll take a look in code review. Thanks!

@djbrooke I have implemented accordingly to the feedback.

@thaorell I just added a review to pull request #4924 and requested some minor changes, removing comments provided that I understand what you've implemented. Overall, this looks great! Thanks!

Looking good as of d538fac. Moving to QA. Thanks!

Was this page helpful?
0 / 5 - 0 ratings