Documentation: Explore SolrRDF (not MVP)

Created on 28 Aug 2016  路  6Comments  路  Source: Islandora/documentation

Not needed right now, but looks very promising to avoid having our main Triple Store exposed to the world in Islandora CLAW. I have not tested this still (about to do it to load a huge n-triples data set!).
https://github.com/agazzarini/SolRDF/wiki

It exposes a SPARQL endpoint(1.1) so really no tweaks or non-standard stuff needed, and it runs as a Solr Extension (4.x and 5.x)

I'm pretty sure @acoburn could like this also.

architecture drupal enhancement

All 6 comments

@DiegoPino ever heard of Stanbol? It's not sparql 1.1, but it supports LDPath and uses Solr/lucene on the backend (I've used it for years, and it's pretty awesome). Fuseki can also be configured to use Lucene for text-based queries.

@acoburn Stanbol keeps popping up when I google things. I've yet to figure out exactly how to use it. What exactly are you using it for?

@dannylamb I'm going to tag @ajs6f b/c he also uses Stanbol, though I think his uses are a little different than mine. And @ruebot saw my brief presentation on Stanbol at LDCX so he might chime in here.

There are a number of ways to use Stanbol; here are just a few examples of things that I've done:

  1. Because the LoC endpoint is slow and arguably unstable, I took a full dump of the name authority files and imported it into Stanbol as an "entityhub". This means I can query all of lcnames / lcsubjects using LDPath locally. Same for geonames (which has a limit on the number of daily requests). Those "entityhubs" are indexed in a solr "yard" that allows for both fast lookups and text based queries (which are hard in Sparql).
  2. I needed to do some NLP on free text in a repository (not Fedora), so I used Stanbol's "enhancement engine" to wire together a number of NLP components to produce the output I needed.

All of Stanbol's features run as RESTful HTTP endpoints (which makes integration w/ other software super easy), everything is really modular, and configuration is all dynamic and and can be modified at runtime.

I'm using it for the first use that @acoburn describes above, except with many different vocabularies and in conjunction with this Open Refine extension. The combination allows metadata people here at UVa to rectify metadata against common vocabularies like LCSH or Geonames. I'd like to examine some of the NER abilities, but haven't had much time.

@DiegoPino another thought on this w/r/t Solr and Sparql is that Solr is primarily a search engine for text-based queries. That is, Solr/Lucene indexes unstructured text over a set of fields and allows simple to complex queries on those fields. It is also possible to use Solr in many other ways: a document store, a database, etc. These are _possible_, but I am not convinced that this is a good idea: doing so works "against the grain" of Solr. If you want a document store, use a document store (there are many good ones); if you need a (relational) database, use one. A well-indexed/tuned database can perform very well at scale.

In a word, if you're using Solr but not primarily conducting text-based queries, then I'm not sure that Solr is necessarily the best data storage tech. And IMO (or at least, in my experience), Sparql is not primarily about text-based queries, especially once you get into rules engines and RDFS entailment.

@acoburn i agree with most of that, well, Solr has evolved to a bit more than just a text-based query search engine but i see where this points to. I also know that if you want a document store you should use one, etc if we keep enumerating technologies, and i'm sure that is the best approach, but islandora has being used for long time by institutions that don't have the resources to build up top notch Infrastructure backed systems where you can just scale and add complexity, we do have a many community users with 1 small to medium machine that serves all and in those cases i tend to feel that reusing software for different needs could be a good way. Probably a professional deformation related to my origins 馃槃

Was this page helpful?
0 / 5 - 0 ratings

Related issues

akuckartz picture akuckartz  路  3Comments

dannylamb picture dannylamb  路  5Comments

jonathangreen picture jonathangreen  路  4Comments

jonathangreen picture jonathangreen  路  3Comments

acoburn picture acoburn  路  4Comments