Documentation: Finalize feature set for MVP

Created on 22 Aug 2016  路  36Comments  路  Source: Islandora/documentation

Create a FINITE list of features that is as small as possible that will still give users the functionality they need and the foundation for the addition of new features.

MVP architecture

Most helpful comment

Related to @bryjbrown's comment: https://github.com/Islandora-CLAW/CLAW/issues/334#issuecomment-241519478 and publishing linked data, I think it would be a good idea to use a single URL for these resources. That is, don't produce HTML at one location and have the same resource as JSON-LD somewhere else. Rather, if you can have your endpoint produce HTML (with RDFa markup) for browsers and JSON-LD for clients that request application/ld+json or application/json. Personally, I wouldn't prioritize a public SPARQL endpoint.

All 36 comments

Count me in for this, and for writing up the MVP document in markdown when it's ready to move over to GitHub.

This is just for starters. Defnitely not authoritative. Please critique.

  • The ability to publish linked data
  • Synchronization with Fedora 4
  • Meaningful REST API
  • Support for Collections, Images, Books, and Pages
  • Can control metadata mappings from Drupal to RDF through a user interface
  • The ability to export/import JSON-LD
  • The ability to restrict access to collections and/or individual resources
  • The ability to index and search resources with Apache Solr

Support for Collections, Images, Books, and Pages

Would it be worthwhile to consider having something analogous to the binary cmodel from 1.x in the MVP? Something that stores a non-RDF resource without making any assumptions about it? Maybe this isn't needed for a v1.0, just a thought.

Looking for refinement on that first bullet point.

Does 'publishing linked data' mean:

  • A public triplestore to query?
  • RDFa output?
  • some 'pure RDF' serialization format (like json-ld) available through content negotiation?

"Can control metadata mappings from Drupal to RDF through a user interface" <- this is our equivalent/replacement for Form Builder?

@manez Yes, this would basically turn drupal into an RDF editor, where admins control the forms.

The ability to publish linked data

This definitely needs to be fleshed out more. It's a little vague. Maybe some use cases?


Synchronization with Fedora 4

We should be more specific about what the synchronization is. Is it both ways, one way?


Meaningful REST API

What is "meaningful"?


Support for Collections, Images, Books, and Pages

:+1:


Can control metadata mappings from Drupal to RDF through a user interface

:+1:


The ability to export/import JSON-LD

:+1:


The ability to restrict access to collections and/or individual resources

:+1:

In the weeds; Is this WebAC or Drupal restriction?


The ability to index and search resources with Apache Solr

:+1:


@ruebot and @dannylamb, yes we need to refine language, desired functionality and by this, also expectations. I feel some concepts are mixed up. But we have this sprint to solve this right?

Related to @bryjbrown's comment: https://github.com/Islandora-CLAW/CLAW/issues/334#issuecomment-241519478 and publishing linked data, I think it would be a good idea to use a single URL for these resources. That is, don't produce HTML at one location and have the same resource as JSON-LD somewhere else. Rather, if you can have your endpoint produce HTML (with RDFa markup) for browsers and JSON-LD for clients that request application/ld+json or application/json. Personally, I wouldn't prioritize a public SPARQL endpoint.

@acoburn i agree, drupal 8 REST API (which works as middleware in drupal's routing system) uses content negotiation + (sadly) a _format param to expose different serialisations on the same URI. The question will be which URI will be the canonical one, the UUID based one(which does not exist by default in drupal 8, i published a working implementation for our case that can be extended if needed) or the sequential numbered one, which is based on a '{entity_type}/{id}' routing pattern, with id a sequential number unique to each entity_type.
All works pretty similar (in terms of workflow and params) to http://symfony.com/doc/current/routing.html#routing-format-param

The ability to publish linked data:

Different presentations but all with congruent canonical URI for local linked resources which means translate fedora4 paths to published resources URI's
RDFa(core drupal) in html, JSON-LD(Accept: application/ld+json) plus what any other contributed modules want to provide, with resources (like <> ldp:contains <some/resource>, etc) pointing to also publicly available resources in Drupal 8, becomes a drupal 8 canonical URL (following same convention as the referrer resource).

Follow your nose would be fine for html resources, but as i see this, a Drupal block solves this and can be even a contrib module.


Also good to remember: Drupal 8 allows for multiple view modes, so this can be user configured and adapted/expanded.

One question to consider is: when publishing linked data for an aggregate resource (e.g. a Book), will Drupal publish the aggregate graph? Since the HTML display is (I assume) a sort of aggregation of resources (pages, files), I'd expect the JSON-LD repr would also contain the aggregate graph, but that would be good to spell out explicitly.

The ability to publish linked data

I think @acoburn's content-neg description should cover our "publish linked data" needs (we can always expand if we get a persuasive use case).

The ability to restrict access to collections and/or individual resources

I think this could be Drupal restrictions, so long as they are translated to WebAC for Fedora...no?

Meaningful REST API

This is wide open to interpretation, but... would this be Drupal services to allow creation of resources in Drupal (which would push to Fedora) and/or would this be Silex services to allow creation of resources in Fedora (which would sync back to Drupal).


@DiegoPino I know you got your routing working, but I found this ticket for Drupal 8 core which appears very similar to what you have. Would it cause a conflict in future?

@acoburn. I would expect (or code aiming for that) drupal publishing the aggregate graph. If a main simple drupal node aggregates multiple custom fedora resource entities, then it's serialization is an aggregation graph, which is what i (as today) would like to model.

@DiegoPino cool, that's what I was hoping.

@whikloj, no problem there with https://www.drupal.org/node/2353611. Since we can't aim right now for manual applied patches, i added my own Resolver just for fedora_resources, which is basically the same idea that they apply general in that ticket. Since i'm pretty sure they are not right now in a state where custom entities will inherit UUID routes, both things can live side by side. Also my routing still does not solve the linking, which involves messing with URL class.


my own Resolver means code borrow from there and here. I did not invent the wheel, but i made it spin here.

The ability to restrict access to collections and/or individual resources

To be able to map and enforce WebAC in drupal 8 we need to investigate these services for fedora_resources type derived entities

  1. See Drupal::accessManager https://api.drupal.org/api/drupal/core!lib!Drupal.php/function/Drupal%3A%3AaccessManager/8.2.x
    returns an Object implementing \Drupal\Core\Access\AccessManagerInterface) [https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Access%21AccessManagerInterface.php/interface/AccessManagerInterface/8.2.x]
  2. If we need to use a different authentication system as Drupal itself,
    develop a service that implements \Drupal\Core\Authentication\AuthenticationProviderInterface and bind to our Service container using the authentication_provider service tag.

For 'publishing linked data'....

Question about publishing the entire graph:

Based on how D8 works with the format parameter, if the html representation is disambiguated with other representations by a query param like ?_format=json, then does that qualify as a distinctly different uri? I know that fragments aren't considered by http, so are query params?

Yeah... and sync.

So I've got big ideas, and want to go for the gold on it, but both Fedora and Drupal are going to need help enforcing conditional updates for it to work. Granted, no type of sync'ing whatsoever is going to work well without conditional updates, so it's not like I can do another approach. It's just that if time gets spent making both Fedora and Drupal respect conditional updates, then we may not have as much time to do what I want with sync.

That said, I'm shooting for full bidirectional sync. If we bake it into the lowest levels of D8 entities (like RDF and NonRDF Resources), then aside from having a few post-save events, there will be no mention of Fedora whatsoever in Drupal code. It's definitely the best way to decouple the two.

As far as implementations go, I'm looking at sticking Interval Tree Clocks both as a field in Drupal and in the RDF in Fedora. There has to be some middleware to intercept write requests to Fedora and make sure they update the Interval Tree Clock for the resource, and we can write the updates ourselves into the Drupal side of things. Then all replication can be handled async between two listeners, one for Drupal and one for Fedora.

FYI: Java and C implementation for Interval Tree Clocks (which are a generalization of both vector clocks and version vectors) here.

For PHP I was thinking about making an extension around the C code.

no mention of Fedora whatsoever in Drupal code.

:+1:

For PHP I was thinking about making an extension around the C code.

That seems reasonable.

@dannylamb some ideas that obviously need more discussion and can also be no-no, on how to publish the whole graph and also comply with an disambiguated URI for ORE is:

  • Use normal, node entity derived content types with fields that link to our fedora_resource entities(custom ones, provided by our module): this way we are emulating a ReM(Resource Map in ORE) and we can add new contents using all the UI goodies Drupal 8 provides etc. So, this Nodes have a different 'canonical' URL than the one assigned to it's aggregated(linked as field values) fedora_resources. I said canonical because in Drupal 8 you can basically make as many, pattern based, aliases as you want.
    Question here: discussing pros and cons, this ReM would really not exist in Fedora4, or at least right now we haven't defined a structure, place, whatever for an ReM.

OR

  • Make a permanent route act as a JSON-LD graph serialisation under a different Path

AND/OR (from IRC by @acoburn )

  • use link headers to point to the JSON-LD graph

OR

  • Create a Resource Map custom entity, with it's own controllers, serialisation specificities and of course a "rourte" (i'm starting to like this idea)

Assuming this JSON-LD serialisation of each fedora_resource would be just the resource itself. (more a question than an afirmation).

Anyone @Islandora-CLAW/sprinters wants to discuss this idea on IRC?

It might be worth noting that _someone_ might be able to write a little JAVA code for Fedora in order to generate vector-clock headers. There are currently hooks in the Fedora code for being able to do this. That way, the drupal code can just work on header values w/r/t the vector-clocks

That said, I'm shooting for full bidirectional sync. If we bake it into the lowest levels of D8 entities (like RDF and NonRDF Resources), then aside from having a few post-save events, there will be no mention of Fedora whatsoever in Drupal code. It's definitely the best way to decouple the two.

That is the way i'm approaching stuff, still one direction (from Drupal to Fedora) but would like to discuss some approaches

Meaningful REST API

The terminology is going to get a little weird here, but I'd highly recommend using Hydra for this. And by Hydra, I mean the vocabulary for describing hypermedia-driven web APIs.

@acoburn++
Does using HydraCG implies using a complete different ontology or they can be mixed? I see the @context is peculiar hydraCG centric still don't get Hydra-cg completely so i paste this here, can be of use.
http://stackoverflow.com/questions/25297719/get-a-collection-of-sub-resources-at-once-with-json-ld-and-hydra

Hydra and Swagger.io? Or just one?

@ruebot: I don't know enough about either to make a good decision. I would be happy to investigate.

@acoburn I believe @dannylamb and @whikloj have done a fair bit of investigation on the swagger.io side of things: https://github.com/Islandora-CLAW/CLAW/issues/205

@ruebot: I'm advocating for _some_ mechanism to describe the API. If there's already momentum behind swagger.io, that's great

@acoburn cool. I'll leave it to @dannylamb and @whikloj for thoughts/decisions there.

@acoburn i will write anything to get at vector clock headers. anything that will get me conditional updates. i don't need byte for byte comparison.

Publish Linked Data

Back to this. I'm thinking we should provide json-ld for every resource/entity in addition to the resource map, which yes, would make sense to have its own entity/node.

And I'm thinking we just generate the resource map RDF from the triple store. It can be dynamic at first, but we'll probably want to consider caching with invalidation based on a transitive SPARQL query. And if we can't make the assertions on other resources in Fedora, then I guess we have no choice but to preserve them as NonRDFResources (the irony is killing me).

Meaningful API

Looks like Drupal is going to thwart us if we want normal looking conneg. No PUTs kinda stinks too. I'm tempted to try and smooth this stuff over with middlewares. Looks like you can even make a silex application act as a filter.

About swagger: Server side stubs don't seem to be worth generating. And the little tester page has a hard time with conneg because it overrides accept headers you set even if they're a parameter you're providing as per the schema. You have to manually list all types of consumed and produced messages in the schema, so something like "any Content-Type you can provide" is awfully hard to describe. I'm saying this because i spent some time trying to swaggerize the Fedora API and ran into that gem.

The client code generation of swagger is still nifty, though.

But anything that describes the API in a machine readable format is a good thing. If people think using RDF to describe the API is better, then we can go for that. No love lost with Swagger.

I had some time this morning to think more about ResourceMaps / Aggregations and the goal of "Publishing Linked Data", all in the context of some recent threads of discussion. Here are some thoughts (please critique):

  • The Drupal representation of the aggregated resource _is_ the ResourceMap.

That is, don't store the ResourceMaps in Fedora but _do_ store the aggregations in Fedora with descriptive metadata attached to these Aggregations. That resource map would have an HTML serialization and a JSON-LD serialization (i.e. each at different URLs, which, as I understand, is how Drupal does it). E.g. you might have http://example.org/obj/foo for the HTML serialization and http://example.org/obj/foo?_format=json for the JSON-LD version. Both serializations would include the _complete_ aggregated graph. Each would also use a link header Link: <http://example.org/linkeddata/foo>; rel="describes" to point to the particular Aggregation, which can be dereferenced by any linked data client. The metadata attached directly to the ResourceMap would be very minimal: Islandora-CLAW would be the dcterms:creator, plus any additional _necessary_ metadata -- as mentioned above, the primary descriptive metadata would be attached at the Aggregation level.

  • The Aggregations would be available separately (as per the ORE spec): e.g. http://example.org/linkeddata/foo, available in HTML and JSON-LD formats (or others, if necessary)

This endpoint could live entirely separately from Drupal and/or be based on a simple template service. Personally, I wouldn't include ldp:contains triples for these resources (i.e. I'd rely mostly on ldp-member triples), but I wouldn't draw a line in the sand on that point. In contrast to the ResourceMap serializations, the resources serialized at the /linkeddata/... endpoint would not include child and/or aggregated resources -- they would basically obey the "single-subject" restriction we see in Fedora (so they could include hash URIs).

To me, this seems like it has the advantage of following the ORE spec (as I understand it) and fitting into the models that both Fedora and Drupal provide, while also retaining the _semantics_ of ORE and linked data.

Closing since sprint is over. We can open another ticket to 'revisit' this concept later if required.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DiegoPino picture DiegoPino  路  5Comments

manez picture manez  路  5Comments

acoburn picture acoburn  路  5Comments

jonathangreen picture jonathangreen  路  3Comments

ruebot picture ruebot  路  4Comments