Documentation: Build a re-usable File Service

Created on 13 Apr 2016  ·  17Comments  ·  Source: Islandora/documentation

Collections, Basic Image, Video, Audio will all have to hold files or proxies for files.

This ticket is to design a generalized service that could be used from all of these and more.

PDX pcdm

Most helpful comment

I make no assumptions, I only want to stub something and when ya'll (the committers) agree on an Ontology we can work on the semantic restructure. I just like the idea of starting with something ~ even if the end result is totally different.

All 17 comments

:+1: fully needed. Will start doing some thinking.

https://github.com/daniel-dgi/porkpie

I'll leave that there, lest we forget again. :poppy:

@daniel-dgi :poopy:

Done in PDX

Initial Thoughts
POST/PUT Endpoint should:
accept binary
accept UUID of parent, if empty REJECT (because files at this level should be part of a collection)
accept isPreservationMaster bool (Ontology/Semantic to use for this?)
create UUID for the file (not filesets if moving pcdm:works)
...

Resources for brainstorming, conceptualizing and causing _mild_ panic attacks:

@DiegoPino's example/concept on future.islandora: http://future.islandora.ca:8080/fcrepo/rest/DAM/books/book1/pages/page1

Possible conceptualization broarder than large img: https://raw.githubusercontent.com/wiki/duraspace/pcdm/yEd/islandora-large-image/Islandora-PCDM-Large-Image.jpg

https://github.com/duraspace/pcdm/wiki/PCDM-2.0

https://github.com/duraspace/pcdm/issues/53

I know @DiegoPino asked for this, but do we want a specified predicate on a File that defines it as the preservation master? Then we have to (possibly) check and update on any FileSet modification.

For example:

@prefix pcdm: <http://pcdm.org/models#> .

<#MyDog> a pcdm:Object ;
  pcdm:hasFileSet <#fs1> .

<#fs1> a pcdm:FileSet ;
  <pcdm:hasFile>  <#file1>  .

<#file1> a pcdm:File .

<#file1> would be our preservation master even if it was a low-res JPG, then if we added a second one.

<#fs1> a pcdm:FileSet ;
  <pcdm:hasFile>  <#file1>, <#file2>  .

<#file1> a pcdm:File .

<#file2> a pcdm:File .

where <#file2> is a Tiff, then we have to remove the predicate from <#file1> and add it to <#file2>.

We would have to perform this comparison on any PUT/POST to the <#fs1> fileset.

Just wondering if that is a good option or should we be determining the preservation master only when we actually want to create a derivative?

@whikloj interesting point, maybe flagging a file at this level as preservation master does not make sense. It feels like once the conceptualization of FPR comes together it might make sense to revisit? Regardless I could stub it out in code and if it needs to be s/moved/removed it can be?

@br2490 👍

@whikloj , @br2490 FileSet?
Not sure if we are there yet. Are we officially assuming PCDM 2.0 as our semantic-structural model? Then if so i need an ontology, not only example diagrams.
Can an object can have multiple preservation masters attached, different formats, for describing different resources? i my mind yes. I'm kinda lost.

How are you defining an Object? In my mind an object is a single resource.

So if that is a complex object that would have multiple files of different formats that are describing different resources....wouldn't that me multiple objects?

Not sure @whikloj. I can think of multiple use cases, but i'm probably out of library stuff (again, and again). I feel (damn feelings) we are handling this again as we did in old islandora. And the whole OBJ + derivatives was a mess. What i would like, personal feelings again, is just a way to mark. Hey this, whatever slug named binary resource, should be used to make derivatives. Nothing more. And this other, is already a derivate of something else, so please camel, dont do anything with it. Not sure how to put this in another way.

I make no assumptions, I only want to stub something and when ya'll (the committers) agree on an Ontology we can work on the semantic restructure. I just like the idea of starting with something ~ even if the end result is totally different.

@br2490 totally == starting with something == 👍 . Bad practice during a sprint to be super picky, my mistake sorry!!

Re: https://github.com/br2490/pdx/blob/issue-189/src/FileService/README.md
From my point of view (and feel free to disagree)

While thinking about this I decided that we have to add a UUID to the NonRDFSource (read: binaries) /fcr:metadata. Otherwise, we can't get the file. We'd have to name them (ie. PUT) and get them by name...UUID seems better.

So you need to ensure that if a UUID is not provided, that one is added to the RDF. I'm not sure how to ensure this happens as there will be two requests. One to PUT/POST the binary file and then one to PUT the fcr:metadata.

Perhaps Drupal doesn't give it a UUID and instead the FileService assigns the UUID and it gets synced back to Drupal through Islandora Sync??

Back to the README

  1. Should add files to filesets in an Object

Should we add a proxy for the File to the FileSet indirect container, much like how we add objects to a Collection. But this is kind of a grey area, as I'm not sure if there is much area for file reuse. Thoughts?

  1. IF no fileset container exists, create one and report
Create a FileSet indirect container and add the File and then report?

  1. Be agnostic as to ontology,...
So for this, I think you can pass the request along to the ResourceService and have it PUT/POST the binary for you. Then follow up with a PATCH to the `/fcr:metadata` endpoint.

  1. Should return something if successful else panic.

Should definitely return something, probably a 201 if the file is uploaded otherwise return the response that caught you off guard.

Regarding Should accept

  1. put/post file/{object} - creates pcdm:Object and attaches any binaries as pcdm:File. Object here hasMember, hasRelatedObj.
I would say, you don't accept a pcdm:Object. That would be the ObjectService (@bryjbrown), you accept the UUID of an Object in the route and add the pcdm:FileSet/pcdm:File to it.

  1. patch file/{object} - append additional metadata.
Again, I would suggest that you only patch files. Because what you'll be doing is located the file (using the idToUri translation) then appending `/fcr:metadata`. Leave Object patching to the PDX ObjectService.

All Patch requests can be passed to the ResourceService in the end.

  1. get file/{id} - do something.
So a get can pass straight to the ResourceService, but what you could do is content-negotiation. If the Accept: header is some form of RDF (ie. text/turtle, application/ld+json, application/rdf+xml), then you actually want the `/fcr:metadata` information. Otherwise return the binary. 

[ResourceService](https://github.com/Islandora-CLAW/Crayfish/blob/master/src/ResourceService/Controller/ResourceController.php#L21) just needs to know if you want the object or the metadata.

  1. delete file/{object} - delete something.
Again, only deal with Files. You could (and this would be nice, but not a requirement) to verify that the UUID asked for returns a pcdm:File. Or throw an Exception of some sort. 

But you could do that sort of validation for all these routes. Not sure if that would add too much lag.

But for delete, again you can just pass this request to the ResourceService for the most part.

Cool 😎 I just glimpsed at this and will carefully read tonight. Thanks

Sent from my phone. Forgive brevity and autocorrect.
On Jun 27, 2016 4:27 PM, "Jared Whiklo" [email protected] wrote:

Re: https://github.com/br2490/pdx/blob/issue-189/src/FileService/README.md
From my point of view (and feel free to disagree)

While thinking about this I decided that we have to add a UUID to the
NonRDFSource (read: binaries) /fcr:metadata. Otherwise, we can't get the
file. We'd have to name them (ie. PUT) and get them by name...UUID seems
better.

So you need to ensure that if a UUID is not provided, that one is added to
the RDF. I'm not sure how to ensure this happens as there will be two
requests. One to PUT/POST the binary file and then one to PUT the
fcr:metadata.

Perhaps Drupal doesn't give it a UUID and instead the FileService assigns
the UUID and it gets synced back to Drupal through Islandora Sync??

Back to the README

1.

Should add files to filesets in an Object

Should we add a proxy for the File to the FileSet indirect container,
much like how we add objects to a Collection. But this is kind of a grey
area, as I'm not sure if there is much area for file reuse. Thoughts?
2.

IF no fileset container exists, create one and report

Create a FileSet indirect container and add the File and then report?
3.

Be agnostic as to ontology,...

So for this, I think you can pass the request along to the
ResourceService and have it PUT/POST the binary for you. Then follow up
with a PATCH to the /fcr:metadata endpoint.
4.

Should return something if successful else panic.

Should definitely return something, probably a 201 if the file is
uploaded otherwise return the response that caught you off guard.

Regarding _Should accept_

1.

put/post file/{object} - creates pcdm:Object and attaches any binaries
as pcdm:File. Object here hasMember, hasRelatedObj.

I would say, you don't accept a pcdm:Object. That would be the
ObjectService (@bryjbrown https://github.com/bryjbrown), you accept
the UUID of an Object in the route and add the pcdm:FileSet/pcdm:File to it.
2.

patch file/{object} - append additional metadata.

Again, I would suggest that you only patch files. Because what you'll
be doing is located the file (using the idToUri translation) then appending
/fcr:metadata. Leave Object patching to the PDX ObjectService.

All Patch requests can be passed to the ResourceService in the end.
3.

get file/{id} - do something.

So a get can pass straight to the ResourceService, but what you could
do is content-negotiation. If the Accept: header is some form of RDF (ie.
text/turtle, application/ld+json, application/rdf+xml), then you actually
want the /fcr:metadata information. Otherwise return the binary.

ResourceService
https://github.com/Islandora-CLAW/Crayfish/blob/master/src/ResourceService/Controller/ResourceController.php#L21
just needs to know if you want the object or the metadata.
4.

delete file/{object} - delete something.

Again, only deal with Files. You could (and this would be nice, but
not a requirement) to verify that the UUID asked for returns a pcdm:File.
Or throw an Exception of some sort.

But you could do that sort of validation for all these routes. Not
sure if that would add too much lag.

But for delete, again you can just pass this request to the
ResourceService for the most part.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/Islandora-CLAW/CLAW/issues/189#issuecomment-228865030,
or mute the thread
https://github.com/notifications/unsubscribe/ANhlbLnCWV_PweZBdGQKYJB1eslF9GFOks5qQDI_gaJpZM4IGpGg
.

@br2490 @bryjbrown can y'all give a status update on this for @dannylamb's benefit?

@ruebot @dannylamb I'm working on #222 (ObjectService) and @br2490 is working on #189 (FileService). We spent the June sprint discussing how PDX should implement PCDM, and settled on a very basic view of it (at least for the beginning) where we only use a subset of the http://pcdm.org/models# ontology.

I'm going to try to use the right words for this and probably fall way off the mark, but here goes: PDX's API should allow you to add the pcdm model type to a resource by specifying a resource UUID and the thing you want it to be (Collection/Object/File). It should also allow you to add pcdm relationships between resources with the appropriate predicates (hasFile, memberOf, etc) through the API by passing in the UUIDs of the resources to be linked.

I mocked up an idea for how the ObjectService API would look by blatantly copying the API that was in place for CollectionService, which you can see here: https://github.com/bryjbrown/pdx#objectservice. I have no idea whether its even appropriate or not though since I have 0 experience with API design and a beginner's understanding of LOD principles. Ben has some notes on plans for the API design of the FileService here: https://github.com/br2490/pdx/tree/issue-189/src/FileService

I would also add that since the majority of the work we've done thus far has been conceptual (trying to wrap our heads around PCDM, Silex and API design), I for one would be totally fine if @dannylamb (or any of the other community members) wanted to scrap my branch and start over in a more deliberate approach. Whatever makes it more usable in the long run I say.

@bryjbrown this great, and very much appreciated! Let's put this on the next CLAW agenda. I think it will be very relevant to the Alpaca discussion, and help us prioritize where we focus the MVP efforts.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dannylamb picture dannylamb  ·  3Comments

akuckartz picture akuckartz  ·  3Comments

ruebot picture ruebot  ·  4Comments

Natkeeran picture Natkeeran  ·  3Comments

acoburn picture acoburn  ·  4Comments