Documentation: Serialize Media entities as LDP-RS describing the File, not itself

Created on 13 Jun 2017  路  17Comments  路  Source: Islandora/documentation

Right now a Media entity, when serialized, has itself as the subject and contains a triple of the form <uri_of_media> iana:describes <uri_of_file>, but really it needs to be <uri_of_file> iana:describedby <uri_of_media> to be in line with how Fedora generates a LDP-RS for every LDP-NR that gets created. This amounts to adding a special case for Media entities in the jsonld module.

Here's what it looks like now (non-relevant triples removed for brevity):

{
    "@graph":[
        {
            "@id":"http:\/\/localhost:8000\/media\/1?_format=jsonld",
           ...
            "http:\/\/www.iana.org\/assignments\/relation\/describes":[
                {
                    "@id":"http:\/\/localhost:8000\/sites\/default\/files\/2017-06\/sample.jp2"
                }
            ]
        }
        ...
}

And here's what it should look like:

{
    "@graph":[
        {
            "@id":"http:\/\/localhost:8000\/sites\/default\/files\/2017-06\/sample.jp2",
           ...
            "http:\/\/www.iana.org\/assignments\/relation\/describedby":[
                {
                    "@id":"http:\/\/localhost:8000\/media\/1?_format=jsonld"
                }
            ]
        }
        ...
}
claw-jsonld drupal

Most helpful comment

In my understanding, the Media entity in Drupal is "a wrapper for the file" and any fields/values on a Media entity - for example: ebucore:height is 2394px, or mimetype is image/tiff, are semantically the properties of the _file_. It's just that file entities, in Drupal, can't have fields attached. So the fields go on the Media. Any other fields or properties you attach to a Media should, I think, describe the file proper (otherwise put it on the node).

The Media contains the same information, and is analogous to, the /fcr:metadata document describing the binary. However, it's different structurally - in Drupal it's "the middleman" tying a node to a file. In Fedora, the file itself points to the node, through its properties (which are accessed through the document at /fcr:metadata).

Taking the Media's JSONLD serialization, it would say: (using REALLY LAZY shorthand)

<DRUPAL/media/1> pcdm:fileOf <DRUPAL/node/1>,
      schema:sameAs <DRUPAL/_flysystem/fedora/stuff/filename> .

This does not make a lot of sense because the media is not, semantically, "the same as" the file nor "a file of" the node.

It's only when in Fedora, and the subject is swapped out for the Fedora Binary, that it makes sense:

<FEDORA/fcrepo/rest/stuff/filename>  pcdm:fileOf <DRUPAL/node/1> ,
     schema:sameAs <DRUPAL/_flysystem/fedora/stuff/filename> . 

that it makes semantic sense. I don't want to put too much weight in the Media's jsonld here because it's misleading as LD, but it works as the in-transit-to-fedora construct.

Here's a diagram of the JSONLD of a node, media, and file, along with the fedora objects and their types (both according to HTTP headers, and to the documents they delivered).
Islandora-and-fedora-jsonld-2019-05-09

Point is, I agree that the serialization would make semantic sense if you make the main Subject (id) the URI of the file in Drupal rather than the Media in drupal. (though it already contains a schema:sameAs to that effect, so maybe . As far as I can tell (using a CLAW instance that is some days out of date) the original problematic triple, <uri_of_media> iana:describes <uri_of_file>, is not present, so i'm not sure what needs to be done for this issue.

All 17 comments

@dannylamb is that something you want to have fixed as such? Since Media entities are not file entities, not sure how to handle that. I would have guessed that media entities were a way of managing nicely images, etc, but the real Non RDF Source payload would come from one of the file entities connected to them.
Would that not leave all the properties that are part of the media entity but not of (one of) the files that are part of the media entity out?

jsonld module handles, or at least would like to handle this, as generic and ldp-less as possible: Says the jsonld module 馃樅

@DiegoPino That's precisely the conundrum. The Drupal and LDP models a bit at odds. So long as we're ok with the fact that the JSONLD we generate for Media has the wrong subject w/r/t LDP, then it's reasonable to do this conversion elsewhere.

And FWIW I'm totally ok with that.

In my understanding, the Media entity in Drupal is "a wrapper for the file" and any fields/values on a Media entity - for example: ebucore:height is 2394px, or mimetype is image/tiff, are semantically the properties of the _file_. It's just that file entities, in Drupal, can't have fields attached. So the fields go on the Media. Any other fields or properties you attach to a Media should, I think, describe the file proper (otherwise put it on the node).

The Media contains the same information, and is analogous to, the /fcr:metadata document describing the binary. However, it's different structurally - in Drupal it's "the middleman" tying a node to a file. In Fedora, the file itself points to the node, through its properties (which are accessed through the document at /fcr:metadata).

Taking the Media's JSONLD serialization, it would say: (using REALLY LAZY shorthand)

<DRUPAL/media/1> pcdm:fileOf <DRUPAL/node/1>,
      schema:sameAs <DRUPAL/_flysystem/fedora/stuff/filename> .

This does not make a lot of sense because the media is not, semantically, "the same as" the file nor "a file of" the node.

It's only when in Fedora, and the subject is swapped out for the Fedora Binary, that it makes sense:

<FEDORA/fcrepo/rest/stuff/filename>  pcdm:fileOf <DRUPAL/node/1> ,
     schema:sameAs <DRUPAL/_flysystem/fedora/stuff/filename> . 

that it makes semantic sense. I don't want to put too much weight in the Media's jsonld here because it's misleading as LD, but it works as the in-transit-to-fedora construct.

Here's a diagram of the JSONLD of a node, media, and file, along with the fedora objects and their types (both according to HTTP headers, and to the documents they delivered).
Islandora-and-fedora-jsonld-2019-05-09

Point is, I agree that the serialization would make semantic sense if you make the main Subject (id) the URI of the file in Drupal rather than the Media in drupal. (though it already contains a schema:sameAs to that effect, so maybe . As far as I can tell (using a CLAW instance that is some days out of date) the original problematic triple, <uri_of_media> iana:describes <uri_of_file>, is not present, so i'm not sure what needs to be done for this issue.

@dannylamb so I did this and it has no effect. I am guessing because you are only grabbing the media elements graph and by moving this triple from the media -> file to file -> media it is outside that graph. So its the same as removing it.

Whoa! missed the @rosiel comment. reading now.

Okay, I agree with @rosiel above. This is not working due to our serializing method, but even if it did it wouldn't necessarily make sense.

A simple way to add this (not that it makes sense) would be to replace iana:describes with iana:describedby and make both the subject and object the media element.

So <drupal/media/2> iana:describes <drupal/file/3> becomes <drupal/media/2> iana:describedby <drupal/media/2>, again this doesn't make sense.

But in Fedora it would become
<fedora/NonRdfSource/1234-5678> iana:describedby <drupal/media/2>.

I'm not sure its worth the hassle though.

This issue is from 2017, and I don't see any iana:describes in the graph returned from a media in 2019 - I think it was removed a while ago.

Using curl, I see it in the header for /media/x?_format=jsonld. Link: <http://DOMAIN/_flysystem/fedora/2019-05/IMG_0606.JPG>; rel="describes"; type="image/jpeg". This statement is ... accurate, no?

To rewrite the original issue to reflect current behaviour:

Right now a Media entity, when serialized, has itself as the subject and _contains triples_ of the form <uri_of_media> ebucore:height '3024', but really it needs to be <uri_of_file> ebucore:height '3024' to be _semantically accurate_. _Also, the existence of a 'media document' describing the file is_ in line with how Fedora generates a LDP-RS for every LDP-NR that gets created, _since even in its HTTP headers it claims it_ iana:describes <uri_of_file>.

@rosiel That link header is indeed accurate. As is your summary about the subject uri. The missing piece we should add on top is an iana:descibedby with the media's url in the RDF. That would tie it up all nicely.

To stick with your example, something like this in a jsonld GET response for a media

<uri_of_file> ebucore:height '3024'
<uri_of_file> iana:describedby <uri_of_media>

with a rel="describes" link header pointing to <uri_of_file>.

Ok, here's what we have now

{
   "@graph":[
      {
         "@id":"http:\/\/localhost:8000\/media\/1",
         "@type":[
            "http:\/\/pcdm.org\/models#File",
            "http:\/\/pcdm.org\/use#OriginalFile"
         ],
         "http:\/\/purl.org\/dc\/terms\/title":[
            {
               "@value":"Original Image",
               "@language":"en"
            }
         ],
         "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#label":[
            {
               "@value":"Original Image",
               "@language":"en"
            }
         ],
         "http:\/\/schema.org\/author":[
            {
               "@id":"http:\/\/localhost:8000\/user\/1"
            }
         ],
         "http:\/\/schema.org\/dateCreated":[
            {
               "@value":"2019-05-15T19:21:42+00:00",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
            }
         ],
         "http:\/\/schema.org\/dateModified":[
            {
               "@value":"2019-05-15T19:22:12+00:00",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
            }
         ],
         "http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#height":[
            {
               "@value":"1018",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#int"
            }
         ],
         "http:\/\/pcdm.org\/models#fileOf":[
            {
               "@id":"http:\/\/localhost:8000\/node\/1"
            }
         ],
         "http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#hasMimeType":[
            {
               "@value":"image\/jpeg",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#string"
            }
         ],
         "http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#width":[
            {
               "@value":"904",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#int"
            }
         ],
         "http:\/\/schema.org\/sameAs":[
            {
               "@value":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg"
            }
         ]
      },
      {
         "@id":"http:\/\/localhost:8000\/user\/1",
         "@type":[
            "http:\/\/schema.org\/Person"
         ]
      },
      {
         "@id":"http:\/\/localhost:8000\/node\/1",
         "@type":[
            "http:\/\/pcdm.org\/models#Object"
         ]
      }
   ]
}

Feels like we've batted around two ways of doing this

  1. Just change schema:sameAs to iana:describes, and then process the rest to be more fedora/ldp-ish in Milliner. This is done with a simple config change using Context, and would result in the following from Drupal (editied for brevity):
{
   "@graph":[
      {
         "@id":"http:\/\/localhost:8000\/media\/1",
         "@type":[
            "http:\/\/pcdm.org\/models#File",
            "http:\/\/pcdm.org\/use#OriginalFile"
         ],
         "http:\/\/pcdm.org\/models#fileOf":[
            {
               "@id":"http:\/\/localhost:8000\/node\/1"
            }
         ],
         "http:\/\/www.iana.org\/assignments\/relation\/describes":[
            {
               "@value":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg"
            }
         ]
         ...
      },
      ...
   ]
}

which isn't 100% over-the-top semantically correct, but is actually the more intuitive solution to folks coming from outside the ldp sphere. We'd then further process it in Crayfish/Alpaca to have it make sense in fedora and the triplestore.

  1. We replace the @id to be that of the file, and use iana:describedby to reference the media. This would look like (again, edited for brevity):
{
   "@graph":[
      {
         "@id":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg",
         "@type":[
            "http:\/\/pcdm.org\/models#File",
            "http:\/\/pcdm.org\/use#OriginalFile"
         ],
         "http:\/\/pcdm.org\/models#fileOf":[
            {
               "@id":"http:\/\/localhost:8000\/node\/1"
            }
         ],
         "http:\/\/www.iana.org\/assignments\/relation\/describedby":[
            {
               "@value":"http:\/\/localhost:8000\/media\/1"
            }
         ]
         ...
      },
      ...
   ]
}

This is the most semantically correct, but may come off as strange to the uninitiated. It would require less processing to get into the right shape for Fedora and the Triplestore, though.

No. 2 makes sense.
No. 1 would be a regression back into the semantic flaw from 2017 that caused this issue to be created.

@rosiel @whikloj PRs are up^^

Testing instructions are in https://github.com/Islandora-CLAW/islandora/pull/136

@rosiel your diagram in https://github.com/Islandora-CLAW/CLAW/issues/662#issuecomment-491408492 is epic. Mind if I use it in my Open Repositories and iCamp slide decks, with full and genuflecting attribution?

@mjordan Yes, but no genuflecting please, and it was a product of collaborating with @elizoller.

[edit: also, unless things change by then, please include the fileOf arrow that gets crossed out and redirected to Drupal. ;) ]

OK, will nix the genuflecting, cocredit @elizoller, and note updates.

馃槂

These might be right?
Islandora 8 - Drupal Node and Fedora Resource - Service File
Islandora 8 - Drupal Node and Fedora Resource - Original File

@elizoller++

Was this page helpful?
0 / 5 - 0 ratings

Related issues

acoburn picture acoburn  路  5Comments

dannylamb picture dannylamb  路  4Comments

jonathangreen picture jonathangreen  路  3Comments

ruebot picture ruebot  路  4Comments

Natkeeran picture Natkeeran  路  3Comments