Was told to create an issue for this, apologies if duplicated.
| Title (Goal) | Ability to partition Islandoras on the same Fedora |
| --- | --- |
| Primary Actor | Repository Admin, Sysadmin |
| Scope | Islandora Site Architecture |
| Level | Medium? |
| Story | In order to have less work to do maintaining separate Fedora stacks, I as a sysadmin/repository admin would like the ability to use a Drupal multisite with separate Islandoras on the same Fedora. As a repo admin who provides sites for others, I have some client sites that need to be given permission to manage their own objects but not objects that belong to a different namespace/site. We sometimes need to present certain (select) objects as a whole 'site' of its own with different associated themes or framing Drupal content. (this looks toward a related issue/use case of 'can i set up an exhibit of select content' ) I am worried about this in the context of two-way sync, which would push all fedora content into all of my islandora sites as Drupal content. |
@rosiel cool! This really needs to be discussed. My first guess would be: we could decide what to sync, where to sync based on a given, arbitrary and configurable predicate. But this needs to be explored further, mostly because if sync is happening from fedora to Drupal 8 (resource was not originated by islandora for example) then that sync utility would need a reverse map for this, something like:
ns1:predicateA == value1 -> sync with URL1
And that map would not be stored in Drupal but in some config available to camel.
or we could define that URL as an rdf property?
Many many ways to define the same
Thanks a lot!!
I think the easiest way to deal with this would be structure. From the repository root, you'd need to have seperate containers for each multisite. Then you could re-index per container. That pattern could be applied to multitenancy with appropriate authz.
@dannylamb you mean LDP based? i can see some scalability issues with that, in specific if we use the default PID minter which is handy to avoid a unbalanced tree. Also makes filtering in a triple store kinda complex (like show me all objects that are descendant of.. what if that descendant of is 5 steps with different predicates). Good talk for next CLAW call!
Have I mentioned how much I hate that semantics and storage are jumbled up in Fedora?
"That pattern could be applied to multitenancy with appropriate authz." -> this sounds cool but is way over my head. What should I read to fill in my blanks?
@rosiel I think this sounds more complex than it is. In my mind @dannylamb is proposing a Fedora 4 repo structure of
Fedora 4 root
|- /site1
| |- /objects in site 1
|
|- /site2
|- /objects in site 2
Then you can set authorization based on the root level elements, ie. Bob is admin of site1 and Jane is admin of site2. But neither can access the other's repository contents.
But @DiegoPino is right that this might have issues of unbalanced trees. Perhaps we should pull @ruebot in and have him do one of his performance and scaling massive ingests to see how it works if you create 3-4 root level objects and ingest a ratio of objects into each.
Like
Fedora 4 root
|- /site1 (ingest objects)
|
|- /site2 (ingest 1/2 as many as site1)
|
|- /site3 (ingest 1/4 as many as site1)
|
|- /site4 (ingest 1/8 as many as site1)
and see how ingest and response times go? This test would be directly on Fedora and so could avoid any issues of PHP/Drupal in it's timing.
There is no (performance) problem at all with an unbalanced tree, at least from the Fedora side. The problem is having too many children of a single node/resource.
That's what the pair-tree PID minter protects against.
@ajs6f when you say "too many children of a single node", do you mean just having a tonne of children under a single node, or do the children have to direct children of the single node?
ie.
<root node>
|- <child 1>
|- <sub 1>
| |- <sub sub 1>
|- <sub 2>
| |- <sub sub 2>
versus
<root node>
|- <child 1>
|- <sub 1>
|- <sub 2>
|- <sub 3>
|- <sub 4>
@whikloj i guess @ajs6f means direct children. Or any type of tree would end being a disaster.
Yes, as @DiegoPino says. it's too many immediate/direct children that are a problem.
In fact, if you can guarantee by other means (particularly by controlling your own id minting) that you won't stick too many children under a single parent, then you _shouldn't_ use the hierarchy builder minter. You should just use PUT and stick things wherever it makes sense.
So I have two concerns here:
It sounds like you are suggesting that the hierarchical structure that is built into Fedora 4 Objects would be 藛sometimes藛 meaningful and 藛sometimes藛 arbitrary. Does this sound like a solid plan? (I am not being sarcastic; I actually don't know). Would it be better to include an extra, hereditary predicate and let pid-minters populate the hierarchy for optimal storage/retrieval?
No, what I am telling you is that the hierarchical structure that is built into Fedora 4 Objects _is now_ sometimes meaningful and sometimes arbitrary if you use the hierarchical ID minter. I'm suggesting you decide whether you can _avoid_ that. I don't know what the phrase "hereditary predicate" means.
On the Islandora Metadata Interest Group, a discussion was started on OAI-PMH support. In addition to some wanted features, the idea of namespaces came up. Our use case is different from that of @rosiel and wanted to add it here.
| Use Type | Description |
| ------------- | ------------- |
| Title (Goal) | Ability to distinguish and/or assign content to multiple institutions |
| Primary Actor | Sysadmin, Repository Admin, Repository curators |
| Scope | Islandora Site Architecture |
| Level | Medium? |
| Story | Currently, the Connecticut Digital Archive works with over 40 institutions who add and manage content in the repository and in multiple sites. To distinguish one institutions' content from another, CTDA implements namespaces. Each institution has a namespace that is a range. For example, 20002-29999 is the namespace range for UConn Archives & Special Collections. The reason for this is that UConn ASC can have general content in the 20002 namespace, research data in 20003, and university records in 20004. Each institution has such a range where the first one or two numbers never change. We not only use namespaces to distinguish content from different institutions and within an institution different types of content but also namespaces are used on various sites. For example, we have a site for UConn ASC and CT State Library. For CTDA, we really need an easy way to ensure that institutions and users can quickly determine if the content is theirs. Namespaces allow us to do that especially as they appear in the PID, in the url, etc. Going forward we need a way to ensure these institutional distinctions remain in place and can be continued in such a way that non-technical volunteers are easily able to assign content to a particular institution. |
@uconnjeustis can you create a separate issue for this if this is a separate use case? Also, I think it would be a really good idea to talk this out on a future CLAW call, so please do not hesitate in adding it to the agenda, and attending the meeting.
Not a CLAW-specific issue, either. Might be worth bringing up on a Fedora call-- some documentation of best practices would be good.
My use case as it's slightly different though related to this issue is now in a Islandora-CLAW/CLAW-478. Please direct responses there. Thanks
I just came back from vacation and think I missed the last CLAW meeting. I'll check the schedule and try to hope on the next one.
@uconnjeustis ++
Should the current migration sprint account for how to make Fedora 3.x PID namespaces migrate over losslessly? Just askin'. Related issue: #822.
I would think mapping PID namespaces to LDP containers would be best.
I think organizing objects by stuffing them in a container per namespace would separate them out nicely if you really want to solidify the distinction. FWIW, so long a we stuff the PID on a field somewhere, we can then query on it to do things like "Get me all objects who were in namespace X".
Do containers suffer from the many-direct-children scalability issue discussed above?
Fedora suffers from that problem. There's nothing inherent in LDP that causes that problem, but to the extent that you're committed to Fedora, you would have to deal with it.
Worth mentioning here: It's unhealthy to think in a D8 context/CLAW about multi sites the way they were applied in Islandora 7.x. Multi sites, by definition, imply different DB tables (not speaking about domain access module), means one site can not access other site's entities, which makes splitting/or better said, reusing nodes/entities from one site to another, extremely complex, not recommend, or even impossible without hacking (now speaking about the (domain access module)[https://www.drupal.org/project/domain].
For Islandora 7.x that was not an issue since no DO were ever stored in Drupal, all live read from o'l fedora 3.
Opposite case here. Pipe goes in a single direction. So really "namespacing" at least for that purpose makes less sense. I would say, if UI side "separation" is needed or desired, then probably simple taxonomy work like a generic tag system (this object belongs to this group) plus awareness of that in each view that lists/displays/context module stuff should suffice. The moment you expose/pop-up storage/backend implications like LDP containment and fedora paths and minting, and depend on them on a system that never ever accesses directly or gives control over that like CLAW, you are opening a pandora box or signing a contract you won't be able to keep in the long term.
FYI: There has been discussions about the whole multi site approach a lot here https://www.drupal.org/project/drupal/issues/2306013
FWIW, we have been using namespace prefixes in D7 to accomplish multisite without actually using multisite. We serve a consortium of ~20 members, each with its own namespace prefix; using this scheme lets us support the idea of 'sub-institutions' (to arbitrary depth, in theory).
I'm glad to share more, and at the very least, we have plenty of data like this that we could use to test a migration along the lines proposed above
We don't use a RELS-EXT to define the relationship, so every collection is really just a child of root.
~~~
root
~~~
Effectively, however, this flat example represents two top-level institutions, lsu and latech, and one subinstitution of lsu, lsu-sc:
~~~
root
If we're storing the 7.x PID as per #822, and we're creating taxonomies as described in #888, maybe we should provide an option to create and populate a taxonomy of PID namespaces and assign the relevant value to each new CLAW node on the migration fly. That way, we get the ability immediately after migration to do some of the things in CLAW we were doing in the source 7.x with PID namespaces.
I'm not suggesting we do this during the migration sprint, but maybe after. Might be a good first issue for someone (like me but it doesn't necessarily have to be me) to take on.
Now that migrate_7x_claw migrates the 7.x object's PID to the corresponding D8 node's field_pid, we can get the 7.x object's namespace from that and do stuff with it. This could be handled with a Context Condition that parses out the namespace from the string stored in field_pid.
Related issue: #822.
Following from my previous comment, I've written a Context condition plugin will be useful for objects migrated from 7.x. It tests the namespace part of a PID in a D8 islandora_object node's field_pid field, which we now get in migrations using https://github.com/Islandora-Devops/migrate_7x_claw.
Here's the configuration form of a context that uses it, with a reaction (which is part of the core Context module) being to use the Bartik theme:

Here's a screenshot of a node that has one of the registered namespaces:

And a screenshot of a node that does not have one of the registered namespaces (i.e., reaction isn't executed):

Currently, we don't have an context reactions that would be useful in a "multisite" setup (just to bring this back to @rosiel's original use case), but it would be possible to write some reactions that replicated 7.x multisite behavior.
If people think this Context condition will be useful, I can open a PR against https://github.com/Islandora-CLAW/islandora to add it.
Just throwing these here in case they are of use later.
Seeing @bondjimbond's awesome work on multitenancy, I would be happy to close this ticket as the multitenancy use case is more thoroughly expanded in #1300, and that sounds like a more advisable set up for multitenant systems. Namespacing was never really the issue; it was more about dividing up content.
To summarize the output from this thread:
field_pid (based on the 7.x migrate module). [note: If you choose to migrate your content in a different way you have the freedom to distinguish your islandora content by other means - such as taxonomy terms or content types - which also work as context conditions. Context allows you to do a lot, but does not do access control - see below] Thank you @DiegoPino @dannylamb @whikloj @ajs6f @uconnjeustis @mjordan @jpeak5 @ruebot @Natkeeran for your work on this thread.
So... we good to close this thread?
@rosiel awesome summary and relating of issues. One thing I'd like to offer though:
Context allows you to do a lot, but does not do access control
That's not necessarily true. A while back I put together https://github.com/mjordan/ip_range_access specifically use Context for access control. I'd love to get some additional eyes on it. I wrote that module to replace a capability of the 7.x Islandora Context module that we use to control access to some licensed vendor content we host in our Islandora repo, and that we make accessible from off campus via Ezproxy.
I think there's a reason that Contexts doesn't come with a "deny access" reaction. It works on the node or media's page. This does not carry through to Views, blocks, or other ways of exposing content. So if you're using this, be very careful.
@rosiel thanks for the heads up. We haven't tested that module for those things yet but certainly will.