Documentation: Use case: Equivalent of "namespacing" Fedora to accommodate multisites

Created on 12 Oct 2016 · 37Comments · Source: Islandora/documentation

Was told to create an issue for this, apologies if duplicated.

| Title (Goal) | Ability to partition Islandoras on the same Fedora |
| --- | --- |
| Primary Actor | Repository Admin, Sysadmin |
| Scope | Islandora Site Architecture |
| Level | Medium? |
| Story | In order to have less work to do maintaining separate Fedora stacks, I as a sysadmin/repository admin would like the ability to use a Drupal multisite with separate Islandoras on the same Fedora. As a repo admin who provides sites for others, I have some client sites that need to be given permission to manage their own objects but not objects that belong to a different namespace/site. We sometimes need to present certain (select) objects as a whole 'site' of its own with different associated themes or framing Drupal content. (this looks toward a related issue/use case of 'can i set up an exhibit of select content' ) I am worried about this in the context of two-way sync, which would push all fedora content into all of my islandora sites as Drupal content. |

Multi-tenancy use case

Source

rosiel

👍2

All 37 comments

@rosiel cool! This really needs to be discussed. My first guess would be: we could decide what to sync, where to sync based on a given, arbitrary and configurable predicate. But this needs to be explored further, mostly because if sync is happening from fedora to Drupal 8 (resource was not originated by islandora for example) then that sync utility would need a reverse map for this, something like:
ns1:predicateA == value1 -> sync with URL1
And that map would not be stored in Drupal but in some config available to camel.

or we could define that URL as an rdf property?

Many many ways to define the same

Thanks a lot!!

DiegoPino on 12 Oct 2016

I think the easiest way to deal with this would be structure. From the repository root, you'd need to have seperate containers for each multisite. Then you could re-index per container. That pattern could be applied to multitenancy with appropriate authz.

dannylamb on 13 Oct 2016

@dannylamb you mean LDP based? i can see some scalability issues with that, in specific if we use the default PID minter which is handy to avoid a unbalanced tree. Also makes filtering in a triple store kinda complex (like show me all objects that are descendant of.. what if that descendant of is 5 steps with different predicates). Good talk for next CLAW call!

DiegoPino on 13 Oct 2016

Have I mentioned how much I hate that semantics and storage are jumbled up in Fedora?

dannylamb on 13 Oct 2016

😄1

"That pattern could be applied to multitenancy with appropriate authz." -> this sounds cool but is way over my head. What should I read to fill in my blanks?

rosiel on 14 Oct 2016

@rosiel I think this sounds more complex than it is. In my mind @dannylamb is proposing a Fedora 4 repo structure of

Fedora 4 root
|- /site1
|   |- /objects in site 1 
|
|- /site2
     |- /objects in site 2

Then you can set authorization based on the root level elements, ie. Bob is admin of site1 and Jane is admin of site2. But neither can access the other's repository contents.

But @DiegoPino is right that this might have issues of unbalanced trees. Perhaps we should pull @ruebot in and have him do one of his performance and scaling massive ingests to see how it works if you create 3-4 root level objects and ingest a ratio of objects into each.

Fedora 4 root
|- /site1 (ingest objects)
|
|- /site2 (ingest 1/2 as many as site1)
|
|- /site3 (ingest 1/4 as many as site1)
|
|- /site4 (ingest 1/8 as many as site1)

and see how ingest and response times go? This test would be directly on Fedora and so could avoid any issues of PHP/Drupal in it's timing.

whikloj on 17 Oct 2016

There is no (performance) problem at all with an unbalanced tree, at least from the Fedora side. The problem is having too many children of a single node/resource.

ajs6f on 19 Oct 2016

That's what the pair-tree PID minter protects against.

ajs6f on 19 Oct 2016

@ajs6f when you say "too many children of a single node", do you mean just having a tonne of children under a single node, or do the children have to direct children of the single node?
ie.

<root node>
      |- <child 1>
             |- <sub 1>
             |     |- <sub sub 1>
             |- <sub 2>
             |     |- <sub sub 2>

versus

<root node>
      |- <child 1>
             |- <sub 1>
             |- <sub 2>
             |- <sub 3>
             |- <sub 4>

whikloj on 20 Oct 2016

@whikloj i guess @ajs6f means direct children. Or any type of tree would end being a disaster.

DiegoPino on 20 Oct 2016

Yes, as @DiegoPino says. it's too many immediate/direct children that are a problem.

ajs6f on 20 Oct 2016

In fact, if you can guarantee by other means (particularly by controlling your own id minting) that you won't stick too many children under a single parent, then you _shouldn't_ use the hierarchy builder minter. You should just use PUT and stick things wherever it makes sense.

ajs6f on 20 Oct 2016

👍1

So I have two concerns here:

How many children is too many?
As Islandora CLAW is attempting a lower barrier to entry, it might be a lot of work to create an id minting strategy for all use cases.

whikloj on 20 Oct 2016

I do not know. It's a good question. I'd take it to #fcrepo or the email list.
Yes, but I think you probably can, actually, _if_ you are controlling the Fedora IDs (as opposed to Drupal IDs). But I don't claim to fully understand the ID management in the current architecture. Maybe this is a good topic for a CLAW call? (I would be happy to join.)

ajs6f on 20 Oct 2016

It sounds like you are suggesting that the hierarchical structure that is built into Fedora 4 Objects would be ˆsometimesˆ meaningful and ˆsometimesˆ arbitrary. Does this sound like a solid plan? (I am not being sarcastic; I actually don't know). Would it be better to include an extra, hereditary predicate and let pid-minters populate the hierarchy for optimal storage/retrieval?

rosiel on 20 Oct 2016

No, what I am telling you is that the hierarchical structure that is built into Fedora 4 Objects _is now_ sometimes meaningful and sometimes arbitrary if you use the hierarchical ID minter. I'm suggesting you decide whether you can _avoid_ that. I don't know what the phrase "hereditary predicate" means.

ajs6f on 20 Oct 2016

On the Islandora Metadata Interest Group, a discussion was started on OAI-PMH support. In addition to some wanted features, the idea of namespaces came up. Our use case is different from that of @rosiel and wanted to add it here.

| Use Type | Description |
| ------------- | ------------- |
| Title (Goal) | Ability to distinguish and/or assign content to multiple institutions |
| Primary Actor | Sysadmin, Repository Admin, Repository curators |
| Scope | Islandora Site Architecture |
| Level | Medium? |
| Story | Currently, the Connecticut Digital Archive works with over 40 institutions who add and manage content in the repository and in multiple sites. To distinguish one institutions' content from another, CTDA implements namespaces. Each institution has a namespace that is a range. For example, 20002-29999 is the namespace range for UConn Archives & Special Collections. The reason for this is that UConn ASC can have general content in the 20002 namespace, research data in 20003, and university records in 20004. Each institution has such a range where the first one or two numbers never change. We not only use namespaces to distinguish content from different institutions and within an institution different types of content but also namespaces are used on various sites. For example, we have a site for UConn ASC and CT State Library. For CTDA, we really need an easy way to ensure that institutions and users can quickly determine if the content is theirs. Namespaces allow us to do that especially as they appear in the PID, in the url, etc. Going forward we need a way to ensure these institutional distinctions remain in place and can be continued in such a way that non-technical volunteers are easily able to assign content to a particular institution. |

uconnjeustis on 4 Jan 2017

@uconnjeustis can you create a separate issue for this if this is a separate use case? Also, I think it would be a really good idea to talk this out on a future CLAW call, so please do not hesitate in adding it to the agenda, and attending the meeting.

ruebot on 4 Jan 2017

Not a CLAW-specific issue, either. Might be worth bringing up on a Fedora call-- some documentation of best practices would be good.

ajs6f on 4 Jan 2017

👍1

My use case as it's slightly different though related to this issue is now in a Islandora-CLAW/CLAW-478. Please direct responses there. Thanks

I just came back from vacation and think I missed the last CLAW meeting. I'll check the schedule and try to hope on the next one.

uconnjeustis on 6 Jan 2017

@uconnjeustis ++

ajs6f on 6 Jan 2017

Should the current migration sprint account for how to make Fedora 3.x PID namespaces migrate over losslessly? Just askin'. Related issue: #822.

mjordan on 27 Aug 2018

I would think mapping PID namespaces to LDP containers would be best.

ajs6f on 27 Aug 2018

👍1

I think organizing objects by stuffing them in a container per namespace would separate them out nicely if you really want to solidify the distinction. FWIW, so long a we stuff the PID on a field somewhere, we can then query on it to do things like "Get me all objects who were in namespace X".

dannylamb on 27 Aug 2018

Do containers suffer from the many-direct-children scalability issue discussed above?

mjordan on 27 Aug 2018

Fedora suffers from that problem. There's nothing inherent in LDP that causes that problem, but to the extent that you're committed to Fedora, you would have to deal with it.

ajs6f on 27 Aug 2018

Worth mentioning here: It's unhealthy to think in a D8 context/CLAW about multi sites the way they were applied in Islandora 7.x. Multi sites, by definition, imply different DB tables (not speaking about domain access module), means one site can not access other site's entities, which makes splitting/or better said, reusing nodes/entities from one site to another, extremely complex, not recommend, or even impossible without hacking (now speaking about the (domain access module)[https://www.drupal.org/project/domain].
For Islandora 7.x that was not an issue since no DO were ever stored in Drupal, all live read from o'l fedora 3.
Opposite case here. Pipe goes in a single direction. So really "namespacing" at least for that purpose makes less sense. I would say, if UI side "separation" is needed or desired, then probably simple taxonomy work like a generic tag system (this object belongs to this group) plus awareness of that in each view that lists/displays/context module stuff should suffice. The moment you expose/pop-up storage/backend implications like LDP containment and fedora paths and minting, and depend on them on a system that never ever accesses directly or gives control over that like CLAW, you are opening a pandora box or signing a contract you won't be able to keep in the long term.

FYI: There has been discussions about the whole multi site approach a lot here https://www.drupal.org/project/drupal/issues/2306013

DiegoPino on 27 Aug 2018

FWIW, we have been using namespace prefixes in D7 to accomplish multisite without actually using multisite. We serve a consortium of ~20 members, each with its own namespace prefix; using this scheme lets us support the idea of 'sub-institutions' (to arbitrary depth, in theory).

I'm glad to share more, and at the very least, we have plenty of data like this that we could use to test a migration along the lines proposed above

We don't use a RELS-EXT to define the relationship, so every collection is really just a child of root.

~~~

root

~~~

Effectively, however, this flat example represents two top-level institutions, lsu and latech, and one subinstitution of lsu, lsu-sc:

~~~
root

lsu-*
- lsu-sc-*
latech-*
~~~

jpeak5 on 27 Aug 2018

If we're storing the 7.x PID as per #822, and we're creating taxonomies as described in #888, maybe we should provide an option to create and populate a taxonomy of PID namespaces and assign the relevant value to each new CLAW node on the migration fly. That way, we get the ability immediately after migration to do some of the things in CLAW we were doing in the source 7.x with PID namespaces.

I'm not suggesting we do this during the migration sprint, but maybe after. Might be a good first issue for someone (like me but it doesn't necessarily have to be me) to take on.

mjordan on 29 Aug 2018

Linking to https://github.com/Islandora-CLAW/CLAW/issues/926

dannylamb on 20 Sep 2018

Now that migrate_7x_claw migrates the 7.x object's PID to the corresponding D8 node's field_pid, we can get the 7.x object's namespace from that and do stuff with it. This could be handled with a Context Condition that parses out the namespace from the string stored in field_pid.

Related issue: #822.

mjordan on 22 Dec 2018

Following from my previous comment, I've written a Context condition plugin will be useful for objects migrated from 7.x. It tests the namespace part of a PID in a D8 islandora_object node's field_pid field, which we now get in migrations using https://github.com/Islandora-Devops/migrate_7x_claw.

Here's the configuration form of a context that uses it, with a reaction (which is part of the core Context module) being to use the Bartik theme:

context

Here's a screenshot of a node that has one of the registered namespaces:

node

And a screenshot of a node that does not have one of the registered namespaces (i.e., reaction isn't executed):

node2

Currently, we don't have an context reactions that would be useful in a "multisite" setup (just to bring this back to @rosiel's original use case), but it would be possible to write some reactions that replicated 7.x multisite behavior.

If people think this Context condition will be useful, I can open a PR against https://github.com/Islandora-CLAW/islandora to add it.

mjordan on 23 Dec 2018

Just throwing these here in case they are of use later.

whikloj on 14 May 2019

Seeing @bondjimbond's awesome work on multitenancy, I would be happy to close this ticket as the multitenancy use case is more thoroughly expanded in #1300, and that sounds like a more advisable set up for multitenant systems. Namespacing was never really the issue; it was more about dividing up content.

To summarize the output from this thread:

The cookbook for a multitenant Islandora, described in #1300, involves separate Drupals per site. They write to their own "root" nodes within a Fedora.
Related to #822, you are migrating objects from 7.x and you want to do different things with them in Drupal depending on namespace, @mjordan wrote a context condition NodeHadNamespace that uses the value in field_pid (based on the 7.x migrate module). [note: If you choose to migrate your content in a different way you have the freedom to distinguish your islandora content by other means - such as taxonomy terms or content types - which also work as context conditions. Context allows you to do a lot, but does not do access control - see below]
Providing different users access to different content in a single Drupal installation is kinda tricky, but some methods mentioned above (which may also apply to #478) are:
-- Domain Access appears to let you "split" an islandora site based on domains, like site1.example.com, site2.example.com, or examplesite3.com.
-- Workspace allows you to have a "staging" and a "live" workspace so you can create content (nodes only, not media, as of Feb 2019) and easily deploy it when ready.
-- Permissions by Term is another solution - from Issue #823
-- Group is a heavy but thorough method of assigning content and users to different groups.

Thank you @DiegoPino @dannylamb @whikloj @ajs6f @uconnjeustis @mjordan @jpeak5 @ruebot @Natkeeran for your work on this thread.

So... we good to close this thread?

rosiel on 13 Feb 2020

@rosiel awesome summary and relating of issues. One thing I'd like to offer though:

Context allows you to do a lot, but does not do access control

That's not necessarily true. A while back I put together https://github.com/mjordan/ip_range_access specifically use Context for access control. I'd love to get some additional eyes on it. I wrote that module to replace a capability of the 7.x Islandora Context module that we use to control access to some licensed vendor content we host in our Islandora repo, and that we make accessible from off campus via Ezproxy.

mjordan on 13 Feb 2020

I think there's a reason that Contexts doesn't come with a "deny access" reaction. It works on the node or media's page. This does not carry through to Views, blocks, or other ways of exposing content. So if you're using this, be very careful.

rosiel on 13 Feb 2020

👍1

@rosiel thanks for the heads up. We haven't tested that module for those things yet but certainly will.

mjordan on 13 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Investigate drupal_ti

ruebot · 4Comments

Create 'Teaser' view mode for Fedora resources

dannylamb · 4Comments

HOMEWORK: Explore Drupal 8 UI

manez · 5Comments

Port "Islandora Middleware Services API" Google Doc to markdown -> docs directory

ruebot · 3Comments

Create mascot for Alpaca

ruebot · 4Comments