Design and introduce a toolset allowing author disambiguation.
ORCiDs are good for production use where authors manage the workflow, but aren't comprehensive...
If e.g. we re-introduce the Browse By Author into OJS 3.x, we'll need to implement this.
See https://github.com/pkp/pkp-lib/issues/2818#issuecomment-340145300 for related discussion.
We could of course connect the users table to the author metadata.
The author metadata would work in a way like versioning of the user metadata but there would always be a user account connected to the author metadata and browse by author would work through this. This from the top of my head.
Agreed, I think an "internal" user_id to author_id link would be useful. Your proposal would quickly solve the issue for submitting authors, but not secondary authors, and we'd have to be sure we handled the case where a user was submitting "on behalf of" someone else to avoid creating bad linkages.
I think it's also worth approaching this in a way that supports other IDs, e.g. I think there's a national researcher ID in Germany. I'd imagine a toolset allowing us to add potentially several IDs, e.g. "OJS user ID" (probably mediated by something more human-friendly like an email or username, which is also unique) alongside others, with some internal checks to make sure that they were all mutually consistent.
One option would be an universal OJS user account that is not tied to a single installation of OJS, but would work in any OJS installation. And of course the option of connecting your ORCID account to that account. But that would be an immense change.
Aside from that, we could of course create a "ghost" user account for the secondary authors that the editors could use to connect submissions with similar author metadata. This would be something that just the editors would handle.
This "ghost" account would then be activated, if a user with the same email would register an account. And after registration the system would ask "Are these your articles".
I like the idea of linking to universal or 3rd party author ids from an OJS author profile. The German example could be author identifiers in the Integrated Authority File (GND) (http://www.dnb.de/EN/Standardisierung/GND/gnd.html). But there's also commercial ones like ResearcherID etc., but then authors might also like to link to their Twitter, ResearchGate, GitHub etc. profiles.
But then again: Mostly important for authors w/o ORCID iD. Authors with an actively maintained ORCID profile probably link to all their other identifiers from there.
Thanks, @mtub. I'm not sure how to help users (both editors and authors) strike a good balance -- if we encouraged everyone to add as many IDs as possible (email, Facebook, Twitter, Github, ResearchGate, ORCiD, GND, ResearcherID, Scopus Author ID, ...), it'll be chaos. If authors provide only a partial set of the available IDs (think 50% providing GND and the other 50% providing Scopus Author ID), or if there are conflicting cardinalities (two authors share the same github but different whatever elses), then we'll be no further ahead than we are now.
Frankly I'm at a loss for the best way to proceed -- we really don't want to add new fields for all the possibilities, but I'm also hesitant to prescribe a "best" single ID e.g. as a journal setting.
Overall I do think we'll need tools to allow the editors to curate the data in batch.
This is just a more detailed description of my suggestion above. I am not sure if it would solve all problems connected to this issue, but then again, I do not think that there is a solution that would cover everything.
The way ORCID works is carefully thought trough. Email is basically the best default identifier there is. But since it can not be fully trusted, ORCID just uses it as a way to get a confirmation from the author. Once that confirmation is received, ORCID is attached to a piece of information, usually an article. ORCID of course assumes that the author is not lying when claiming work.
I can not see any efficient way of this working without an author interaction, which of course becomes a bit of a problem with dead people and people without an email.
Since ORCID does not support unauthenticated ORCIDs, maybe OJS could have authenticated user accounts and unauthenticated user accounts. The "ghost" accounts I mentioned above.
With authenticated user accounts we could automatically add a database link between the users table and the submissions table. Authentication would be based primarily on ORCIDs. But user could also claim published articles inside an OJS installation without an ORCID.
With unauthenticated user accounts we would give the editor the tools to connect the users table to the submissions table. The most important identifier here would be email, but the editor could also use name matching etc. The created link would be a "weak" link between the users table and the submissions table, but still something that could be used in cases mentioned in the first message. If a person with the same email would register an actual account, OJS would ask them "are these your articles" and the user could claim them, preferably by using an ORCID, which would create a "strong" link between an user and an article.
We could also see what ISNI would have to offer here. Most of the old articles are already indexed by libraries. When a journal begins to digitize their old issues and uploads them to OJS they usually just fill in the same metadata again. But what if there would be a way of searching library metadata at this point and even attach an ISNI to the author metadata? Of course there could be errors in ISNIs as well, but at least we would "only" make the same mistakes again.
At University Library Heidelberg we are very much interested in the disambiguation issue for author's and could also help in fleshing this out.
@asmecher Hi Alec,
you wrote:
Frankly I'm at a loss for the best way to proceed -- we really don't want to add new fields for all the possibilities, but I'm also hesitant to prescribe a "best" single ID e.g. as a journal setting.
Do you think it is possible to add an identifiers table, with configurable identifiers?
And link this to authors and optional users?
Maybe like this?
author_id, user_id (maybe null), identifier_type, identifier
Type could be configurable for the host? With some default types like VIAF already configured?
@ajnyga and @isgrim, thanks, those are some excellent ideas.
Two alternatives to the tables you propose, @isgrim, just for discussion's sake:
Don't introduce any new tables. Similar to the current ORCID implementation, use author_settings / user_settings to introduce other IDs:
INSERT INTO author_settings (author_id, setting_name, setting_value, setting_type) VALUES (123, 'id:user_id', '45', 'int'); # OJS author ID 123 is OJS user ID 45
INSERT INTO author_settings(author_id, setting_name, setting_value, setting_type) VALUES (123, 'id:orcid', 'https://orcid.org/abcdefg', 'string'); # OJS author ID 123 is ORCID `https://orcid.org/abcdefg`; all ORCIDs are presumed to be authenticated
INSERT INTO user_settings(user_id, setting_name, setting_value, setting_type) VALUES (45, 'id:orcid', 'https://orcid.org/abcdefg', 'string'); # OJS user ID 45 is ORCID `https://orcid.org/abcdefg`; all ORCIDs are presumed to be authenticated
This would indicate that author_id 123 is user_id 45 (using internal IDs), and that both have ORCID https://orcid.org/abcdefg.
Advantages: No new tables required; symmetrical with current ORCID storage; each ID can be written as a plugin
Disadvantages: Queries to disambiguate authors may be funky
Introduce a more generic structure for IDs, something like...
Advantages: Can be used to affix identifiers to users, authors, and anything else; simple schema; easy to group by identifier, list heterogenous assets by identifier, etc.
Disadvantages: assoc_type/assoc_id is a weirdly normalized pattern (though common in OJS)
We would need to consider whether we'd want to migrate other kinds of identifiers, e.g. DOIs etc., to this kind of structure.
@asmecher Is still on the horizon? It would be great if we could incorporate GND (german authority file) or other name authorities (VIAF, etc.) as an disambiguating identifier.
I was stumbling on an related issue while looking at the "authors" method in the SearchHandler.inc.php this was used for "Browse by author" in OJS 2 and can still be used for browse by author in 3.xx but the link is hidden. We use the link from a plugin to enable browsing by author without disambiguation, which works okay. But it would be nice to have a maintained "browse by author" functionality as a plugin or in the core.
@isgrim, the old OJS 2.x "browse by author" behavior is definitely deprecated, and we're hoping to use third-party ID-based services like ORCID to take its place. (OJS will never be particularly useful in hosting its own author list, since a single OJS install will only reflect a small amount of a typical author's work.)
Are there services offering browsable collections of works by an author using the GND ID for disambiguation, or would this need to be built into OJS?
@asmecher I dont know of a particular service, but at German National Library you can query for all literature and musical works which have been published in Germany, which are linked with that GND id.
The link was displayed in the overview page for Martin Luther:
http://d-nb.info/gnd/118575449
The website is only in german and not very usable, so probably not a good idea.
There is also the WorldCat by the OCLC, I found a service listing catalogued works of an identity:
https://www.worldcat.org/identities/lccn-n79089628/ (for Martin Luther, as an example for a dead person)
https://www.worldcat.org/identities/lccn-n88027295/ (for David A. Patterson, as an example for a an active author)
But its using the LCCN (from the American Library of Congress).
I think the main point of configurable identifiers would be to have one primary identifier for a local OJS instance, and other identifiers with "sameAs" or "seeAlso" relation. All the identifiers could then be part of the exported meta data for archiving or cataloguing purposes.
I talked with @lmaylein about this and he mentioned that another use for a primary identifier would be to prevent entering duplicate Author entries with slightly different spelling or spelling mistakes.
See for example an related issue for the DataVerse Open Data Repository:
https://github.com/IQSS/dataverse/issues/5151
@asmecher
Hi Alec,
do you know if this still on the table?
We haven't made any progress on it, I'm afraid, but I do think it's an area for OJS to be improved and would be happy to help e.g. with requirements and design.
Thanks for the response @asmecher .
We are planning a plugin to incorporate another name identifier into the Author metadata (GND),
maybe during the development I have some time to think about a more wholesome approach to identifiers, but to start we are looking for a quick solution with another metadata field.
No time frame yet, but will keep you posted.
Some ideas, very raw, just a braindump:
Here's a recent relevant discussion on the support forum about the current front-facing author index: https://forum.pkp.sfu.ca/t/ojs-3-disallow-excessive-spaces-in-names-building-author-indexes/55816
Found this old thread while looking through our internal wiki. Keeping it here for reference.
I want to take some time this year to work on this. Maybe in the coming OJS-DE workshop in Heidelberg in February.
https://forum.pkp.sfu.ca/t/integration-of-person-norm-data-and-authority-files-for-authors/2796/11