To help expand on the DataCite DOI support #24 we are doing we will also add some additional DataCite fields which will help in the discovery of our datasets in their index.
For example this is what we currently send:
<?xml version="1.0" encoding="UTF-8"?>
<resource
xmlns="http://datacite.org/schema/kernel-3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datacite.org/schema/kernel-3
http://schema.datacite.org/meta/kernel-3/metadata.xsd">
<identifier identifierType="DOI">10.7910/DVN/29606</identifier>
<creators>
<creator>
<creatorName>Blackwell, Matthew, Honaker, James, King, Gary</creatorName>
</creator>
</creators>
<titles>
<title>Replication data for: A Unified Approach To Measurement Error And Missing Data: Overview</title>
</titles>
<publisher>:unav</publisher>
<publicationYear>2015</publicationYear>
<resourceType resourceTypeGeneral="Text"/>
</resource>
This will need to be updated to this, which includes metadata suggested by Martin Fenner from DataCite and by the Helmsley project (add contributors).
<?xml version="1.0" encoding="UTF-8"?>
<resource
xmlns="http://datacite.org/schema/kernel-3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd">
<identifier identifierType="DOI">10.7910/DVN/XYZ02</identifier>
<creators>
<creator>
<creatorName>Castro, Eleni</creatorName>
<nameIdentifier schemeURI="http://orcid.org/" nameIdentifierScheme="ORCID">0000-0001-9767-8536</nameIdentifier>
<affiliation>IQSS</affiliation>
</creator>
<creator>
<creatorName>Barbosa, Sonia</creatorName>
</creator>
</creators>
<titles>
<title>Replication data for: Testing out DataCite</title>
</titles>
<publisher>Harvard Dataverse</publisher>
<publicationYear>2015</publicationYear>
<resourceType resourceTypeGeneral="Dataset"/>
<descriptions>
<description descriptionType="Abstract">It was a good idea to try testing out this metadata before implementation
</description>
</descriptions>
<contributors>
<contributor contributorType="ProjectLeader">
<contributorName>Starr, Joan</contributorName>
<affiliation>California Digital Library</affiliation>
</contributor>
</contributors>
</resource>
Complete mapping to DataCite will still need to be done as described in #2774 and #2778 but this will require additional resources and time to complete.
One suggestion: allow for '
If that belongs in a different issue (aka - not relevant for this indexing), please let me know (or move it wherever it fits better).
Thanks for the suggestions @pameyer we are tracking what mapping we can support for the first phase here: https://docs.google.com/spreadsheets/d/1uADPbtVUEIXz5phtThjxU6gkAg-5jojxSlHtF959lEU/edit?usp=sharing
This feature needs to be enabled to send metadata to both:
@pdurbin @bmckinney
Going to move this out of In Progress for now, feel free to re-add the label and an owner if this is incorrect. Thanks!
@djbrooke sorry, I should have left a comment. On Thursday I said I'd created an issue called something like "DataCite XML: send more fields on published". Then I realized that this issue already exists and @pameyer has already commented on it. @bmckinney and I discussed it and he agreed he'd work on it the next two weeks or so I assigned it to him and put it in "Development" in https://waffle.io/IQSS/dataverse . My understanding is that it's relatively low effort but I'll defer to @bmckinney to estimate it after talking more with @pameyer about his requirements. I'm fine with whatever Waffle column but I believe this is one of the issues the @pameyer would say is a show stopper for DNS cutover day of switching https://data.sbgrid.org to be powered by Dataverse. I just created a GitHub label called "SBGrid" and applied it to this issue so that I can find it again.
Sounds good - feel free to move it over when work begins on it!
@bmckinney is working on this so I moved it to Development in Waffle.
@bmckinney The action happens in the DOIDataCiteServiceBean, specifically getUpdateMetadataFromDataset and getMetadataFromStudyForCreateIndicator (There could be a little cleanup done here to reduce repeated code.) There are methods on the DatasetVersion object to get specific field values that you can use to pattern additional methods for the additional metadata required. See getTitle() and getDescription()
@scolapasta and @bmckinney discussed this issue fairly extensively yesterday in an SBGrid spring planning meeting and it sounded like the decision was that @bmckinney will beef up the existing class rather than adding a new one. I'm happy to be corrected if I misunderstood.
Hi @bmckinney - we have this marked as "In Progress" - feel free to link to any branches or other work that explains the current state of this, or just move it back to "Ready" if it's not started. Thanks!
Thanks @bmckinney !
This is not a requirement for V1 of SBGrid, but there may be other groups interested in this.
@jggautier let's discuss.
Working example: https://dv.sbgrid.org/api/datacite/10.15785/SBGRID/264
@djbrooke and I briefly discussed the benefits to the community (including and outside of SBGrid) of the additions proposed in the working example, including...
Which should help generally with discoverability and perhaps with internationalization work.
I'll be talking with the team about the additional citation metadata @bmckinney worked on and helped me understand, to get a better sense of the amount of effort involved in sending more metadata.
I've written a list of questions and potential mapping in this doc: https://docs.google.com/document/d/1eIxfSwFIC1Paay9pUl5qfbgW7b1sw9EKQkcdDpmi8e4.
Since we've been talking about SPI a lot lately, including yesterday in the context of #3657, I'd like to point out that converting https://github.com/sbgrid/sbgrid-dataverse/blob/098b56b531f0c3e5cad689ef39eb54d321071587/mod-sbgrid/src/main/java/edu/harvard/iq/dataverse/export/DataciteExporter.java to a jar file might be a good first step. I also wrote about SPI today at https://groups.google.com/d/msg/dataverse-community/Nc8tX0s8lo8/pphF_LKUAQAJ
From Ashish Sharma at https://groups.google.com/d/msg/dataverse-community/LN7NR7iAFfw/H50cmAOnAQAJ
Does Dataverse support all elements of the DataCite 4.0 schema, or is it limited to a subset of fields?
For example, We'd like to be able to use the 'Version' relationship when a new version is created. This would involve updating the DOI metadata for the prior object to reflect the fact that it is no longer the latest version.
There are similar attributes, in the schema, that would need to be edited.There are UI changes, but I suspect that those are "easier" to implement.
From my notes at DVCM2017, Anita de Waard mentioned Scholarly Link eXchange as a search engine that builds links between publications and data. DOI metadata is one source search engines of this type can use for indexing; so expanding the DataCite metadata propagated to the DOI system allows these types of external sources to index datasets within Dataverse without requiring special-purpose implementations.
@jggautier - do you think we should close this one? Is this work covered in #4318?
I've always thought this issue was about sending DataCite more metadata in the DataCite schema. #4318 is about making it harvestable with OAI-PMH. It would be ideal if the same metadata in DataCite schema is harvestable over OAI-PMH and sent to DataCite on dataset publish.
One question I have is: Will the same metadata be harvestable and sent to DataCite?
It seems it wouldn't be, based on discussion in today's sprint planning.
So my other question is: Will the development work to make DataCite metadata available in both ways be easy enough that the effort can be part of #4318? Or should there be more than one (one issue to make it harvestable, another to send to DataCite)?
To keep the scope of the other issue, #4318, focused on providing value to @LauraHuisintveld at DANS, @shlake at UVA and others interested in harvesting using the DataCite metadata format, lets keep this issue about sending more metadata to DataCite separate. I can rename the title to make it clear that it's about sending additional metadata to DataCite.
Thanks to @sekmiller and @pameyer who helped me understand a little better how the metadata is sent to DataCite versus made available through OAI-PMH. Seems like #4318 gets Dataverse closer to sending more metadata.
There's more mapping to do, either as part of this issue or an issue following this one, as part of a broader discussion about improving the connections between data and publications to take advantage of services like the Scholarly Link eXchange framework (related to https://github.com/IQSS/dataverse/issues/2778).
I got the impression from today's discussion that Dataverse sends data to DataCite on publish when either EZID or DataCite is used as a persistent ID provider. What about Handles? Is data sent to DataCite on publish when Handles are used rather than DOIs?
What about Handles? Is data sent to DataCite on publish when Handles are used rather than DOIs?
I assume that data is sent do DataCite on publish for Handles too.
Mostly I'm leaving a comment here because #4782 which is in flight is also about sending more data to DataCite.
Also, code has been written for this issue #2917 at https://github.com/sbgrid/sbgrid-dataverse/blob/feature/datacite-xml/mod-sbgrid/src/main/java/edu/harvard/iq/dataverse/export/datacite/DataciteDataModel.java but it seems to be more in the context of export. A flavor of DataCite export has been implemented in pull request #4664 for #4257.
DataCite doesn't register Handles, as far as I know – the local Handle server does. Metadata is a requirement for getting DOIs for datasets from DataCite, but I don't think they would accept metadata for datasets with Handles.
@bencomp that makes sense. Thanks!
When https://github.com/IQSS/dataverse/issues/5029 is released, Dataverse will know if creators are people or organizations, and can include DataCite's "nametype" attribute. (See Martin Fenner's comment about nameTypes.)
Most helpful comment
DataCite doesn't register Handles, as far as I know – the local Handle server does. Metadata is a requirement for getting DOIs for datasets from DataCite, but I don't think they would accept metadata for datasets with Handles.