We currently have the APis as using the db id, but we need to support persistent Id.
From #1717:
However, the doi naming Scheme contains slashes, and so does not play well with REST api. We could introduce a Scheme where the doi id is escaped or the slashes are replaces with dashes, or maybe base64-ed. Not sure any of these is a good idea - at least, it's not a very intuitive one. We could offer another endpoint that converts global ids to local ones.
I definitely have a need to figure out internal dataset ID numbers when working with APIs.
For a long times I've been doing this at https://github.com/IQSS/dataverse/blob/master/scripts/search/assumptions
export FIRST_FINCH_DATASET_ID=curl -s "http://localhost:8080/api/dataverses/finches/contents?key=$FINCHKEY" | jq '.data[0].id'
And more recently I've been using an undocumented feature of the Search API to expose database IDs (looking them up by globalId/persistentId/DOI) but this requires turning on an experimental feature I haven't fully implemented at #1299 - https://github.com/IQSS/dataverse/blob/60e82b10168397f863624e70e51b12f6b38cb4c5/src/test/java/edu/harvard/iq/dataverse/api/SearchIT.java#L218
Anyway, my point is that this is an important endpoint for sure. /cc @rliebz
Hi,
This is a blocker issue for my project, because without the ids I can't perform metadata updates, and I can't get the ids because the get_contents() call takes too long to complete. I will give a try on some of the workarounds described here, so thank you to folks who posted those!
One possible suggestion for a simple solution here would be to URL-escape the DOIs and then use them in the REST format as usual, so you'd get something like https://dataverse.harvard.edu/api/datasets/doi%3A10.7910%2FDVN%2FUXTXA/versions/:latest
Anyway, if anyone has any additional suggestions for how to find the IDs or how to perform metadata updates using only DOI, I would love to hear them!
Thanks,
Garth
While I was just trying to write a test for #2222 it was driving me crazy (again) that I can't see the dataset entity/database IDs from SWORD. I just pushed a proof of concept to correct this in 639d8c3.
the get_contents() call takes too long to complete
Right, get_contents is a method @garthg is calling from https://github.com/IQSS/dataverse-client-python and the corresponding issue about this slowness on the API side is #2122
Without this functionality of being able to look up datasets via DOI, the native "datasets" API ( http://guides.dataverse.org/en/4.0/api/native-api.html#datasets ) is way less useful. An example use case today from @aawinburn was "How do I get the file ID this PDF in my unpublished dataset?" Good question and #1795 was supposed to be the answer but you have to know the database id of the dataset. I've also answered this question at https://groups.google.com/d/msg/dataverse-community/fFrJi7NnBus/JUdOlOmhtQgJ encouraging people (for now) to get a list of file IDs via the SWORD statement ( http://guides.dataverse.org/en/latest/api/sword.html#display-a-dataset-statement ) mostly because SWORD operates via DOIs. See also https://github.com/infsci2711/MultiDBs-FilesAPIs2DBs-WebClient/issues/6
As I just mentioned in a thread on the Dataverse Google Group, #2416 was opened recently which is about how hard it is to discover file IDs from the GUI.
In addition #2438 is a new issue about what persistent IDs we could/should use for files.
Developers of the Dataverse client for Python would like the ability to use DOIs (not just database IDs) to operate on the native API. https://github.com/IQSS/dataverse-client-python/issues/28 has some discussion on this.
This would also be useful for the R client.
I should elaborate: there's a tension between the Native API's ability to get versions of a dataset (but only by dataset ID) and the SWORD API's ability to retrieve a dataset by DOI. It would be nice for these to be able to play together, particularly given that the Native API doesn't require an API key to view the contents of a public dataset, but the SWORD API does.
This is a blocker as well for my project, and I do not see what the reason is that the search API does not expose the dataset ID's by default.
As it turns out, several dataverse installations I've tested do provide the id's when the 'show_entity_ids=true' parameter is passed in the URL. However, this feature is undocumented in the API docs.
See also #1717 which spawned this ticket. I think @michbarsinai @scolapasta and I need to get together and decide on an approach to try. Options include:
/api/datasets?persistentId=doi:10.7910/DVN/UXTXA/api/datasets/doi%3A10.7910%2FDVN%2FUXTXA/api/datasets/versions/:latest/doi:10.7910/DVN/UXTXA@garthg means well when he suggests escaping the DOI in the URL like /api/datasets/doi%3A10.7910%2FDVN%2FUXTXA/versions/:latest (and @michbarsinai suggested the same at https://github.com/IQSS/dataverse/issues/1717#issuecomment-87414448 ) but my goodness is that hard on the eyes. I would much prefer using a query parameter like this: /api/datasets?persistentId=doi:10.7910/DVN/UXTXA which is exactly what we do on the dataset page: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/UXTXA
Another approach would be to put the DOI at the end of the URL, like we do with SWORD ( /dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.7910/DVN/UXTXA ) but I favor the query parameter approach.
Whatever we decide on we would, of course, continue to support the old way for a while. And I think we should continue to support looking up a dataset by id, even if we use a query parameter (/api/datasets?id=42).
Another option is to have a DOI endpoint. This will also allow to point to different types of items from a DOI, which is, I think, one of the main goals of the DOI project.
Something along the lines of:
/api/doi/10.7910/DVN/UXTXA
Not sure how to deal with versions there - we could append them (/api/doi/12.3456/DVNE/UXTXA/versions/:latest) and use some semi-clever URL parsing. Or we could return a list of the versions, and have the client access a specific version via the existing API.
@RinkeHoekstra In case it's helpful, I wrote some Python that does cached lookup of dataverse IDs to make it slightly easier to manage this issue. Some code is on pastebin at: http://pastebin.com/ipdhEPXA . Obviously that's not a substitute for proper implementation through the API, but I wanted to pass it along just in case it's helpful.
@garthg thanks! I found similar code somewhere on Github and now have a workaround.
A separate issue is that the search API is rather picky as to how the DOI is quoted. For instance Python requests always quotes the query parameters in a GET request, but the API then searches for the quoted string rather than unquoting it first. But that is a separate issue ...
URL scheme for external persistent ids:
http://dataverse.org/api/datasets/:persistentid/:draft?persistentid=doi:10.2.3.4./open/ended/notation*
&.Implemented in feature branch 1837-persistent-id-in-dataset-api. To test: Use the regular API testing, but refer to the dataset as http://dataverse.org/api/datasets/:persistentid/:draft?persistentid=doi:<doi-goes-here>.
@scolapasta this is one of the issues I mentioned this morning for which code has been pushed to a branch made from 4.2.3 and a decision should be made whether to merge it in to the 4.2.3 branch or not.
Most recently, this issue is affecting this user:
I'm replying with workarounds but really we should just fix this issue. @michbarsinai implemented a fix at https://github.com/IQSS/dataverse/issues/1837#issuecomment-166017382 and it has since become pull request #2893.
Tested and merged.
You can see the fix in production at https://dataverse.harvard.edu/api/datasets/:persistentId?persistentId=doi:10.7910/DVN/ARKOTI
(That's the dataset @monogan said we could test with at https://github.com/IQSS/dataverse-client-r/issues/2#issuecomment-155613278 .)
Docs at http://guides.dataverse.org/en/4.3/api/native-api.html#datasets
Most helpful comment
You can see the fix in production at https://dataverse.harvard.edu/api/datasets/:persistentId?persistentId=doi:10.7910/DVN/ARKOTI
(That's the dataset @monogan said we could test with at https://github.com/IQSS/dataverse-client-r/issues/2#issuecomment-155613278 .)
Docs at http://guides.dataverse.org/en/4.3/api/native-api.html#datasets