dataverse 🚀 - DV cannot parse DOIs that don't have a separator in authority/prefix

Hey @tdilauro - @scolapasta is running point on migration issues and I'll make sure he checks this out today.

I'll also tag @landreev as it looks like he added code to handle what you mention in a commit that was in 4.5: https://github.com/IQSS/dataverse/commit/7ba9194

Anyway, we'll be in touch as soon as possible.

djbrooke on 20 Jan 2017

👍1

Almost forgot! We should also check with two of the recent upgraders from the Dataverse community - @donsizemore and @4tikhonov - to see if they have any helpful info here.

djbrooke on 20 Jan 2017

👍1

Hi @tdilauro I see from your post at https://groups.google.com/d/msg/dataverse-migration-wg/bFYugiKVvd8/IpsLxRB6AwAJ that you were having trouble migrating datasets with DOIs such as "doi:10.7281/T1J10120" to Dataverse 4.3 but can you please try this on Dataverse 4.6? I'm getting different behavior than you reported there. I'm not getting "IllegalArgumentException: Failed to parse identifier: doi:10.7281/T1J10120" but I also don't want to get your hopes up that DOIs like this are completely supported. I suspect there still may be an issue with DOIs like this. We'll see. 😄

On a related note, I played around with the code a bit and pushed some scratch work to https://github.com/pdurbin/dataverse/commit/ca6ecb49bb4fb1695e4a87ba54d768e22b2728c1 that I'd be happy to review with @scolapasta @landreev @sekmiller and others.

pdurbin on 20 Jan 2017

Hi Tim,

I'll be curious to here what happens when you try what Phil suggested, but
please let me know how else I can help with anything migration related.

(I have a general e-mail I sent out to those groups doing migration, will
send to you shortly)

Gustavo

On Fri, Jan 20, 2017 at 12:09 PM, Philip Durbin notifications@github.com
wrote:

Hi @tdilauro
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tdilauro&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xgAtlRJhZJjMyKX1J660N0IQRNfpw7XyvKRWo-rQjyo&m=uPO38ER7wWUbGudRmD-sQMN0cuknpns74nc8LnR8TRw&s=HKMPMX7WCKDZ2-0GLyckOPK7gK3k315enz0VXsiO-AM&e=
I see from your post at https://groups.google.com/d/
msg/dataverse-migration-wg/bFYugiKVvd8/IpsLxRB6AwAJ
https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msg_dataverse-2Dmigration-2Dwg_bFYugiKVvd8_IpsLxRB6AwAJ&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xgAtlRJhZJjMyKX1J660N0IQRNfpw7XyvKRWo-rQjyo&m=uPO38ER7wWUbGudRmD-sQMN0cuknpns74nc8LnR8TRw&s=YyeE03Hsn969T_5hEp9Is4elObvN0BJC7GE9eMbUZ7A&e=
that you were having trouble migrating datasets with DOIs such as
"doi:10.7281/T1J10120" to Dataverse 4.3 but can you please try this on
Dataverse 4.6? I'm getting different behavior than you reported there. I'm
not getting "IllegalArgumentException: Failed to parse identifier:
doi:10.7281/T1J10120" but I also don't want to get your hopes up that DOIs
like this are completely supported. I suspect there still may be an issue
with DOIs like this. We'll see. 😄

On a related note, I played around with the code a bit and pushed some
scratch work to pdurbin@ca6ecb4
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pdurbin_dataverse_commit_ca6ecb49bb4fb1695e4a87ba54d768e22b2728c1&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xgAtlRJhZJjMyKX1J660N0IQRNfpw7XyvKRWo-rQjyo&m=uPO38ER7wWUbGudRmD-sQMN0cuknpns74nc8LnR8TRw&s=Q4YDnoCoxi7AEQG5w75KC4t1K7TeUsmKjduvroAUI_Y&e=
that I'd be happy to review with @scolapasta
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scolapasta&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xgAtlRJhZJjMyKX1J660N0IQRNfpw7XyvKRWo-rQjyo&m=uPO38ER7wWUbGudRmD-sQMN0cuknpns74nc8LnR8TRw&s=iPt8g6T78RlGqq7xLYws-eT9fc6uSPIl7ITlXHmahpE&e=
@landreev
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_landreev&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xgAtlRJhZJjMyKX1J660N0IQRNfpw7XyvKRWo-rQjyo&m=uPO38ER7wWUbGudRmD-sQMN0cuknpns74nc8LnR8TRw&s=00UyD91Jl2KiRcN38M_tuh5DEclDeMkNqQtOGFRg8H0&e=
@sekmiller
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sekmiller&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xgAtlRJhZJjMyKX1J660N0IQRNfpw7XyvKRWo-rQjyo&m=uPO38ER7wWUbGudRmD-sQMN0cuknpns74nc8LnR8TRw&s=rqjTEQ6B1eIR_Wah1ckveT4_VWbBOuGG1R5pL09znP4&e=
and others.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_3583-23issuecomment-2D274125417&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xgAtlRJhZJjMyKX1J660N0IQRNfpw7XyvKRWo-rQjyo&m=uPO38ER7wWUbGudRmD-sQMN0cuknpns74nc8LnR8TRw&s=6yjchhqyJ0-9gQih9MhXSaGpPoExSQVC7fddklL0PjY&e=,
or mute the thread
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AEEhBFHhq0SNLF8eYjmSrwwOfELF2E99ks5rUOo4gaJpZM4LpWyJ&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xgAtlRJhZJjMyKX1J660N0IQRNfpw7XyvKRWo-rQjyo&m=uPO38ER7wWUbGudRmD-sQMN0cuknpns74nc8LnR8TRw&s=hc54IAAx88CmPmQFqaTX17IbeFtErmi1hCF7nRUUacQ&e=
.

scolapasta on 20 Jan 2017

Thanks @pdurbin, @djbrooke, and @scolapasta.

@scolapasta: A key question for me right now is for which DV versions the migration tools and process are supported. I have performed the migration several times against 4.3 and below, but not of the most current releases that supports migration directly from 3.6.3.

tdilauro on 20 Jan 2017

Hey @tdilauro - I know that you and @scolapasta have been communicating about this, so feel free to leave any helpful notes or scripts here. If this is something that other installations will run into, we should share the knowledge if helpful. Thanks!

djbrooke on 26 Jan 2017

@tdilauro and I discussed this issue briefly at http://irclog.iq.harvard.edu/dataverse/2017-02-01#i_47985

pdurbin on 1 Feb 2017

I was able to work around this issue in 4.3.1 to complete our migration by stripping the DOIs from the 3.6.3-exported DDI and feeding this to batch migration. Since our Handle prefix was not real and not registered, I needed only replace the Handles with their corresponding DOIs in the dataset table once the batch migration was complete. Using the original exported DDI, I was able to create a mapping from the Handles to their corresponding DOIs for each of the datasets. Using the same information, I created a script to rename (mv) the associated directories.

My migration path was 3.6.2 -> 4.3.1, followed by a series of upgrades from 4.3.1 -> ... -> 4.6. I did not have time to test a migration directly from 3.6.2 to 4.6. So, I'm not sure if this issue is resolved or not.

tdilauro on 1 Feb 2017

@tdilauro I think there's still an issue here. Please take a look at https://github.com/IQSS/dataverse/blob/v4.6/src/test/java/edu/harvard/iq/dataverse/GlobalIdTest.java#L127 for example, which looks like this:

    @Test
    public void testBadIdentifierTwoParts(){
        System.out.println("testBadIdentifierTwoParts");

        exception.expect(IllegalArgumentException.class);
        exception.expectMessage("Failed to parse identifier: doi:2part/blah");
        new GlobalId("doi:2part/blah");
}

@tdilauro you identifiers have two parts (i.e. doi:10.7281/T1J10120) and it sounds like the Dataverse code should be able to parse them.

pdurbin on 1 Feb 2017

Yesterday @scolapasta mentioned "The shoulder separator should not be part of authority but rather the identifier per examples on EZID site" from #898 which seems related. That issue also references http://ezid.cdlib.org/learn/id_concepts

pdurbin on 2 Feb 2017

Thanks for making that connection to #898, @pdurbin. I'll add some comments there.

tdilauro on 2 Feb 2017

FWIW - We have a working fix for using shoulders that don't end in '/' in QDR (and very strangely our internal issue # was 898 also!). I didn't send it at the time since I left the hardcoding that assumes '/' is the doiSeparator (so it's only a partial fix that doesn't address other separators). It works with the sequential number and random string identifiers and is backward compatible in that existing identifiers (of the form 10.5072/FK2/ABCDEF) remain stored as is (with the shoulder as part of the authority) while ones without a final slash separate the authority and shoulder (which is stored as part of the identifier field - 10.5072 and FK2ABCDEF) in the DB. A new DoiShoulder key allows you to set the shoulder. If a PR is useful, let me know. (We weren't upgrading from 3.x but I think a DB update could migrate old DOIs if that's still useful).

qqmyers on 26 Apr 2018

👍1

@qqmyers yes, please make a pull request so we can debate the approach, at least. Thanks!

pdurbin on 26 Apr 2018

Sending over to @scolapasta based on a brief discussion

djbrooke on 26 Apr 2018

@qqmyers thanks for the PR and I think it's a good start to the solution, which is what #898 is all about (or at least, evolved into). That solution being that the shoulder should be stored as part of the identifier always (with our without a separator).

While we normally do favor small batches, in this case I worry that will make migration instructions challenging when we do get the complete fix for #898, since this would create a schism in installations, some with shoulders stored as part of the authority, some with shoulders stored as part of the identifier (and possibly some with both! if they changed their structure at some point)

I'm willing to be convinced otherwise, but until then I lean not testing / merging this PR.

Regardless, it's good to see that others are on the same page about storing the shoulder as part of the identifier and your comments and work on this can hopefully help me champion #898 as a priority (as it helps the goal of reducing forks) for us here, or encourage someone in the community to tackle the complete solution.

scolapasta on 3 May 2018

@scolapasta - I haven't checked thoroughly, but I think existing entries could be migrated to move the shoulder into the identifier with a db migration and a minor update to DatasetServiceBean.findByGlobalId() (there's ICPSR specific code there to handle identifiers with only one separator character that would instead be used all the time). I didn't make this change when we made the update for QDR to avoid a db migration step, but this change would make all ids stored with authority and 'shoulder+' in the database. If that's useful, let me know and I can do a quick test (we have '10.5072/FK2' authority entries in QDR's test databases, so I can migrate those and try the updated code I'm suggesting.).

If that's not a helpful step forward, that's fine. I didn't submit this back when we developed it for QDR because it was clear that more work would be needed to support other separators/ change of separator over time.

qqmyers on 4 May 2018

I agree that your proposed changes sound like what is needed for #898 and if you have the time to work on that, I would be very happy to help and review, as needed.

scolapasta on 4 May 2018

For housekeeping purposes, I'll move this issue back to the backlog. When you have a new PR, let us know. and you can probably connect it to both this and #898)

scolapasta on 4 May 2018

In the spirit of housekeeping, I closed pull request #4623 (the one being discussed).

pdurbin on 4 May 2018

I updated the code and put the two lines of sql needed to update the database in a comment on the PR. If that needs to go elsewhere, let me know.

qqmyers on 4 May 2018

One ~issue: I added a new DoiUseMixedCase setting to allow ICPSR to keep mixed case identifiers. The old code assumed any identifier that did not have two occurrences of the doiseparator were from ICPSR and did not have to be set toUpperCase(). Since there will now be others that have identifiers with only one doiseparator, I thought making it an explicit setting made sense. However, it might be cleaner to just drop the toUpperCase() altogether. The main use case seems to be looking for datasets based on the DOI in URIs like /dataset.xhtml?persistentId=doi:10.5072/FK2/ZPAREP - leaving the case alone should work...

qqmyers on 4 May 2018

@qqmyers so, my main feedback is that we should really just be treating the shoulder as a prefix to the identifier that has no meaning and that we add at generation time. So we shouldn't need to treat shoulders as separate cases in the generation, nor need a doi separator character, as that separator could just be part of the shoulder.

In other words a few example DOIs:
doi:authority/dvn/1234
doi:authority/dvn:1234
doi:authority/dvn1234
doi:authority/1234

In all these cases if we just treat what is before the ":" as the protocol, between the ":" and the "/" as the authority, and then anything after that "/" as the identifier, we simplify store as a setting that the shoulder is:
"dvn/"
"dvn:"
"dvn"
N/A

and then just always prepend the shoulder (treating the nonexistence of the setting as "") to what is generated. In that case, I think it simplifies the code some. What do you think?

Re: the queries, shouldn't it be looking for a "/" and not a the doi separator? Or are my examples above
incorrect? (the script itself will need to go into an update script, but we can add that when we decide to merge this as I don't know exactly which update script it will go in. I'll discuss with @kcondon to see what he thinks the best way to handle is)

scolapasta on 14 May 2018

@scolapasta - ~Yes. I think the doi separator had been being used for both the authority/shoulder separator and as the shoulder/identifier separator as well, but, both Handles and DOIs require the first to be a '/' and, as you note, the shoulder/identifier separator can just be considered part of the shoulder. So I think the doispeparator can be removed and that this change will allow other shoulder/identifier separators as discussed in #898 . I can do that.

Any thought on whether the toUpperCase and mixed case flag are needed or if the toUpperCase can just be dropped?

Also - I noticed that I've introduced a bug for handle processing. I'll fix that as well.

qqmyers on 15 May 2018

Hmm, if the code did do that (use separator for both) seems like a bug. Good that we're just getting rid of it, anyway.

I'll await your next changes and think about the upper case / mixed case question in the meanwhile.

scolapasta on 15 May 2018

Yes - I agree it was a bug. You can see in the pull request, in the removed code for DataSetServiceBean, that index2 and index3 are both using the supplied separator.

I updated the PR last night - the separator was getting set but not used in several other places, but I think I've now removed them and '/' is used directly for the authority/shoulder separator (since both handle and doi hardcode it).

With the update, Handles are treated as case sensitive, and DOIs are treated as case insensitive unless the mixedcase flag is set, in which case it is case sensitive as well. I think this works and the defaults are consistent with the spec, but it does mean that orgs with mixed case db entries would need to set the flag whereas things just worked for them before...

qqmyers on 15 May 2018

I see @sekmiller has conflicting changes from #2438 that I think will also fix the bug and essentially drop the toUpperCase() logic that forced DOIs to be case insensitive (the source of the globalid must match the case in the database). I think that works - equivalent to my suggestion of not having a mixed case flag and also dropping toUpperCase(). If that's OK, I think his code is cleaner and then this PR can be rejected. (Since Handle and DOI both require '/' as the separator, the logic for making that a variable and storing it in the db could still be removed, but that's just cleanup once the bug is gone.)

qqmyers on 15 May 2018

Well, I think the value of this PR (or maybe a new PR?) is still in moving the shoulder out of the authority into the identifier field, so having (updated) db scripts and then proper code to handle that (and the removal of the separator as separate from the shoulder) would be good. (essentially solving 898). Would you have the capacity to work on that?

scolapasta on 16 May 2018

I think the new code is agnostic to DB structure, so the sql script I sent earlier should still be fine. I'll go ahead and merge with the dev branch and verify that things work with the db change...

qqmyers on 16 May 2018

I've completed a basic merge with dev. With the changes from the doi for files branch changing how IDs are found in the db, here's what's in this issue/PR:
The db script to move any shoulder used from the authority to identifier.
A DoiShoulder key and code that allows generation of dataset ids that include a shoulder as a prefix to a random or sequential identifier that always applies to datasets an applies to datafiles when their ids are independent (not prefixed with the dataset id).
Removal of the doiSeparator key which was previously being overloaded as the authority/shoulder and shoulder/identifier separator. For DOIs and Handles, the former must be '/' and the code now assumes any shoulder/identifier separator is just part of the doiShoulder coming in and stored as part of the identifier column in the db.
Addition of shoulderWithRandomString and shoulderWithSequentialNumber options for id generation styles for datasets that prepend any shoulder to new ids.
Separation of the old idgenerationstyle flag into two flags: datasetIdgenerationStyle and datasetIdGenerationStyle. I did this because the shoulderwith... options probably don't make sense for files (why have a shoulder twice in a dependent file id), but it would allow files to be sequential while datasets are random as well. With the updated code here, datafiles will use a shoulder if it exists only if they are independent of the dataset ids, i.e. if you set a shoulder and are generating datasets using it, you'll also get datafile ids using your shoulder.

qqmyers on 17 May 2018

Minor simplification - removed the shoulderWithRandomString and shoulderWithSequentialNumber options for datasetIdGenerationStyle and just assume that if a doiShoulder is set, it should be used.

I think this is ready for review again.

qqmyers on 17 May 2018

This is great - we're very close, I think!

Comments:

Since DoiSeparator had been introduced as the character between the shoulder and the identifier - though I agree with you it was misused as the character after the authority - could we just get rid of it completely? i.e. all the places where you get the separator and hard code it is '/', just use '/' as needed directly. Similar to how we don't have a variable for the ":" between protocol and authority. I think this would just make the code much cleaner.
Since you removed the "shoulderWith..." styles, can we simplify again back to just idgenerationstyle, instead of datasetIdGenerationStyle and datafileIdGenerationStyle. We would lose the ability to allow files to be sequential while datasets that are random, but I don't think we really need that and again code simplicity is better for when we invariably need to revisit a year from now. (also less documentation / administration burden to deal with)

in DataFileServiceBean, rather than have to add the independent / dependent conditional for both style cases, we could just add the shoulder to the "prepend" logic, just before it (renaming from datasetId):

 if (doiDataFileFormat.equals(SystemConfig.DataFilePIDFormat.DEPENDENT.toString())){            
     prependString = datafile.getOwner().getIdentifier() + "/";
 } else {
     prependString = shoulder;
}

Something like that at least...

scolapasta on 18 May 2018

For the doiSeparator: The doiSeparator key is gone. What remains are all the separator sets and gets to store and retrieve the separator (which is always '/' so far) to the database. I think all of that could also be pulled out too and the column dropped from the db. Is that what you're suggesting? I agree its simpler and as long as there are no plans for a id scheme where there are choices for separator (so knowing the scheme won't allow you to infer the character used), I don't think it causes any trouble.
I can go ahead with this if you let me know we're talking about the same thing.
For the idgenerationstyle. I think it can be just one. While I like the option randomString, dependent sequential personally, QDR's not planning to use it. Not hard to change if it needs to be brought back someday. I'll go ahead and change this.
For the prepend logic, the issue which caused me to add the shoulder to the generateID calls is that the check for isIdentifierUniqueInDatabase needs to consider the shoulder or the checks are always true. Looking now for dependent file Ids with randomString, it looks like that may have the same issue - if isIdentifierUniqueInDatabase is called to compare only the random part following the dataset id to be prepended, it is always true. The logic for dependent sequential is different and includes the dataset part in the isUnique check.

Given that, if we agree that dependent randomString has a bug, I think I can still do something like you suggest and concatenate the dataset id and shoulder into one term and then just pass that as the last parameter to the generate calls. I'll go ahead with this to make a clear example.

qqmyers on 18 May 2018

Code committed for everything except possible removal of doiSeparator column from table and associated get/set code.

qqmyers on 18 May 2018

I went ahead and edited both your comment and mine to add numbers to each bullet.

Yes, we're talking about the same thing and what I was suggesting.
Agreed, I think if/when we find a need, we can re add.
Sure, go ahead and add, and I'll take a look. (my expectation would be that isIdentifierUniqueInDatabase should check the entire combination, i.e. the entire persistentIdentifier and not just the id part. But I'll review when you have the change.

scolapasta on 18 May 2018

Ha! Looks like you beat me by a minute. But if you can get that last part in, I can hopefully get the final review in later this evening.

scolapasta on 18 May 2018

OK - removed the doi separator related variables and remerged with the latest code. The separator column in the database should be removed, but... I believe the ids for files code already moves that from the dataset to dvobject table, so the code to create the separator column in dvobject and any code to move the values from dataset to dvobject can just be removed from the sql update already created for that work.
So - hopefully done at this point/ready for review.

qqmyers on 19 May 2018

@qqmyers thanks again for working on this - I think I mentioned before that #898 was one of my pet issues so I'm very happy this is getting done (note, I am going to connect the PR to that one too, as both these should get closed once this is merged). Great job with handling those last three things. I think there are now just a couple of very small things to polish up:

DatasetServiceBean:
default:
/* Should we throw an exception instead?? -- L.A. 4.6.2 */
return generateIdentifierAsRandomString(dataset, idServiceBean,"");

I think this should be shoulder instead of "". Let me know if you don't agree.

JsonParserTest.java:
assertEquals("10.5072/FK2", actual.getAuthority());

This would fail now, right? So remove "/FK2" like you do in the other tests.

While not a change you introduced at all, I'm curious why Dataset still has protocol and authority (since they were moved to dvobject). Seems like this was missed in the file doi branch and would leave a couple of empty columns in the db. We might just want to remove them here while we're at it.

scolapasta on 19 May 2018

@scolapasta - glad to help.

1) - that's fine. I think I made it "" as half-way to doing an exception...
2) The test didn't fail because it bounces off a mock service that also had /FK2 - I updated both to remove them.
3) I think this makes sense. I went ahead and removed the authority and protocol variables from Dataset, dropped them as unique constraints, updated the findByIdentifier query that was still using the dataset table and updated on reference to authority (replaced with this.getAuthority() from dvobject). I see these columns and globalIdCreateTime are already dropped in the 4.8.6 to 4.9 sql update, so I don't think further change is needed there (but it raises the question of whether globalIdCreateTime needs to be removed from Dataset as well and whether the unique constraints that were in Dataset should be added to DvObject). (FYI: all this compiles, but I haven't yet tested with an updated database since merging with the file id work...)

qqmyers on 19 May 2018

Latest set of comments (also note, while a few things below I deem to be necessary before merge, some are just "nice to haves"; additionally, if you want us to help work on any of it, let me know - you've done a fantastic job with it so far):

Thanks.
Good. Though this makes me wonder if we should somewhere defensively code to make sure that authority can't have a'/' in it, as that would break things. Not critical, as the workaround is to simply not do that.
I confirmed with @sekmiller, yes, let's remove globalidcreatetime (and identifier).
3a. We should have the unique constraints on DvObject.
3b. findByIdentifier on Dataset is not used and should be removed. (Really, there should be such a named query on DvObject and that should be used by things like: isIdentifierUniqueInDatabase)
A new one: in the file dois we wrote a query that was not ideal, but was deemed necessary* because of how we stored each part. Now with the cleaned up fixed version, we should revert back to a named query. Would you mind making this update in this branch?

The query is both in DatasetServiceBean and DataFileServiceBean:
String queryStr = "select s.id from dvobject s where s.protocol || ':' || s.authority || s.doiseparator || s.identifier = '" + globalId +"'";
Dataset foundDataset = null;
try {
Query query = em.createNativeQuery(queryStr);

(*) this was because there could be "/" in authority and "/" because it was a file doi, so for parsing it would matter whether you were dependent or not, and thus if you switched your doi attributes at some point (edge case) then it would break

scolapasta on 21 May 2018

(do we have write access to this fork? @sekmiller and I just discussed making the changes to the update script that will be necessary for this)

If we don't, then we can just work on them and send them to you to put in.

scolapasta on 21 May 2018

@sekmiller had me add his statements to the SQL upgrade script in the pull request. Here's the commit: a3bcb9b

pdurbin on 22 May 2018

OK - I think I've made all the requested updates:
2 - the updated findByGlobalIdentifier method in DvObject logs the issue if it doesn't find something of the form :/
3 - globalIdcreatetime and related code removed from Dataset
3a - unique constraints on protocol, authority, identifier added to DvObject (checked that nulls are not equal, so datafiles that don't yet have DOIs don't violate the constraint)
3b - removed findByIdentifier on Dataset
4- created a NamedQuery to find things by protocol, authority, and identifier. Put this in DvObject and added wrappers in Dataset and DataFile that handle checking for type. I did some quick testing here and found that my attempt at a typedquery didn't work, so I changed to explicitly test dtype in the query. That works (give a 404 if you use a DataFile DOI when looking for a dataset rather than a 500 error).

FWIW- my test used the update script 1 commit back.

qqmyers on 23 May 2018

Identifier still needs to be removed from dataset, but I've asked @sekmiller to just go in and extract it as I'm hoping to get this to QA today.

The only other thing is how we have duplicate code for parsing id's in DVObjectServiceBean and in global id. Would be good to consolidate at some point, but we can tackle separately* (unless for some reason QA takes longer than expected).

(*) if you think you can get to it this afternoon, then feel free to tackle now.

scolapasta on 23 May 2018

Try this. The only real difference I saw was that GlobalId strips any whitespace, semicolon and single quote from authority and identifier. I left that in and merged the better error checking in from DvObjectServiceBean. So you create a GlobalId from the a string, and then DvObjectServiceBean just reads the parsed protocol, authority, and identifier from it to populate the query.

qqmyers on 23 May 2018

Perfect! Thanks.

scolapasta on 23 May 2018

Issues so far:

[x] setup-all.sh needs to set :Authority and :DoiShoulder (Stephen)
[ ] update db script does not migrate multiple doi providers correctly.
[x] docs need to scrub mention of :DoiSeparator and Add :DoiShoulder (Derek?)
[ ] setup-optional-harvard.sh was modified to add prod :Authority and :Shoulder values but did not do so previously.

kcondon on 24 May 2018

setup-all.sh can drop setting the doiSeparator too.

qqmyers on 25 May 2018

@sekmiller and I just worked on 83a76a0 but I think it's weird that :DoiShoulder continues to be a misnomer:

screen shot 2018-05-25 at 10 44 34 am

pdurbin on 25 May 2018

The Handle ID generation code does not include the shoulder, so I think it is now DOI specific. The connection with Handles and the requirement to be '/' were both from when it was also used as the authority - identifier separator.

qqmyers on 25 May 2018

@qqmyers great! If you're confident that the line about the misnomer can be safely removed, can you please go ahead? Thanks!

pdurbin on 25 May 2018

Sorry - it is still a misnomer, but no longer has to be a '/'. (Regardless of protocol, identifiers are generated the same way in the DatasetServiceBean and DataFileServiceBean, which is then passed to the relevant Id bean for generation.)

qqmyers on 25 May 2018

👍1

@qqmyers thanks for taking a look and for fixing up the docs at cf7a720. It sounds like :DoiShoulder is still a misnomer, which is unfortunate, since it's a new setting.

pdurbin on 25 May 2018

Looks like there are 18 uses of 'DoiShoulder' in git - I can just go through and change them all ...

qqmyers on 25 May 2018

@qqmyers judging from baa9052 you're done already! Thanks!!

pdurbin on 25 May 2018

@kcondon reported that he doesn't want :Authority or :Shoulder to be set by the setup-optional-harvard.sh script so I fixed this in d497dee . I also merged the latest from develop into the branch in 5478470.

pdurbin on 31 May 2018

Dataverse: DV cannot parse DOIs that don't have a separator in authority/prefix

All 55 comments

Related issues