Users should be able to upload files via rsync. When rsync is available, users will not be able to upload via http methods.
The technical underpinnings for rsync are present, but we need to decide on what the UI/UX should be for rsync uploads.
During the 2016-09-08 SBGrid Sprint Planning meeting ( https://docs.google.com/document/d/1wWSdKUOGA1L7UqFsgF3aOs8_9uyjnVpsPAxk7FObOOI/edit ) this issue was given an effort level of "5".
The code that @bmckinney demo'ed lives at https://github.com/bmckinney/dataverse-canonical/tree/dcm-beta and it works (deployed to https://dv.sbgrid.org currently) but would probably benefit from @scolapasta and @mheppler taking a look so I'm going to assign this issue to them and me and Bill to figure out what happens next to this code.
@bmckinney @djbrooke and I discussed this and I believe that what should happen next is for a branch to be pushed to the main IQSS repo so that @mheppler can easily switch to it and make UI improvements.
There is a plan for design/usability in Trello: https://trello.com/c/Nbte37k1/9-sbgrid-rsync-file-uploads-4-8. In terms of timing, is it accurate to say that this is planned to be presented at June's Dataverse Community Meeting as production-ready code or at least a prototype?
That's correct!
We did a pre-test of the mockups with @pameyer and several issues came up:
(Linked mockups: https://projects.invisionapp.com/share/F3C9JRA8Z#/screens/240095406_Rsync_Create_Dataset_1)
For uploads, just using "rsync" is either too much information or too little. For downloads, it's probably the best word to use.
Pete expected to be able to copy the location of the rsync script (rather than download it to his laptop) so he could "wget" & paste the file into a terminal window that is open (via SSH) to the remote server where his data is stored and run it there.
The mockups led the data owner (Pete) to assume that the files he transferred were not encrypted during transfer.
Additional issues with the mockups:
UI IMPACT
As defined in the following mockup document.
Create Dataset
Dataset (view/no files)
Dataset (upload in progress, dependencies: #3942, #3561, #DCM-TBD)
Upload Files (popup)
code styling from Bootstrap from red to something less "WARNING"Dataset (view/has file/success-upload)
Dataset (file table)
File
Notifications
User Guide
Developer Guide
Various "upload in progress" indicators have a dependency on both DCM changes (https://github.com/sbgrid/data-capture-module/issues/21), and supporting Dataverse <-> DCM API changes (#3942).
In standup this morning I mentioned a few recommendations for this issue, but it turns out most of them actually apply more to #3998. But there's still one that's relevant here.
On the Upload Files popup (over the Dataset page):
The red text on pink background ("bash ./upload6.bash") should instead be highlighted black text, as seen in this file page mockup:
This is because users report that when they see the red text, they think an error has occurred. I'll be making the same recommendation for the download pages in #3998.
Reviewed the UI IMPACT outlined above and moved a few more of the items to the File Download w/ Rsync UI/Workflow changes #3998 issue.
I'm looking into the checklist items under the "Dataset (upload in progress)".
@landreev sounds good. I'm looking at stuff under "Upload Files (popup)".
The dataset is now getting locked when the user downloads the rsync script.
The locked page shows the "dcm upload in progress" (the message is in the bundle), until the package is uploaded.
I was having all sorts of issues with actually displaying the image and getting the page properly refreshed... so let's review this quickly.
@mheppler I added code that properly identifies a package file as class "package". So the page is now showing the new custom icon; but it appears to be a little off center or something? - please see below:

First pass notes:
sr.py unnecessarily.sr.py https://github.com/sbgrid/data-capture-module/issues/16 appears to be working as expected.Continuing notes:
Can we decide who's going to work on addressing this stuff?
I stepped in last week to work on the dataset locks and status messages on the page. Do I own the rest of this ticket now? A lot of the things currently being discussed, I don't know much about at all. I'm sure I can figure it all out. But is there somebody else on the team who was involved in implementing these things earlier, for whom they make more sense already?
This thing is still sitting in code review (prematurely, it sounds like?); if I should just pull this back into the dev. column and work on it, let me know.
Some quick questions/ideas:
"Supported file formats" - what page is that? But yes, everything should be configurable. If something needs to be changed/hidden, when rsync is configured as the upload mode, I'm sure it can be done.
I didn't like the current lock message either; I can just go ahead and put the message you suggested into the bundle, unless anyone objects.
"Popup / download link continues to work when dataset is locked for transfer" - that I haven't been able to reproduce... Once the dataset is locked, the "download" button should be shaded and disabled... Is it some edge case, of having another browser window open, that still has the button enabled?? Yes, we can add an additional code check to the backing bean.
As for email notifications, are there any implemented currently? Or is this something that needs to be added?
Download link not copy-and-pasteable... Can we just replace the link with a button? Unless it's easy to add the target value to the link, to make it possibly to copy it... @mheppler ?
As for adding text explaining that it's not possible... I don't know, there is a lot going on in that popup already - not sure it's worth it. I would probably address it in one of the 2 ways above.
Disabling further uploads after a successful DCM upload - has somebody already worked on this? Or is this something new that needs to be added?
In retrospect, I should've taken notes somewhere else and condensed them rather than using the comments on the github issue for that.
"supported file formats" is in the files panel of dataset.xhtml in create mode; but it does look likely that its in a bundle already.
"Popup / download link continues to work when dataset is locked for transfer" - I think this is a feature rather than a bug (aka - user downloads script, forgets where, they can download it again even though the dataset is locked for transfer).
Email notifications have been implemented; but we deferred some wording/UX stuff until now.
Download link / copy-paste: I think either of those suggestions would work.
Oh, this part:
"Popup / download link continues to work when dataset is locked for transfer" - I think this is a feature rather than a bug (aka - user downloads script, forgets where, they can download it again even though the dataset is locked for transfer).
I thought we were going to handle this by having the expiration date on the lock - ?
Reworking previous few comments from good/bad/ugly mix into a checklist
TODO:
/api/datasets/:persistentId/dataCaptureModule/checksumValidation w\out DCM. @pdurbin talked to @pameyer about this after standup on 2017-08-29 and we decided we won't fix this. When the good news comes from the DCM, the yellow "in progress" notification disappears and the package is shown below, after the page refreshes.good:
ugly (aka - not worth fixing currently):
https://github.com/IQSS/dataverse/issues/4111 may be related to the delete issue.
@pameyer not sure if warrented another to-do checklist item, so I didn't add it, but I'll be fixing the icons for Dataverse Package file types in Chrome.

@sekmiller the createPackageDataFile method in FileRecordWriter.java seems to be the one related to "when import has failed due to failure to rename package file".
I realize I didn't reference my commit yesterday, to automatically link it to this issue. But I did check some code that should have fixed the delete issue above. Will mark it on the checklist.
And thanks @pameyer for the checklist - it helps a lot.
Chrome is 👍

Another run-through:
TODO:
Good:
Ugly:
localhost:8088 has localhost in link target). Only link I've noticed with this behavior; very low probability of impacting users in production; does not need to be fixed in this issue.From #3561:
Re: two message blocks upon submitting a dataset for review, the green-success msg should only display after the user first clicks the submit btn, then the yellow-warning msg should only display when the user returns to the dataset, as an indicator the dataset is still in review.
@pameyer @michbarsinai - I have added the 4th lock reason, specifically for a dcm upload in progress. I decided it would be cleaner/easier to work with this way.
@mheppler - yes, that's why the "two messages" issue was entered here. (we decided it was easier to address it after the merge). I'm working on it.
Did a quick test pre QA:
Regarding @kcondon's last notes:
Fixed the issue 2. that @kcondon reported lat night (closing the popup via the "x" box), by adding an ajax event="close" listener.
The branch should be ready for merging into the rsync download branch.
Two restAssured errors uncovered after merge of #3561 into develop:
I'll take a look at the failing REST Assured tests.
In ce0e97c I re-fixed #3443 about excluding email addresses from metadata export (they were being exposed), marked a method as deprecated to hopefully prevent regressions like this in the future, and added a "Privacy Considerations" section to the Installation Guide. I gave @matthew-a-dunlap a brain dump about the fix.
In 840c7a2 @matthew-a-dunlap and I fixed a bug in native file add that appears in develop as of 9d319bd where if you upload a .dat file you get a 500 error rather than a proper JSON response. The 500 was resulting from a null user attempting to create a lock:
Caused by: java.lang.IllegalArgumentException: Cannot lock a dataset for a null user
at edu.harvard.iq.dataverse.DatasetLock.<init>(DatasetLock.java:117)
at edu.harvard.iq.dataverse.DatasetLock.<init>(DatasetLock.java:104)
at edu.harvard.iq.dataverse.DatasetServiceBean.addDatasetLock(DatasetServiceBean.java:540)
I also tested adding the same file via the GUI but was unable to reproduce the bug. I can only assume this means that the GUI and the API are taking separate execution paths, which is unfortunate. Anyway, should be fixed.
Most helpful comment
Reworking previous few comments from good/bad/ugly mix into a checklist
TODO:
/api/datasets/:persistentId/dataCaptureModule/checksumValidationw\out DCM. @pdurbin talked to @pameyer about this after standup on 2017-08-29 and we decided we won't fix this. When the good news comes from the DCM, the yellow "in progress" notification disappears and the package is shown below, after the page refreshes.good:
ugly (aka - not worth fixing currently):