Dataverse: File Download w/ Rsync UI/Workflow changes

Created on 12 Jul 2017  Â·  23Comments  Â·  Source: IQSS/dataverse

Download compliment to #3348 .

Backend dependency on #3561

SBGrid

All 23 comments

UI IMPACT

As defined in the following mockup document.

Dataverse

  • [x] Render logic to hide Metrics block
  • [x] Render logic to hide Guestbook link in Edit dropdown

Dataset (view/no files)

  • [x] Render logic to hide Metrics block
  • [x] Render logic to hide Guestbook panel in Terms tab

Dataset (view/has file/success-upload)

  • [x] Render logic to hide Download count
  • [x] Render logic to hide Download button
  • [x] Help text with link to User Guide (/user/dataset-management.html#file-handling-and-uploading)
  • [x] Display (in place of Download button) "Local Access" path, "Download Access" rsync commands for various locations (with location name country in parentheses), "Verify Data" command
  • [x] Dynamic path for Local Access
  • [x] Dynamic paths for Download Access
  • [x] Dynamic command for Verify Data
  • [x] Tooltips for "Local Access", "Download Access", "Verify Data" labels
  • [x] Style code snippets as "black text, grey background" (already delivered in #3348)

Edit Dataset Terms

  • [x] Render logic to hide Guestbook panel

File

  • [x] Render logic to hide Metrics block
  • [x] Render logic to hide Download count
  • [x] Render logic to hide Download button
  • [x] New "Data Access" tab, displayed first, with help msg and labels/values to display "Local Access" path, "Download Access" rsync commands for various locations (with location name country in parentheses), "Verify Data" command
  • [x] Dynamic path for Local Access
  • [x] Dynamic paths for Download Access
  • [x] Dynamic command for Verify Data
  • [x] Help text with link to User Guide (/user/dataset-management.html#file-handling-and-uploading)
  • [x] Tooltips for "Local Access", "Download Access", "Verify Data" labels
  • [x] Style code snippets as "black text, grey background" (already delivered in #3348)

User Guide

  • [x] Rsync Download > File Handling + Uploading > Dataset + File Management

From another round of testing:

  • hide "guestbook" block under terms tab; since "package files" and guestbooks are not intended to work together.

Wrote some new messaging for the rsync download pages, would love to get @pameyer's verification that everything is accurate. Here are some earlier mockups, followed by the revised instructional text.

Data Access tab at the bottom of the file page:

dataaccess
There will be gray help text at the top, and tooltips on each of the three blue headers.

Gray instructions at top: "This data can be accessed through your computer's Terminal using the commands below. For more information, see our User Guide." ("User Guide" links to a guide page with more info)

Local Access tooltip: "If this data is locally available, this is its file path."

Download Access tooltip: "Download this data by running this command from your preferred mirror."

Verify Data tooltip: "This command runs a checksum to verify the integrity of the data you've downloaded."


File table at the bottom of the dataset page:

filetable
This page will use the exact same help text as the other page. As discussed with @mheppler and @TaniaSchlatter, the gray general instructions should go on the right side of the file table, above "Local Access". The three headers will have the same tooltips as the headers on the file page.

Text styling

All red-on-pink text should be changed to highlighted black text, as in this earlier mockup:
font2

@dlmurphy looks good to me

One update - these UI changes should be conditional on the downloadMethods setting that was previously added (for this purpose).

It doesn't say so in the checklist explicitly, but, there is no "Download" button on the page for a package file, correct? (just to confirm). I.e., that "Data Access" text on the dataset page, in the mockup above, replaces the current Download button - correct?

If so, the checklist really should say "render logic to hide download count AND download button"...

It sounds so far like this issue is mostly about changing the rendering of the dataset and other pages for "package" files.
So then, in order to work on it, you just need a file with the "package" mime type. One way of creating such a file - aside from actually running a successful DCM upload - would be to pick a dataset with a single file; find the file in the DataFile Postgres table; and change the ContentType field to application/vnd.dataverse.file-package.

(and, it sounds like, you don't need this Datafile to contain an actual physical package directory on the filesystem; at least not for the purposes of working on the page...)

@landreev - you are correct; there shouldn't be a "Download" button

Note from sprint planning meeting:

  • We closed out #3249. Does this need to be handled in this issue? If so, @pameyer can you please update the checklist before we too many items done?

@djbrooke "Data Access" checklist items and mock-ups cover #3249; so I don't think there's anything to update from it into this one.

I have a few questions:

  1. In the checklist, for the dataverse page and dataset page, the checklist consists of hiding the metric block for dataverses and dataset. Will the metric block be hidden if rsync is enabled?

2.I saw the renders @dlmurphy made. The download links which are provided for various providers, would it make sense to have a dropdown with the provider names, and when a particular provider is chosen, we can provide that specific link? This is more of a suggestion than a question. My argument is, that if(and this is a questionable if) the list of providers increase i the future, the dataaccess block, or the files tab would look very large accomodating all the providers.

@rbhatta99

  1. right; the information going into the metrics block comes from the dataverse application, and downloads not going through the application (aka - all rsync downloads) aren't counted.

  2. potentially; probably makes sense to keep this in mind when thinking about generalizing dataset locality

Merged branch '3348-RSync-FileUpload-Workflow' into 3998-rsync-download 3b7051292843318fbd0d4462b4042f9566f31e65

Just added commit 8ddef8ef4a934b83f72b17015968877e7388707e (but forgot to link it to this issue). Documentation should be all set for this issue now!

(filtered) notes from another run-through.

  • [ ] "submit for review" button is visually active when a dataset is locked for DCM upload
  • [x] possible escaping typo in email notifications for DCM success: Dataset <a href="https://localhost/dataset.xhtml?persistentId=doi:10.5072/FK2/BQIIEF" title="$title"&>$title</a> has been successfully uploaded and verified.
  • [ ] "upload files" button visually active (but correctly non-functional) when a dataset is locked for review
  • [ ] "dataset has been published" email goes out before a workflow has finished; move notification to “post-publish” from “pre-publish”?
  • [x] tooltip for download access needs rewording (suggests that the command is executed on the mirror vs downloading from it)
  • [ ] "reason for return" lost when a curator sends a dataset back to the depositor via API (~did not check UI,~ but @kcondon also found this).
  • [x] data access tab info might make users think that they can download/access unpublished data.

Which users get sent notifications for events appears inconsistent; this needs further investigation to determine if these are bugs, user error, or intended behavior.

  • Dataset creator is notified about dataset creation through UI (not API)
  • DCM success notifies dataset creator, other users with the same permission level (dsContributor) on the same dataverse, curator (2x); same behavior for DCM failure.
  • "submit for review" notifies goes to curator only
  • "return from review" notification to depositor (2x), and other users with same permission level on the same dataverse.
  • "dataset has been published" email goes to depositor (2x), other users with same permission level on the same dataverse; also sends a "you have been assigned a role" notification to the depositor.

Good stuff:

  • unanticipated case (major version of dataset published, update metadata to add PDB ID, publication DOI, etc not available at original publication time; submit for review; publish minor version) works as expected (assuming the RSAL can handle it, which won't be a problem).
  • run upload script -> generate intentional corruption -> DCM failure -> depositor re-runs script -> DCM success pathway keeps working
  • API usage still works

As part of the above commit, I slightly reworded the email strings. We can support html in our emails, but "just turning it on" breaks the return formatting in our old emails, so it seems better to fix this later.

Authoritative List of Open Issues - IN ORDER OF CURRENT PRIORITY- Please add to and update this list as needed.

  • [x] ~create dataset API reports a "Bad api key" error for valid API keys.~

  • [x] occasionally (intermittently, 2x out of 12) file fails to upload with null ptr error in logs. No rollback happens due to rollback bug and dataset remains locked.
    [2017-09-21T15:50:05.600-0400] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.engine.command.impl.ImportFromFileSystemCommand] [tid: _ThreadID=50 _ThreadName=jk-connector(2)] [timeMillis: 1506023405600] [levelValue: 1000] [[ Directory/usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/AZ3OJP does not exist.]]

Likely cause NFS congestion; handled by having DCM retry success API (currently 1x) and suggest (on DCM side) that it should trigger an ops/curation alert.

Done or deferred

  • ~possible escaping typo in email notifications for DCM success: Dataset <a href="https://localhost/dataset.xhtml?persistentId=doi:10.5072/FK2/BQIIEF" title="$title"&>$title</a> has been successfully uploaded and verified.~ (@matthew-a-dunlap )
  • [x] tooltip for download access needs rewording (suggests that the command is executed on the mirror vs downloading from it)
  • [x] data access tab info might make users think that they can download/access unpublished data.
  • ~clicking on the dataset link the the "success message" ignores port in url scheme for the link (aka - dev environment on localhost:8088 has localhost in link target). Only link I've noticed with this behavior; very low probability of impacting users in production; does not need to be fixed in this issue.~
  • ~"reason for return" lost when a curator sends a dataset back to the depositor via API~ (not lost; functionality for sending it never existed)
  • ~"dataset has been published" sends a "you have been assigned a role" notification to the dataset creator (should be sent on dataset creation/ when role is actually assigned.)~ #4141
  • ~"submit for review" ideally should go to dataverse contact in addition to curator (feature request)~ #4143
  • ~Dataset creator is not notified about dataset creation if dataset created using API~ #4142
  • ~There was a significant delay in file upload. If this is an intentional delay for testing, make sure it is removed.~ Confirmed with Pete that the delay was intentional
  • ~"dataset has been published" email goes out before a workflow has finished; move notification to “post-publish” from “pre-publish”?~ (@matthew-a-dunlap)
  • ~"submit for review" button is visually active when a dataset is locked for DCM upload~ (@sekmiller)
  • ~"upload files" button visually active (but correctly non-functional) when a dataset is locked for review~ (worked on by @rbhatta99)
  • ~Versions tab on file landing page never loads~ (worked on by @rbhatta99)
  • ~File replace should not be available once a file has been uploaded~ (@sekmiller)
  • ~deleting a package file from a draft dataset leaves an empty directory structure behind.~ (worked on by @rbhatta99)
  • ~Message block text format has a lot of trailing hyphens. Only the label is being passed.~ @mheppler
  • ~Upload button needs to stay active, showing instructions and download script access when ds locked before submitted for review.~ (@rbhatta99)
  • ~Add a "Welcome" message to the top of the script~ (@dlmurphy)
  • ~Make phrasing in the instructions – the order and word use – match the script (related to https://github.com/sbgrid/data-capture-module/issues/24)~ (@dlmurphy)
  • ~Page auto refresh blinks the CSS (the "CSS-seen-as-text-for-a-brief-moment" condition is Chrome only; but the files table gets improperly resized after an auto-refresh in all/most browsers)~
  • ~When rsync is configured files can still be uploaded / replaced via API.(@sekmiller - working on this in conjunction with delete messaging above. Completed for api and SWORD)~
  • ~Add conditional render logic to download instructions for unpublished draft _"Data files can not be accessed until the dataset draft has been published. For more information about downloading and verifying data, see our User Guide."_~ @pdurbin
  • ~loading a dataset page triggers a server log exception "Problem getting rsync script: null". Proximal cause appears to be DCM not returning JSON when the HTTP status isn't 200 (which is fixable on the DCM side - sbgrid/ data-capture-module#22); root cause probably DV hitting sr.py unnecessarily.~ (@rbhatta)
  • ~success message is "the selected files have been updated"; could see this being confusing to end users. (on delete )~ (@landreev)
  • ~updated/streamlined publish email logic (see https://github.com/IQSS/dataverse/issues/4141, it is pulled into 3998 branch tho)~ (@matthew-a-dunlap)

For future reference, these are symptoms of multiple roles being assigned to users un-necessarilly:

  • ~DCM success and failure notify all users with dsContributor on a dataset not only dataset creator~ (user error)
  • ~DCM success and failure notify curator 2x~ (not user facing)
  • ~"return from review" notification to depositor (2x)~ (user error)
  • ~"return from review" notification sent to all users with dsContributor level on the dataset~ (user error)
  • ~"dataset has been published" email goes to depositor (2x)~ (user error. will still occur when a user has multiple roles intentionally)
  • ~"dataset has been published" email sent to all users with dsContributor level on the same dataset~ (user error)
  • ~Returning dataset to author sends two notifications and emails to author.~ (user error)

From discussions with @scolapasta and @kcondon , it's likely that some of the user notification items can be resolved by changing while roles are given to which users. I'll investigate and update these items (and create issues as necessary).

I may have been able to fix that annoying "CSS blink after the page auto-refreshes" issue. (the "blink" itself was a Chrome-only issue; but the auto-refresh was also causing the files table to be resized incorrectly, and that was affecting all (or most) browsers. I've updated the checklist entry).

It was all caused by the "update=@all" in the autorefresh block. It's just that for whatever reason I couldn't get it to properly refresh all the affected page parts without resorting to "all", back when I was working on it for tabular ingest. (so, this was not anything introduced in these rsync branches; we've had this issue forever).

But, as I said, I think I got it to work, by updating just the affected components (the files table; the citation; the edit/publish/etc. buttons; and the message panel). But please re-test carefully. And see if I missed anything.

screen shot 2017-09-21 at 11 57 20 am

Wait, my commit with the message fix didn't seem to go through. Trying again.

@kcondon I've added a similar success message to file delete on the single file page as well.

Was this page helpful?
0 / 5 - 0 ratings