Dataverse: Allow users to upload a file to use as a dataset thumbnail

Created on 6 Jan 2017  ·  31Comments  ·  Source: IQSS/dataverse

Currently, users can select one of the files in a dataset to serve as a thumbnail for a dataset. This is selected by editing the metadata for a file.

screen shot 2017-01-05 at 11 28 18 pm

Instead of selecting a file that's part of the dataset, allow a user to upload a file to serve as the thumbnail.

To do list

To anyone who picks up items on the to do list below, please ensure that all the tests in SearchIT.java continue to pass. The code is in the 3559-dataset-thumbnail branch.

  • [x] Switch from canIssue UpdateDatasetThumbnailCommand to Permission.ViewUnpublishedDataset or remove the canUpdateThumbnail check entirely. Removed in 63e40d0.
  • [x] Double check performance on search cards (see #3631). An extra lookup per dataset were added to the SearchServiceBean. Removed the extra lookup in bd280ad.
  • [x] Disallow restricted files as thumbnails. Enforce in getThumbnail, getThumbnailCandidates, and but not EditFilesPage because we are trying not to touch that old workflow. We can't write API tests around restricted files because it's a GUI-only feature (#2497). Fixed in fb12e02.
  • [x] Consolidate the dataset.isReleased logic from Datasets to DatasetUtil. Fixed in ceb629b.
  • [x] Add back in the concept of selecting a thumbnail automatically at runtime, if available, when a dataset logo hasn't been uploaded and when thumbnailfile_id is null. Fixed in 36e359f but not that the method will not select a restricted file.
  • [x] There are two DataFileConverter.java classes. Rename either src/main/java/edu/harvard/iq/dataverse/dataaccess/DataFileConverter.java (existing) or src/main/java/edu/harvard/iq/dataverse/DataFileConverter.java (new in this branch) because this is a code smell: http://stackoverflow.com/questions/6647510/bad-practice-to-have-two-classes-of-the-same-name-in-different-packages/6647540#6647540 . Renamed in 36e359f.
  • ~When thumbnailfile_id is null, run the automatic thumbnail selection logic and persist the value of the file id to thumbnailfile_id. Progress in a66fce2.~ We decided not to to this after all.
  • [x] File size limit validation to UI
  • [x] Move sizeLimit="500000" from UI to a method on SystemConfig and write a failing test in SearchIT to exercise it.
  • [x] Add a way to test with an automated test what image shows on a search card. Progress in f590ede. Good enough as of a56829b.
  • [x] Make sure staging file is never left in the dataset directory after navigating away from the page or whatever. No extra cruft left behind. Probably put the staging file in a temp directory rather than the dataset directory. Fixed in 743c5b8.
  • [x] Filenames are missing from "Select Available File" as of a56829b, switched to a number for now. Fixed in dec5546.
  • [x] update the file upload tool tip to take a dynamic maximum size
  • [x] Hard code 500 KB as a size limit in a new method in SystemConfig. Plan to re-use the method for the Dataverse logo upload. Done as of 27c5e3b.
  • [x] Revert code having to do with thumbnail sizes. Done in c687022.
  • [x] Select Available File popup doesn't reflect that a file in that table is already selected (see screenshot above). Fixed in 7553fdb.
  • [x] Updates to User Guide > Dataset + File Management. Consider documenting "rules" above.
  • [x] Review accepted file types (.jpg vs. jpeg), make them consistent with logo at Dataverse level
  • [x] Capture save logic in DatasetWidgetsPage in command similar to UpdateDataverseThemeCommand or in a service bean to be re-used by the API. Done in db1c03e.
  • [x] UpdateDatasetThumbnailCommand remove staging path and use only input stream. Fixed in 771a4b8.
  • [x] Make sure EditDatafilesPage uses new UpdateDatasetThumbnailCommand. Done in 2f8f9cb.
  • [x] When you upload a file that's too big you see text squished together like "Invalid file sizecoffeeshop.png 531.2 KB". It should say "Invalid file size coffeeshop.png 531.2 KB" with a space between "size" and the filename.
  • [x] Fix NullPointerException from DataFile.getLabel when switching from one DataFile to another. Fixed in 87b728c. Not sure why this started happening.
  • [x] Rather than generating "data:image/png;base64," + imageDataBase64 on the fly, generate thumbnails based on imageDataBase64 and save them to disk. Use dataset_logo.thumb48 as the filename. Fixed in e7a429c.
  • [x] Create an API endpoint to return the thumbnail (we used to have this but it broke and #3616 was opened to address it some day). Started in 56c33cf. Will be secured as part of an item below. It's only for datasets, not dataverses or files.
  • [x] For datasets but not dataverses or files, switch from <h:graphicImage value=" to ` to <h:graphicImage url=" on dataset page. Fixed in ebfe4da.

QUESTIONS

  • Add functionality to allow SVG file types #2843 Out of scope for this issue.

Out of scope:

  • #3714 MyData thumnbnails do not appear for dataverses or datasets
  • #3616 image_url from Search API results no longer yields a downloadable image
  • #2843 Dataverse Theme + Widgets - Support SVG upload for Logo Image
  • #3671 Dataverse Theme + Widgets - Broken TIF, TIFF upload for Logo Image
  • Out of scope for now but logging for later: the "old" way of changing a thumbnail (selecting thumbnails and "Metadata") should not be considered a feature, and should be removed.
  • Bring back the use of <span class="icon-dataset text-info" for on search-include-fragment and datasets.xhtml. Figure out how to implement a render based on url="/api/datasets/#{result.entityId}/thumbnail" returning null in h:graphicImage. Try http://stackoverflow.com/questions/25570159/how-to-render-pgraphicimage-conditionally-depending-on-if-the-resource-exists/25570339#25570339 .
  • Ideally, rather than serving static images from the datasets API endpoint, we'd serve these static images from a CDN (#2647). Currently the reason we serve them from Glassfish so that expensive and complicate permission logic can be enforced. We could avoid this by having a simple rule that only non-restricted files can be used as thumbnails.
  • Consider going ahead and saving dataset_logo_thumbnail.png.thumb64 and dataset_logo_thumbnail.png.thumb400 versions for future use. @TaniaSchlatter has some thoughts on placement of the thumbnail on the dataset page (make consistent with search cards) but didn't mention wanting a larger thumbnail.
  • SUGGESTION: Should we continue to use the new useGenericThumbnail boolean or not?
  • SUGGESTION: Is DatasetThumbnail object necessary?
  • Fix inconsistent bug where the "Edit" button on the dataset pg is gone unless you go back-and-forth-back-forth from the dataverse to the dataset pg a few times This is probably due the the dev1 server not having the Weld patch applied...
  • ~_(Usability testing recommendations TBD)_~ (see https://github.com/IQSS/dataverse/issues/3559#issuecomment-283681234 )
  • ~Remove existing thumbnailfile_id field from dataset table?~ We decided against this.
  • ~SUGGESTION: In order to simply the getThumbnail logic, always copy the chose thumbnail to a consistent location, such as dataset_thumb48.png. For example, if DataFile 10.5072/FK2/MHWMM8/15ab8bd180d-a3cde4ffa854.thumb48 is selected, copy it to 10.5072/FK2/MHWMM8/dataset_thumb48.png. If a dataset logo 10.5072/FK2/MHWMM8/dataset_logo.thumb48 is selected, copy it to 10.5072/FK2/MHWMM8/dataset_thumb48.png.~ We decided against this.
SBGrid File Upload & Handling Metadata Feature

All 31 comments

Is there background info for this request?

@TaniaSchlatter yes! This came out of a meeting yesterday with @pameyer @djbrooke @scolapasta and myself. Notes are at https://docs.google.com/document/d/1Cp0myGJKAMWQLIT8wHE8JLitUOsySUM0S7yQ_rx4Cgw/edit?usp=sharing and this specific item appears as "dataset logo" at https://docs.google.com/a/hkl.hms.harvard.edu/document/d/1idgIT_BOOGDuhR5j9RSOi-3NdrPhRS60N2O953LvzVM/edit?usp=sharing

Disciplines that will be supported as part of the SBGrid grant require that the collections of files in a dataset remain intact and untouched by any process. If a structural biologist wants to add a cool thumbnail to show off the nature of her dataset, she's currently not able to do so because adding that file for thumbnail purposes would change the nature of the dataset.

We can quickly create mockups to get this ready for development by looking at the Theme + Widgets page for a dataverse, and applying that to this new workflow for a dataset.

screen shot 2017-01-11 at 11 19 24 am

I just mentioned to @pameyer that I would expect the image_url field in the Search API to continue to be populated for datasets that have the new alternative thumbnail (once it's developed). Here's how it looks:

"image_url":"https://demo.dataverse.org/api/access/dsCardImage/2"

That's from http://guides.dataverse.org/en/4.6/api/search.html#advanced-search-example

Some tasks we identified in sprint planning today:

  • Determine the entry point for thumbnail updating
  • Determine where the file lives
  • Change the logic for uploading vs. file selection for thumbnails - we want these to both be available
  • Validate the file types (we don't want .exes)

Discuss this feature quickly with @pdurbin and @pameyer to review a mocked up workflow (see attached) that is based off how we upload logos for a dataverse. Here are some decisions we've come to about it.

  • Entry point will be through the dataset's Edit button, where the link for Widgets will now read "Thumbnail + Widgets"
  • The Thumbnail Image upload component will live in a tabbed pg, which is currently just Dataset Widgets, just like the dataverse has a Theme + Widgets pg
  • The validation and sizing and all of that will be based on the existing dataverse logo and data file preview thumbnail logic, and we'll discuss any pitfalls as they arise in development
  • The current default functionality for generating dataset thumbnails through data file uploaded to the dataset will not change, data owners will still be able go to the Edit Files pg and set the thumbnail, even if they have manually uploaded a new thumbnail image
  • When a default thumbnail is being used by a dataset, it will be displayed in the new Thumbnail Image upload component (with no Remove button), and the user can manually upload a new image to overwrite it

Question to be answered:

  • Can this Thumbnail Image component also be added to the add dataset workflow, like we currently do with the data file upload component?

screen shot 2017-02-01 at 12 15 12 pm

@mheppler I added some wireframes in 9c9025a but I could use some help cleaning up the UI as well as thinking about all the implications on the bundle properties and the names of xhtml files. Here's a screeshot of what I have so far. The branch is "3559-dataset-thumbnail".

screen shot 2017-02-02 at 1 23 58 pm

First some terminology:

  • dataset file: A dataset file is part of the dataset and a thumbnail for both the file and its dataset may be created from the dataset file automatically if the file is an image, for example. This is existing functionality.
  • dataset logo: A dataset logo is not one of the files in the dataset. The thumbnail for your dataset can be created based on the dataset logo you upload rather than a dataset file. This issue is about adding the ability to upload a dataset logo.

After standup yesterday, @scolapasta @landreev @mheppler @kcondon and I met and decided the following:

  • Instead of the altthumbnail field on the dataset table idea I've been demoing, we'll store multiple resolutions of the dataset logo on the filesystem under, for example, /usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/ZQJXKL in multiple resolutions with filenames like this:

    • dataset_logo_thumbnail.jpg

    • dataset_logo_thumbnail.jpg.thumb48

    • dataset_logo_thumbnail.jpg.thumb64

    • dataset_logo_thumbnail.jpg.thumb400

In addition, Mike and I have been talking about wanting something along these lines:

  • On the new "Dataset Thumbnail + Widgets" page:

    • Indicate if a thumbnail has already been set from a dataset file. Make sure the user understands that by uploading a dataset logo the exisiting thumbnail from a dataset file, if any, will no longer be used. Working well as of d4bbcff.

  • On the "Edit Files" page:

    • If a dataset logo is in place and the user clicks "Set Thumbnail" on a dataset file, a confirmation should be shown that the dataset logo will be deleted. Fixed in 08c660e.

I've already uploaded a screenshot of a early "Dataset Thumbnail + Widgets" prototype at https://github.com/IQSS/dataverse/issues/3559#issuecomment-277040204

Here's how the existing "Set Dataset Thumbnail" looks on the "Edit Files" page:

screen shot 2017-02-06 at 2 56 18 pm

Mockup for new Select Dataset Thumbnail popup.

screen shot 2017-02-10 at 10 16 21 am

This morning I demo'ed the code as of 6911217 to @mheppler @pameyer @scolapasta @dlmurphy @landreev @jggautier and @sekmiller and here's the UI/UX stuff I'm going to work on next.

  • Thumbnail + Widgets page

    • "Save Changes" should show success message and redirect to the dataset page. Fixed in 05e81da.

    • Bug: We see "Default Icon" when the search cards show a randomly selected image in the case when thumbnailfile_id is null on the dataset table. Fixed in b49388b

    • Bug: After selecting a new thumbnail from the radio button, the thumbnail shown is not refreshed. You have to refresh the page. Fixed in 13bbf4e

    • Need success messages for various actions. Fixed in 05e81da (the only success message is when you click "Save Changes").

    • Changes need to be staged. Fixed in 0e84f81

  • Edit Files page

    • Bug: Selecting a file didn't work the first time. The wrong thumbnail was selected. Cannot reproduce as of 86fd48f. Seems possibly related to #2677 and #684.

    • ~Changes need to be staged.~ Decision by @mheppler and @pdurbin on 2016-02-17 to leave the code alone and not stage changes on this page, mostly to prevent additional draft versions from being created (see #3633).

Once we get the UI/UX the way we want, @scolapasta @landreev and I plan to talk more back end implications including upgrade scenarios.

In b49388b I exposed the existing concept of a "Automatically Selected Thumbnail". Here's a screenshot:

dataset_thumbnail_ _widgets_-_chemistry_project_-_2017-02-14_18 41 15

@landreev and I agreed that a useGenericThumbnail might make sense to add. It's a way to tell Dataverse, "I don't want any of the automatic thumbnails from files in my dataset. Just give me a generic dataset icon."

@mheppler and I just talked through some rules we want:

  • If you click the "Remove" button, it should always switch you to the default dataset icon.
  • If you upload 5 images files, Dataverse will automatically select one as your dataset thumbnail. We aren't changing this behavior but if you make any decisions about thumbnails by explicitly selecting the thumbnail from one of the dataset files or by uploading a dataset logo image (a non-dataset file), you have forever left behind the automatic selection feature.
  • Under "Don't make me think", we plan to remove the text that shows "Automatically Selected Thumbnail" because the user probably doesn't need to know about automatic selection. If she just created a dataset and uploaded 5 files, obviously one of them was chosen by the system as the thumbnail for the dataset.

I haven't implemented any of this stuff yet.

I just deployed d4bbcff to the dev1 server and gave @dlmurphy a walk through of the changes:

  • fix "double remove" bug
  • clicking Remove should always switch to default icon
  • don't show "Automatically Selected Thumbnail"
  • no return to autoselect once you make a choice

Next I plan to work on the concept of staging so that the Cancel button allows you to actually back out of changes. As of this writing the "Save Changes" button tells you you've updated the thumbnail but in truth it has already happened because the changes are not currently staged. I also added a boatload of API tests and all functionality is available via the API as well. I'm not sure if staging makes sense for the API or not.

Next I plan to work on the concept of staging so that the Cancel button allows you to actually back out of changes. As of this writing the "Save Changes" button tells you you've updated the thumbnail but in truth it has already happened because the changes are not currently staged.

This should be fixed as of 0e84f81. I still need to circle back to the "Edit Files" page mentioned above.

There's also more backend refactoring I'd like to do. For example, there are unsecured API endpoints that I'd like to have respect permissionsWrapper.canIssueCommand(dataset, UpdateDatasetCommand.class) like the dataset thumbnails page. Or I could hide these API endpoints behind the "admin" API if we don't think people will ever want to use them.

We just noticed a couple things from usability testing:

  • The help text says only 500 KB is allowed but @dlmurphy was just able to upload a ~5 MB file
  • The user was unable to upload a TIFF file. I wonder if it had an extension other than "tff".

Update 2017-03-10: This list has been moved to the description above to avoid having to scroll around to find it. -- @pdurbin

Do we need to also add "changed dataset thumbnail" to the Dataset Version Differences summary and details?

@sekmiller not unless it is something that is saved in the metadata and versioned, which it is not at this time.

The biggest advantage of SVG (for this type of use) is resolution independence; my take is that this is less of an issue targeting desktop browsers than mobile browsers.

Usability testing recommendations from UX team:

  • [x] Add 48x48 thumbnail to dataset pg, consistent layout compared to dataset result card on dataverse pg. Added in 13aece1 but could use clean up.
  • ~Change the name of the “Edit” button to “Edit Dataset", this should also be applied to "Edit Dataverse" as well as "Edit File"~

Update: Just met with Mike and Julian. We've decided to table the recommendation on changing the "Edit Dataset" button (and other Edit buttons) for now due to some potential downsides to the change. We'll go over it again once Tania's in the office next week.

I just made pull request #3703 and put this issue in Code Review at https://waffle.io/IQSS/dataverse

Post-pull request To Do list:

  • [x] Bring back the use of
  • [x] After the <span class="icon-dataset text-info" logic is back, get REST Assured tests in SearchIT working again. Done in b587491
  • [x] Make the default icon look right on the Dataset Page. Done in 88f0e8b.
  • [x] Write more unit tests?
  • [x] Reduce logging. Done in d7b6667.
  • [ ] Should we be using the map for performance in SearchIncludeFragment?

@scolapasta @sekmiller @landreev and I discussed the code in pull request #3703 on Friday.

I believe the only thing left is that @landreev said he's like to get on the branch and look at how for datasets I removed the use of dvobjectThumbnailsMap and dvobjectViewMap.

I removed @sekmiller and myself from this issue. If there are any more changes required from either of us, please let us know. With or without the proposed change above, we are ready to advance this issue to QA.

Please note that the main entry point for QA should be the new "Thumbnails + Widgets" section of the Dataset + File Management page in the User Guide: https://github.com/IQSS/dataverse/blob/82b662f50614c86ecf04914ef3342911643b4d37/doc/sphinx-guides/source/user/dataset-management.rst#thumbnails--widgets

Updated todo list from @scolapasta after @sekmiller and I met with him just now:

  • [ ] Leonid: Should we be using the map for performance in SearchIncludeFragment?
  • [x] Steve: Why is there a new null check in DataFile? Steve will revert and try to reproduce the bug. Done in 2b81fee. Bug was not reproducible.
  • [x] Phil: EditDatafilesPage.java Get rid of logger.info("unexpected... should never get here"). Done in 7aec7c9.
  • [x] Phil: Rename "logo" to "thumbnail" in Datasets.java. Done in 87cd847.
  • [x] Phil: Dataverses, comment out unsupported API for setting dataverse logo. Done in a1eec7d.
  • [x] Phil: FileUtil rename to DATA_URI_SCHEME. Done in efa691a.
  • [x] Steve: see if we can consolidate any methods - noted the calling structure in comment
  • [x] Steve: remove extra import from UpdateDatasetCommand. Done in aa134db.
  • [x] Phil: add ui:remove in header comment. Done in 4f0f601.
  • [x] Steve: wait for Leonid and investigate isDisplayImage boolean - removed isDisplayimage and valueSet booleans

Discussed:

  • Smarter way to test instead of Admin.java
  • For restricted files, don't hide the "Set Thumbnail" button on Edit Data Files page. Inconsistent with the rules on the new page but too much work.

@sekmiller
I'm done with the review (submitted it in the PR)

@kcondon heads up that I just opened #3671 with @sekmiller looking over my shoulder as I took screenshots and added it to the description of this issue as out of scope.

I added some more logging in c7aba71 to DatasetUtil and now I'm seeing this error on dataverse-internal: UnirestException caught attempting to GET https://dataverse-internal.iq.harvard.edu/api/datasets/336/thumbnail and exception was: com.mashape.unirest.http.exceptions.UnirestException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

When the error above is thrown, thumbnails do not appear on any methods that call DatasetUtil.getThumbnailImageString (search cards and dataset page).

After discussing with @scolapasta in 9ddd0a4 I went ahead and ripped out the slightly-odd concept of having backing beans call new Dataverse API. No more Unirest, so no more UnirestException. I doubt @sekmiller will have a problem with this since he and I already discussed how we already have the dataset in our hands on the dataset page. Likewise, @landreev and I talked about how there's already a lookup of a dataset happening in the new thumbnail API, so in that last commit we are moving that lookup to the SearchServiceBean. @scolapasta points out that there's the potential for a performance hit in adding the lookup to the SearchServiceBean in the sense that around 4.2 we introduced a multi-phase/multi-pass approach to showing thumbnails and now the dataset thumbnails will have the expense of looking up the dataset in phase 1 rather than phase 2. I suspect the performance will be good enough but we can revisit if necessary.

@kcondon I already ran a build on dataverse-internal. Thanks for finding that bug. Dataset thumbnails no longer care about "siteUrl" as of this fix I just made.

Found a couple issues:

  • Default thumbnail from available files appears on dataset page next to title when draft, disappears/ reverts to default icon when publish dataset, though dataset search card and thumbnail/widgets page shows it as thumbnail. I think this may be due to default thumbnail not identified as selected thumbnail when you edit file metadata. Fixed in 57331c4.
  • Image files which were too large to generate thumbnails or do not have thumbnails still appear in the select file list but with just their name. Since these are not really meaningful, maybe they should be filtered out? Fixed in 0a2b503.
Was this page helpful?
0 / 5 - 0 ratings