Dataverse: Modular Explore

Created on 1 Mar 2017  Â·  49Comments  Â·  Source: IQSS/dataverse

Right now we have some external tools that we allow users to "explore" with (TwoRavens, GeoConnect). We'd like to make this more modular to make it easier for installations to add new tools.

Ideally, this would involve adding a new table that could know the url of the tool and other needed info (type of file?), but it may also involve having to write a handler for each. For this, we would want to follow the SPI model, and provide a default handler for simple apps.

Code Infrastructure UX & UI Feature

Most helpful comment

HI all,

I think tracking the tool would be a good idea from the start. ADA will certainly be interested in understanding whether users are using DataExplorer or TwoRavens (or some other tool).

Regards,
Steve

All 49 comments

Some more on the modularity and SPI topic:

Also, https://projects.iq.harvard.edu/dcm2017/agenda shows " SPIs and Modularity: Are you interested in learning about Dataverse's emergent modular architecture?"

  • We'll limit this to the file level for now
  • Towards to goal of small chunks, the scope of this issue is that we want an installation to be able to run Data Explorer and TwoRavens in parallel.
  • @scolapasta will edit this to add additional details

Repeating part of comment from #4230:

Besides the more obvious dynamic text and links we need for external tools, a couple of aspects that we need to handle in a modular way are:

When to show the links to the external tool
How to handle the link to the external tool (i.e. wha dynamic info to send)
In both these cases, we can think of building these in incremental ways, with each step building on the previous step, building from simplest, but supporting fewer tools, to most complex, but also most flexible.

For when to show the link, we have 3 steps:
A. simple list of tools, will only support one operation (i.e. Configure)
B. Add "Operation" to list, in order to support multiple operations (i.e. Configure vs. Explore)
C. Add "Subject of Operation" in order to support operations on different kind of objects (i.e. tools that work on tabular files, like DataExplorer, vs tools that work on geospatial, like World Map.

For how to handle link we similarly have 3 steps:

  1. Dictate the exact parameter names and values that an external tool will need.
  2. Define parameters dynamically from a list of "reserved" terms that we define.
  3. Define individual "Handler" Java classes and SPI interface, in order to handle any tool.

In order to support Modular Explore for Two Ravens and DataExplore, the minimum we technically need is A1. However, since we are working on Modular Configure first, that bumps us to B1. In addition, in this scenario we would need to contact the external tools and have them change their parameters and we would rather be more flexible. So the decision is that the definition for done for this issue is B2.

Note, that we would still not be able to support geoConnect / WorldMap in a modular way yet, until we support at least C2, but per the previous comment, that is out of scope for this issue.

Thanks @scolapasta for taking the lead on breaking this up before next week's backlog grooming session.

@michbarsinai I pinged you in Slack because I said we'll need to write a more intelligent parser. Please see 272b4c2 for a terrible kludge and tests with examples of what we need. Thanks.

We discussed this issue for several minutes in sprint planning today. To me, the absolute requirements are:

  • [x] Add a new reserved word for siteUrl which is required by Data Explorer (#4249). Done in 5ae8418
  • [x] The "Explore" button needs to be a drop down that dynamically includes "external tools" that have been added to the installation of Dataverse. Done in 5d18067.

In 5ae8418 I stubbed out some code for Modular Explore (has TODO's).

As I poke at the code behind the "Explore" button for the first time I'm realizing how complex it is. The business logic seems to be:

  • Should we show a single Explore button or a dropdown Explore button with multiple tools (TwoRavens and WorldMap) under it?
  • When you click the tool, does it open immediately (perhaps in a new tab), or is the user first presented with a popup to either agree to terms of use or write in a guestbook (or both?)?

This is assuming the file isn't restricted or that you're logged in and have access, of course.

Here's how a dropdown Explore button with a popup for a guestbook looks, an example from https://dataverse.harvard.edu/file.xhtml?fileId=3090851 where the user has clicked "WorldMap":

screen shot 2018-01-05 at 12 21 10 pm

One of the main reasons we're working on this issue is that we want to support external tools such as Data Explorer (#4249) so let's take a quick look at Data Explorer in action, an example at https://dataverse.scholarsportal.info/file.xhtml?fileId=8988 in the screenshot below. In this case, the user is only one click away from launching Data Explorer in a new tab:

screen shot 2018-01-05 at 12 10 08 pm

"Modular Explore" is being built on top of the "Modular Configure" work we did in #4230. It makes use of the same externaltool database table and all the docs we wrote on the new "External Tools" page at https://github.com/IQSS/dataverse/blob/8b9e125b2a4247162f0d760b535c5c5a08420894/doc/sphinx-guides/source/installation/external-tools.rst apply (thought we'll need to tweak them a bit). I bring this up because under the Modular Configure way of doing things, there is yet another popup where the description of the tool is seen along with a "Continue" button. (Please note that "Explore" tools like Data Explorer will not appear under "Configure" like this once we add a column to the externaltool table to differentiate tools as "explore" tools vs. "configure" tools.) Here's a screenshot:

screen shot 2018-01-05 at 12 04 58 pm

Personally, I think I would be annoyed by the extra click, having to click "Continue" each time. I'm wondering if I should just make Data Explorer behave like it does in production for Scholars Portal today, which is to say that an additional "Continue" popup isn't required when you first go to launch the tool. TwoRavens doesn't require a Continue button.

I sure hope this is making sense. In short, I'm thinking about all the various popups and popup logic. I want to be mindful of not increasing the scope of this issue dramatically, but I'm trying to consider what sort of user experience we want.

For what it's worth, I'm deploying code to https://dev1.dataverse.org but please check with me at http://chat.dataverse.org for the latest info since the code is changing a lot as I hack on it.

@pdurbin For what it's worth, I agree with you about avoiding extra clicks (aka - having to click to get past a popup that tells me what an external tool is).

Thanks for weighing in @pameyer . I see @kevinworthington gave your comment a thumbs up which I take to mean that he also doesn't want extra clicks just to see the description over and over again. I'm still interested in feedback from anyone who's interested, of course.

Meanwhile, I've been working on a way to distinguish between "explore" tools like Data Explorer and "configure" tools like PSI.

As of 233cdc1 (which I just deployed to https://dev1.dataverse.org ), the external tool manifest format (JSON) now supports configuring "explore" vs "configure" tools. Next I'll probably see if I can weave together the existing "Explore" button with the new dynamic list of "explore" tools, including making sure the popup logic is consistent. If I get blocked, I'll make noise.

As of 5d18067 I have dynamic explore tools showing up under the Explore button so I checked off the todo list item above.

In 0e1c1e9 I started working on a new todo:

  • [x] Show popup for terms of use, guestbook, etc with "Accept" button as needed, just like for TwoRavens. Done in ef87fde

I have this code working on a file page (screenshot below) but it doesn't work on the dataset page for some reason. (There's a TODO in that commit about this.) I've been chatting with @sekmiller about this and he's trying to help (thanks!). He's working on #4393 which is related since it has to do with guestbooks. We're want to make sure we resolve any merge conflicts, eventually.

screen shot 2018-01-08 at 2 09 47 pm

I just asked for help in ##jsf on freenode: https://javabot.evanchooly.com/logs/%23%23jsf/2018-01-09

I just went over fixes in f460d5b and ef87fde with @scolapasta and made pull request #4407.

My mental model of this issue has been:

  • Write enough code so that installations can install and use Data Explorer (#4249).

I think this has been delivered in pull request #4407.

@scolapasta said he would do some code review and make a todo list here.

Todos from Code Review:

  • [x] Add service bean method that includes type as a parameter. Done in 04be093.
  • [x] use externalTool instead of externalToolHandler for the xhtml (and corresponding backing beans); consolidate up related code. Done in 04be093.
  • [x] move constants from ExternalTollHandler to ExternalTool. Done in 515c351.
  • [x] move reserved word enum from ExternalToolServiceBean to ExternalTool. Done in 05dd4bf.
  • [x] consolidate getTools logic in DatasetPage. Done in b502c21.
  • [x] for null APItoken, don't pass key. Done in 6f48df7.
  • [x] Add comment to null exploreTool workaround (or resolve issue). Done (comments added) in 332c358.
  • [x] use ExternalTool logic for TwoRavens. Done in 480cb59.

Minor:

  • [x] remove a couple of extraneous comments in FileDownloadHelper and in File button fragment. Done in f9d11d9 and 04be093.
  • [x] remove update from db script. Done in eded8ef.

Discuss in the future (Gustavo, Steve, Phil):

  • potentially unneeded null checks
  • guestbookService modifyDatafileAndFormat method?

I did a bunch of the todo list items above.

@kevinworthington let me and @scolapasta know that his Data Explorer tool (#4249) will make use of an API token, if available, so I'm playing around with adding it to the dataExplorer.json manifest like this:

  {
    "key": "{apiToken}"
  }

But now I'm wondering... what if the user isn't logged in. key=null will appear in the URL. Is that ok? If that's not ok, do we omit that query parameter if the file is not restricted? Something to think about.

Hi @pdurbin,
Yes, it's ok to pass a "key" value of "null" if the user isn't logged-in for public tab files. No need to omit this parameter.

@kevinworthington thanks, that's helpful feedback. My preference is to get something working and then improve on it in a future pull request. I'm ready for more code review so I'm going to move this issue over at https://waffle.io/IQSS/dataverse

@scolapasta and I met yesterday and some additional todo items were added to the list above: https://github.com/IQSS/dataverse/issues/3657#issuecomment-357082662

I did all of the stuff on the list above with the exception of "use ExternalTool logic for TwoRavens" which @scolapasta and I have discussed as a nice to have in the sense that if it turns out to be difficult, we can defer that task until later.

I've been investigating converting TwoRavens into an external tool but this task is complicated by the following:

  • The Dataverse installer handles part of the configuration of TwoRavens and would need to be updated. At standup today it sounds like @landreev and @kcondon were go with removing this functionality.
  • There is a lot of code specific to TwoRavens in TwoRavensHelper.java that needs to be reviewed. @scolapasta and I agree that the whole file would go away as long as it doesn't break TwoRavens.
  • TwoRavens as a external tool would always appear under a dropdown, making several statements in the guides out of date.
  • TwoRavens is mentioned quite often in the guides as being the only option for exploring tabular files.

I pushed some FIXMEs and such at 002019f to give a sense of where changes in the code would need to be made to convert TwoRavens to an external tool. Confusingly, there's TwoRavens the tool and then there's the subset functionality which has TwoRavens branding on it.

Also @scolapasta and I met to discuss the "null exploreTool workaround" todo list item above ( https://github.com/IQSS/dataverse/issues/3657#issuecomment-357082662 ). We plan to leave the code as-is (keeping the fix in ef87fde, basically) but we're puzzled about why "tool" is non-null but become null sometimes in a backing bean method:

        tool: #{tool}
        <p:commandLink type="button" process="@this" styleClass="btn btn-default" rendered="#{guestbookResponse.fileFormat == 'externalTool'}"
                       action="#{fileDownloadHelper.explore(guestbookResponse, fileMetadata, tool)}" target="_blank"
                       update="guestbookUIFragment">
            #{bundle['acceptTerms']}
            <f:param name="DO_GB_VALIDATION" value="true"/>
        </p:commandLink>

Ok, @scolapasta @sekmiller and @djbrooke all seem interested in making TwoRavens into a external tool so I just created pull request #4426 that does this.

As a fallback position, we still have pull request #4407 which is smaller in scope and does not make TwoRavens into an external tool. The checklist above is complete so I'm moving this to code review.

I'm move this back into development based on feedback from @scolapasta . He noticed an out of date comment which I fixed in c91a7bd and a bug which I fixed in 54bd2ff (thanks!).

He had a question about if I removed this TwoRavens-specific bundle key on purpose and I did because it isn't being used in the code: dataset.message.files.ingestSuccess=The file(s) have been successfully ingested. You can now explore them with TwoRavens or download them in alternative formats.

I'm now chewing on the rest of the feedback.

I'm blocked on this item because I don't understand what the task is:

  • "to reduce duplicate code, let's have externalTool version of modifyDatafileAndFormat call the non externalTool version and just the extra bit"

This next comment I need to dig into more. What is being asked for? What is the priority of this change happening before we can move this issue to QA?

  • "I'd move the bandaid logic related to the popup to the FileDownloadHelper. Let's have the Service always assume that you pass it a tool and the helper be for any special logic the popup needs to do. Also add some comments to this if statement (I think the comment on the transient could be left or removed, but would like an explanation around the if, because I think that specifically could confuse us in the future)"

This comment has to do with documentation, mostly for @dlmurphy and @djbrooke

  • "I'd vote for removing TwoRavens even further from docs, e.g. from find-use-data.rst and dataset-management.rst., Maybe we could mention "Explore Tools" generally instead? Let's see what Derek and Danny think."

For this one:
• "to reduce duplicate code, let's have externalTool version of modifyDatafileAndFormat call the non externalTool version and just the extra bit"

you have duplicate code in both versions of the modifyDatafileAndFormat. Instead of that, have the one that takes the external tool do its thing and then call the other method.

For this one:
• "I'd move the bandaid logic related to the popup to the FileDownloadHelper. Let's have the Service always assume that you pass it a tool and the helper be for any special logic the popup needs to do. Also add some comments to this if statement (I think the comment on the transient could be left or removed, but would like an explanation around the if, because I think that specifically could confuse us in the future)"

Move the logic that you added as a bandaid for the weird null / not null issue with the popup from the ServiceBean to the FileDownloadHelper that calls that service bean (since it only happens for the popup). Add a comment on this part of the code explaining the weird null / not null case we see.

Both of these are 5 minute changes.

Discussed TwoRavens documentation with @pdurbin and @djbrooke this morning. I'm going to create a new issue about it, to be considered in our Wednesday Sprint Planning meeting.

In Slack @scolapasta and I worked through ce12f9c and talked about what became 02bd628 but that commit hasn't been blessed yet so I'm moving this issue to code review.

Along the way I also did a little code cleanup at 23844e8 and made doc improvements at f0b67cd. Once @dlmurphy makes the new issue above about docs, we can move link to it here.

The pull request we intend to put through QA is #4426 and I just closed the smaller-scope one.

Once this ticket makes it into QA I'm happy to summarize what to test in a comment or face to face or both.

Looks great. My one remaining comment is to think about what we want to put into the guestbook responses for this. It's currently "external tool", but I think "explore" would be better (we know that these are "explore" type external tools). Possibly consider also adding which tool? (something like "explore-tworavens" and "explore_dataexplorer" based on the name in the db?)

@jggautier Any thoughts? This could also be something we defer until later, but we might lose some info (which tool) now.

@scolapasta actually, we don't put "external tool" into the database. We put the word "Explore" into the database. "Explore" has always been what we put in the database for TwoRavens and in the pull request we continue to do so. Here's the code from FileDownloadHelper.java:

// Rather that putting "Explore" in the database, we *could* put externalTool.getDisplayName() for "Data Explorer" or whatever.
guestbookResponse.setDownloadtype("Explore");

@dlmurphy thanks for opening this: Revise Explore Tools documentation to reflect modularity #4429

Oh, ok! I saw this:
guestbookResponse.fileFormat != 'externalTool' and got confused... but if we're already saying Explore, that's good.

I'd still want to see what @jggautier has to say about tracking the the specific tool and whether he (or others) think it's important to start tracking now or not.

HI all,

I think tracking the tool would be a good idea from the start. ADA will certainly be interested in understanding whether users are using DataExplorer or TwoRavens (or some other tool).

Regards,
Steve

If @stevenmce approves, that's good enough for me. 😄 I implemented the change in c236079

I also added the following query to the Google doc referenced in the "useful database queries" issue at #4169. Here's how it looks on my laptop:

select downloadtype, count(downloadtype) from guestbookresponse group by downloadtype order by count desc;

 downloadtype  | count 
---------------+-------
 Download      |   124
 TwoRavens     |    33
 Data Explorer |     3
 Awesome Tool  |     1
(4 rows)

Glad it was straightforward looks good. I wonder if we should be using some sort of unique identifier*, rather then the display name.

(*) Not just inspired by being at conference about identifiers :)

This sounds great to me. @dlmurphy and I were planning to crash a tech open hours to discuss this. I think these are some related issues to consider:

  • With 4.6 in Dec. 2016, downloadtypes like Explore were consolidated into "Download", affecting the metric reports. How will separating actions like Explore again affect those reports? (timelines of downloadtypes in Harvard Dataverse)
  • Is it always appropriate to record uses of other modular tools as a downloadtype?

I had that thought too but I think we should think of these external tools as bit of an app store. At (the first?) community meeting we talked about having an "Appiverse" some day (see also #2269) and we're actually starting to make progress toward this. In an app store, the name of the app really matters. We would probably yell at someone who tried to promote an external tool called "TwoRavens" that isn't the real TwoRavens. In short, I think it's fine to persist the name of the tool. If we need to revisit this decision, let's do it in a new issue and future sprint. I am eager for this issue to make progress, to make its way to QA. The pull request delivers a lot of value as-is and opens the door for Data Explorer (#4249) and other read-only explore tools to be made available.

If we decide that identifiers of some sort are important, then i think we should do this now, and not have to change data later. That said, I see no reason that this can't start going through QA while we decide.

@jggautier after standup we talked about if there's a chance I messed up download counts by differentiating between "TwoRavens" and "Data Explorer" or whatever. I don't think so. As I hoped it looks like the code is simply counting rows like this: "select count(o.id) from GuestbookResponse o where o.datafile_id = " + dataFileId. That's from a method called getCountGuestbookResponsesByDataFileId in GuestbookResponseServiceBean.

@scolapasta that's fine. As you may have noticed, we dragged this issue into QA this morning during standup. Maybe we could tell third-party developers that they need to ask for a UUID from us for their application to be included in our app store (or a least be listed in the guides). I'm still trying to control the scope of this issue.

@kcondon I hope this helps:

Changes to consider testing:

  • TwoRavens: TwoRavens has been converted into an "external tool". Part of the installation instructions have been adjusted to include a curl command operating on a JSON file. Two database settings have been removed: :TwoRavensTabularView and :TwoRavensUrl. Note that the "subset" functionaly still has TwoRavens branding but is independent of TwoRavens.
  • WorldMap: The logic for when to show the WorldMap button has been adjusted to be aware of the presence of external tools.
  • Configure button: The Modular Configure code (#4230) has been refactored and should be retested. Note that external tools can now have a type "configure" vs "explore" as noted in the guides.
  • Download counts and types: Instead of "Explore" we now record the names of the external tools. The rules around download counts should not have changed. I'm not sure if they are documented or not.
  • Guides: The guides have been adjusted to mention external tools as appropriate but further work will be done for #4429 some day.

To think about:

  • What is the definition of done for integrating Data Explorer into Dataverse (#4249)? Should we go ahead and test Data Explorer now? It seems to work for me. 😄

Known issues:

  • When you launch an external tool from a popup the tool will take over the browser window rather than opening in a new tab. When there is no popup, the tool launches in a new tab, which is the desired behavior. This bug exists in production for TwoRavens and has been discussed with Gustavo, Steve, and Mike but we don't have a fix. The inconsistency is unfortunate.

I just gave a demo to of the code as of c236079 to @landreev @djbrooke @TaniaSchlatter @mheppler @dlmurphy @kcondon @jggautier @matthew-a-dunlap and @pameyer (thanks!). The code is on https://dev1.dataverse.org and the guides are at http://guides.dataverse.org/en/3657-tworavens-as-modular-explore/installation/external-tools.html

Notes to self:

  1. db update script is required to run this.
  2. Significant doc changes with instructions on how to add external tools, including new api endpoints?
  • [x] Revised popup title to "Dataset Terms" for download/explore workflows on dataset/file pgs.

screen shot 2018-01-23 at 4 15 36 pm

@pdurbin to respond to your App store comparison, yes the name is important, but behind the scenes what is used (at least in Android) is the package name. Beyond multiple tools with the same name, another concern is the same tool with a changing name, i.e having two rows that should map to the same thing, but don't (or when we internationalize and remove the name from the db).

I generally prefer to tackle issues which would require future database updates that modify data (as opposed to structure) preemptively. I don't see this as an expansion of scope (rather as a good coding practice); the expansion of scope already occurred when we decided to track the specific tool.

@scolapasta as I mentioned, I'm interested in controlling the scope of this issue. I'm happy to revert c236079 to make it so we go back to persisting "Explore" rather than "Two Ravens" or "Data Explorer" or whatever. That commit was so recent I expect it will revert cleanly and easily.

@djbrooke in the demo you seemed to like that individual tools are being tracked. What would you like?

Good discussion, thanks. I understand the reasoning for IDs instead of names, but let's keep this moving for now. On to QA!

Issues found so far:
-last line in db update script is missing a ;

@kcondon whoops, good catch. I added the missing semicolon in 92e0a43

As we discussed after standup yesterday there are technically no new API endpoints but the "manifest" format (JSON) has changed in that external tool authors must specify if their tool should appear under the "Explore" button or the "Configure" button.

Thanks. So this is not new?
curl -X POST -H 'Content-type: application/json' --upload-file twoRavens.json http://localhost:8080/api/admin/externalTools

Well, twoRavens.json is new (in the future, we'd like the TwoRavens team to host this file and #4429 is related) but the API endpoint is not new. It shipped with 4.8.5 and is documented at http://guides.dataverse.org/en/4.8.5/installation/external-tools.html#making-an-external-tool-available-in-dataverse

@kcondon just found a bug (thanks!) where two "Explore" buttons were being displayed side by side (one a dropdown and one a non-drop down). I fixed this in c6c0c87 and left a detailed comment about the fix in the commit.

x-Seeing two explore buttons on a shape file that has already been mapped.

Known issues:

  • When you launch an external tool from a popup the tool will take over the browser window rather than opening in a new tab. When there is no popup, the tool launches in a new tab, which is the desired behavior. This bug exists in production for TwoRavens and has been discussed with Gustavo, Steve, and Mike but we don't have a fix. The inconsistency is unfortunate.

This is now being tracked at #4742.

Was this page helpful?
0 / 5 - 0 ratings