Dataverse: Facets - "File Type" vs "filetype"

Created on 18 Sep 2018  路  7Comments  路  Source: IQSS/dataverse

Some how a new "Tabular Data" facet for File Type has appeared. I thought it was only in develop but I am seeing it in production. It's fine if we're changing the style format from lowercase and no spaces "tabulardata", but we should apply it to all facets (e.g. "networkdata") and not just one. Also, it appears that both "tabulardata" AND "Tabular Data" are being displayed together. We should fix this to be consistent.

screen shot 2018-09-18 at 4 19 51 pm

All 7 comments

Files labeled as "tabulardata" end at Aug 12, 2018, which we believe coincides with the release of 4.9.2 (Aug 8). Included in that release were the TSV ingest fixes in #4044, which included the commit bf1f90c7fa15ba1f694e56aac9dc4d4b71969c80 that introduced the new "Tabular Data" capitalized format.

  • Capitalized file types displayed in facets from MimeTypeFacets.properties
  • Added comment to FileUtil.java outlining suggested captilization changes to the getFacetFileType function

Discussed this with @scolapasta and @landreev to determine where facet values were coming from that were not defined in MimeTypeFacets.properties. Gustavo pointed out a comment from Leonid in the getFacetFileType function in FileUtil.java:

           // if there's no defined "facet-friendly" form of this mime type
           // we'll truncate the available type by "/", e.g., all the 
           // unknown image/* types will become "image"; many other, quite
           // different types will all become "application" this way - 
           // but it is probably still better than to tag them all as 
           // "uknown". 
           // -- L.A. 4.0 alpha 1

In order to get these unknown file types consistent with those that this function generates, Leonid volunteered to add capitalization to the function. Here is a list of the facets we have in Harvard Dataverse that have been generated by this function.

  • application
  • audio
  • video
  • chemical
  • binary
  • model

Here is what my localhost looks like with the changes so far.

screen shot 2018-09-20 at 12 50 44 pm

I am passing this issue to Leonid to complete that. Question, will we need to run a script to change the values stored for those facets?

Pulled the recent changes from Leonid, reindexed, and now my facet list is beautifully capitalized and consistent.

screen shot 2018-09-20 at 1 45 57 pm

I knew that breaking this better in the TSV issue was the right choice!

I just ran ./ec2-create-instance.sh 5067-file-type-facet-caps to deploy the branch to http://ec2-34-227-22-74.compute-1.amazonaws.com:8080/ and it seems to work. Note how it's "Image" in the screenshot below rather than "image":

screen shot 2018-09-20 at 2 59 35 pm

NOTE: When developing/testing, in order to see changes to these file type facet values, you need to clear Solr and reindex.

curl http://localhost:8080/api/admin/index/clear

curl http://localhost:8080/api/admin/index

http://guides.dataverse.org/en/4.9.2/admin/solr-search-index.html

Do you have any data on if/how much the permissions changes make on re-indexing? Hopefully, it will be at least 4x (and possibly more) faster.

Was this page helpful?
0 / 5 - 0 ratings