The title says it all! As part of the 4.5 release, it will be extremely helpful to include/exclude Harvested content when searching and filtering.
@pdurbin - can you please take this on while @landreev moves the rest of Harvesting to completion? Let me know if you need additional info. Thanks!
@landreev will provide details on what is needed and when so I'll pass this to him.
It's my understanding that for 4.5 we should use the existing isHarvested method to search and facet on and that I should make my changes to the 4.5-export-harvest branch. Extra methods ("from another dataverse") may be added in the future.
@landreev @scolapasta as we discussed I added a "source" facet in 527149b that shows "Harvested" vs. "Local". @scolapasta wanted it to only show when there is more than one type to show so that's what I implemented. Here's a mini screenshot:

Heads up to @sekmiller and @mheppler (who are also working on the 4.5-export-harvest branch) that you'll need to updated your Solr schema to the version in the commit above.
Passing to QA.
@eaquigley @mheppler
Any UX, Design input needed?
Leaving in QA but removing my name until I get pull request.
I think this should be updated to say Dataset Source instead of just source (that is, if this is only for datasets) and instead of local say, "Harvard Dataverse"
Thoughts @mcrosas?
I think it's also files. Eventually it might be for dataverses.
Also, @eaquigley another question that came up. This logic of not showing a facet if it only has one value (i.e won't filter results any), do we want to extend that to other facets? (not necessarily in this release)
Let's confirm whether or not it's for files, and if so, let's move it to QA as is. If it's just for Datasets let's change it to "Dataset Source" as Liz recommends.
I still think the Local label should be changed to be "Harvard Dataverse" whether or not its for more than datasets. Additionally "Source" is already the name of a facet in the CHIA metadata schema so we probably shouldn't be reusing labels for facets. @djbrooke @scolapasta @pdurbin
Regarding not showing the facet, I would still show the facet even if there is only one value. I would recommend having this "Source" facet be one that has to be selected and is not a default one that automatically appears in the facets.
@eaquigley I had no issue with "Harvard Dataverse" if that's what you / @mcrosas want. I was just responding to the Source / Dataset Source part.
OK, in general, we'll keep single facets. We don't yet have the ability to make source be selectable (it is not one of the dynamic facets). So for this case, I think we should keep the don't show of only one type logic, since there will be installations without harvesting and maybe some with exclusively harvested. This isn't perfect, as a search result won't show the signal facet (in a case where the installation has both, but the result only one), but is the best we can do for now, I think.
@scolapasta I know, I was bringing it back up though since @djbrooke's comment only mentioned the Dataset Source label so it was not clear if Local was going to be changed to Harvard Dataverse. Still concerned about "Source" being an existing facet for another metadata block...
https://dataverse.harvard.edu/dataverse/cfa uses "Type" to mean various astronomy things such as Cube, Image, and Table, whatever they are (sorry I'm not an astronomer):

Like "Source" I can imagine a future conflict with "Type". They're both such generic words.
Label suggestion: "Metadata Source".

Let's go with Metadata Source. Thanks @mheppler for the suggestion!
Thanks @pdurbin
As discussed yesterday, I just changed how the facet looks to this:

"Source" is now "Metadata Source" and "Local" is now "Root Dataverse" or whatever you renamed your root dataverse to (such as "Harvard Dataverse). A reminder that this facet is unlike most of the others in that it disappears if you have narrowed your results to only local or only harvested results. See https://github.com/IQSS/dataverse/issues/3203#issuecomment-233445164
@landreev pointed out that he was relying on "Local" being there so we added a new isHarvested Solr field. From a quick demo we think the code is working right but I'll pass this to him to review a1df98c and possibly use an alternative to addQueryRestrictions that I added.
Heads up to @sekmiller @mheppler @kcondon and anyone else on this branch that this change as well as renaming the Solr field from source to metadataSource will require everyone to update their Solr schema when they pull the latest from the 4.5-export-harvest branch.
@pdurbin Basically works but found a weird issue: renaming root does not change the name of the local facet, even after an index clear/ reindex.
Restarting Glassfish might help.
On Aug 17, 2016 5:20 PM, "kcondon" [email protected] wrote:
@pdurbin
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pdurbin&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=zZqgk-2aY3bvqXtcLK1DpsGgLq5zaKrgsmr-rKzylmQ&m=DDna5VKVj3QYCQxtITd-SNUaAuFQwKsmaZWNI4zWTNA&s=L22_2pUB17rfJt-8uIhV4LlyVT0AwHoA_h44dZdnJzI&e=
Basically works but found a weird issue: renaming root does not change the
name of the local facet, even after an index clear/ reindex.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_3203-23issuecomment-2D240552224&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=zZqgk-2aY3bvqXtcLK1DpsGgLq5zaKrgsmr-rKzylmQ&m=DDna5VKVj3QYCQxtITd-SNUaAuFQwKsmaZWNI4zWTNA&s=eIp-rJNJyZP5Un1peseCxpHM6JnFo55i0YrF81OluUQ&e=,
or mute the thread
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AABSDt27uChmJGgMgqLUUCLJ-2DEHAu4C0ks5qg3sCgaJpZM4JIQt2&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=zZqgk-2aY3bvqXtcLK1DpsGgLq5zaKrgsmr-rKzylmQ&m=DDna5VKVj3QYCQxtITd-SNUaAuFQwKsmaZWNI4zWTNA&s=ySh5dVwkgYe9s3I0sveZVf9d6vBPd9Txnzog0ffZyLo&e=
.
Leonid fixed this, will retest, thanks!
This is working now. Closing.
Reopening. Found that harvested facet is including dataverses in the count on dvn-vm5. This may have happened in the last fix.
facet says Harvard Dataverse (16,195) but when clicked there are only 14404 datasets but Dataverses (1,791)
This appears to be fixed and I'm closing it. Feel free to reopen if I'm missing something.
These are the steps that I took on dataverse.harvard.edu:
Most helpful comment
Label suggestion: "Metadata Source".