Is your feature request related to a problem or area of OpenRefine? Please describe.
Add a new facet of type Text (as opposed to current "List" facet which is usually used for text values). This is in order to have a facet that facets on textual values while not attempting to automagically transform non-text value cells into text. The new Text facet should list all cell values that are text, and create buckets (with counts) for non-text values
Describe the solution you'd like
Describe alternatives you've considered
See discussion in #1957 and #1662 and other linked PRs and issues
Additional context
See discussion in #1957 and #1662 and other linked PRs and issues
Just to note progress. I didn't manage to get as much time as I hoped in the last week to look at this, but I did get some time to think through the problem. My plan currently is to:
I may do some further refactoring of the ListFacet class once I've done this initial work, but it is this initial work that needs to be completed to get us to a point where we have the current List Facet behaviour and the new Text Facet behaviour supported with a reasonably minimum amount of code duplication.
In terms of actual progress I've written some extended tests for the current List Facet behaviour to make sure this continues to work in the current way
@ostephens what is your timeline for this? Is your current work pushed on some branch? If so maybe I can help?
@wetneb I've got time to look at this over this weekend - If I don't get the work completed, I'll share a branch with what I've done so we can share the work
Feel free to share your branch if you need a hand on this :)
Hi @wetneb - sorry I've been so slow on this.
You can see the progress (or lack of) I've made so far at https://github.com/ostephens/OpenRefine/tree/list-facet-refactor
I think the following commits are good & sensible steps:
https://github.com/ostephens/OpenRefine/commit/02501db87f21e2094cfa62f903b0bc02c0fe3e09
https://github.com/ostephens/OpenRefine/commit/4ce6769fc5a52fcb6263742fe2c9b384bf040e68
But apart from that I seem to end up going in circles. My original idea was to have an abstract class for a 'list facet' which would support grouping by any of the supported data types then be implemented by different types of list facet including the existing 'list facet' and a new 'text list facet' which would exhibit the new desired behaviour. But each time I look at this I start to think the whole way facets are structured feels not quite right, and end up fluctuating between just doing something v simple but duplicating loads of code, and revisiting the whole structure of the facet code (at which point I have to lie down in a dark room for half an hour).
So - sorry I've not really made progress with this and holding back 3.2 :(
btw code as committed to this branch fails as follows:
[ERROR] Failures:
[ERROR] TextListFacetTests.testBooleanSelection:340 NullPointer
[ERROR] TextListFacetTests.testDateSelection:214 NullPointer
[ERROR] TextListFacetTests.testIntegerSelection:277 NullPointer
[INFO]
[ERROR] Tests run: 597, Failures: 3, Errors: 0, Skipped: 0
Most helpful comment
In terms of actual progress I've written some extended tests for the current List Facet behaviour to make sure this continues to work in the current way