Openrefine: Add new text facet type

Created on 12 Apr 2019  路  7Comments  路  Source: OpenRefine/OpenRefine

Is your feature request related to a problem or area of OpenRefine? Please describe.
Add a new facet of type Text (as opposed to current "List" facet which is usually used for text values). This is in order to have a facet that facets on textual values while not attempting to automagically transform non-text value cells into text. The new Text facet should list all cell values that are text, and create buckets (with counts) for non-text values

Describe the solution you'd like

  • [ ] Add support for a new facet type of "Text"
  • [ ] Change option to "Create custom text facet" in menu to use this new facet (rather than the current use of List facet)
  • [ ] Convert pre-defined facets to use new Text facet type where appropriate

Describe alternatives you've considered
See discussion in #1957 and #1662 and other linked PRs and issues

Additional context
See discussion in #1957 and #1662 and other linked PRs and issues

enhancement facets

Most helpful comment

In terms of actual progress I've written some extended tests for the current List Facet behaviour to make sure this continues to work in the current way

All 7 comments

Just to note progress. I didn't manage to get as much time as I hoped in the last week to look at this, but I did get some time to think through the problem. My plan currently is to:

  • Create a new class for the current ListFacet behaviour which extends the ListFacet class
  • Create a new class for a new text fact which extends the ListFacet class
  • Shared code between these will continue to live in the ListFacet class. Where there are differences the new classes will override methods to implement the appropriate behaviour

I may do some further refactoring of the ListFacet class once I've done this initial work, but it is this initial work that needs to be completed to get us to a point where we have the current List Facet behaviour and the new Text Facet behaviour supported with a reasonably minimum amount of code duplication.

In terms of actual progress I've written some extended tests for the current List Facet behaviour to make sure this continues to work in the current way

@ostephens what is your timeline for this? Is your current work pushed on some branch? If so maybe I can help?

@wetneb I've got time to look at this over this weekend - If I don't get the work completed, I'll share a branch with what I've done so we can share the work

Feel free to share your branch if you need a hand on this :)

Hi @wetneb - sorry I've been so slow on this.

You can see the progress (or lack of) I've made so far at https://github.com/ostephens/OpenRefine/tree/list-facet-refactor

I think the following commits are good & sensible steps:
https://github.com/ostephens/OpenRefine/commit/02501db87f21e2094cfa62f903b0bc02c0fe3e09
https://github.com/ostephens/OpenRefine/commit/4ce6769fc5a52fcb6263742fe2c9b384bf040e68

But apart from that I seem to end up going in circles. My original idea was to have an abstract class for a 'list facet' which would support grouping by any of the supported data types then be implemented by different types of list facet including the existing 'list facet' and a new 'text list facet' which would exhibit the new desired behaviour. But each time I look at this I start to think the whole way facets are structured feels not quite right, and end up fluctuating between just doing something v simple but duplicating loads of code, and revisiting the whole structure of the facet code (at which point I have to lie down in a dark room for half an hour).

So - sorry I've not really made progress with this and holding back 3.2 :(

btw code as committed to this branch fails as follows:

[ERROR] Failures:
[ERROR]   TextListFacetTests.testBooleanSelection:340 NullPointer
[ERROR]   TextListFacetTests.testDateSelection:214 NullPointer
[ERROR]   TextListFacetTests.testIntegerSelection:277 NullPointer
[INFO]
[ERROR] Tests run: 597, Failures: 3, Errors: 0, Skipped: 0
Was this page helpful?
0 / 5 - 0 ratings