Openrefine: New behaviour of list facet breaks predefined facets

Created on 19 Feb 2019  路  7Comments  路  Source: OpenRefine/OpenRefine

The change in #1666 requires users to add .toString() at the end of the expression for their facets.
Some predefined facets have not been adapted for this.

This is a bad breaking change and should be reverted.

If we want to introduce a different behaviour for a facet, here is a better way to do it:

  • introduce in the backend a new facet with the new behaviour. The facet will have a different identifier and the previous facet will still be available.
  • update the UI to migrate some or all of the predefined facets to this new facet identifier with new behaviour.
  • if the old facet is no longer exposed in the UI, and after a long period of depreciation (so that people can adapt their workflows), the old facet may be dropped from the backend.
bug facets High

Most helpful comment

What I am proposing is to re-introduce exactly the change you made, but in a different way. We can have the best of both worlds: a new facet which has the new behaviour you proposed, without the breaking change it has incurred. For that to happen, we only need to introduce your new facet as a separate facet, with a different class name and facet identifier. This has the following advantages:

  • we can keep the existing facet available in the backend, meaning that running existing JSON workflows will not break;
  • migrating the UI to use your new facet will force us to update the facet ids there, migrating to either your new facet (and adding .toString()) or migrating to another new facet (such as a boolean facet). If we forget to update any facet (as we did), it will stay with the old facet implementation and will therefore not break. Suppose some extension adds some UI for new predefined facets: we will not break that either.

That ensures a smooth transition for both users and developers. I am happy to work on reintroducing these facets between 3.2-beta and 3.2.

I think it is fine to have breaking changes when they are clearly needed to bring a much needed improvement - but in this case we can very easily avoid breaking anything, so my gut feeling is that we should make sure OpenRefine workflows live up to their expectations of reproducibility and reusability.

All 7 comments

@wetneb We talked about this in #1666 here: https://github.com/OpenRefine/OpenRefine/pull/1666#issuecomment-421719191

Can it be reverted and still show users their data inconsistency as I mentioned in my comment? I don't think so, but open to discuss. My worry and Owen's was that the previous facet being available does not give users a good view on their messy data. That's very important to us, that we not hide data issues and I mentioned this in my comment.

But I guess your take is that the existing facets may hide data issues, but we'll provide new facets with new behavior. My worry there is that we stay in the same rut with folks missing data issues by using the old facets and not new facets. Anything you can do to make the new facets be named the same as old facets (from a display perspective to users) and in the same menu structure would certainly help there I guess. They get the new facets, but for compatibility reasons we will still have the old facets in some submenu with a different label.

@thadguidry yes I understand the motivation of the changes. I think the new facet is great but it needs to be introduced in a different way to avoid the bugs that this change has created. This has confused a lot of users.

So yes, if someone wants to work on making this new behaviour work for everyone, I would welcome a PR which would follow the steps above. In the meantime, the first step is to revert the change.

@wetneb OK thanks for confirming. Agree to revert and work towards better steps as you identified above.

I've commented against #1969 that I'm not against reverting the current changes - and I'm still happy if you want to go ahead with this. However, re-reading #1662 I'm not completely convinced that reverting the changes is really the right thing to do - #1662 is clear that there was some really funky behaviour in the previous situation (like selecting a value from the facet and not seeing all the rows from that facet).

In #1662 I mention (but don't directly suggest) the idea of introducing a "Boolean" facet - I feel that this would be a much better way of handling all of the predefined facets that use tests with a boolean outcome to create a custom text fact.

I acknowledge absolutely that the current change has created confusion for users (and I've had to re-write training material because of it so I definitely know this has caused problems!) - but the previous approach was also broken, but we'd got used to the way in which it was broken and so either ignored it or worked around it.

I'm worried that avoiding a breaking change here is just embedding an already broken approach. As long as the Custom Text Facet exists and works in the old way we have a problem - we can't change it, but it simply doesn't work correctly for columns of mixed value types. I can't help but wonder if the right thing to do would be to:

  • Acknowledge this is a breaking change, but stick with it
  • Ensure all predefined facets use 'toString()' as necessary
  • Introduce a new "Boolean" facet type and convert predefined facets with Boolean outcomes to that facet type

I'm not convinced leaving the Custom Text Facet as it is makes sense in the long term.

All this said, I don't want to hold changes up, and I'm not in a position to move forward changes at the moment - so if reverting is what needs to happen, and my thoughts above don't sway you, then please go ahead

Reading the last para above it comes across as a slightly passive aggressive attempt to wash my hands of any decision made here - that wasn't my intention, so apologies if it comes across that way. I'll 100% support any decision made because I can see the arguments both ways - unfortunately I just don't have the time right now to write code here :(

What I am proposing is to re-introduce exactly the change you made, but in a different way. We can have the best of both worlds: a new facet which has the new behaviour you proposed, without the breaking change it has incurred. For that to happen, we only need to introduce your new facet as a separate facet, with a different class name and facet identifier. This has the following advantages:

  • we can keep the existing facet available in the backend, meaning that running existing JSON workflows will not break;
  • migrating the UI to use your new facet will force us to update the facet ids there, migrating to either your new facet (and adding .toString()) or migrating to another new facet (such as a boolean facet). If we forget to update any facet (as we did), it will stay with the old facet implementation and will therefore not break. Suppose some extension adds some UI for new predefined facets: we will not break that either.

That ensures a smooth transition for both users and developers. I am happy to work on reintroducing these facets between 3.2-beta and 3.2.

I think it is fine to have breaking changes when they are clearly needed to bring a much needed improvement - but in this case we can very easily avoid breaking anything, so my gut feeling is that we should make sure OpenRefine workflows live up to their expectations of reproducibility and reusability.

New feature requests created for new facet types. I think we can close this one given the List facet has been reverted to the previous behaviour

Was this page helpful?
0 / 5 - 0 ratings