When creating a project from a csv/tsv/seperator format file, there is no checkbox to strip leading/trailing whitespace. This is already an option on importing JSON and XML - would be good if it could be an option for csv/tsv as well
The main reason the options exist is for backward compatibility because several of the importers used to always strip whitespace and there was no way to import a file _as is_.
One of the things that I'm thinking of which might mitigate the need for this is a way to apply operations to multiple columns.
Another thing I should mention is that we now support multicharacter separators, so if the problem is a program that's writing the files as value1 , value , ... or value1, value2, ... you could just make your separator " , " or ", " respectively.
Agreed that the ability to apply operations to multiple columns would mitigate this.
So the task for this would be to add an option to the CSV importer to strip whitespace. It could be ticked by default in the UI, but the backend should assume by default that it is disabled if the option is not provided (for backwards compatibility).
Hi, can I take up this issue?
So once we have added the checkbox in the csv/tsv files, the importer in the backend should assume that it is disabled even if the checkbox is checked by default. Can you please tell me where the json importer is dealing with that option?
The JSON importer is unrelated to this issue, since it is only used when importing JSON files.
The CSV/TSV importer (called SeparatorBasedImporter in the backend) takes its options as the options parameter of the parseOneFile method. If this options object does not have a key corresponding to the new parameter you introduce, then it should assume that its value is false.
This will ensure that if someone uses an importer configuration created with an earlier version of OpenRefine, this importer configuration is still understood in the same way. Is that clearer?
Sorry, I was too quick on this one. The trimming option for the JSON importer is defined in TreeImportingParserBase which is common to the XML and JSON importers.
However it does not actually seem to be taken into account as far as I can see, it would probably be worth creating an issue for it. Also clicking on the label for this option in the importer configuration panel checks/unchecks the wrong checkbox.
@wetneb Right, I was wondering where it is, because I couldn't get it to work for json itself. Thanks.
Should I fix the label problem of json importer config as another pull request?
Perhaps it is cleaner if we do that in separate PRs, yes.