We are currently using a custom snapshot of OpenCSV which is years old. The OpenCSV project is still active and a lot of releases have been published since then. We should look into upgrading to a newer version, which would also let us get rid of the locally stored .jar.
Version 5.0 does not seem to support multi-character separators, which are currently used at least in the smartSplit function.
Multi-character separators were proposed but the patch did not make it upstream: https://sourceforge.net/p/opencsv/patches/44/
We don't seem to be the only ones to require this though: https://stackoverflow.com/questions/8653797/java-csv-parser-with-string-separator-multi-character
The OpenCSV project seems to prefer a strict stance of RFC 4180 which seems reasonable given their mission. (I.E. they consider bits outside the "csv standard" to not really be csv) Even if you provided a new class/method then it would probably be rejected, but you never know and might want to ask them.
I still think investment in one of these would be better all around:
This might mean that perhaps we look at using Jackson CSV (I prefer it more, since they know CSV can be "messy" sometimes and supports extensions, where we would just need to implement something like CsvParser.Feature.ALLOW_MULTI_SEPARATORS)?
https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv#configuring-csvschema
or Apache CSV (which wanted to unify development with OpenCSV at one-time, but didn't get far with them either if I recall)
OpenCSV does not stick to RFC4180, they also have a more flexible parser which accommodates with non-standard needs.
But yeah, switching to another parser could also be an option. There seems to be quite a lot of them actually! https://github.com/uniVocity/csv-parsers-comparison
I did not say they stick?
Well, you wrote that "The OpenCSV project seems to prefer a strict stance of RFC 4180 which seems reasonable given their mission" and I think that is not a very accurate description of OpenCSV, since their default parser is much more flexible and accepts CSVs that do not conform with RFC4180. They also have a RFC4180 parser, but that is not the default one.
Thanks for info, but I also see flexibility with other parsers. So switching, although painful might be wiser. Up to you.
With the migration to spark, Spark SQL's own CSV parser is a natural choice since it allows efficient partitioning (so, scales well to large datasets).
OpenCSV rejected multi-character separators (a second time): https://sourceforge.net/p/opencsv/feature-requests/119/
There still don't seem to be any available Java CSV parsers which support multi-character separator strings.
Most helpful comment
With the migration to spark, Spark SQL's own CSV parser is a natural choice since it allows efficient partitioning (so, scales well to large datasets).