Cats: Rethinking type class boilerplate generation

Created on 4 Dec 2019  路  15Comments  路  Source: typelevel/cats

This issue is a follow-up to #2553 that focuses on the Simulacrum question and gives a concrete proposal for a replacement.

The problem is that Simulacrum 1 is based on Macro Paradise's macro annotations, which means it has a couple of disadvantages:

  • It doesn't work on Dotty, which doesn't have macros that can do this kind of thing (and probably never will, as far as I can tell).
  • It can be difficult to troubleshoot when it doesn't do what you expect (you have to turn on tracing and dig through some pretty verbose output).

I've been experimenting with reworking Simulacrum's code generation as a set of Scalafix rules. This Scalafix-based approach resolves both of the macro-related issues:

  • It works exactly the same on Dotty as on Scala 2.
  • The generated code is right there in front of you.

This second part is also a disadvantage: in my current proof-of-concept you end up with a lot (like 2.2k lines) of boilerplate being added to the Cats source. It "reserves" the last part of the companion object of every type class for rule-managed code, and if contributors accidentally edit this part of these files, their changes will be overwritten.

I've tried a few ways of generating at least some of the syntax method boilerplate as managed source, in order to minimize the additions to checked-in code, but I don't think it's possible to do this in a way that makes sense without breaking binary compatibility. This might be a viable solution for Cats 3, though, and it would be pretty straightforward to change my proof-of-concept to make this configurable (i.e. your build indicates whether the generated code should go behind a banner in src/main/scala or in src_managed).

The alternative seems to be to implement the current Simulacrum functionality either as a Dotty compiler plugin or directly in the compiler.

If we do decide to go the Scalafix route, there's also a question of when. In theory it could happen at any time, since whether the code generation is done by a macro or a Scalafix rule shouldn't make any difference to the published artifacts.

Simulacrum isn't the only thing blocking cross-building on Dotty, though鈥攖he use of kind-projector is also a problem. My proof-of-concept also includes rules for rewriting kind-projector's syntax, but only for the sake of experimenting with Dotty cross-building (the result is an awful mess and not something I'd ever want to merge).

So personally I think we should switch out Simulacrum 1 with a Scalafix-based approach sooner rather than later, but we're still going to have to come up with a Dotty-friendly way to write type lambdas and polymorphic function value definitions before we can start cross-building.

Most helpful comment

My understanding though is that even when -Ykind-projector is implemented, it will only support the * syntax? Cats uses kind-projector's 位[伪 => F[G[伪]]] a lot, so we'd still either need to make Dotty support that syntax or kind-project support Dotty's [伪] =>> F[G[伪]].

I think it's reasonable to extend the scope of Ykind-projector to also support the lambda syntax, it's just a bit more work.

There's also the matter of kind-projector's value-level 位ZipStream ~> Stream syntax.

Eventually dotty should have polymorphic SAM which should help, meanwhile I agree that writing out the new looks fine too.

All 15 comments

The alternative seems to be to implement the current Simulacrum functionality either as a Dotty compiler plugin or directly in the compiler.

If there was no need to cross-compile with Scala 2, and Dotty features could be freely used, would any part of simulacrum still be needed ? If not, maybe the cats typeclasses could be written using Dotty features and somehow Scala 2 code would be generated from that.

My proof-of-concept also includes rules for rewriting kind-projector's syntax, but only for the sake of experimenting with Dotty cross-building (the result is an awful mess and not something I'd ever want to merge).

So dotty has a -Ykind-projector flag for this purpose, but it's not actually implemented yet :) https://github.com/lampepfl/dotty/blob/3a7dfd2c14e1bc146f79faef8fcad42efadcdaeb/compiler/src/dotty/tools/dotc/parsing/Parsers.scala#L1520-L1521 It probably wouldn't be too much work to fix this and support most of the kind projector syntax.

@smarter

would any part of simulacrum still be needed ?

@mpilquist's Spotted Leopards is an experiment in this direction. I'm not personally sold on this encoding of type classes (yet?)鈥擨'd prefer to keep type class operations and syntax methods distinct.

There's also the apply summoner methods, etc., but I guess we could do those as a one-time thing.

So dotty has a -Ykind-projector flag for this purpose, but it's not actually implemented yet

My understanding though is that even when -Ykind-projector is implemented, it will only support the * syntax? Cats uses kind-projector's 位[伪 => F[G[伪]]] a _lot_, so we'd still either need to make Dotty support that syntax or kind-project support Dotty's [伪] =>> F[G[伪]].

There's also the matter of kind-projector's value-level 位[ZipStream ~> Stream](_.value) syntax. Personally I think using this in Cats was a mistake, and that writing out new (Stream ~> ZipStream) is almost always much clearer. I guess I could put together a PR proposing that.

My understanding though is that even when -Ykind-projector is implemented, it will only support the * syntax? Cats uses kind-projector's 位[伪 => F[G[伪]]] a lot, so we'd still either need to make Dotty support that syntax or kind-project support Dotty's [伪] =>> F[G[伪]].

I think it's reasonable to extend the scope of Ykind-projector to also support the lambda syntax, it's just a bit more work.

There's also the matter of kind-projector's value-level 位ZipStream ~> Stream syntax.

Eventually dotty should have polymorphic SAM which should help, meanwhile I agree that writing out the new looks fine too.

I think it's reasonable to extend the scope of Ykind-projector to also support the lambda syntax, it's just a bit more work.

@smarter That's great to hear! I don't want to sound impatient, but do you happen to have any sense of the timeline for work on -Ykind-projector? (I'd be happy to try to help out.)

No one is actively working on it, I'd like to do it but probably won't get to it in the short term, so if you or anyone else is interested, please have a look :) Since it's mostly parsing stuff it should be doable without any deep knowledge of dotty internals, and I'm more than happy to help along (I'm always on https://gitter.im/lampepfl/dotty)

As a first hint, you can get dotty to output the parser AST by running it with -Xprint:parser, furthermore you can see the raw AST instead of the pretty-printed trees by adding -Yplain-printer (and Thicket(List()) is the same thing as EmptyTree)

FWIW the problem of generated members is relevant for monocle @Lenses too. There's no way to do that as generated sources, and I can't imagine the scalafix approach being very pleasant.

I think we really need to convince the Dotty team that being able to generate synthetic members is a really important feature.

@nafg You definitely could use generated sources for something like Monocle lenses (or derived type class instances, etc.). I'm experimenting with a couple of approaches for Circe codecs:

  • Users write object User extends UserCirceCodecGen and the rule generates a UserCirceCodecGen trait in managed source and updates it as needed as part of every build.
  • The rule generates definitions in the companion object (in checked-in code) and users have to run the rule manually when they change their data types.

I'm still not sure how these will feel in practice, but macro-based derivation definitely isn't pain-free either.

@smarter I'm looking at -Ykind-projector now and have a basic implementation working for *. What would be the best place to discuss the scope of the option (e.g. whether it should support -*, 位[`-伪` => ...], etc.)?

@travisbrown I think it depends on how much complexity it brings, I asked Martin and he's OK with lambda syntax stuff like 位[`-伪` => ...] so I suggest doing the minimum needed for cats.

(this could be discussed further on https://github.com/lampepfl/dotty/issues/7139)

@travisbrown boilerplate code generally feels bad, but I think that there are a couple of potential advantages to dropping the macro-based solution that you've only started to hint at with "The generated code is right there in front of you".

  1. Source navigation. I've seen coworkers get tripped up with a scenario like:

    • What's this bimap method that's being called on Either?

    • Click on bimap in IntelliJ. No source found.

    • Shrug. No way to know.

  2. ScalaDoc discoverability. From what I recall, _some_ Simulacrum generated methods end up in the ScalaDocs, but not all of them. I never understood why this was the case, and maybe it no longer is. But even if they are, using Scalafix may open up options to actually give them useful ScalaDocs?

I think we've delayed this decision long enough and @travisbrown has found a solution that should work in a backwards compatible way. I suggest we start implementing this if there are no objections within the next ~10 days?

I opened a PR (#3424) that splits this out from the Dotty cross-build (which has some open questions like which Dotty version we should support, since we're still missing e.g. the scalatestplus dependency for 0.24).

Done in #3424.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kailuowang picture kailuowang  路  3Comments

non picture non  路  3Comments

mcanlas picture mcanlas  路  4Comments

fosskers picture fosskers  路  3Comments

chuwy picture chuwy  路  4Comments