In our projects, we often used generated classes like this (using cats):
case class Thing(i: Int, s String)
import cats.implicits._
(parse[Int], parseString[String]).mapN(Thing.apply) // using applicative feature
Now since the latest versions, the constructor also has an extra field unknownFields: _root_.scalapb.UnknownFieldSet, eg. this does not work anymore:
case class Thing(i: Int, s: String, unknownFields: _root_.scalapb.UnknownFieldSet)
import cats.implicits._
// Does not compile, missing the last parameter
(parse[Int], parseString[String]).mapN(MyCaseClass.apply)
// This works, but requires more writing for many parameters:
(parse[Int], parseString[String]).mapN(MyCaseClass(_, _))
I think this is fine, if we generate an extra method (like of, but with the default Seq.empty):
case class Thing(i: Int, s String, unknownFields: _root_.scalapb.UnknownFieldSet)
object Thing {
// Feel free to come up with a better name, this is just the first thing that came to my mind.
def create(i: Int, s: String): Thing =
Thing(i, s, Seq.empty)
}
import cats.implicits._
// Using the new method, we can again do this:
(parse[Int], parseString[String]).mapN(MyCaseClass.create)
Two things I haven't tested yet:
of might work as wellLet me know what your preferred solution for this problem would be, I'd be happy to help with the implementation and of course testing it!
There's a infile option to disable this feature:
"This feature can be disabled by settings preserve_unknown_fields to false at the file or package level."
Could it be possible to have this as a generation option (like flatpackage and others) to disable it on a project level? In most cases this is not something that we'd want.
@timo-schmid I think of removing unknownFieldSet from of, as most users wouldn't want to be required to pass it. An overload might complicate other use cases, so I am hesitant about adding an overloaded version, but I can be convinced otherwise.
@tOverney we also have package scoped options, so you can specify the option once per project. Hope that works too.
Agree, from the user perspective it's most of the times not needed. I agree
as well with @tOverney in that this feature will rarely be used, maybe preserve_unknown_fields should default to false? The of function would then act as a "normal" constructor.
One thing I wonder as well is - how will something like this affect code written with libraries like Doobie, that derive schemas from the case class. In such a case, for every PB message that is used as a schema, one would have to implement a special schema for the UnknownFieldSet (that is then ignored).
Preserving unknown fields is the current behavior on the official protobufs implementations starting v3.5. ScalaPB tries to follow the general behavior of the official implementation by default. In this case, this can potentially prevent a data loss when there's a service that parses a proto and later serializes it for persistence or for passing forward. The Doobie example is a great example: should unknown fields be ignored or serialized somehow on an extra column to be inspected later? This could save a team from losing data during a new version roll out.
I think that the main problem here is not the feature itself, but that in Scala, users need in some cases to explicitly deal with the new constructor field, while in other languages the field could be easily ignored. (Personal opinion is that many of us choose Scala because it tends to force us to deal with edge cases and missing data at compile time)
I feel that given it's possible to disable this feature by adding a single file into the project, we have an adequate solution for users that this causes trouble for them. I'll submit later a change that removes UnknownFieldSet from of.
Ok cool š
The explanation you gave actually makes a lot of sense. Thanks for explaining!
I released 0.10.2-SNAPSHOT with a fixed of that excludes unknownFields. Can you please try it out by adding
resolvers += Resolver.sonatypeRepo("snapshots")
to both your build.sbt and project/build.sbt and report back if this works?
Works like a charm! š Thanks for the quick fix!
@thesamet Adding UnknownFieldSet to all generated case classes breaks anything using extractors (unapply). I think it would be a good idea to generate backwards compatible unapply-methods in the companions. Wdyt? (Ran into this issue when trying to update Cloudstate to akka-rpc 0.8.0)
Thanks for reporting. Let's figure out a good path forward. I am hesitant in adding a backwards-compatibly unapply this since it's pretty common practice to evolve a schema by adding and sometimes removing fields from protos when adding features. I think that if we have unapplyes that catch both n and n+1 parameters I'll be adding a pitfall when the code can compile but not always work as intended.
In Cloudstate's case, is the concern about the work needed for adding a trailing underscore to capturing UknownFieldSet in many places, or that the end outcome of having the trailing underscore is less desirable?
Since we now that have a def of function that doesn't take UnknownFieldSet, we could create an unapply inside a company object named of, so we can write case MyMessage.of(x, y) => ... - let me know if that would be useful in your case.
@thesamet Not sure what the best path forward is, but I donāt think it unreasonable to generate an additional unapply without the unknownfieldset, because you know if you want to use them. Having to always match-and-discard them is really noisy. Do you envision any other case where additional unapplies would make sense?
@viktorklang It looks like the generated unapply without the UnknownFieldSet will have to be on a different object so it doesn't conflict with the auto-generated unapply scalac generates. (I initially thought it's possible to add an override for unapply or to replace). https://stackoverflow.com/questions/50350079/custom-extractor-of-case-class-is-not-working-in-pattern-matching
@thesamet Is it possible to pass preserve_unknown_fields as a parameter to the generation? I don't really want to add dependencies on scalapb protos in the protos I need to use?
@viktorklang , existing protos don't need to be changed to inhibit unknown fields, it is sufficient to add a single proto that depends on scalapb.proto, and set this for the entire package. If needed, this exta proto file can reside in a different directory you add to PB.protoSources. Would that work for you?
I've been avoiding adding generator parameters that impact source compatibility since it makes it impossible for other proto generators (like akka-grpc) to know which settings were applied by the other generator. It's technically easy to add to the parameter, but the cost of supporting users who run into trouble has been very high with the ones I added in the past.
@thesamet Hmmm, I canāt think of any solution which has no ungortunate side
effects. All I know is the sudden appearance of UnknownFieldSet in
Cheers,
ā
Totally understood. I hope that throwing in an extra proto file in the project's code base makes a reasonable solution so there shouldn't be any code breakage or changes to any other existing code or protos.
@thesamet Is the extra proto per package? Then I need to add 5-8 of those files :/
If everything is under a common parent package com.mycompanyor even just com then it would apply it to everything under it.
Unfortunately they are distinct. Thinking about this more I get the sense
that generating 2 unapplies when the flag is true is probably the least
effort & most correct path forward. Since it truly isnāt aboout the
Cheers,
ā
Cool. I've been thinking of having the new unapply be generated under MyMessage.of, so pattern matching would look like case MyMessage.of(x,y) => .... Let me know if there are other proposals (my current understanding is that Scala doesn't allow to override the unapply it automatically generates).
@thesamet I believe that you can generate 2 unapplies, one for the version without the UnknownFieldSet, and one for the version with the UnknownFieldSet.
So:
final case class Foo(name: _root_.scala.Predef.String = "", unknownFields: _root_.scalapb.UnknownFieldSet = _root_.scalapb.UnknownFieldSet.empty) extends scalapb.GeneratedMessage with scalapb.lenses.Updatable[Foo] {
ā¦
}
object Foo {
def unapply(name: _root_.scala.Predef.String): Option[Foo] = ā¦
def unapply(name: _root_.scala.Predef.String, unknownFields: _root_.scalapb.UnknownFieldSet): Option[Foo] = ā¦
ā¦
}
@thesamet Just an example of the consequence of the current default:
- case (emitter,
- UserFunctionReply(Some(ClientAction(ClientAction.Action.Reply(Reply(some @ Some(payload))))), _)) =>
+ case (
+ emitter,
+ UserFunctionReply(Some(ClientAction(ClientAction.Action.Reply(Reply(some @ Some(payload), _)), _)), _, _)
+ ) =>
And another:
- Cart(Nil) <- getCart(GetShoppingCart(userId)) // Test initial state
- Empty() <- addItem(AddLineItem(userId, productId1, productName1, 1)) // Test add the first product
- Empty() <- addItem(AddLineItem(userId, productId2, productName2, 2)) // Test add the second product
- Empty() <- addItem(AddLineItem(userId, productId1, productName1, 11)) // Test increase quantity
- Empty() <- addItem(AddLineItem(userId, productId2, productName2, 31)) // Test increase quantity
- Cart(items1) <- getCart(GetShoppingCart(userId)) // Test intermediate state
- Empty() <- removeItem(RemoveLineItem(userId, productId1)) // Test removal of first product
+ Cart(Nil, _) <- getCart(GetShoppingCart(userId)) // Test initial state
+ Empty(_) <- addItem(AddLineItem(userId, productId1, productName1, 1)) // Test add the first product
+ Empty(_) <- addItem(AddLineItem(userId, productId2, productName2, 2)) // Test add the second product
+ Empty(_) <- addItem(AddLineItem(userId, productId1, productName1, 11)) // Test increase quantity
+ Empty(_) <- addItem(AddLineItem(userId, productId2, productName2, 31)) // Test increase quantity
+ Cart(items1, _) <- getCart(GetShoppingCart(userId)) // Test intermediate state
+ Empty(_) <- removeItem(RemoveLineItem(userId, productId1)) // Test removal of first product
```
It's definitely adding annoying noise. I totally hear you. Re overriding unapply, I could not get it to work on the same object (please double-check if I miss anything, the signatures here are different from your code):
final case class Foo(x: Int, y: Int, z: String)
object Foo {
def unapply(f: Foo): Option[(Int, Int)] = Some((f.x, f.y))
}
object A {
def main(args: Array[String]): Unit = {
Foo(3, 4, "boo") match {
case Foo(x,y) => println("here")
}
}
}
leads to
[error] /tmp/fff/foo.scala:10:15: wrong number of arguments for pattern Foo(x: Int, y: Int, z: String)
[error] case Foo(x,y) => println("here")
[error] ^
[error] one error found
I tried adding the unapply under a sub-object. This also ends up being not so pretty and creates additional class files for the new sub-object. The number of class files has been a source of concern for other users. It's tricky to balance this one. Maybe the best choice would be a generator parameter to restore the old behavior :/
Perhaps the generated classes should not be case classes, and we should instead generate the appropriate 'apply' and 'unapply' methods ourselves - that way we can have the unknownFields field, but avoid having it show up in the unapply. WDYT?
@raboof That definitely crossed my mind in the past, and again as I was thinking about this ticket. We'd have to generate hashCode, equals, and copy which should be fairly easy. Any experience with how this may end up breaking source compatibility for users?
@thesamet @raboof Another option would to generate the unapply without the unknownFieldsāinstead option for people to retrieve them manually?
case f @ Foo(a, b) => f.unknownFields.ā¦
@viktorklang the unapply with the unknown fields isn't generated by ScalaPB. It's auto-generated by Scala and apparently can't be replaced. If there was a way to do that, it would be my first choice. See code and compiler error here: https://github.com/scalapb/ScalaPB/issues/778#issuecomment-599815048
@thesamet Apologies, the last part of my message was omitted:
"The only sensible way of achieving this is to put unknownFieldSet outside of the first parameter list of the case class, but would also not take it into accound for equality etc:
case class Foo(a: A, b: B)(val unknownFields: UnknownFieldSet = ā¦)"
@viktorklang That proposal sounded very promising, but I noticed that this would break how we call the constructor:
final case class Foo(x: Int=0, y: Int=0)(z: String="boo")
object Foo {
def apply(x: Int, y:Int): Foo = new Foo(x,y)("") // [*]
}
object A {
def main(args: Array[String]): Unit = {
Foo(3,4) // <-- does not compile
}
}
Compilation fails regardless of whether the line [*] is there.
With [*]:
[error] both method apply in object Foo of type (x: Int, y: Int)(z: String)Foo
[error] and method apply in object Foo of type (x: Int, y: Int)Foo
[error] match argument types (Int,Int)
[error] Foo(3,4)
Without [*]:
[error] Unapplied methods are only converted to functions when a function type is expected.
[error] You can make this conversion explicit by writing `apply _` or `apply(_,_)(_)` instead of `apply`.
[error] Foo(3,4)
[error] ^
[error] one error found
We'd all have to instantiate with Foo(3,4)(), which does not seem desirable.
@thesamet
Alright, that's unfortunate!
Then I guess were back at of:
case class Foo(int: Int, unknownFields: String)
object Foo {
object of extends Function2[Int, String, Foo] {
def apply(int: Int, unknownFields: String): Foo = Foo(int, unknownFields)
def unapply(f: Foo) = Some(f.int)
}
}
scala> val Foo.of(x) = Foo(1, "bar")
x: Int = 1
Right, I want to still entertain the idea of using classes instead of case classes. I'll take a closer look at it soon.
@thesamet Cool, looking forward to see what works best!
I created a new branch https://github.com/scalapb/ScalaPB/tree/classes and published 0.11.0-SNAPSHOT to sonatype-snapshots. This experimental version emits classes instead of case classes with generated versions of hashCode, equals, toString, and of course - unapply! ScalaPB/sparksql-scalapb tests pass. Looks like shapeless can pick up val constructor params too. I am mostly concerned about users who use macros over the generated code. I verified that shapeless.Generic works correctly for the generated code.
I would appreciate it if you can try it out on your actual code base (but don't deploy to production yet!), so we can get some early feedback if this is going to be a painful change.
Just bring up something that doesn't seem to have been covered too much above:
I was hoping for the option to generate these case classes without unknownFields field for use in Spark. In my case, I want to write tables with schemas from these generated case classes. I don't want to "pollute" all my tables with an extra field that wouldn't make sense for data scientists running queries against these tables.
Hi @nwparker - when you use sparksql-scalapb the unknown fields do not become a column in a dataframe.
Hi @nwparker - when you use sparksql-scalapb the unknown fields doesn't become a column in a dataframe.
Thanks for the reply, Nadav. That nullifies my concerns. Thanks for commenting!
@thesamet One thing I'd like to settle is whether we should be including UnknownFieldSet in equals/hashCode at all.
@viktorklang We should be including unknownFieldSet in equals/hashCode, in order to match the semantics of protobuf-java.
I haven't received any additional feedback in the past 8 months that unknown fields are an issue in conjunction with pattern matching. The alternative to replace case classes with classes come with some cost, so I'm not inclined to merge. I'm looking forward for feedback on this, and will be happy to reconsider this in the future. Closing for the time being due to the lack of feedback.
Most helpful comment
I created a new branch https://github.com/scalapb/ScalaPB/tree/classes and published
0.11.0-SNAPSHOTto sonatype-snapshots. This experimental version emits classes instead of case classes with generated versions of hashCode, equals, toString, and of course - unapply! ScalaPB/sparksql-scalapb tests pass. Looks like shapeless can pick upvalconstructor params too. I am mostly concerned about users who use macros over the generated code. I verified that shapeless.Generic works correctly for the generated code.I would appreciate it if you can try it out on your actual code base (but don't deploy to production yet!), so we can get some early feedback if this is going to be a painful change.