Scalapb: Provide alternative method to create instance without "unknownFields"

Created on 10 Mar 2020 · 38Comments · Source: scalapb/ScalaPB

In our projects, we often used generated classes like this (using cats):

case class Thing(i: Int, s String)
import cats.implicits._
(parse[Int], parseString[String]).mapN(Thing.apply) // using applicative feature

Now since the latest versions, the constructor also has an extra field unknownFields: _root_.scalapb.UnknownFieldSet, eg. this does not work anymore:

case class Thing(i: Int, s: String, unknownFields: _root_.scalapb.UnknownFieldSet)
import cats.implicits._
// Does not compile, missing the last parameter
(parse[Int], parseString[String]).mapN(MyCaseClass.apply)
// This works, but requires more writing for many parameters:
(parse[Int], parseString[String]).mapN(MyCaseClass(_, _))

I think this is fine, if we generate an extra method (like of, but with the default Seq.empty):

case class Thing(i: Int, s String, unknownFields: _root_.scalapb.UnknownFieldSet)
object Thing {
  // Feel free to come up with a better name, this is just the first thing that came to my mind.
  def create(i: Int, s: String): Thing =
    Thing(i, s, Seq.empty)
}
import cats.implicits._
// Using the new method, we can again do this:
(parse[Int], parseString[String]).mapN(MyCaseClass.create)

Two things I haven't tested yet:

Creating parameter groups (in the case class definition) might work as well
Overloading of might work as well

Let me know what your preferred solution for this problem would be, I'd be happy to help with the implementation and of course testing it!

Source

timo-schmid

Most helpful comment

I created a new branch https://github.com/scalapb/ScalaPB/tree/classes and published 0.11.0-SNAPSHOT to sonatype-snapshots. This experimental version emits classes instead of case classes with generated versions of hashCode, equals, toString, and of course - unapply! ScalaPB/sparksql-scalapb tests pass. Looks like shapeless can pick up val constructor params too. I am mostly concerned about users who use macros over the generated code. I verified that shapeless.Generic works correctly for the generated code.

I would appreciate it if you can try it out on your actual code base (but don't deploy to production yet!), so we can get some early feedback if this is going to be a painful change.

thesamet on 20 Mar 2020

👍2

All 38 comments

There's a infile option to disable this feature:
"This feature can be disabled by settings preserve_unknown_fields to false at the file or package level."

Could it be possible to have this as a generation option (like flatpackage and others) to disable it on a project level? In most cases this is not something that we'd want.

tOverney on 10 Mar 2020

@timo-schmid I think of removing unknownFieldSet from of, as most users wouldn't want to be required to pass it. An overload might complicate other use cases, so I am hesitant about adding an overloaded version, but I can be convinced otherwise.

@tOverney we also have package scoped options, so you can specify the option once per project. Hope that works too.

thesamet on 10 Mar 2020

Agree, from the user perspective it's most of the times not needed. I agree
as well with @tOverney in that this feature will rarely be used, maybe preserve_unknown_fields should default to false? The of function would then act as a "normal" constructor.

One thing I wonder as well is - how will something like this affect code written with libraries like Doobie, that derive schemas from the case class. In such a case, for every PB message that is used as a schema, one would have to implement a special schema for the UnknownFieldSet (that is then ignored).

timo-schmid on 10 Mar 2020

Preserving unknown fields is the current behavior on the official protobufs implementations starting v3.5. ScalaPB tries to follow the general behavior of the official implementation by default. In this case, this can potentially prevent a data loss when there's a service that parses a proto and later serializes it for persistence or for passing forward. The Doobie example is a great example: should unknown fields be ignored or serialized somehow on an extra column to be inspected later? This could save a team from losing data during a new version roll out.

I think that the main problem here is not the feature itself, but that in Scala, users need in some cases to explicitly deal with the new constructor field, while in other languages the field could be easily ignored. (Personal opinion is that many of us choose Scala because it tends to force us to deal with edge cases and missing data at compile time)

I feel that given it's possible to disable this feature by adding a single file into the project, we have an adequate solution for users that this causes trouble for them. I'll submit later a change that removes UnknownFieldSet from of.

thesamet on 10 Mar 2020

Ok cool 👍

The explanation you gave actually makes a lot of sense. Thanks for explaining!

timo-schmid on 10 Mar 2020

I released 0.10.2-SNAPSHOT with a fixed of that excludes unknownFields. Can you please try it out by adding

resolvers += Resolver.sonatypeRepo("snapshots")

to both your build.sbt and project/build.sbt and report back if this works?

thesamet on 11 Mar 2020

Works like a charm! 👍 Thanks for the quick fix!

timo-schmid on 11 Mar 2020

@thesamet Adding UnknownFieldSet to all generated case classes breaks anything using extractors (unapply). I think it would be a good idea to generate backwards compatible unapply-methods in the companions. Wdyt? (Ran into this issue when trying to update Cloudstate to akka-rpc 0.8.0)

viktorklang on 15 Mar 2020

Thanks for reporting. Let's figure out a good path forward. I am hesitant in adding a backwards-compatibly unapply this since it's pretty common practice to evolve a schema by adding and sometimes removing fields from protos when adding features. I think that if we have unapplyes that catch both n and n+1 parameters I'll be adding a pitfall when the code can compile but not always work as intended.

In Cloudstate's case, is the concern about the work needed for adding a trailing underscore to capturing UknownFieldSet in many places, or that the end outcome of having the trailing underscore is less desirable?

Since we now that have a def of function that doesn't take UnknownFieldSet, we could create an unapply inside a company object named of, so we can write case MyMessage.of(x, y) => ... - let me know if that would be useful in your case.

thesamet on 15 Mar 2020

@thesamet Not sure what the best path forward is, but I don’t think it unreasonable to generate an additional unapply without the unknownfieldset, because you know if you want to use them. Having to always match-and-discard them is really noisy. Do you envision any other case where additional unapplies would make sense?

viktorklang on 15 Mar 2020

@viktorklang It looks like the generated unapply without the UnknownFieldSet will have to be on a different object so it doesn't conflict with the auto-generated unapply scalac generates. (I initially thought it's possible to add an override for unapply or to replace). https://stackoverflow.com/questions/50350079/custom-extractor-of-case-class-is-not-working-in-pattern-matching

thesamet on 15 Mar 2020

@thesamet Is it possible to pass preserve_unknown_fields as a parameter to the generation? I don't really want to add dependencies on scalapb protos in the protos I need to use?

viktorklang on 15 Mar 2020

@viktorklang , existing protos don't need to be changed to inhibit unknown fields, it is sufficient to add a single proto that depends on scalapb.proto, and set this for the entire package. If needed, this exta proto file can reside in a different directory you add to PB.protoSources. Would that work for you?

I've been avoiding adding generator parameters that impact source compatibility since it makes it impossible for other proto generators (like akka-grpc) to know which settings were applied by the other generator. It's technically easy to add to the parameter, but the cost of supporting users who run into trouble has been very high with the ones I added in the past.

thesamet on 15 Mar 2020

@thesamet Hmmm, I can’t think of any solution which has no ungortunate side
effects. All I know is the sudden appearance of UnknownFieldSet in

extractors are breaking a lot of existing code :-/

Cheers,
√

viktorklang on 15 Mar 2020

Totally understood. I hope that throwing in an extra proto file in the project's code base makes a reasonable solution so there shouldn't be any code breakage or changes to any other existing code or protos.

thesamet on 16 Mar 2020

@thesamet Is the extra proto per package? Then I need to add 5-8 of those files :/

viktorklang on 16 Mar 2020

If everything is under a common parent package com.mycompanyor even just com then it would apply it to everything under it.

thesamet on 16 Mar 2020

Unfortunately they are distinct. Thinking about this more I get the sense
that generating 2 unapplies when the flag is true is probably the least
effort & most correct path forward. Since it truly isn’t aboout the

contract, but rather the code’s concern.

Cheers,
√

viktorklang on 16 Mar 2020

Cool. I've been thinking of having the new unapply be generated under MyMessage.of, so pattern matching would look like case MyMessage.of(x,y) => .... Let me know if there are other proposals (my current understanding is that Scala doesn't allow to override the unapply it automatically generates).

thesamet on 16 Mar 2020

@thesamet I believe that you can generate 2 unapplies, one for the version without the UnknownFieldSet, and one for the version with the UnknownFieldSet.

So:

final case class Foo(name: _root_.scala.Predef.String = "", unknownFields: _root_.scalapb.UnknownFieldSet = _root_.scalapb.UnknownFieldSet.empty) extends scalapb.GeneratedMessage with scalapb.lenses.Updatable[Foo] {
  …
}

object Foo {
  def unapply(name: _root_.scala.Predef.String): Option[Foo] = …
  def unapply(name: _root_.scala.Predef.String, unknownFields: _root_.scalapb.UnknownFieldSet): Option[Foo] = …

  …
}

viktorklang on 16 Mar 2020

@thesamet Just an example of the consequence of the current default:

-          case (emitter,
-                UserFunctionReply(Some(ClientAction(ClientAction.Action.Reply(Reply(some @ Some(payload))))), _)) =>
+          case (
+              emitter,
+              UserFunctionReply(Some(ClientAction(ClientAction.Action.Reply(Reply(some @ Some(payload), _)), _)), _, _)
+              ) =>

And another:

-        Cart(Nil) <- getCart(GetShoppingCart(userId)) // Test initial state
-        Empty() <- addItem(AddLineItem(userId, productId1, productName1, 1)) // Test add the first product
-        Empty() <- addItem(AddLineItem(userId, productId2, productName2, 2)) // Test add the second product
-        Empty() <- addItem(AddLineItem(userId, productId1, productName1, 11)) // Test increase quantity
-        Empty() <- addItem(AddLineItem(userId, productId2, productName2, 31)) // Test increase quantity
-        Cart(items1) <- getCart(GetShoppingCart(userId)) // Test intermediate state
-        Empty() <- removeItem(RemoveLineItem(userId, productId1)) // Test removal of first product
+        Cart(Nil, _) <- getCart(GetShoppingCart(userId)) // Test initial state
+        Empty(_) <- addItem(AddLineItem(userId, productId1, productName1, 1)) // Test add the first product
+        Empty(_) <- addItem(AddLineItem(userId, productId2, productName2, 2)) // Test add the second product
+        Empty(_) <- addItem(AddLineItem(userId, productId1, productName1, 11)) // Test increase quantity
+        Empty(_) <- addItem(AddLineItem(userId, productId2, productName2, 31)) // Test increase quantity
+        Cart(items1, _) <- getCart(GetShoppingCart(userId)) // Test intermediate state
+        Empty(_) <- removeItem(RemoveLineItem(userId, productId1)) // Test removal of first product

```

viktorklang on 17 Mar 2020

It's definitely adding annoying noise. I totally hear you. Re overriding unapply, I could not get it to work on the same object (please double-check if I miss anything, the signatures here are different from your code):

final case class Foo(x: Int, y: Int, z: String)

object Foo {
  def unapply(f: Foo): Option[(Int, Int)] = Some((f.x, f.y))
}

object A {
  def main(args: Array[String]): Unit = {
    Foo(3, 4, "boo") match {
      case Foo(x,y) => println("here")
    }
  }
}

leads to

[error] /tmp/fff/foo.scala:10:15: wrong number of arguments for pattern Foo(x: Int, y: Int, z: String)
[error]       case Foo(x,y) => println("here")
[error]               ^
[error] one error found

I tried adding the unapply under a sub-object. This also ends up being not so pretty and creates additional class files for the new sub-object. The number of class files has been a source of concern for other users. It's tricky to balance this one. Maybe the best choice would be a generator parameter to restore the old behavior :/

thesamet on 17 Mar 2020

Perhaps the generated classes should not be case classes, and we should instead generate the appropriate 'apply' and 'unapply' methods ourselves - that way we can have the unknownFields field, but avoid having it show up in the unapply. WDYT?

raboof on 17 Mar 2020

@raboof That definitely crossed my mind in the past, and again as I was thinking about this ticket. We'd have to generate hashCode, equals, and copy which should be fairly easy. Any experience with how this may end up breaking source compatibility for users?

thesamet on 17 Mar 2020

@thesamet @raboof Another option would to generate the unapply without the unknownFields—instead option for people to retrieve them manually?

case f @ Foo(a, b) => f.unknownFields.…

viktorklang on 17 Mar 2020

@viktorklang the unapply with the unknown fields isn't generated by ScalaPB. It's auto-generated by Scala and apparently can't be replaced. If there was a way to do that, it would be my first choice. See code and compiler error here: https://github.com/scalapb/ScalaPB/issues/778#issuecomment-599815048

thesamet on 17 Mar 2020

👍1

@thesamet Apologies, the last part of my message was omitted:

"The only sensible way of achieving this is to put unknownFieldSet outside of the first parameter list of the case class, but would also not take it into accound for equality etc:

case class Foo(a: A, b: B)(val unknownFields: UnknownFieldSet = …)"

viktorklang on 17 Mar 2020

@viktorklang That proposal sounded very promising, but I noticed that this would break how we call the constructor:

final case class Foo(x: Int=0, y: Int=0)(z: String="boo")

object Foo {
  def apply(x: Int, y:Int): Foo = new Foo(x,y)("") // [*]
}

object A {
  def main(args: Array[String]): Unit = {
    Foo(3,4)  // <-- does not compile
  }
}

Compilation fails regardless of whether the line [*] is there.

With [*]:

[error] both method apply in object Foo of type (x: Int, y: Int)(z: String)Foo
[error] and  method apply in object Foo of type (x: Int, y: Int)Foo
[error] match argument types (Int,Int)
[error]     Foo(3,4)

Without [*]:

[error] Unapplied methods are only converted to functions when a function type is expected.
[error] You can make this conversion explicit by writing `apply _` or `apply(_,_)(_)` instead of `apply`.
[error]     Foo(3,4)
[error]        ^
[error] one error found

We'd all have to instantiate with Foo(3,4)(), which does not seem desirable.

thesamet on 17 Mar 2020

@thesamet

Alright, that's unfortunate!

Then I guess were back at of:

case class Foo(int: Int, unknownFields: String)

object Foo {
  object of extends Function2[Int, String, Foo] {
    def apply(int: Int, unknownFields: String): Foo = Foo(int, unknownFields)
    def unapply(f: Foo) = Some(f.int)
  }
}

scala> val Foo.of(x) = Foo(1, "bar")
x: Int = 1

viktorklang on 18 Mar 2020

Right, I want to still entertain the idea of using classes instead of case classes. I'll take a closer look at it soon.

thesamet on 18 Mar 2020

@thesamet Cool, looking forward to see what works best!

viktorklang on 18 Mar 2020

I would appreciate it if you can try it out on your actual code base (but don't deploy to production yet!), so we can get some early feedback if this is going to be a painful change.

thesamet on 20 Mar 2020

👍2

Just bring up something that doesn't seem to have been covered too much above:

I was hoping for the option to generate these case classes without unknownFields field for use in Spark. In my case, I want to write tables with schemas from these generated case classes. I don't want to "pollute" all my tables with an extra field that wouldn't make sense for data scientists running queries against these tables.

nwparker on 26 Mar 2020

Hi @nwparker - when you use sparksql-scalapb the unknown fields do not become a column in a dataframe.

thesamet on 26 Mar 2020

👍1

Hi @nwparker - when you use sparksql-scalapb the unknown fields doesn't become a column in a dataframe.

Thanks for the reply, Nadav. That nullifies my concerns. Thanks for commenting!

nwparker on 26 Mar 2020

@thesamet One thing I'd like to settle is whether we should be including UnknownFieldSet in equals/hashCode at all.

viktorklang on 30 Mar 2020

@viktorklang We should be including unknownFieldSet in equals/hashCode, in order to match the semantics of protobuf-java.

thesamet on 30 Mar 2020

I haven't received any additional feedback in the past 8 months that unknown fields are an issue in conjunction with pattern matching. The alternative to replace case classes with classes come with some cost, so I'm not inclined to merge. I'm looking forward for feedback on this, and will be happy to reconsider this in the future. Closing for the time being due to the lack of feedback.

thesamet on 1 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings