Serde: Deserialize into a pre-existing struct

Created on 8 Apr 2017  路  16Comments  路  Source: serde-rs/serde

Example from @burntsushi - when iterating over rows of a CSV file, we would like to be able to reuse the same five String buffers on every row.

#[derive(Debug, Deserialize, PartialEq)]
struct MBTARow {
    trip_id: String,
    arrival_time: String,
    departure_time: String,
    stop_id: String,
    stop_sequence: i32,
    stop_headsign: String,
    pickup_type: i32,
    drop_off_type: i32,
    timepoint: i32,
}

The use case is similar to Clone::clone_from.

derive

Most helpful comment

I might look into this to further optimize webrender deserialization.

All 16 comments

The analogous approach would be something like:

pub trait Deserialize<'de>: Sized {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: Deserializer<'de>;

    fn deserialize_from<D>(&mut self, deserializer: D) -> Result<(), D::Error>
        where D: Deserializer<'de>
    {
        *self = Deserialize::deserialize(deserializer)?;
        Ok(())
    }
}

@nox mentioned being interested in working on this.

I might look into this to further optimize webrender deserialization.

Note the semantics we'd like in webrender is that it blindly clobbers the source struct. On failure it's in a partially overwritten (but memory-safe) state. We want this because serde errors are hard aborts for us (all sources are trusted), and we want to minimize work.

Not sure if that matches what y'all want.

Yes, partially overwritten but memory-safe state is acceptable.

The important thing is that this is about reusing allocations, not merging one deserialized struct into another. So if you have:

struct S {
    a: Option<T>,
    b: Option<T>,
}

Deserializing {"a": ..., "b": ...} followed by deserializing {"a": ...} over top of the same data needs to result in b=None, not retaining its previous value.

Blugh, this is getting to a bit of a clumsy mess. Just to check that I'm understanding this correctly: I basically need to duplicate the entire of the deserialization subsystem with _from variants?

Like every single method on Visitor will need a _from variant? Or I'll need to make a FromVisitor trait...?

@Gankro I hope not... Here is a quick prototype.

Status: this will be useless for webrender as-is because enums represent a fundamental barrier to this optimization. Currently pushing on https://github.com/rust-lang/rfcs/pull/2195 to fix this.

Is the fundamental barrier enums in general, or consecutive messages that contain different variants of the same enum field? If the common case is that consecutive messages contain the same variant, we can certainly optimize that.

The item we're deserializing is an enum whose variant is ~random. In theory, yes, we could do something clever if we got the same variant multiple times in a row, but this basically never happens for us. We need a way to build an enum in-place regardless of its old value.

Ok been a bit sick but I've had a chance to work through enough of this to have an idea of how it looks, and what I need to do for my use-case.

So the trick you showed me makes the API work nice and clean, and I should have structs deriving nice and well with that. Enums are more complicated, and I'll probably need to fork serde_derive to provide a webrender-custom implementation to meet our needs, since you won't want my solution.

So my enum of interest has two nice properties: it's Copy (no dtors), and it has a None-like (C-like?) variant. The fact that it's Copy isn't something I'm going to be able to rely on, though. There are some plans to break that. In addition, I'm not super happy with relying on Copyness since that does nothing for UB like bool = 3. Now in principle we can ignore that UB since we plan to abort of failure, but I want our system to be a bit more robust than that.

So I plan to write a derive(LeakyEnumDeserialize) that does the following:

fn deserialize_enum_from(&mut self, &mut deserializer) -> Result<(), ()> {
  drop_in_place(self);           // clear out old stuff
  self.tag = #nonelike_variant;  // mark all contents as invalid, by setting enum to "None"

  match read_tag()? {
    A => {
      // deserialize fields
      self.A.field1.deserialize_from(deserializer)?
      self.A.field2.deserialize_from(deserializer)?
      // Now that we're done, we can actually say we're A 
      self.tag = A;
      Ok(())
   ...

The basic idea here is the one we use for leak-amplification in Vec::Drain: we mark the enum as empty before starting our work, and then only at the end do we tell it that it has contents, for minimal overhead. For a Copy enum this is perfect, but for one with destructors this can end up leaking stuff. We could set up code to keep track of where we are in the parse and "unwind" our work, but I expect this will pessimize codegen for something that, in our application, is a show-stopping programming error.

Emitting this implementation will be conditional on detecting that the enum has a repr(int), no other reprs (for forward safety compatibility), and a none-like variant. I've already prototyped out the repr detecting logic, since it was unclear to me that repr() annotations would show up in a derive (it does). Presumably slurping up a None-like variant should be trivial.

Very promising early results having only implemented structs, tuples, and primitives (which can be upstreamed to y'all): https://gist.github.com/Gankro/63d1028a0b69e418add58c23fc828243

Need to do some actual profiling though, the assembly is a bit hard to follow since different changes lead to loop funrolling.

One annoying thing I'm seeing for a [f32; 16] is that even in my most optimized result we're still generating what appears to be:

check-len
copy-float
check-len
copy-float
check-len
...

rather than

copy-float
copy-float
copy-float
...

It should be within the contract of deserialize_from to do emit the latter, since we don't guarantee partial deserializations to have any particular form. That said I'm not sure if I can actually coax the three abstractions of serde+bincode+Read to understand this transformation.

Possibly fixing https://bugs.llvm.org/show_bug.cgi?id=34911 would be sufficient.

Implemented in #1094.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dtolnay picture dtolnay  路  4Comments

dtolnay picture dtolnay  路  3Comments

swfsql picture swfsql  路  3Comments

tangkhaiphuong picture tangkhaiphuong  路  3Comments

dtolnay picture dtolnay  路  3Comments