Serde: Return Vec<u8> instead of string for deserializing

Created on 9 Sep 2016  路  13Comments  路  Source: serde-rs/serde

I have a question about getting vector of bytes on deserializing step from a passed data. At the moment Serde return strings for &[u8] and Vec<u8> types. I understand, that some sort of things also not available with vectors because Rust language doesn't have specialization right now.

If I'm going to make a support for a BERT deserializer for the Vec<u8>, does it mean that necessary to define my own Visitor for a de::SeqVisitor trait (like it specified there)? The BERT deserializer have a parse_binary method which copies a binary data into buffer and returns it to a caller:

use std::io::{self, Read};
use byteorder::{BigEndian, ReadBytesExt};
use serde::de::{self, EnumVisitor, Visitor, Deserialize};
use super::errors::{Error, Result};

pub struct Deserializer<R: Read> {
    reader: R,
    header: Option<u8>,
}

impl<R: Read> Read for Deserializer<R> {
    #[inline]
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        self.reader.read(buf)
    }
}

impl<R: Read> de::Deserializer for Deserializer<R> {
    type Error = Error;

    forward_to_deserialize! {
        bool usize u8 u16 u32 u64 isize i8 i16 i32 i64 f32 f64 char
        str string unit seq seq_fixed_size bytes option map unit_struct
        tuple_struct struct struct_field tuple ignored_any newtype_struct
    }

    #[inline]
    fn deserialize<V: Visitor>(&mut self, visitor: V) -> Result<V::Value> {
        if self.header.is_none() {
            self.header = Some(try!(self.read_u8()));
        }

        let result = self.parse_value(visitor);
        self.header = None;
        result
    }

    #[inline]
    fn deserialize_enum<V: EnumVisitor>(
        &mut self, _enum: &'static str, _variants: &'static [&'static str],
        mut visitor: V
    ) -> Result<V::Value> {
        Err(Error::UnsupportedType)
    }
}

impl<R: Read> Deserializer<R> {
    /// Creates the BERT parser from an `std::io::Read`.
    #[inline]
    pub fn new(reader: R) -> Deserializer<R> {
        Deserializer {
            reader: reader,
            header: None,
        }
    }

    /// The `Deserializer::end` method should be called after a value has
    /// been fully deserialized. This allows the `Deserializer` to validate
    /// that the input stream is at the end.
    #[inline]
    pub fn end(&mut self) -> Result<()> {
        if try!(self.read(&mut [0; 1])) == 0 {
            Ok(())
        } else {
            Err(Error::TrailingBytes)
        }
    }

    #[inline]
    fn parse_value<V: Visitor>(&mut self, visitor: V) -> Result<V::Value> {
        let header = self.header.unwrap();
        self.header = None;
        match header {
            109 => self.parse_binary(header, visitor),
            _ => Err(Error::InvalidTag)
        }
    }

    // Example of data [0, 0, 0, 5, 118, 97, 108, 117, 101]
   // First 4 bytes is length, after - data (not necessarily a string)           
   #[inline]
   fn parse_binary<V: Visitor>(&mut self, _header: u8, mut visitor: V) -> Result<V::Value> {
       let length = try!(self.read_i32::<BigEndian>());
       let mut buffer = vec![0; length as usize];
       try!(self.reader.read_exact(&mut buffer));
       visitor.visit_byte_buf(buffer);
   }
}
support

All 13 comments

At the moment Serde return strings for &[u8] and Vec types

serde only does that when you deserialize to a String.

If I'm going to make a support for a BERT deserializer for the Vec, does it mean that necessary to define my own Visitor for a de::SeqVisitor trait (like it specified there)? The BERT deserializer have a parse_binary method which copies a binary data into buffer and returns it to a caller:

jup, that's the way to go. You also need to implement deserialize_seq and call visitor.visit_seq(your_seq_visitor).

This code was compiling successfully, but in my test for a binary I'm getting the following error:

---- test_deserializers::test_deserialize_binary stdout ----
    thread 'test_deserializers::test_deserialize_binary' panicked at 'called `Result::unwrap()` on an `Err` value: Custom("Invalid type. Expected `Seq`")', ../src/libcore/result.rs:788

At the moment the BinarySeqVisitor and invoke it from the Deserializer<R: Read> looks like this:

impl<R: Read> Deserializer<R> {
    // ... some other implementation of Deserializer<R: Read>
    #[inline]
    fn parse_binary<V: Visitor>(
        &mut self, _header: u8, mut visitor: V
    ) -> Result<V::Value> {
        let length = self.read_i32::<BigEndian>().unwrap() as usize;
        visitor.visit_seq(BinarySeqVisitor::new(self, Some(length)))
    }
}


struct BinarySeqVisitor<'a, R: 'a + Read> {
    de: &'a mut Deserializer<R>,
    length: Option<usize>
}

impl<'a, R: 'a + Read> BinarySeqVisitor<'a, R> {
    #[inline]
    fn new(de: &'a mut Deserializer<R>, length: Option<usize>) -> Self {
        BinarySeqVisitor { de: de, length: length }
    }
}

impl<'a, R: Read> de::SeqVisitor for BinarySeqVisitor<'a, R> {
    type Error = Error;

    fn visit<T: Deserialize>(&mut self) -> Result<Option<T>> {
        match self.length {
            Some(0) => return Ok(None),
            Some(ref mut len) => *len -= 1,
            _ => {}
        };
        match Deserialize::deserialize(self.de) {
            Ok(value) => Ok(Some(value)),
            Err(e) => Err(e)
        }
    }

    fn end(&mut self) -> Result<()> {
        if let Some(0) = self.length {
            Ok(())
        } else {
            Err(Error::TrailingBytes)
        }
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        match self.length {
            Some(len) => (len, self.length),
            None => (0, Some(0))
        }
    }
}

impl<'a, R: Read> de::Visitor for BinarySeqVisitor<'a, R> {
    type Value = Vec<u8>;

    #[inline]
    fn visit_unit<E>(&mut self) -> std::result::Result<Vec<u8>, E>
        where E: de::Error,
    {
        Ok(Vec::new())
    }

    #[inline]
    fn visit_seq<V>(&mut self, mut visitor: V) -> std::result::Result<Vec<u8>, V::Error>
        where V: de::SeqVisitor,
    {
        let mut values = Vec::with_capacity(visitor.size_hint().0);

        while let Some(value) = try!(visitor.visit()) {
            values.push(value);
        }

        try!(visitor.end());
        Ok(values)
    }
}

Any ideas how to fix it?

that means that the visitor you are passing to fn parse_binary<V: Visitor>( isn't one that expects to give you a sequence.

It means to change which signature? Which invoke and pass it to the parse_binary method? Or perhaps parse_value? I really can't understand how to change the code, that it works correctly as I expect.

@dtolnay could you share a little bit more information about how to make my own vector/sequence visitor with a current state of my codebase?

@Relrin I'm not entirely sure, as I see no calls to parse_binary, do you have the code uploaded somewhere?

Note that with serde it's often useful to use a debugger to step through the code (or even just spray logging statements everywhere).

@oli-obk Current code is a part of the following pull request, which I've showed in this issue.

@dtolnay Do you have any ideas how to fix this code?
I've tried to change a logic of parse_binary method onto:

#[inline]
fn parse_binary<V: Visitor>(
    &mut self, _header: u8, mut visitor: V
) -> Result<V::Value> {
    let length = self.read_i32::<BigEndian>().unwrap() as usize;
    let seq_visitor = BinarySeqVisitor::new(self, Some(length));
    seq_visitor.visit_seq(self)
}

But compiler generates the following errors:

error[E0277]: the trait bound `deserializers::Deserializer<R>: serde::de::SeqVisitor` is not satisfied
   --> /Users/savicvalera/code/bert-rs/bert/src/deserializers.rs:132:21
    |
132 |         seq_visitor.visit_seq(self)
    |                     ^^^^^^^^^
    |
    = note: required because of the requirements on the impl of `serde::de::SeqVisitor` for `&mut deserializers::Deserializer<R>`

error[E0308]: mismatched types
   --> /Users/savicvalera/code/bert-rs/bert/src/deserializers.rs:132:9
    |
132 |         seq_visitor.visit_seq(self)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected associated type, found struct `std::vec::Vec`
    |
    = note: expected type `std::result::Result<<V as serde::de::Visitor>::Value, errors::Error>`
    = note:    found type `std::result::Result<std::vec::Vec<u8>, _>`

@dtolnay I have had a functionality which return to a user String value after parsing a binary. Currently I'm trying to extract bytes from the binary and return it as is as the result (in our case is Vec<u8>). But I can't figure out, how to it correctly. In the comments above described a current state of parse_binary method, which trying to use my sequence deserializer, but its failed.

That won't work because String can only be deserialized through visit_str / visit_string / visit_bytes / visit_byte_buf and Vec can only be deserialized through visit_unit / visit_seq. So parse_binary needs to decide whether to support String or Vec. If you decide on Vec I can walk you through that but it will stop working for String.

Once we implement specialization for Vec (#309) the code I showed above will work for deserializing both String and Vec.

I filed #555 to implement visit_seq for deserializing String, and that way it will work for both String and Vec without requiring specialization.

Your implementation in the PR (including parse_binary) is correct except for this part:

match Deserialize::deserialize(self.de) {
    Ok(value) => Ok(Some(value)),
    Err(e) => Err(e)
}

According to your test, you expect the binary data to look like this:

109,         // binary
0, 0, 0, 5,  // length
118,         // "v"
97,          // "a"
108,         // "l"
117,         // "u"
101          // "e"

But your implementation expects the following:

109,         // binary
0, 0, 0, 5,  // length
97,          // unsigned integer
118,         // "v"
97,          // unsigned integer
97,          // "a"
97,          // unsigned integer
108,         // "l"
97,          // unsigned integer
117,         // "u"
97,          // unsigned integer
101          // "e"

The fix is to deserialize from something that knows to read a u8 without needing to see the "unsigned integer" tag every time:

impl<'a, R: Read> de::SeqVisitor for BinarySeqVisitor<'a, R> {
    fn visit<T: Deserialize>(&mut self) -> Result<Option<T>> {
        match self.length {
            Some(0) => return Ok(None),
            Some(ref mut len) => *len -= 1,
            None => {}
        };
        Deserialize::deserialize(self).map(Some)
    }

    // ...
}

impl<'a, R: Read> de::Deserializer for BinarySeqVisitor<'a, R> {
    type Error = Error;

    fn deserialize<V>(&mut self, mut visitor: V) -> Result<V::Value>
        where V: Visitor
    {
        visitor.visit_u8(try!(self.de.read_u8()))
    }

    forward_to_deserialize! {
        bool usize u8 u16 u32 u64 isize i8 i16 i32 i64 f32 f64 char str string
        unit option seq seq_fixed_size bytes map unit_struct newtype_struct
        tuple_struct struct struct_field tuple enum ignored_any
    }
}

Then in the unit test:

let binary: Vec<u8> = binary_to_term(&data).unwrap();
assert_eq!(b"value", binary.as_slice());

Side note: you can get rid of the impl de::Visitor for BinarySeqVisitor<'a, R> because nothing uses it.

Thank @oli-obk @dtolnay for helping me out! It works now 馃檪

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dtolnay picture dtolnay  路  3Comments

dtolnay picture dtolnay  路  3Comments

dtolnay picture dtolnay  路  3Comments

dtolnay picture dtolnay  路  3Comments

dtolnay picture dtolnay  路  3Comments