Was not sure if I should file this against serde_json or here. As far as I can tell the bug is more likely in serde itself. While implementing #1179 I noticed that non string keys are not supported for flattening. However the same behavior happens if buffering happens for internally tagged enums:
Example that shows the problem:
#[macro_use]
extern crate serde_derive;
extern crate serde;
extern crate serde_json;
use std::collections::HashMap;
#[derive(Deserialize, Debug)]
#[serde(tag = "type")]
enum E {
M(HashMap<u32, ()>),
}
fn main() {
let j = r#" {"type": "M", "1": null} "#;
println!("{}", serde_json::from_str::<E>(j).unwrap_err());
}
This currently fails with this error:
invalid type: string "1", expected u32
However a HashMap<u32, ()> is otherwise entirely permissible by serde_json.
There are some other JSON-specific assumptions baked into Content too. For example the representation of Serde enums happens to be what serde_json uses but not necessarily shared by other formats.
I suspect we will need a format-specific optional API on Deserializer for buffering data in a way that is consistent for that format. The default implementation would basically be equivalent to today's Content but formats could override it to use their own representation. Even serde_json may want to switch and use something like serde_json::Value for buffering rather than Content.
We should start by identifying all the ways Content is used in our generated code and generalize those into an API that can be provided by the deserializer.
This is complicated by associated type defaults being unstable https://github.com/rust-lang/rust/issues/29661. I thought we might want a format-specific buffer type as a Deserializer associated type with the default being Content, but we will need to find a different way.
Here is a backdoor way to add an associated type.
struct Content;
trait Deserializer {
// Suppose we want something like this but associated type defaults are unstable.
type Buffer = Content;
}
trait Deserializer {
// Instead write it as an associated function that forwards to a generic callback.
fn deserialize_with_buffering<F>(callback: F) -> F::Value
where
F: DeserializeWithBuffer,
{
callback.run::<Content>()
}
}
trait DeserializeWithBuffer {
type Value;
fn run<Buffer>(self) -> Self::Value;
}
Is there any update on this?
Customizing the buffer type by Deserializer could provide a powerful alternative to #1327. The key constraint being satisfied in that approach (as compared to just stashing state in some thread local) was to accurately expose state during deserialization of untagged, internally tagged, and flattened members.
If a Deserializer can replace the buffer type, a library could provide a stateful Deserializer built on TLS by wrapping any existing Deserializer and wrapping that Deserializer's buffer type to correctly track the state.
I implemented a proof of concept in #1354.
Let's give this another attempt as associated type defaults (rust-lang/rust#29661) approaches stabilization. I believe the workaround without associated type defaults in #1354 could have been made to work but would have been too complicated to maintain.
Is it possible that this issue explains the behavior described in nox/serde_urlencoded#66? In that case, the author is trying to deserialize the key-value pair bar=123 into a variant B { bar: i32 } in an untagged enum. It seems like this should work. Deserialization fails because "data did not match any variant of untagged enum Q2". When using a similar enum Q1 where the variant is B { bar: String }, it works as expected, with bar = "123".
Yeah that looks like a consequence of this issue. #[serde(with = "serde_with::rust::display_fromstr")] on the field (https://docs.rs/serde_with/1.4.0/serde_with/rust/display_fromstr/index.html) would work around it in that case but obviously is not ideal.