We just saw that #11118 and #11446: looks like string_list is being used to record single values that are bools.
This would make it hard for tools to understand the intent and pushes the burden of parsing data (which is now string data) on the analysis side, increasing both the cost and the likelihood of having bad data.
The code states the following:
// We purposefully make all of our preferences the string_list format to make data analysis
// simpler. While it makes things like booleans a bit more complicated, it means all our
// preferences can be analyzed with the same dashboard and compared.
Setting aside the fact that that this will break GLAM, how does having a list of string having a single value of the wrong type make analysis simpler?
cc @sblatz looks like this was a recent addition - do you know why we did this?
We had a conversation over Slack with @sblatz and Marissa Gorlick. Let me try to capture what was said, for future reference.
The decision to use string_list was made in order to make all "data types the same" to ease future dashboards, in response to Marissa's request for standardization.
While the intent was noble, we all agreed that this was not the right path to pursue, as "standardization" does not necessarily imply "all the is sent as string", which is both a problem for tools and for storage.
The path forward here is:
boolean typestring typestring_list typeWhile this is a very important change to make, right now we're in code freeze so it can't happen right away.
Any standardization can happen at analysis side, if needed, through UDFs.
I'm happy yo provide support with reviews as needed!
cc @mdboom
Most helpful comment
We had a conversation over Slack with @sblatz and Marissa Gorlick. Let me try to capture what was said, for future reference.
The decision to use
string_listwas made in order to make all "data types the same" to ease future dashboards, in response to Marissa's request for standardization.While the intent was noble, we all agreed that this was not the right path to pursue, as "standardization" does not necessarily imply "all the is sent as string", which is both a problem for tools and for storage.
The path forward here is:
booleantypestringtypestring_listtypeWhile this is a very important change to make, right now we're in code freeze so it can't happen right away.
Any standardization can happen at analysis side, if needed, through UDFs.
I'm happy yo provide support with reviews as needed!