Powershell: Make Export-Csv and ConvertTo-Csv support hashtables (dictionaries)

Created on 6 Nov 2019  路  18Comments  路  Source: PowerShell/PowerShell

Summary of the new feature/enhancement

In many contexts in PowerShell, custom objects and hashtables can conveniently be used interchangeably, such as in JSON serialization (ConvertTo-Json)

However, Export-Csv and ConvertTo-Csv currently do not support dictionaries ((ordered) hashtables, IDictionary instances) meaningfully: they serialize the dictionary _itself_.

Making these cmdlets serialize the key-value pairs, analogous to property-name-value pairs in [pscustomobject] input would be helpful.

# OK - custom object input
[pscustomobject] @{ prop=1 } | ConvertTo-Csv | Should -Be '"prop"', '"1"'

# Currently unsupported: hashtable input
@{ prop=1 } | ConvertTo-Csv | Should -Be '"prop"', '"1"'

The latter test fails, indicating the currently useless serialization of hashtables:

Expected @('"prop"', '"1"'), but got
@('"IsReadOnly","IsFixedSize","IsSynchronized","Keys","Values","SyncRoot","Count"',
 '"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"'

Proposed technical implementation details (optional)

Make Export-Csv and ConvertTo-Csv detect IDictionary input and serialize its key-value pairs instead of the dictionary object itself.

Area-Cmdlets-Utility Issue-Enhancement

Most helpful comment

That is, the proposal is not only to support collections (enumerables, lists) whose elements are [pscustomobject] instances, but also those whose elements are IDictionary instances.

:-) Thanks for my education. I see your point.

My concern was about follow scenario:

Get-Date | Export-Csv c:\tmp\q.txt -IncludeTypeInformation
$a=Import-Csv C:\tmp\q.txt
$a.psobject
$a

While Export/Import-CliXml is universal, Export/Import-Csv give great UX and better performance for special, table, case, and I'd want lost this. Sorry that I was not accurate enough. I mistakenly thought that IncludeTypeInformation was by default although it was in Windows PowerShell, in Core it was changed (by me?! :upside_down_face:)

All 18 comments

Related #8855 (can we move it to the issue too?)

We need to add new switch to avoid a breaking change.
Also there is a question about Collection and related interfaces.

Current behaviour for ConvertTo-Csv results in data that is essentially useless:

PS>  $data = 1..10 | % { @{ Number = $_ } }
PS>  $data

Name                           Value
----                           -----
Number                         1
Number                         2
Number                         3
Number                         4
Number                         5
Number                         6
Number                         7
Number                         8
Number                         9
Number                         10

PS>  $data | convertto-csv
"IsReadOnly","IsFixedSize","IsSynchronized","Keys","Values","SyncRoot","Count"
"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"
"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"
"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"
"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"
"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"
"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"
"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"
"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"
"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"
"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"

Is there any reason someone would rely on this behaviour?

@vexx32 I think the design is PowerShell generalization of psobject serialization/deserialization vs data serialization/deserialization that we discuss in the issue.

@iSazonov:

If there's a way to support this as part of a more generalized feature that may even improve performance for the [pscustomobject] case, then all the better.

I haven't looked at IDataView yet, and I don't know how realistic near-term use in PowerShell is - by contrast, enabling support for IDictionary specifically seems like a pretty quick enhancement to make (that could still benefit from later under-the-hood optimizations, as long as the behavior doesn't change).

Either way, I agree with @vexx32 that there's no backward-compatibility concern here and therefore no need for a new switch.

I think we should also consider adding a switch to ConvertFrom-Csv and Import-Csv in order to import the data -AsHashtable for symmetry.

Either way, I agree with @vexx32 that there's no backward-compatibility concern here and therefore no need for a new switch.

It is not clear if we want change output:

Expected @('"prop"', '"1"'), but got
@('"IsReadOnly","IsFixedSize","IsSynchronized","Keys","Values","SyncRoot","Count"',
 '"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Collections.Hashtable","1"'

Great idea, @vexx32 - see #11027.

@iSazonov:

To me, it's quite obvious that no one would rely on this output - the _only_ piece of information remotely of interest in this output that is _specific to the input object_ is the _entry count_ (column "Count") - and for that you obviously don't need CSV output.

I would agree that the current output is not useful and thus very unlikely to break someone.

If we want address more advanced scenarios we could design new Import/Export-TabularData cmdlets where "Data" better reflects the focus on data processing.

I don't see a need to divide features into a new cmdlet at the moment. We're not spinning up impromptu SQL servers to process data, we're just adding a sensible input/output type. The addition isn't particularly significant, in my opinion, and shouldn't warrant additional commands.

What is alternative for serialization/deserialization PowerShell objects?

@iSazonov, what do you mean?

10916 This occurs from time to time. Users want to save an object and then restore it as a native object. It鈥檚 worth to enhance our engine to support this, for Export/Import-Csv too that original design is for.

Also I still don't understand (see my comment above) why we only consider IDictionary if there are IList, IEnumerable, ICollection.

We're definitely _not_ talking about a general serialization feature here.

(The latter is what Export-Clixml is for (and enhancing that to support type-faithful deserialization for more than the handful of currently supported well-known types would be great, but also sounds challenging; #10916, if I understand it correctly, actually proposes something different, which sounds even more challenging: it is not asking for type-faithful deserialization - categorical support for which is fundamentally impossible - but for proxy methods that call back to the remoting endpoint).

why we only consider IDictionary if there are IList, IEnumerable, ICollection.

We're considering supporting IDictionary as a collection _element_ type, not as a _collection_ type - in the same way that ConvertTo-Json already does.

That is, the proposal is to not only to support collections (enumerables, lists) _whose elements are_ [pscustomobject] instances, but also those _whose elements_ are IDictionary instances.

[pscustomobject] instances are primarily "property bags", and IDictionary instances (at least with string-typed keys) are conceptually related and, in practice, are sometimes used interchangeably - each types has its pros and cons, but, fundamentally, they are both a (possibly ordered) collection of key-value pairs.

To give a concrete example: With this proposal implemented, the following two commands will yield the same result:

# Collection of *custom objects*
[pscustomobject] @{ one = 1; two = 2 }, [pscustomobject] @{ one = 1; two = 2 } | ConvertTo-Csv

# Conceptually equivalent collection of *hash tables*
@{ one = 1; two = 2 }, @{ one = 1; two = 2 } | ConvertTo-Csv

That is, both commands would output:

"one","two"
"1","2"
"1","2"

That is, the proposal is not only to support collections (enumerables, lists) whose elements are [pscustomobject] instances, but also those whose elements are IDictionary instances.

:-) Thanks for my education. I see your point.

My concern was about follow scenario:

Get-Date | Export-Csv c:\tmp\q.txt -IncludeTypeInformation
$a=Import-Csv C:\tmp\q.txt
$a.psobject
$a

While Export/Import-CliXml is universal, Export/Import-Csv give great UX and better performance for special, table, case, and I'd want lost this. Sorry that I was not accurate enough. I mistakenly thought that IncludeTypeInformation was by default although it was in Windows PowerShell, in Core it was changed (by me?! :upside_down_face:)

I would like to work on this one if it's available.

@ivanshen apologies, I forgot to make a note here; I submitted #11029 to add this functionality already. 馃檪

Was this page helpful?
0 / 5 - 0 ratings