Powershell: Add -UnifyProperties parameter to Select-Object

Created on 27 Oct 2020  路  23Comments  路  Source: PowerShell/PowerShell

Not sure if -UnifyProperties is the correct name for the parameter purpose described below.
Anyways, it would be nice to have an easy and _standard_ way to resolve the (common) issue were properties aren't displayed or taken by the next cmdlet because the first object in the pipeline doesn't contain all the properties of the following objects.

See e.g.: Not all properties displayed and I believe just a new StackOverflow question came in with the same cause: Trying to get all Teams with their owners, members and guest in a CSV using Powershell.

Proposed technical implementation details

Select-Object -UnifyProperties should behave similar to the purposed Union-Object function (v0.2.1) described in the Not all properties displayed answer. Meaning that it is expected to stall the pipeline to collect all properties names of all objects in the pipeline and use them as a property selection

Related

With the purpose in place, the properties of the issues below could be obtained by simply piping the object to | Select-Object -UnifyProperties:

  • format-table should at least warn when it doesn't display properties #7871
[pscustomobject]@{name='joe';address='home'}, [pscustomobject]@{phone='1'} | Select-Object -UnifyProperties
name address phone
---- ------- -----
joe  home
             1
  • Cannot display two or more variables successively when "Select-Object" used #12825
$Adapters, $IPAddress | Select-Object -UnifyProperties
$printers, $process | Select-Object -UnifyProperties
Issue-Enhancement

Most helpful comment

@iRon7:

My concern about -AllProperties and -EveryProperty is that it sounds like Select-Object * and doesn't express the aspect of _unifying_ (making _uniform_) the set of properties across all objects. In general, it is not uncommon for parameter names to use verbs (-Wait, -Skip, -Force, ...), if that is your concern.

And to be clear: I'm fine with adding this to Select-Object, and the collect-all-input-first is just a matter of documenting it properly.

If -Last really currently collects all input up front, it amounts to an inefficient implementation that should be changed in favor of a queue of the specified length, so that only the most recent N input objects are retained on an ongoing basis.

Similarly, -Unique only needs to retain the _unique_ input objects - though, _depending on the input_, that may be all of them.

To put it differently: cmdlets that for conceptual reasons must defer pipeline output until their end block is processed may need to _look_ at all input objects first, but they don't have to necessarily _collect_ them _all_, whereas the feature we're discussing here does.

All 23 comments

-Union seems to imply objects are joined / data is added or shared in some fashion, when in reality you're not really adding data to any objects, just padding out extra properties so the display format is clear.

I do agree it would be helpful for Select-Object to have this kind of functionality, but I don't think the parameter name communicates the intent very well. 馃

Agree, I am not native English, and I have no clue what should be a correct name, any suggestions? -UniteProperties ? AlignObjects ?
(I will change the title and content of this issue accordingly)

I'm not sure myself really. -StandardizeProperties would _maybe_ be ok but it feels like there should be a simpler term for it that I can't find at the moment. 馃槄

I have changed the title/content using -UniteProperties for now as -Union is definitely wrong and confusing.

It is all about join-object, is it not it?

Typically I see Join-Object mentioned where you have two discrete sets of objects that you want to combine together based on a shared property/properties. My understanding of this suggestion (correct me if I'm wrong, please, @iRon7 馃檪) is that we're mainly concerned with ensuring all the objects have the same property names, rather than merging related objects?

Correct, it is _not_ something like Join-Object, it is what @vexx32 describes: ensuring all the objects have the same property names

It might also be considered to give this purposed feature to a specific (bogus) wildcard property (e.g. a double asterisk: -Property **):

# Wishful thinking...
[pscustomobject]@{name='joe';address='home'}, [pscustomobject]@{phone='1'} | Select-Object **

name address phone
---- ------- -----
joe  home
             1

@iRon7 your example is exactly what any implementation of join-object does including my own. So I vote for a new native cmdlet join-object.

Does it? Most Join-Object implementations I've seen would result in a single object. @iRon7's example is still two objects.

Single object is a result of merge. Join just adds object to another one. That is my understanding.

That's not what's happening here, either? Neither object is being added to the other. Just ensuring that all the objects have the same set of property names.

I like the idea in general, but we need to distinguish between whether this is for _display formatting_ or really about _creating objects_ that have the union of all properties across all input objects.

If this is about display formatting, then Select-Object isn't the right cmdlet to extend - Format-Table would be, perhaps with an -AllProperties switch:

Here's a quick prototype:

function Format-TableAllProperties {
  [System.Collections.Generic.List[string]] $propNames = @()
  [System.Collections.Generic.HashSet[string]] $hashSet = @()
  $inputCollected = @($input)
  $inputCollected.ForEach({ 
    foreach ($name in $_.psobject.Properties.Name) {
      if ($hashSet.Add($name)) { $propNames.Add($name) }
    }
  })
  $inputCollected | Format-Table $propNames
}
PS> [pscustomobject] @{ one = 1; two = 2; three = 3 }, [pscustomobject] @{ one = 10; three = 30; four = 4 } | 
      Format-TableAllProperties

one two three four
--- --- ----- ----
  1   2     3 
 10        30 4

@dfinke had to put Update-FirstObjectProperties into ImportExcel for exactly this reason. There must be a stack of places which need it.

Whether adding it to Select-object is better than having a self contained command can be argued both ways.
What that function could be implemented in a proxy command wrapping Select-Object but I've grown to expect it to be in its own command, so that's my bias, but I'm also thinking Select-Objectis one of those widely used cmdlets that people might prefer to leave alone.

So it sounds like _both_ a for-display and an extend-actual-objects solution may be desirable.

(I suspect you're aware of it , @jhoneill, and the function name suggests it, but just to make it explicit: the linked function uses only the _first_ input object as the source for the set of properties whose presence should be ensured on all subsequent ones).

@iRon7, can we get clarity on which one you were looking for - the examples in the OP suggest the former - and perhaps create a _separate_ issue for the respective other, or at least clearly distinguish these use cases.

@mklement0,

for _display formatting_ or really about _creating objects_

It is about creating objects (at least from my view), I had more something in mind like:

function UniteProperties {                               # Select-Object -UniteProperties
  [System.Collections.Generic.List[string]] $propNames = @()
  [System.Collections.Generic.HashSet[string]] $hashSet = @()
  $inputCollected = @($input)
  $inputCollected.ForEach({ 
    foreach ($name in $_.psobject.Properties.Name) {
      if ($hashSet.Add($name)) { $propNames.Add($name) }
    }
  })
  $inputCollected | Select-Object $propNames
}

Current situation:

[pscustomobject] @{ one = 1; two = 2; three = 3 },
[pscustomobject] @{ one = 10; three = 30; four = 4 } |
    ConvertTo-Csv

"one","two","three"
"1","2","3"
"10",,"30"

Future situation

[pscustomobject] @{ one = 1; two = 2; three = 3 },
[pscustomobject] @{ one = 10; three = 30; four = 4 } |
    UniteProperties | ConvertTo-Csv              # Select-Object -UniteProperties | ConvertTo-Csv

"one","two","three","four"
"1","2","3",
"10",,"30","4"

(I suspect you're aware of it , @jhoneill, and the function name suggests it, but just to make it explicit: the linked function uses only the _first_ input object as the source for the set of properties whose presence should be ensured on all subsequent ones).

Yes, it should be on all subsequent objects, because if you correct only the first object and then ... | Sort-Object | ..., it might lose some properties again (see the description in the Not all properties displayed answer).

Thanks for clarifying, @iRon7.

As for the parameter name, maybe -UnifyProperties is better?

As for a potential separate cmdlet instead: I struggle to even think of a good name, because there is no fitting approved verb that I see (there's some conceptual similarity to Add-Member, but adding it to that is worse, I think).

I think adding this to Select-Object is a good fit in terms of user expectations, even though the collect-all-input-up-front behavior is a departure, but addressing that through the help should suffice.

However, I don't know if there are implementation challenges and we would need to decide whether want to allow combining the new switch with any of the existing sub-selection functionality (-First, -Skip, -Unique ...)

(I suspect you're aware of it , @jhoneill, and the function name suggests it, but just to make it explicit: the linked function uses only the _first_ input object as the source for the set of properties whose presence should be ensured on all subsequent ones).

There are two situations, one is where the first object determines what will be displayed / exported and the other objects can safely be left with a subset of the fields.
Someone wedded to strict mode might hit problems with properties missing, so I wouldn't rule out ensuring presence, but

  • adding many properties is slow compared to checking for their presence.
  • A piece of code which assumes properties are present is also likely to assume that one is a string, one is an integer, on is a Boolean etc, so the properties must be typed correctly
  • If the type is bool or in and not the nullable version one can't expect the receiving software to accept null.
  • Giving fields a value (zero, false, empty string) changes the data from "Not known" to "known and zero/false/empty" which can have unpredictable consequences.
    Like I say I wouldn't argue there's no need, but this a dangerous thing to do. When you want to ensure a _known_ set of properties are all present on all objects, you just run through Select -property a,b,.y,z .
    The use case is (surely?) that the properties can't be known in advance, but you want to see all of them where they exist. Not that you want to fill in missing columns in each row with fictional values.

It is about _creating objects_ (at least from my view), I had more something in mind like:

function UniteProperties {                               # Select-Object -UniteProperties
  [System.Collections.Generic.List[string]] $propNames = @()
  [System.Collections.Generic.HashSet[string]] $hashSet = @()
  $inputCollected = @($input)
  $inputCollected.ForEach({ 
    foreach ($name in $_.psobject.Properties.Name) {
      if ($hashSet.Add($name)) { $propNames.Add($name) }
    }
  })
  $inputCollected | Select-Object $propNames
}

Or more simply, and more powershell-styled.

function UniteProperties {
    $hash =@{}
    $i = @($input)
    foreach ($obj in $i) {foreach ($p in $obj.psobject.properties) {$hash[$p.name] = $true} }
    $i | Select-Object ($hash.keys | ForEach-Object tostring)
}

And test with

$y [pscustomobject] @{ one = 1; two = 2; three = 3 },
[pscustomobject] @{ one = 10; three = 30; four = 4 } |
    UniteProperties 

$y | convertto-csv

$y[1].four.gettype()

gm -in $y[0] 

$y[0].four.gettype()

Have a look at what the last three lines do. That's part of what I was trying to explain to @mklement0 and probably didn't make sense. That maybe what you want ...

@jhoneill,
I guess you meant: $y = [pscustomobject] ...
In that case, all the examples behave as I would expect:

$y | convertto-csv
All objects are converted to a csv format. Where every property (independent of its type) is wrapped in double quotes, except for $Null with is left empty:

"one","two","three","four"
"1","2","3",
"10",,"30","4"

$y[1].four.gettype()
The property of the second object ([1]) is set to an integer (four = 4):

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Int32                                    System.ValueType

gm -in $y[0]
I am not familiar with this syntax, but assume it is similar to $y[0] | gm.
The properties One, Two, and Three are set to an integer (as above) and the property four is null (where Get-Member shows a general Object for the MemberType which unrelated to this purpose)

Name        MemberType   Definition
----        ----------   ----------
Equals      Method       bool Equals(System.Object obj)
GetHashCode Method       int GetHashCode()
GetType     Method       type GetType()
ToString    Method       string ToString()
four        NoteProperty object four=null
one         NoteProperty int one=1
three       NoteProperty int three=3
two         NoteProperty int two=2

$y[0].four.gettype()
Is resulting in an error as the value is not supplied ($Null), similar to: $Null.GetType()

InvalidOperation: You cannot call a method on a null-valued expression.

The result will be exactly the same if you manually define the properties for Select-Object:

$y = [pscustomobject] @{ one = 1; two = 2; three = 3 },
[pscustomobject] @{ one = 10; three = 30; four = 4 } |
    UniteProperties | Select-Object one, two, three, four

And still similar to just:

$z = [pscustomobject] @{ one = 1; two = 2; three = 3 },
[pscustomobject] @{ one = 10; three = 30; four = 4 }

Where the _expected_ difference is:

$y[0] | gm                                                        | $z[0] | gm
                                                                  | 
   TypeName: Selected.System.Management.Automation.PSCustomObject |    TypeName: System.Management.Automation.PSCustomObject
                                                                  | 
Name        MemberType   Definition                               | Name        MemberType   Definition
----        ----------   ----------                               | ----        ----------   ----------
Equals      Method       bool Equals(System.Object obj)           | Equals      Method       bool Equals(System.Object obj)
GetHashCode Method       int GetHashCode()                        | GetHashCode Method       int GetHashCode()
GetType     Method       type GetType()                           | GetType     Method       type GetType()
ToString    Method       string ToString()                        | ToString    Method       string ToString()
four        NoteProperty object four=null                         | one         NoteProperty int one=1
one         NoteProperty int one=1                                | three       NoteProperty int three=3
three       NoteProperty int three=3                              | two         NoteProperty int two=2
two         NoteProperty int two=2                                | 

In the currect situation ($z[0]), the property Four is missing, where using ... | Select-Object -UnifyProperties ($y[0]), the property four is set to $Null.

By default the result will be the same, both $z[0].four and $y[0].four eventually result in a $Null.
There will be a difference when setting the Set-StrictMode -Version latest, where $y[0].four will convinently return a $Null and $z[0].four will return an error:

PropertyNotFoundException: The property 'four' cannot be found on this object. Verify that the property exists.

This is yet another reason to apply the unification to all object properties and not just the first one.

$y = [pscustomobject] @{ one = 1; two = 2; three = 3 },
     [pscustomobject] @{ one = 4; two = 5; three = 6 },
     [pscustomobject] @{ one = 10; three = 30; four = 4 } |
         UniteProperties | ConvertTo-Csv

(It will be inconsistent if $y[0].four and $y[1].four behave differently in certain StrictModes)

@mklement0,

As for the parameter name, maybe -UnifyProperties is better?

I am fine with this, although I did have some more thoughts about this: Rather than choosing a _verb_ for a parameters, rely on the the cmdlet's verb, which results in something like this: ... | Select -EveryProperty or simply ... | Select -AllProperties

Which also shows why I think it should be a feature of Select-Object. Besides, Select-Object has already a similar output if you define the properties yourself, like ... | Select-Object one, two, three, four (this is also supported by the fact that the prototype already uses the Select-Object cmdlet) , it just needs to automatically figure out what properties are used.

even though the collect-all-input-up-front behavior is a departure

Some of the existing parameters of the Select-Object cmdlet already doing this like -Last <int> (e.g. -Last 1000 for less objects) and (unexpected, see: #11221) -Unique

@iRon7:

My concern about -AllProperties and -EveryProperty is that it sounds like Select-Object * and doesn't express the aspect of _unifying_ (making _uniform_) the set of properties across all objects. In general, it is not uncommon for parameter names to use verbs (-Wait, -Skip, -Force, ...), if that is your concern.

And to be clear: I'm fine with adding this to Select-Object, and the collect-all-input-first is just a matter of documenting it properly.

If -Last really currently collects all input up front, it amounts to an inefficient implementation that should be changed in favor of a queue of the specified length, so that only the most recent N input objects are retained on an ongoing basis.

Similarly, -Unique only needs to retain the _unique_ input objects - though, _depending on the input_, that may be all of them.

To put it differently: cmdlets that for conceptual reasons must defer pipeline output until their end block is processed may need to _look_ at all input objects first, but they don't have to necessarily _collect_ them _all_, whereas the feature we're discussing here does.

Was this page helpful?
0 / 5 - 0 ratings