PowerShell data/hashtable format conversion cmdlets

Created on 9 Dec 2019  路  20Comments  路  Source: PowerShell/PowerShell

Summary of the new feature/enhancement

Today, it's hard to manipulate PowerShell PSD1/hashtable syntax. Import-PowerShellDataFile provides some options, but it only goes one way, requires a file, and doesn't work with all files ending in psd1 (module manifests allowing "Restricted Language Mode", so having things that ordinary data syntax does not allow).

To make the PowerShell data syntax a first class citizen within PowerShell, I think we would need:

  • ConvertTo- and ConvertFrom- cmdlets
  • Efficient and consistent [de]serialisation for hashtables, pscustomobjects and arbitrary objects in the usual way
  • Support for the extended restricted language mode syntaxes enabled by switches
  • An API available to do this without a cmdlet. One place this would help is with the ModuleSpecification ToString() implementation, which today hand writes psd1 hashtable format.

Proposed technical implementation details (optional)

I have a small prototype here, but I propose the following:

  • ConvertTo-Psd and ConvertFrom-Psd being the cmdlets
  • Following the standard JSON cmdlet conventions, with Depth and Compress or similar parameters
  • Having a switch for enabling restricted language mode parsing (which will require the full PowerShell parser), perhaps -WithFullParsing or similar Now that I understand this better, I don't think it fits well with this proposal; restricted language mode parsing is specific to module manifests and doesn't reflect broader usage of PSD format
Issue-Enhancement WG-Engine

Most helpful comment

Partially in response to @endowdly but mostly because it's not in this thread yet, the big issue with PSD is that parsing it is coupled to PowerShell.

This means:

  • There's no lightweight parser for it anywhere; you need full PowerShell to parse it
  • Writing one would mean a competing parser that may disagree with the existing parser (but there's no standard to say who is right)
  • The PowerShell parser (in particular the tokeniser) has undergone changes over the years, so there are already several competing parsers, some of which are mutually exclusive (PS 3 vs 4 vs 5.1)

However, in truth this issue isn't to discuss the relative merits of PSD as a format. Quite the opposite; it should be as easy as possible to manipulate PSD so that you can freely convert from or to it as you prefer. If you like PSD, this proposal is good because it becomes easier to convert to PSD. If you prefer JSON, this proposal is good because it's now easier to convert away from PSD to JSON...

All 20 comments

Thanks @rjmholt for opening this. I feel this is a long over due feature to make PSD files a first class citizen in PowerShell.

To have full coverage there should also be a Export-PowerShellDataFile to go along with Import. The command names you have don't match the existing noun PowerShellDataFile. Personally, I have never liked PowerShellDataFile and preferred something like PSData, since most commands use PS abbreviation. What ever is chosen as the noun should be consistent between all commands. To change the noun on Import-PowerShellDataFile an alias can be used for backward compatibility.

@jaykul has done some work in this area with his Configuration module.

I am unable to reach your prototype. Is it a private repository?

Is it a private repository?

Oh, might be. I might copy it into a gist

Would these convert commands use the existing PS type data for serialization configuration?

Update-TypeData

SerializationMethod
TargetTypeForDeserialization
SerializationDepth
InheritPropertySerializationSet
StringSerializationSource
PropertySerializationSet

Would these convert commands use the existing PS type data for serialization configuration?

We should decide questions like that in this issue. It just depends on what people need and want. My small prototype is just very simple, and so far I don't think it parses, just serialises.

My ideal would be to have a subset of functionality that works without a PowerShell engine so it could be embedded as its own library if needed. Bear in mind that the more features and more PowerShell integration we do, the less efficient the serialisation gets. That's probably not a concern in most cases, especially since the PSD format isn't going to be used for protocols or very large files, but just want to mention it. In such a case, we could then override or augment the behaviour in PowerShell.

Bear in mind that the more features and more PowerShell integration we do, the less efficient the serialisation gets.

I'd discuss switching globally to JSON format for all config files (psd1, format, type) with fallback to old one for backward compatibility.
This strategy promises many benefits.

I'd discuss switching globally to JSON format for all config files (psd1, format, type) with fallback to old one for backward compatibility

I would personally be in favour of that too, although that's a long journey to embark upon

Why long? I already ported ConvertTo-Json to new System.Text.Json API #11198. This works great. Then I started to do some experiments with porting PSConfiguration (powershell.config.json) to the new API. I still have no conclusion how make this in better way but prototype works.
In #10898 Dongbo changed internals for types.ps1xml. It made a performance better. If we switched to JSON I guess we could get great performance in startup scenario and runspace creation scenario. The same is true for formats.ps1xml.
Notice, that switching to JSON formats makes UX better. Many modern applications use files in the format. And many tools. And with good compatibility and interability.
I think we should not put it off, we need to work on it now. Moreover, Core team is still working on the development of System.Text.Json API in 5.0 milestone and we could request feature we need.

Why long?

There's no issue in the implementation, but in compatibility. We'll have to support XML/PSD1 for the various config files indefinitely, so the benefits on this path must outweigh the cost of having two large codepaths to maintain.

I agree with all your points, but just want to urge caution there, since we have finite resources for code maintenance and there aren't compelling reasons for most users (or module authors) to migrate off of XML/PSD1.

Core team is still working on the development of System.Text.Json API in 5.0 milestone and we could request feature we need

This would be good for config files. For the JSON cmdlets, it's likely less useful since I believe their emphasis is on UTF-8 specifically.

We'll have to support XML/PSD1

Yes, and we can get this with "fallback" (without breaking anything), in the same time resolving in new code (if you look new ConvertTo-Json you see that the code just got easier) most of issues we have with old formats.
Migrating to new formats will be as easy as "save in new format".

their emphasis is on UTF-8 specifically.

They started with most important scenario - Web. But there are a lot of application (like PowerShell) based in classic strings. I am sure System.Text.Json will support string type in full. Already today it works well with strings.

@iSazonov most module authors will support Windows PowerShell for a long time or indefinitely so unless those versions are updated to support JSON we will still need to use PSD files. It's the same issue with markdown help, module authors will still need PSMAML for Windows PowerShell unless the new help system is a module on the gallery supporting Windows PowerShell.

My proposal doesn't contain any breaking change.

@iSazonov I didn't say that it was, but what I am saying is that migrating to JSON does not benefit module authors that need to support Windows PowerShell.

@ThomasNieto It was announced 4 year ago - all new features will be added only to PowerShell Core. I think most of module authors want to see their module working on Unix. This implies a switch to PowerShell Core. I guess MSFT works hard today on infrastructure so that we see a huge transition from Windows PowerShell version to new version. Core MSFT team already announced that many MSFT product group will migrate to .Net Core 3.1 in next year. I guess community project too. It will not be a surprise to me if we see an increase in the number of uses of PowerShell Core on Windows 10 or 100 times.

15 year ago XML was a dominant format. Today community is moving to JSON. I would say that for modern developers XML is becoming annoying and moving to JSON is right and timely for PowerShell.

@iSazonov I expect the vast majority of community modules will continue to support Windows PowerShell even after PS7 is released, at least until MS deprecates Windows PowerShell. I doubt module authors will limit PS version support to take advantage of JSON module manifests when it is released.

The authors of the modules will follow the needs of users and choose the appropriate strategy. I guess that there are not many modules today that are __forced__ to work on 3.0 or 4.0 version but there are many more modules that __can__ because we keep backward compatibility.
This worked earlier and will continue to work - we do not force users to throw everything away and start from scratch, but we add new features and delete very old ones. This gives them freedom of choice when they upgrade to new versions.

Just to link to an earlier, similar feature request: #2875

Just to comment on the JSON/PSD debate... PSD is inherently a more 'capable' data format. It supports more types, comments*, (imo) has more clear boundaries between sections and data, and can be/is easily restricted with the AST.

* I understand JSONC exists, but it isn't consistently enforced like PSD comments are.

Personally, PSD is more readable, and its syntax is already supported well in existing PowerShell tooling.

I think instead of trying to hamfist JSON into the supported config format, we support the first-class PowerShell data file that exists? We can work on improving the interop between JSON and PSD with new or existing cmdlets.

Partially in response to @endowdly but mostly because it's not in this thread yet, the big issue with PSD is that parsing it is coupled to PowerShell.

This means:

  • There's no lightweight parser for it anywhere; you need full PowerShell to parse it
  • Writing one would mean a competing parser that may disagree with the existing parser (but there's no standard to say who is right)
  • The PowerShell parser (in particular the tokeniser) has undergone changes over the years, so there are already several competing parsers, some of which are mutually exclusive (PS 3 vs 4 vs 5.1)

However, in truth this issue isn't to discuss the relative merits of PSD as a format. Quite the opposite; it should be as easy as possible to manipulate PSD so that you can freely convert from or to it as you prefer. If you like PSD, this proposal is good because it becomes easier to convert to PSD. If you prefer JSON, this proposal is good because it's now easier to convert away from PSD to JSON...

I'd advocate against moving to JSON - .psd1 files are awesome, I use them for configuration all the time, they have the best syntax out of all configuration formats I encountered. Writing standard-compliant JSON by hand is tedious, without support for trailing commas, comments, required quotes around keys.

Maybe it could be possible to standardize a version of .psd1 where only basic types are allowed that could be parsed without a full PowerShell parser?

Was this page helpful?
0 / 5 - 0 ratings