Powershell: Get-Content should release its file handle as early as possible

Created on 28 Jun 2019  路  13Comments  路  Source: PowerShell/PowerShell

Summary of the new feature/enhancement

Currently, a pipeline like this is impossible:

Get-Content $Path | ForEach-Object { $_ -f $a } | Set-Content $Path

This is due to Get-Content not releasing its file handle until the completion of its pipeline. Adding parentheses around the Get-Content call to force it to run to completion and collect output before proceeding does work but is a bit clunky, and there's no reason that this shouldn't also work:

Get-Content $Path -Raw | ForEach-Object { $_ -f $a } | Set-Content $Path

Currently this doesn't, because the file handle is still not released until the pipeline's completion, despite all the data being read at once. There are other caveats to using -Raw, but this would at least enable simple operations with less clunky syntax.

Proposed technical implementation details (optional)

Something that will help alleviate the issue a little bit is some of the changes to command disposal behaviour in #9900, which causes commands to be disposed as soon as their role in the pipeline is fully completed (after EndProcessing / end{} is completed).

Beyond that, the rest of the changes can be implemented in Get-Content itself for -Raw, and it may be worth looking into whether there are other avenues for alleviating the issue.

For example, in Set-Content we can have a fallback behaviour whereby if it can't get a write handle on the file, it instead writes to a temporary file, and during EndProcessing() or with a Dispose() step it then copies the file over the original.

Area-Cmdlets-Management Issue-Enhancement

Most helpful comment

Sounds more like a need for an optional switch on Set-Content to make it buffer updates and write the content to the file in the end block.

All 13 comments

Pipeline executes BeginProcessing for every command in the pipeline, then ProcessRecord for every command in the pipeline and then EndProcessing for every command in the pipeline. It is not clear that we could fix here.

Aye, I suspect the main opportunity here would be doing this for the Get-Content -Raw case (where the whole file is read at once and submitted to the pipeline as a single object) and then for Set-Content to attempt to overwrite at the end of the pipeline.

Potentially, Set-Content could wait until the pipeline is disposed before it actually writes the data to the file, or we could simply defer the write to a temporary file by default before copying it over the original if there is a problem like this that prevents writing during the pipeline.

Pipeline executes BeginProcessing for every command in the pipeline, then ProcessRecord for every command in the pipeline and then EndProcessing for every command in the pipeline. It is not clear that we could fix here.

If every command in the pipeline always writes an object for ProcessRecord and never writes any objects in EndProcessing that is correct. Otherwise, EndProcessing will run after after the last input object is received for each command:

$one = { end { 0; Write-Host '1 end called' }}
$two = { process { $_ } end { Write-Host '2 end called' }}
$three = { process { } end { Write-Host '3 end called'; 0..2;}}
$four = { process { Write-Host $_ } end { Write-Host '4 end called' }}

& $one | & $two | & $three | & $four

1 end called
2 end called
3 end called
0
1
2
4 end called

In the original issue, the idea was that if you add Out-String before Set-Content then Get-Content should be disposed after Out-String has consumed all of it's output. That would be after EndProcessing runs for Get-Content, but before it runs for Out-String.

This looks like a workaround but cannot be a general fix, as it causes the accumulation of information that for large files will lead to memory exhaustion. Even in other languages, the best practice is to write to an auxiliary file and then rename it to the original one that is more safe. I guess this can be achieved with Rename-Item in the pipeline end.

Sure, it can be done manually, but it would be nice if PS could handle that behind the scenes, in my opinion. :smile:

Also, that still can't be done in a single pipeline, I don't think, since currently Get-Content won't release the handle until the entire pipeline is complete regardless of the use case.:/

Sounds more like a need for an optional switch on Set-Content to make it buffer updates and write the content to the file in the end block.

That would be OK for small to medium files, but if you're writing large files it will probably need to buffer to a temporary file, or we'll have huge memory usage. 馃槙

But that's what you're doing by using Get-Content -Raw, which is the use case you talked about in the first place. My point was that maybe you should be buffering downstream rather than upstream if you want to support one-liner content updates so that it works whether you use raw or not, and it's explicit rather than implicit.

Even with downstream buffering the operation is dangerous. We should use intermediate/temporary file and rename it or use a transactional file system.

I agree. My point was just about doing "the work" downstream. Writing to a temp file and then moving it and/or doing transactional work would be the right way to go.

Sounds more like a need for an optional switch on Set-Content to make it buffer updates and write the content to the file in the end block.

That's sooo close to -outvariable:

Get-Content $Path | ForEach-Object { $_ -f $a } | Set-Content $Path
Get-Content $Path | ForEach-Object { $_ -f $a } -ov B; Set-Content $Path $B

(nb. small risk of using get-content -wait and then having a temporary file grow indefinitely)

Some filesystems support transactions, but unfortunately PS's provider does not.

But yeah writing to a temp file would be a good solution if the file is still open when Set-Content tries to get a write handle.

but unfortunately PS's provider does not.

The code is in the repo but was commented at porting time (Windows only).

Was this page helpful?
0 / 5 - 0 ratings