Powershell: Get-Content should release its file handle as early as possible

Created on 28 Jun 2019 · 13Comments · Source: PowerShell/PowerShell

Summary of the new feature/enhancement

Currently, a pipeline like this is impossible:

Get-Content $Path | ForEach-Object { $_ -f $a } | Set-Content $Path

This is due to Get-Content not releasing its file handle until the completion of its pipeline. Adding parentheses around the Get-Content call to force it to run to completion and collect output before proceeding does work but is a bit clunky, and there's no reason that this shouldn't also work:

Get-Content $Path -Raw | ForEach-Object { $_ -f $a } | Set-Content $Path

Currently this doesn't, because the file handle is still not released until the pipeline's completion, despite all the data being read at once. There are other caveats to using -Raw, but this would at least enable simple operations with less clunky syntax.

Proposed technical implementation details (optional)

Something that will help alleviate the issue a little bit is some of the changes to command disposal behaviour in #9900, which causes commands to be disposed as soon as their role in the pipeline is fully completed (after EndProcessing / end{} is completed).

Beyond that, the rest of the changes can be implemented in Get-Content itself for -Raw, and it may be worth looking into whether there are other avenues for alleviating the issue.

For example, in Set-Content we can have a fallback behaviour whereby if it can't get a write handle on the file, it instead writes to a temporary file, and during EndProcessing() or with a Dispose() step it then copies the file over the original.

Area-Cmdlets-Management Issue-Enhancement

Source

vexx32

Most helpful comment

Sounds more like a need for an optional switch on Set-Content to make it buffer updates and write the content to the file in the end block.

KirkMunro on 29 Jun 2019

👍3

All 13 comments

Pipeline executes BeginProcessing for every command in the pipeline, then ProcessRecord for every command in the pipeline and then EndProcessing for every command in the pipeline. It is not clear that we could fix here.

iSazonov on 28 Jun 2019

Aye, I suspect the main opportunity here would be doing this for the Get-Content -Raw case (where the whole file is read at once and submitted to the pipeline as a single object) and then for Set-Content to attempt to overwrite at the end of the pipeline.

Potentially, Set-Content could wait until the pipeline is disposed before it actually writes the data to the file, or we could simply defer the write to a temporary file by default before copying it over the original if there is a problem like this that prevents writing during the pipeline.

vexx32 on 28 Jun 2019

👍2

Pipeline executes BeginProcessing for every command in the pipeline, then ProcessRecord for every command in the pipeline and then EndProcessing for every command in the pipeline. It is not clear that we could fix here.

If every command in the pipeline always writes an object for ProcessRecord and never writes any objects in EndProcessing that is correct. Otherwise, EndProcessing will run after after the last input object is received for each command:

$one = { end { 0; Write-Host '1 end called' }}
$two = { process { $_ } end { Write-Host '2 end called' }}
$three = { process { } end { Write-Host '3 end called'; 0..2;}}
$four = { process { Write-Host $_ } end { Write-Host '4 end called' }}

& $one | & $two | & $three | & $four

1 end called
2 end called
3 end called
0
1
2
4 end called

In the original issue, the idea was that if you add Out-String before Set-Content then Get-Content should be disposed after Out-String has consumed all of it's output. That would be after EndProcessing runs for Get-Content, but before it runs for Out-String.

SeeminglyScience on 28 Jun 2019

👍1

This looks like a workaround but cannot be a general fix, as it causes the accumulation of information that for large files will lead to memory exhaustion. Even in other languages, the best practice is to write to an auxiliary file and then rename it to the original one that is more safe. I guess this can be achieved with Rename-Item in the pipeline end.

iSazonov on 29 Jun 2019

Sure, it can be done manually, but it would be nice if PS could handle that behind the scenes, in my opinion. :smile:

Also, that still can't be done in a single pipeline, I don't think, since currently Get-Content won't release the handle until the entire pipeline is complete regardless of the use case.:/

vexx32 on 29 Jun 2019

Sounds more like a need for an optional switch on Set-Content to make it buffer updates and write the content to the file in the end block.

KirkMunro on 29 Jun 2019

👍3

That would be OK for small to medium files, but if you're writing large files it will probably need to buffer to a temporary file, or we'll have huge memory usage. 😕

vexx32 on 29 Jun 2019

But that's what you're doing by using Get-Content -Raw, which is the use case you talked about in the first place. My point was that maybe you should be buffering downstream rather than upstream if you want to support one-liner content updates so that it works whether you use raw or not, and it's explicit rather than implicit.

KirkMunro on 29 Jun 2019

👍2

Even with downstream buffering the operation is dangerous. We should use intermediate/temporary file and rename it or use a transactional file system.

iSazonov on 29 Jun 2019

👍2

I agree. My point was just about doing "the work" downstream. Writing to a temp file and then moving it and/or doing transactional work would be the right way to go.

KirkMunro on 29 Jun 2019

👍2

Sounds more like a need for an optional switch on Set-Content to make it buffer updates and write the content to the file in the end block.

That's sooo close to -outvariable:

Get-Content $Path | ForEach-Object { $_ -f $a } | Set-Content $Path
Get-Content $Path | ForEach-Object { $_ -f $a } -ov B; Set-Content $Path $B

(nb. small risk of using get-content -wait and then having a temporary file grow indefinitely)

HumanEquivalentUnit on 29 Jul 2019

Some filesystems support transactions, but unfortunately PS's provider does not.

But yeah writing to a temp file would be a good solution if the file is still open when Set-Content tries to get a write handle.

vexx32 on 19 Aug 2019

but unfortunately PS's provider does not.

The code is in the repo but was commented at porting time (Windows only).

iSazonov on 20 Aug 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Parameter parsing/passing: unquoted tokens that look like named arguments with colon as the separator are broken in two when passed indirectly via $Args / @Args

mklement0 · 3Comments

Why is ,(1)[1] different from ,$null?

alx9r · 3Comments

Support for $MaximumFunctionCount and other limits should be removed

lzybkr · 3Comments

Feature Request: Add conditional colorization to Format-Table command

pcgeek86 · 3Comments

Update copyright notices

andschwa · 3Comments