Currently, a pipeline like this is impossible:
Get-Content $Path | ForEach-Object { $_ -f $a } | Set-Content $Path
This is due to Get-Content not releasing its file handle until the completion of its pipeline. Adding parentheses around the Get-Content
call to force it to run to completion and collect output before proceeding does work but is a bit clunky, and there's no reason that this shouldn't also work:
Get-Content $Path -Raw | ForEach-Object { $_ -f $a } | Set-Content $Path
Currently this doesn't, because the file handle is still not released until the pipeline's completion, despite all the data being read at once. There are other caveats to using -Raw
, but this would at least enable simple operations with less clunky syntax.
Something that will help alleviate the issue a little bit is some of the changes to command disposal behaviour in #9900, which causes commands to be disposed as soon as their role in the pipeline is fully completed (after EndProcessing
/ end{}
is completed).
Beyond that, the rest of the changes can be implemented in Get-Content itself for -Raw, and it may be worth looking into whether there are other avenues for alleviating the issue.
For example, in Set-Content we can have a fallback behaviour whereby if it can't get a write handle on the file, it instead writes to a temporary file, and during EndProcessing() or with a Dispose() step it then copies the file over the original.
Pipeline executes BeginProcessing for every command in the pipeline, then ProcessRecord for every command in the pipeline and then EndProcessing for every command in the pipeline. It is not clear that we could fix here.
Aye, I suspect the main opportunity here would be doing this for the Get-Content -Raw
case (where the whole file is read at once and submitted to the pipeline as a single object) and then for Set-Content
to attempt to overwrite at the end of the pipeline.
Potentially, Set-Content
could wait until the pipeline is disposed before it actually writes the data to the file, or we could simply defer the write to a temporary file by default before copying it over the original if there is a problem like this that prevents writing during the pipeline.
Pipeline executes BeginProcessing for every command in the pipeline, then ProcessRecord for every command in the pipeline and then EndProcessing for every command in the pipeline. It is not clear that we could fix here.
If every command in the pipeline always writes an object for ProcessRecord
and never writes any objects in EndProcessing
that is correct. Otherwise, EndProcessing
will run after after the last input object is received for each command:
$one = { end { 0; Write-Host '1 end called' }}
$two = { process { $_ } end { Write-Host '2 end called' }}
$three = { process { } end { Write-Host '3 end called'; 0..2;}}
$four = { process { Write-Host $_ } end { Write-Host '4 end called' }}
& $one | & $two | & $three | & $four
1 end called
2 end called
3 end called
0
1
2
4 end called
In the original issue, the idea was that if you add Out-String
before Set-Content
then Get-Content
should be disposed after Out-String
has consumed all of it's output. That would be after EndProcessing
runs for Get-Content
, but before it runs for Out-String
.
This looks like a workaround but cannot be a general fix, as it causes the accumulation of information that for large files will lead to memory exhaustion. Even in other languages, the best practice is to write to an auxiliary file and then rename it to the original one that is more safe. I guess this can be achieved with Rename-Item in the pipeline end.
Sure, it can be done manually, but it would be nice if PS could handle that behind the scenes, in my opinion. :smile:
Also, that still can't be done in a single pipeline, I don't think, since currently Get-Content won't release the handle until the entire pipeline is complete regardless of the use case.:/
Sounds more like a need for an optional switch on Set-Content
to make it buffer updates and write the content to the file in the end block.
That would be OK for small to medium files, but if you're writing large files it will probably need to buffer to a temporary file, or we'll have huge memory usage. 馃槙
But that's what you're doing by using Get-Content -Raw
, which is the use case you talked about in the first place. My point was that maybe you should be buffering downstream rather than upstream if you want to support one-liner content updates so that it works whether you use raw or not, and it's explicit rather than implicit.
Even with downstream buffering the operation is dangerous. We should use intermediate/temporary file and rename it or use a transactional file system.
I agree. My point was just about doing "the work" downstream. Writing to a temp file and then moving it and/or doing transactional work would be the right way to go.
Sounds more like a need for an optional switch on Set-Content to make it buffer updates and write the content to the file in the end block.
That's sooo close to -outvariable
:
Get-Content $Path | ForEach-Object { $_ -f $a } | Set-Content $Path
Get-Content $Path | ForEach-Object { $_ -f $a } -ov B; Set-Content $Path $B
(nb. small risk of using get-content -wait
and then having a temporary file grow indefinitely)
Some filesystems support transactions, but unfortunately PS's provider does not.
But yeah writing to a temp file would be a good solution if the file is still open when Set-Content tries to get a write handle.
but unfortunately PS's provider does not.
The code is in the repo but was commented at porting time (Windows only).
Most helpful comment
Sounds more like a need for an optional switch on
Set-Content
to make it buffer updates and write the content to the file in the end block.