I'm calling an API repeatedly (500,000 times) and I noticed it runs 4-5x faster on PowerShell 5. The code below is not the actual API, but a minimal example that demos the issue.
I'm running this on 2 different "machines":
The results are pretty consistent. I noticed PowerShell 6 or 7 for Mac was really slow -- that's why I started benchmarking.
You can adjust $max on your machine to get a more reasonable run duration for comparison.
PowerShell Version|Duration (sec.)
:-:|:-:
5|1
6|4
7|5
<#
Ver. Sec
---- ---
5 1
6 4
7 5
#>
$max = 10
$start = Get-Date -Format "HH:mm:ss"
foreach ($i in 1..$max) {
Invoke-RestMethod "https://postman-echo.com/get?i=$i"
}
$end = Get-Date -Format "HH:mm:ss"
write-host "started at" $start
write-host " ended at" $end
Unless somebody is aware of why this might be, I'm willing to take a look.
PowerShell 6/7 seems to use the HttpClient class which takes about 5-7 seconds of my system under PowerShell 7.0.1 or Windows PowerShell 5:
Measure-Command -Expression `
{
Add-Type -Assembly System.Net.Http
$Max = 100
$Size = 0
ForEach ($I in 1..$Max)
{
$Url = "http://postman-echo.com/get?i=$i"
$Client = [System.Net.Http.Httpclient]::new()
$Request = [System.Net.Http.HttpRequestMessage]::new('Get',$Url)
$CancelToken = [System.Threading.CancellationToken]::new($false)
$Option = [System.Net.Http.HttpCompletionOption]::ResponseContentRead
$Response = $Client.SendAsync($Request, $Option, $CancelToken).GetAwaiter().GetResult();
$Size += $Response.Content.ReadAsByteArrayAsync().Result.Length
}
Write-Host 'Total Size' $Size
}
Windows PowerShell uses System.Net.WebRequest; this request completes in about 2-3 seconds on my system under Windows PowerShell:
Measure-Command -Expression `
{
$Max = 100
$Size = 0
ForEach ($I in 1..$Max) {
$Url = "http://postman-echo.com/get?i=$i"
$Request = [System.Net.WebRequest]::Create($Url)
$Response = $request.BeginGetResponse($null, $null)
$Response.AsyncWaitHandle.WaitOne()
$Result= $Request.EndGetResponse($Response)
$Reader = [System.IO.BinaryReader]::new($Result.GetResponseStream());
$Size += $Reader.ReadBytes($Result.ContentLength).Length
}
Write-Host 'Total Size' $Size
}
Interestingly, this latter code also runs much 2x-3x slower under PowerShell 6/7 compared to Windows PowerShell. This kind of surprised me.
So the question may actually be three-fold: 1) Why the switch to HttpClient? This is not an obvious problem at first glance... just a curiosity. 2) Why is performance so much different between HttpClient and WebRequest under Windows PowerShell, and 3) Why the slowdown in WebRequest from Windows PowerShell to PowerShell Core (may be an underlying DotNet Core question).
WebClient was simply nonexistent in .NET Core during pretty much all of the development of 6.x, and we needed web cmdlets to work with. Even now, it's only really been brought back in for compatibility and isn't recommended for new code as I understand it.
Nor sure about much beyond that, though. 馃檪
PowerShell 5 is using a single persistent TCP connection, PowerShell 7 is opening a new connection for each HTTP request.
If the -DisableKeepAlive parameter is used with Invoke-RestMethod there is no difference in performance.
@NoMoreFood If you set $Request.KeepAlive = $false in your example the performance is also the same.
PowerShell 7 actually sends the Connection: Keep-Alive header, but than opens a new connection for every request.
According to https://github.com/dotnet/runtime/issues/31267, .NET Core does not currently support TCP KeepAlive in System.Net.HttpClient.
There is an open issue to fix this: https://github.com/dotnet/runtime/issues/28721 and underlying issue https://github.com/dotnet/runtime/issues/1793, which are tracking the 5.0 milestone. Since PowerShell 7.1 will be targeting .NET 5.0, I expect this issue will be be resolved by the end of 2020.
That's really good info! Thank you.
I did a little more digging and here's what I found. Node.js (using node-fetch) and Python (2/3, using requests) are also very slow, but JS in Chrome is as fast as PowerShell 5. I also used Wireshark, and noticed that the slow ones seem to negotiate HTTPS each time, but the fast ones do it only once. If you try it using HTTP instead of HTTPS, it's much faster. Unfortunately, the code I have to run requires HTTPS. hth.
// JavaScript
// To run this in a browser, first go to https://postman-echo.com/get to avoid CORS issues.
const fetch = require("node-fetch"); // Only for node.js. Remove for Chrome.
(async function () {
const MAX = 10;
const start = new Date();
for (var i = 0; i < MAX; i++) {
const r = await fetch(`https://postman-echo.com/get?i=${i}`);
const d = await r.json();
}
const end = new Date();
console.log(start);
console.log(end);
console.log(end - start);
})();
# Python 3 (without connection pooling)
import requests
from datetime import datetime
MAX = 10
start = datetime.now()
for i in range(MAX):
requests.get(f'https://postman-echo.com/get?i={i}').json()
end = datetime.now()
print("started at", start)
print(" ended at", end)
print(end - start)
It appears that TCP KeepAliveHTTP persistent connections are supported in certain circumstances. If a singular HttpClient is declared and reused for each request, TCP connection reuse occurs and performance is much better, especially when using HTTPS:
$Client = [System.Net.Http.HttpClient]::new()
$Client.BaseAddress = "https://postman-echo.com"
(Measure-Command {
(1..100).ForEach{
$Request = [System.Net.Http.HttpRequestMessage]::new('Get', "/get?i=$_")
$Response = $Client.SendAsync($Request).GetAwaiter().GetResult();
Write-Output $Response.Content.ReadAsStringAsync().Result
}
}).TotalSeconds
Edited to correct conflation between TCP KeepAlive and HTTP KeepAlive (persistent connections). (Thanks @scalablecory)
The problem here is that the HttpClient needs to be reused between calls of Invoke-RestMethod. This enables connection pooling.
Note that this is _not_ TCP Keepalive (a technology to detect broken idle connections), and the two issues linked above will not help to resolve this.
So the the difference in perfomance is due the fact Windows PowerShell 5 pools connections between seperate instances of [System.Net.WebRequest], and PowerShell 7 does not.
As far as I can see, the only way to ensure use of a persistent connection (in PowerShell 7) is to use a raw HttpClient instead of the cmdlets Invoke-RestMethod/Invoke-WebRequest. Perhaps we could open a new issue to discuss whether to implement this functionality.
I guess I was only partially correct above. Python requests has a Session() object which allows for connection pooling. This only takes about 1 second as opposed to about 4 seconds for the example in my earlier comment (which doesn't use connection pooling).
# Python 3 (with connection pooling)
import requests
from datetime import datetime
session = requests.Session()
MAX = 10
start = datetime.now()
for i in range(MAX):
session.get(f'https://postman-echo.com/get?i={i}').json()
end = datetime.now()
print('started at', start)
print(' ended at', end)
print(end - start)
HttpClient creates SocketsHttpHandler where connection pool is implemented.
If we will share the HttpClient or SocketsHttpHandler per process how do we resolve config conflicts between runspaces?
It seems we can not utilize a global pool.
Perhaps we could add new parameter in web cmdlets to get shared SocketsHttpHandler.
We could enhance WebRequestSession with SocketsHttpHandler.
We could utilize a thread-static member to work with that. That'll resolve a majority of cases, and we can have backup code to generate a new handler/client when needed on a secondary thread.
I have a concern about any implicit cache because it can have side effects. We have pool for runspaces. Also we should take into account ForEach -Parallel.
for fun, i tried the -SessionVariable / -WebSession options to see if it would help (kinda like Python above). it didn't.
for fun, i tried the
-SessionVariable/-WebSessionoptions to see if it would help (kinda like Python above). it didn't.
These parameters did not work for me either to enable connection pooling.
We could enhance WebRequestSession with SocketsHttpHandler.
We could utilize a thread-static member to work with that. That'll resolve a majority of cases, and we can have backup code to generate a new handler/client when needed on a secondary thread.
Sorry I am not super familiar with PowerShell, so I am probably misunderstanding. HttpClient is thread-safe and does not need a per-thread handler/client -- generally one instance per process works just fine. The only reason to have more than one is if you're setting options on the SocketsHttpHandler itself.
Most helpful comment
According to https://github.com/dotnet/runtime/issues/31267, .NET Core does not currently support TCP KeepAlive in System.Net.HttpClient.
There is an open issue to fix this: https://github.com/dotnet/runtime/issues/28721 and underlying issue https://github.com/dotnet/runtime/issues/1793, which are tracking the 5.0 milestone. Since PowerShell 7.1 will be targeting .NET 5.0, I expect this issue will be be resolved by the end of 2020.