Doing multipart requests with PowerShell is quite complicated and requires several extra steps for large file uploads. An easy example for a small file can be found on StackOverflow. An in depth discussion can be found in this blog post.
It would be a huge improvement if the WebRequestPSCmdlets (Invoke-RestMethod and Invoke-WebRequest) could be enhanced so that they support Multipart messages directly.
For an implementation I would expect the following parameters:
.NET Core seems to have support for MultipartContent which may simplify the implementation.
This is horrendous in Powershell and hasn't moved in an age. I personally would love to see some movement here, as uploading large file is very, very difficult.
As some pointer around the topic see my comments here (albeit for PS 5.1) which like to some other sites, namely http://blog.majcica.com/2016/01/13/powershell-tips-and-tricks-multipartform-data-requests/
@markekraus Could you please look this if you have free time?
@iSazonov sure. I will take a look at it. I have seen other requests in the past few years for multipart forms in general as several IoT APIs seem to only work with multipart forms.
Feel free to ping MSFT experts here if there is compatibility issues.
FWIW, we mainly Python, Django and Apache for our Web interfaace stuff. This is all RESTfull APIs. I am not developer, just attempting to interact with the API as we are seeing more sysadmins wanting to do this.
@swinster if you could provide me an exact scenario that would be very useful. Right now, all I have to go off of is to build a out a test API that takes multipart forms and files and hope that fits most scenarios.
@markekraus sure. I work for Pexip, and the entire platform a can be automated via API calls. This specific example is required with regard to uploading an upgrade file to the platform, which will be in excess of 1GB. The API documentation can be found here - https://docs.pexip.com/api_manage/api_configuration.htm?Highlight=upgrade
I can even post here the resultant function created to achieve this, although this is based on the information post by Mario Maj膷ica in his blog above.
@swinster awesome. If I need more I'll reach out.
@swinster Can you share your function with me? i want to make sure I'm on the right path.
This presents some interesting implementation issues. To truly support Multipart form data, the Cmdlets would need to accept multiple files, a field name for each file, possibly a content type for each file, and the ability to accept a dictionary of field/value pairs at the same time. Additionally, It appears some APIs are particular about the boundary used, so an optional boundary would need to be supplied. Attempting to make that mesh with the current Cmdlets would be a significant rework almost to the point of justifying separate cmdlets for Multipart support.
The issue is in Multipart form data being extremely flexible and radically different from other requests. It would be easy to tack on support for a single file or to convert a -Body
dictionary, but it would not be simple to mix a file with form data or to support multiple files. Fitting all the Multipart use-cases into the current cmdlets would touch a large portion of code to accommodate a valid but somewhat less common usage of the cmdlets.
IMO, It would also lead to a somewhat confusing UX. If -InFile
was used for Multipart files and only accepted multiple files on a Multipart request it would lead to issues with users supplying multiple paths on a standard request, getting errors, and wondering why it accepts an array of paths. Also, explaining that -Body
and -InFile
are mutually exclusive except when doing Multipart requests would be somewhat confusing confusing. If files were split into a separate and new parameter like -MultipartFile
, that might cause confusion as to why -InFile
doesn't work on Multipart requests or why there needs to be a difference in the first place.
The problem I see with simplifying this is that many of the requests I've read for sending Multipart form data through PowerShell have only their own use-case in mind. It might seem simple from that perspective, but when you look a dozen or so of these requests you begin to see they almost all are distinctly different needs falling under the broad category of Multipart form data. A Narrow implementation would leave many use-cases in their current state, a broad implementation would impact a lot of code but for feature addition that is less commonly used.
With that in mind, I think a fair compromise and better approach would be to accept a -Body
object/collection that can be used by the Cmdlets to create the Multipart Form-Data Request. This could be simplified by exposing new cmdlet(s) and new classes to simplify generating that object/collection. Body could simply be adapted to accept MultipartFormDataContent
at least easing the burden of the user managing an HttpClient
.
I could start by adding support for MultipartFormDataContent
in -Body
. From there it could be decided to add convenience cmdlets or possibly even wrapper classes to easy creation.
@JamesWTruher @dantraMSFT @PaulHigin @SteveL-MSFT What are your thoughts?
@markekraus FWIW, here is the modified function from Mario's blog.
The main addition was the Basic Authentication headers which also required a Base64 encoded version of the authentication credentials. Initially, I decrypted the PSCredential
password via a separate function, however this was changed to a single line. The Basic Authroziation header is required as although you can provide simple credentials to the the HTTPClient
(and indeed the Invoke-WebRequest
), the first request is sent without this header, which is then challenged, thus a second request is made. This means that then entire 1 GB file is uploaded twice - a huge waste of time and resources.
As an aside, I have a similar issue (WRT the Basic Authorisation Header) when downloading a large file (3-4 GB), again again leads to a two request issue, and the download of the file is actioned twice), However the Invoke-WebRequest
is even worse in that case as the entire file is streamed into memory and cause the remote server to lock up when I tried to use this CmdLet. I eventually changed to use the WebClient
(I couldn't get the HTTPClient
to work with my lacking programming knowledge) , however, WebClient
doesn't implement a timeout
property so you have to build a class to inherit and extend the WebClient. All very, very complex and confusing, especially to someone like me with little programming expertise. As mentioned, this is an aside to this particular issue, but is similar in nature.
function Invoke-MultipartFormDataUpload
{
[CmdletBinding()]
PARAM
(
[string][parameter(Mandatory = $true)][ValidateNotNullOrEmpty()]$InFile,
[string]$ContentType,
[Uri][parameter(Mandatory = $true)][ValidateNotNullOrEmpty()]$Uri,
[System.Management.Automation.PSCredential]$Credential
)
BEGIN
{
if (-not (Test-Path $InFile))
{
$errorMessage = ("File {0} missing or unable to read." -f $InFile)
$exception = New-Object System.Exception $errorMessage
$errorRecord = New-Object System.Management.Automation.ErrorRecord $exception, 'MultipartFormDataUpload', ([System.Management.Automation.ErrorCategory]::InvalidArgument), $InFile
$PSCmdlet.ThrowTerminatingError($errorRecord)
}
if (-not $ContentType)
{
Add-Type -AssemblyName System.Web
$mimeType = [System.Web.MimeMapping]::GetMimeMapping($InFile)
if ($mimeType)
{
$ContentType = $mimeType
}
else
{
$ContentType = "application/octet-stream"
}
}
}
PROCESS
{
Add-Type -AssemblyName System.Net.Http
$httpClientHandler = New-Object System.Net.Http.HttpClientHandler
if ($Credential)
{
$networkCredential = New-Object System.Net.NetworkCredential @($Credential.UserName, $Credential.Password)
$httpClientHandler.Credentials = $networkCredential
$httpClientHandler.PreAuthenticate = $true
$httpClient = New-Object System.Net.Http.Httpclient $httpClientHandler
#$password = Get-PlainText -SecureString $Credential.Password
$Base64Auth = [System.Convert]::ToBase64String([System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes([String]::Format( "{0}:{1}", $Credential.UserName, $Credential.GetNetworkCredential().Password)))
#$Base64Auth = [Convert]::ToBase64String([Text.Encoding]::GetEncoding("iso-8859-1").Getbytes("$($Credential.UserName):$($password)"))
$httpClient.DefaultRequestHeaders.Add("Authorization", "Basic $Base64Auth")
}
else {
$httpClient = New-Object System.Net.Http.Httpclient $httpClientHandler
}
$httpClient.Timeout = 18000000000
#$httpClient.DefaultRequestHeaders.Add("AUTHORIZATION", "Basic YTph")
$packageFileStream = New-Object System.IO.FileStream @($InFile, [System.IO.FileMode]::Open)
$contentDispositionHeaderValue = New-Object System.Net.Http.Headers.ContentDispositionHeaderValue "form-data"
$contentDispositionHeaderValue.Name = "package"
$contentDispositionHeaderValue.FileName = (Split-Path $InFile -leaf)
$streamContent = New-Object System.Net.Http.StreamContent $packageFileStream
$streamContent.Headers.ContentDisposition = $contentDispositionHeaderValue
$streamContent.Headers.ContentType = New-Object System.Net.Http.Headers.MediaTypeHeaderValue $ContentType
$content = New-Object System.Net.Http.MultipartFormDataContent
$content.Add($streamContent)
try
{
$response = $httpClient.PostAsync($Uri, $content).Result
if (!$response.IsSuccessStatusCode)
{
$responseBody = $response.Content.ReadAsStringAsync().Result
$errorMessage = "Status code {0}. Reason {1}. Server reported the following message: {2}." -f $response.StatusCode, $response.ReasonPhrase, $responseBody
throw [System.Net.Http.HttpRequestException] $errorMessage
}
#return $response.Content.ReadAsStringAsync().Result
return $response
}
catch [Exception]
{
$PSCmdlet.ThrowTerminatingError($_)
return $response
}
finally
{
if($null -ne $httpClient)
{
$httpClient.Dispose()
}
if($null -ne $response)
{
$response.Dispose()
}
}
}
END { }
}
I have also hardcoded the ContentDispositionHeaderValue
value (package
) which is all that was required for my needs.
@swinster thanks. I just wanted to confirm you were doing something similar.
Authorization Basic is being tracked under #4274
I'm not familiar with the other issue (using WebClient
for large files). but if it is a show stopper and someone hasn't already done so, you should open an issue on it.
@markekraus We discussed using https://github.com/AngleSharp/AngleSharp with @SteveL-MSFT. We could use the package to cover most features we need including multipart.
@iSazonov so that would be part of a larger rework then to use AngelSharp for parsing and submission?
We had a Power Commitiiee conclusion https://github.com/PowerShell/PowerShell/issues/3267#issuecomment-286917402 to use AngelSharp for parsing. I believe if the start will be good we could use AngelSharp broader. Yes, it may be a lot of work so I did not start. Related https://github.com/PowerShell/PowerShell/issues/2867
I think I may still work at accepting the MultipartFormDataContent
as a a -Body
value. It is something that would make it accessible now and still relevant should AngelSharp prove to be used for more than just parsing.
The idea is that we're using AngelSharp library for web the same way we use Newton library for json - we exclude low level web coding and focus efforts at PowerShell web features. Now we have to implement a web client - it is very difficult to implement a full feature web client - we can't compete with browsers. In Windows PowerShell we can use IE as workaround. In PowerShell Core we need a portable solution. Otherwise, we are doomed to endless patches of "holes".
In order to not break anything, we could maintain a new fork our web cmdlets with new names (prefix) as experimental solution and test AngelSharp in the case.
Now that #4782 has been merged. It is time to consider some simplified limited implementations.
What I think can be done without much pain and without making any breaking changes is to add multipart/form-data
support to dictionary -Body
's and -Infile
. This would be facilitated with something like -AsMultipart
switch.
In the case of -InFile
there would only be support for a single file and it would be added as a StreamContent
. The -ContentType
parameter, since it would otherwise be ignore, can be re-purposed for specifying the mime type of the file.
The -Body
dictionary would be converted into StringContent
. The Key's would be field names and the values would be the content.
Finally, when -AsMultipart
is supplied, the -InFile
and -Body
dictionary can be used together for a mixed content submission.
This would add support for many basic use cases, but would not address complex multipart/form-data
submissions or multiple files in a single submission. For those use cases, the MultipartFormDataContent
will still need to be manually created and supplied.
This would also require some error detection, such as when something other than a dictionary is supplied to -Body
when -AsMultipart
is used. Also, the -ContentType
will probably cause issues if it is not a valid type (will need to verify). The current logic for conflict resolution between -Body
and -InFile
will have to be revisited.
@iSazonov Can you add/change the label Area-Cmdlets-Utility
. I'm trying to make it easier to see all the web cmdlet issues.
OK, I have been mentally planning this one for the past month and I finally have working code
https://github.com/PowerShell/PowerShell/compare/master...markekraus:MultiPartSimplification
I'm looking for feedback from anyone in this thread.
I added 2 paramters -AsMultipart
and -MultipartFileFieldName
. When submitting a file via multipart a fieldname is required. The logic has been reworked to allow for -InFile
and -Body
provided -AsMultipart
is present and provided the body is either a IDictionary
or MultipartFormDataContent
.
The new -Body
usage in #4782 is still there and doesn't require -AsMultipart
but I expanded it to work with -AsMutlipart
and to accept a -File
(-Body
is arlready blocked by the MultipartFormDataContent
so you cant supply both a MultipartFormDataContent
and an IDictionary
).
This is all somewhat painful.. but I can confirm this works:
"Test Contents" | Set-Content -Path C:\temp\test.txt
$uri = Get-WebListenerUrl -Test Multipart
$Params = @{
AsMultipart = $true
Body = @{Testy="testest"}
Uri = $uri
Method = 'POST'
InFile = "C:\temp\test.txt"
ContentType = 'text/plain'
MultipartFileFieldName = 'TestFile'
}
$results = Invoke-RestMethod @Params
Should we adopt a similar syntax to curl?
invoke-restmethod -Form @{key=value;upload="@<filepath>"}
@SteveL-MSFT I thought about that, but I have run into APIs that are picky about the Content-Type used for the file uploads, particularly those that are expecting images. I guess in those cases use could still create their own MultipartFormDataContent
and supply it to -Body
. Then there is the question of how to escape @
in something like this irm -Form @{TwitterUsername="@markekraus"}
. and how escape that escape sequence... etc.
using something like that would definitely make the internal logic easier to deal with and would make multiple file uploads available.
@markekraus I think we should optimize for general usage (80/20 rule). I like using @
as it's familiar to curl users and examples, but I'm slightly concerned about setting a precedent within PowerShell. I think escaping @
would just follow normal PowerShell rules: irm -form @{twitterusername="`@steve_msft"}
irm -form @{twitterusername="`@steve_msft"}
Wouldn't that just arrive at the cmdlets as @steve_msft
and then the cmdlet would try to process the steve_msft
file? @
has always been problematic in strings because it such a commonly used internet character. It was always painful in curl if you needed an @
prepended string. In any case, there would need to be string parsing of some kind added.
I too worry about that precedent.
For general usage, I think my current proposal works. Most of the requests I have seen for multipart is for uploading a single file (including the one at the root of this issue). It doesn't introduce a new syntax and it reuses parameters that users are familiar with.
Invoke-RestMethod -InFile "C:\temp\test.txt" -method POST -AsMultipart -MultipartFileFieldName Upload
@markekraus you're right, the cmdlet wouldn't know the difference. If most usage is for upload, then I would be fine with what you're proposing except that -method POST
should be implied and if Upload
is pretty standard, perhaps that should be default if -InFile
is used.
I don't think you're going to get a ton of complaints about how you do it, as long as you don't break existing scenarios. People can put up with a little awkwardness, as long as it's documented with examples, and it works.
Having said that, I was going to say that since PowerShell is OO, it seems to me that we don't need special markers to tell filenames from strings, because we have objects -- it would make sense to me to use an object ...
I mean, from the previous example, if you had a FileInfo instead of a string ...
Invoke-RestMethod -Form @{ UserName = "Jaykul"; Avatar = (Get-Item .\avatar.png) }
... well, you would know what to do, wouldn't you?
We would add an accelerator [File]
@SteveL-MSFT I agree on the Implicit POST. I don't think multipart/form-data
is ever used with any other verb and if it is. It could be Implicit based on -AsMutlipart
.
@Jaykul That thought occurred to me after I signed off for the night. I'm OK with that. The other thing I thought of is that Having -Form
in addition to -InFile
and -Body
might be a bit confusing, but, there really is no non-confusing way to go about all of this, IMO. so I guess we pick our poison.
What object type should it be detected on? I don't work on the filesystem side of the house often. If System.IO.FileSystemInfo
sufficient, or is there a better one?
@iSazonov You know, [file]
is something that it think has been missing from the language. That would be worth pursuing regardless of this use case, IMO. It would certainly help though:
Invoke-RestMethod -Form @{ UserName = "markekraus"; Avatar = [file]".\avatar.png" }
System.IO.FileSystemInfo
is the common base class for both DirectoryInfo
and FileInfo
. If this feature only handles a single file then FileInfo
would make more sense.
I had this thought this morning, what about also supporting streams? Stream support isn't widely used in PowerShell, but, I think it's been a missed opportunity. In PHP streams were very common for generating things on the fly like images (granted I haven't touched PHP in a really long time).
@markekraus I think streams would be additive and could be done separately from this one?
New Issues to discuss [file]/[directory] and streams we need.
For this specific implementation, it is actually simpler and trivial to add Stream support to -Form
. We will already be doing type evaluation on the dictionary values and StreamContent
takes a stream by default. It's actually more work, code-wise, to convert a supplied FileInfo
to a stream and then to a StreamContent
then it does to just wrap the supplied stream in a StreamContent
. But if you think it's better to implement later I'm ok with that.
If it's simpler to do it now, there's no reason to make it more complicated later.
My tupenth, albeit your implementation talk is out of my knowledge base, when sending large file (think 1GB, indeed we regularly send 4GB files), some of my previous attempt to use PS have completely killed the sending system as the entire thing is read into memory. Streaming the file is the only sane way to do this (see previous comments). IMHO, without this ability, the whole feature falls flat.
@swinster Both the current non-multipart -InFile
and the multipart implementation (whichever one we end up going with) use a StreamContent
object which stream the file to HttpClient
. The web cmdlets themselves are not reading the entire file to memory, but that might be what the underlying CoreFX is doing. If you have confirmed this behavior persists in PowerShell core, can you open a new issue with how to reproduce it?
@iSazonov That article is on the receiving (server->client) side were I believe there is already an open issue about it. @swinster was saying a similar problem occurs on the sending (client->server) side. I believe we are already efficient on the sending side. I think that problem for sending did exist in earlier versions, but since Core change to the HttpClient, I don't think it is an issue anymore. I have not looked deeply into the receiving side as that direction is complicated by all of the processing we do after the HttpClient
call.
Removing Review - Committee
, I don't think it necessary for committee to review this anymore. The current proposal seems fine.
@SteveL-MSFT which proposal? -Form
or reusing the existing parameters? Honestly, I was really digging -Form
I thought this had landed on -Form
? (or perhaps that's just what I thought because I prefer it?)
@SteveL-MSFT
Ok. I was just making sure. I'm good with -Form
but since there were kind of multiple proposals floating about since my code post I wanted to make sure we were all on the same page:
-Form
will accept a dictionary where the keys will be the field names. String values will be processed as StringContent
. FileInfo
and Stream
will be processed as StreamContent
. Any other value type will be the result of LanguagePrimitives.ConvertTo<string>()
and send as StringContent
. The Content-Type
of files and streams will be application/octet-stream
. When -Form
is supplied the method will be POST
and anything passed to -Method
and -ContentType
will be ignored . -Form
will be exclusive to -Body
and -InFile
and an terminating error will occur if they are used together.
Note that users needing more control or advanced Multipart features may still create and supply a MultipartFormDataContent
object to the -Body
parameter as provided in #4782.
@SteveL-MSFT You had reapplied the Review - Committee
tag to this issue. Do you get a chance to discuss this one yesterday?
@markekraus no, we ran out of time, will try to get a resolution by next week although Joey and I are at PSConf next week...
@PowerShell/powershell-committee reviewed this and agree that the Form syntax of using a hashtable makes sense as well as adding appropriate accelerators for [file] and [directory]. Since HTTP Form supports Get in addition to Post, -Method
is required when using -Form
.
Can you elaborate on [file]
and [directory]
? The discussion here mentions FileInfo
and DirectoryInfo
, so it's not clear which type mappings are proposed.
on that same topic... should this accelerator implementation support provider items?
[File]'somedrive:\path\to\file.txt'
if [file] -eq [system.io.fileinfo]
, I think that will cause confusion, e.g.
using namespace System.IO
[File]::WriteAllLines($path, $line)
Clearly that's not meant to be FileInfo
, but if you don't see the using
, you might be confused if you get used to see [File]
in other places.
For me, the name isn't so important as just having easy accelerators for files and directories. I'm not a huge fan of [system.io.fileinfo]
not throwing on the file not existing. ideally this accelerator would be used like this
function Get-Widget {
param([PSFile]$File)
# Do something with $File
}
Get-Widget -File 'x:\no\lo\existo.nope'
# parameter binding error for non existent file
Get-Widget -File 'c:\some\real\file.txt'
# works with existing file
$form = @{
StringField = 'stringValue'
FileField = [PSFile]'x:\no\lo\existo.nope'
}
# type conversion error on non existing file
$form = @{
StringField = 'stringValue'
FileField = [PSFile]'c:\some\real\file.txt'
}
# no error
also... I'd like to note that the accelerators should not be blocking to the multipart implementation. I can implement this now with out [file]
and add support for that later. The accelerators should have been tied to a separate issue. while having them would make the UX for the -Form
feature easier, for now the user can do
$form = @{
FileField = Get-Item 'c:\some\real\file.txt'
}
iwr -Form $form -Method POST -Uri $uri
The accelerators should be a separate issue and PR. You can still use [System.IO.FileInfo] for now.
Just a quick question. with regard to this issue, has anything made it to PowerShell v6 as yet ?
Not sure if the underlying mechanism to retrieve a file will be altered, but it came to light the other day that the way I have the "working" at the moment in v5 is a little on the slow side. An HTTP download of a 4GB file from a browser from a server based on the same network takes approximately 30 seconds. In PowerShell v5, this takes around 1 hour !!!
@swinster PowerShell 6.0.0 includes the ability to supply a System.Net.Http.MultipartFormDataContent
object to the -Body
parameter. You can read about this in detail here https://get-powershellblog.blogspot.com/2017/09/multipartform-data-support-for-invoke.html . The simplified -Form
parameter will be added in 6.1.0 (assuming nothing comes up or blocks it). It is the first feature on my TODO list, but I have a few bugs and some code refactoring I would like to resolve first.
Awesome work, thanks @markekraus . Can't wait to be able to retrieve files with -outfile
Can't wait to be able to retrieve files with -outfile
@swinster Can you elaborate on this? You can already download files and use the -OutFile
parameter to save them. This has been around for several versions at least.
The blog post I linked has an earlier version of what was planned for the simplified multipart/form-data support. The planned implimentation has since been revised. You can see a "demo" of the currently planned simplified multipart/form-data support here https://get-powershellblog.blogspot.com/2017/12/powershell-core-web-cmdlets-in-depth_24.html#L21 and you can see the proposal here https://github.com/PowerShell/PowerShell/issues/2112#issuecomment-337735585
@markekraus apologies, the comment above was not in the correct place as it related to the underlying method of data retrieval from a simple GET method on the Invoke-WebRequest
, which _was_ in the order of 200 time slower that a browser request, but actually nothing to do the MultiPart uploads. However, I have just run the same basic tests using Invoke-WebRequest
to GET a large file in v6 and v5 simultaneously and honestly there is a HUGE difference - whoever made these changes to the implementation here need some applause as well - even if this is the wrong place!
An update for anyone following this issue. I am currently working on implementing the solution as approved by the PowerShell Committee with some minor adjustments. I hope to have a PR submitted in the coming days.
After reading RFC-7578 it became clear that a single field name can be supplied multiple times with different field values. The use case for this is an array of files (multiple file selection) or values for a single form field. To accommodate that, I'm adjusting the implementation slightly to treat collections as multiple field values for the same field name.
for example:
$Form = @{
Files = Get-ChildItem c:\temp -File
Names = "Bill", "Sue", "Jane"
Cars = [System.Collections.Generic.List[String]]::new([String[]]@("Smart", "Honda"))
Description = "Multiple and single value support."
Image = [System.IO.FileInfo]"c:\picture\me.png"
}
Invoke-WebRequest -Method POST -Uri $uri -Form $Form
In the approved implementation, this would result in a collection being converted to a single string. In the implementation I am working on, Names
would be supplied 3 times with Bill
as the first value, Sue
as the second, and Jane
as the third. It doesn't add much complexity and initial testing shows it works for endpoints that support it. It provides more flexibility at minimal cost. After checking, I personally have an internal use case for this in a form that requires multiple values for a single field.
Most helpful comment
We would add an accelerator [File]