Powershell: Implement WebCdmlets to save response content to filename specified in Content-Disposition

Created on 10 Apr 2018  路  14Comments  路  Source: PowerShell/PowerShell

Currently the WebCmdlets only allow to save the content of a response via -OutFile to a filename specified when invoking the Cmdlets. As many services are responding with the Content-Disposition header, which contains details of the response content, including a filename, it would be good if the WebCmdlets would allow to save a file to a path specified e.g. via a parameter -OutPath but using the filename from the Content-Disposition header. This would be similar to the behaviour of curl, wget or Web Browsers.

Area-Cmdlets-Utility Issue-Enhancement

All 14 comments

I do not want to overload Invoke-WebRequest and Invoke-RestMethod with download specific functionality. Instead, I would like to consider this functionality in the design of the new Download Specific cmdlet.

Please comment on PowerShell/PowerShell-RFC#124

The current plan doesn't include using Content-Disposition, but I'm open to guidance on how to incorporate that for HTTP/HTTPS.

@markekraus I like the suggestion in the RFC you mentioned, but in my opinion what you are suggesting there is something like a download utility similar to wget or a download manager (e.g. something like CloudBerry, Cyberduck or others).

In my opinion the WebCmdlets are about handling Web Requests and their responses. The response to a POST, PUT, GET or DELETE request can contain content and the Content-Disposition header is adding information about the content such as a filename which should be used to save the content. A download utility, as you suggested in the RFC, should be targeted at downloading files (e.g. via a HTTP GET request) and not handling responses of POST, PUT, GET or DELETE requests, which may contain files in their responses.

As the WebCmdlets already include functionality to save the content to a file via the -OutFile parameter, I don't think we are overloading any functionality here.

I noted that -OutFile accepts String as input, and only the documentation states that the parameter should either be path plus filename or only filename (and then $PWD will be used as path).

Therefore my suggestion is, to extend the current behaviour and documentation:

  • Allow the -OutFile parameter to contain either a path OR a path and a filename OR only a filename
  • In case the string specified is a directory, the filename will be extracted from the Content-Disposition header if available. If not available, the filename will be the last part of the URL without query parameters (e.g. for https://example.org/path/to/file.txt it would be file.txt)

This would bring the WebCmdlets closer to the curl behavior and would meet what several users are expecting from the WebCmdlets when using the -OutFile parameter (see discussions on StackOverflow, Reddit.

This is an example how curl handles the filename of the Content-Disposition header (parameter -O is required to save to file and -J to use extract the filename from the Content-Disposition header instead from the URL:

    curl http://test.greenbytes.de/tech/tc2231/attwithasciifilename35.asis -O -J -v
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 217.91.35.233...
* TCP_NODELAY set
* Connected to test.greenbytes.de (217.91.35.233) port 80 (#0)
> GET /tech/tc2231/attwithasciifilename35.asis HTTP/1.1
> Host: test.greenbytes.de
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Thu, 12 Apr 2018 21:31:00 GMT
< Server: Apache/2.4.33 (Ubuntu)
< Cache-Control: no-cache
< Content-Disposition: attachment; filename="00000000001111111111222222222233333"
< Upgrade: h2
< Connection: Upgrade
< Cache-Control: max-age=3600
< Expires: Thu, 12 Apr 2018 22:31:00 GMT
< Vary: Accept-Encoding
< Content-Length: 527
< Content-Type: text/html; charset="ISO-8859-1"
< 
{ [527 bytes data]
100   527  100   527    0     0  10075      0 --:--:-- --:--:-- --:--:-- 10134
* Connection #0 to host test.greenbytes.de left intact
curl: Saved to filename '00000000001111111111222222222233333'

This page has several examples for testing the Content-Disposition header.

Rather than overload -OutFile, it may be better to have a -OutPathparameter that only takes a directory path.

I'm an 100% against adding this.

There are way too many download specific requests for these cmdlets that are better served on the new cmdlet. The code for dealing with downloads in the web cmdlets is not ideal because all of the value of these cmdlets is focused on proccessing web responses, not downloaded content.

Cmdlets and functions should "do one thing". The web cmdlets already provide a download feature as a convenince, and perhaps that was a mistake in the initial design.. Any further download features should be focued on the new download specific cmdlets.

Also. the existing web cmdlets do not need to reach full parity with curl. PowerShell does. curl is overloaded and and that caries with it a ton of baggage.

processing web content, downloading binary data (this includes text downloads), and processing web streams are 3 distinct tasks. Yes, they are sharing a common protocol, HTTP. But that protocol is overloaded with a bunch of different functionalities.

PowerShell can and should accommodate those functions. But I see no compelling reason why they should all be done in the existing cmdlets. It makes them too complex and difficult to change. Adding more download features makes it harder to make feature changes that deal with processing web content, and adding features to process web content makes it harder to deal with downloads.

We can avoid that coupling behavior by using separate cmdlets.

@SteveL-MSFT Even with that, we now have to add logic in these cmdlets to deal with naming when content-distposition isn't available, we have to figure out the behavior when a file with the chosen name already exists. Do we overwrite? Do we error out? fdo we add an -AllowClobber? We have to now complicate the resume feature and extend run time checks to include more mutex params. It's just too complex and too restricting for too little gain, IMO. With the use of -PassThru or -ResponseHeadersVariable the user has access to the content-dispostiion header and can rename the file if they wish.

@markekraus I'm with you 100% that we want full parity with curl, but that doesn't mean everything jammed into just one or two cmdlets.

@markekraus I don't understand why you insist that handling attachments is something which needs to be done in a different Cmdlet. This issue is not about downloading a file. It is about extracting the filename for the response content from the Content-Disposition header, which is considerably different. A user may need to issue a complex Web Request with HTTP Header, Body and a Method different than GET (most often POST, but also PUT or DELETE) and where the response happens to contain an attachment which should be saved locally. As -OutFile already exists as a parameter, this issue is only about allowing to use the filename provided in the response instead of specifying it upfront. This issue is not about adding download functionality to WebCmdlets.

The Cmdlet you proposed in the RFC is about downloading (e.g. retrieving) a file. I agree, that functionality specific to downloading files could (and probably should) be handled in a different Cmdlet and that also includes extracting the filename from the Content-Disposition header. There are some features which would really bloat the existing WebCmdlets like multipart downloads (e.g. via multiple parallel HTTP Range Requests), Async IO for saving files, resumable downloads, protocols other than HTTP and probably several more features. Saving an attachment in resposne to a complex Web Request is not something which requires a different Cmdlet in my opinion, as it would bloat the Download Cmdlet and require it to allow specifying HTTP Header, Body and different HTTP Methods which are what the existing WebCmdlets seem to be designed for. The Download Cmdlet could reuse the existing WebCmdlets and enhance them with additional features for downloading files.

There is a good reason, that both wget and curl exist and are both used in different uses cases. wget handles downloading of files, similar to what you suggest. curl handles Web Requests and their responses.

I'd be glad if others would add their view here. It may also be a good idea to exactly specify what the community (and the PowerShell Committee) want to achieve with the existing WebCmdlets and what they do not want to achieve with them, maybe even through dedicated RFCs for the WebCmdlets.

I don't understand why you insist that handling attachments is something which needs to be done in a different Cmdlet.

I know I sound like a broken record, but: All of the value of the web cmdlets derives from their ability process web content. That is to say, doing meaningful things with HTML, XML, JSON, and plain-text. Their code is geared towards it. Adding features to support downloads or "attachments" or any other way you want to paint them is a completely different functionality. That is why.

You can already rename the file after the fact by inspecting the content disposition header on the response. So this lack of convenience is non-blocking. But, adding this feature is a non-trivial amount of work that requires limiting the cmdlets' flexibility to extend and improve features relevant to their core functionality.

If you think you may need POST functionality in order to retrieve remote files, please comment on the RFC.

As I mentioned here https://github.com/PowerShell/PowerShell/issues/6537#issuecomment-377690052, with Invoke-WebRequest we can already achieve 2 of 3 things, so the single-responsibility principle is not strictly followed already. Therefore, adding the third (1a. without specifying destination file name) with -Save would not be in such a violation that we require a totally new cmdlet; now that Invoke-WebRequest is used quite heavily used in the wild.

with Invoke-WebRequest we can already achieve 2 of 3 things

And that is a design decision that has not served us well.

I'm closing this as the conversation has gone stale and there is decent enough consensus and reason not to implement this in the existing cmdlets.

decent enough consensus

With the recent addition of #11671, we now have 3 distinct requests to implement the same thing.

I find @ffeldhaus proposal and reasoning convincing: this is not about "feature creep" or "responsibility creep", this is about a simple, non-breaking enhancement to an _existing_ feature that has great value.

I suggest we revisit this.

If someone is willing to take this on: Implementing it as an experimental feature has now been green-lighted: https://github.com/PowerShell/PowerShell/issues/11671#issuecomment-578904101

Was this page helpful?
0 / 5 - 0 ratings