In the .dvc/config file, I provide a query for the http remote
['remote "http"']
url = http://localhost:8080/?param1=value1¶m2=value2
When I receive the download/upload request in the server, the query is not received, I only get the file path
# sample
GET /85/ced7cc818c291d93a831db49318ff1
I would expect to get something like this
GET /85/ced7cc818c291d93a831db49318ff1?param1=value1¶m2=value2
After checking the source code, looks like the parameters are correctly parsed at
dvc/tree/http.py:45 - self.path_info = self.PATH_CLS(url)
but for some reason, it is lost in the upload/download functions from the same file. I've tried to trace the execution, but it is too hard. Maybe you can give me some guidance.
Output of dvc version
:
// Installed from source
$ dvc version
DVC version: 1.6.6+747cb6.mod
---------------------------------
Platform: Python 3.6.9 on Linux-5.3.0-28-generic-x86_64-with-Ubuntu-18.04-bionic
Supports: All remotes
Cache types: hardlink, symlink
Repo: dvc, git
Hi @MetalBlueberry, this is expected behavior in DVC. We do not currently support passing query parameters into the requests made to an HTTP remote.
Could you give some explanation of your use case here? I'm guessing your server is expecting some auth related information in the query parameters?
Implementation-wise, this would be straightforward for us to support. As you noted, we do parse the query params for the original URL in the remote config, but the params are dropped when we append remote file paths to that base URL. To get the behavior you are expecting, we should just need to override URLInfo.replace()
https://github.com/iterative/dvc/blob/747cb649f7ecc681267e8b2d14c225b664546b80/dvc/path_info.py#L155-L156
The HTTPURLInfo
specific implementation would also need to include self._extra_parts
in addition to _base_parts
.
We are working on a custom remote based on the http remote. The idea of query parameters is to provide configuration options to the custom remote to perform specific actions depending on those parameters.
If we can modify dvc to pass query params it could be really handy, if not.. we will provably encode the data in the url or in the custom header for the password.
@pmrowla feel like a bug to me to be honest. Is there a strong reason to not pass URLs "as-is"?
@shcheklein we actually discussed this issue internally during grooming and agreed there's no real reason for us to not support it.
The only real question was that for a "normal" web server the extra URL parts like fragment (http://.../path#fragment
) and query parameters are really endpoint/path specific, but that's not relevant for our purposes w/DVC remotes, so it should be fine for us to just propagate any configured query parameters to all of the HTTP remote requests.
Yeah, makes sense, thanks @pmrowla !
To my mind HTTP is a well defined protocol already - as far as I remember it defines things that are being sent to the server (e.g. fragments are skipped, query is sent), and it doesn't dictate any rules or implies any meaning on top of URL how server should organize itself. So, usually it's totally fine to use queries instead of regular path to pass some directory name.
I've tested #4517 with my current implementation and the params are passed seamlessly to my application.
Thank you for your guidance @pmrowla
Most helpful comment
I've tested #4517 with my current implementation and the params are passed seamlessly to my application.
Thank you for your guidance @pmrowla