Curl: Windows Encoding bug.

Created on 4 Feb 2018  Â·  5Comments  Â·  Source: curl/curl

I did this

echo "./curl.exe -v www.baidu.com/?=你好" | file -
/dev/stdin: UTF-8 Unicode text
./curl.exe -v www.baidu.com/?=你好
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 14.215.177.39...

  • TCP_NODELAY set
  • Connected to www.baidu.com (14.215.177.39) port 80 (#0)
  • > GET /?=â–’â–’â–’ HTTP/1.1
  • > Host: www.baidu.com
  • > User-Agent: curl/7.53.1

I expected the following

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 14.215.177.39...

  • TCP_NODELAY set
  • Connected to www.baidu.com (14.215.177.39) port 80 (#0)
  • > GET /?=你好 HTTP/1.1
  • > Host: www.baidu.com
  • > User-Agent: curl/7.53.1

curl/libcurl version

User-Agent: curl/7.53.1
[curl -V output]

operating system

Windows 10 Git-bash

more detail

* echo "./curl.exe -v www.baidu.com/?=你好" | file -* shows my bash script encoding is Utf-8. I don`t know why encoding of string passed to curl will change to GB2312, and my git-bash and my target server do not support GB2312. GB2312 encoding is obtained by WireShark

URL cmdline tool not-a-bug

All 5 comments

The problem is here that the "你好"-piece is not a legal URL (by some definition of the term, see also URL-interop problems).

curl doesn't encode the characters at all, it will send the exact byte sequence it gets and in your case that is probably GB2312 if that's what your windows shell uses! If you want the URL to be using an UTF-8 encoded version of that string, you must unfortunately convert that yourself first to the actual URL you want to send.

On a Linux machine, that is easily done using iconv. I don't know how you do that on windows.

On a Linux machine, that is easily done using iconv. I don't know how you do that on windows.

msysgit comes with iconv, though I can't get the characters to show, probably because my character set is english (437).

As though I used iconv to convert string encoding to uft8, curl still recognize that as a system encoding string and I still can`t get my expect result.
image

curl still recognize that as a system encoding string

I don't understand what this means. curl has no knowledge of "system encoding" and it doesn't recognize such. When you set contents for libcurl to send in a multipart formpost, it will send the exact bytes you tell it to. libcurl doesn't know about encoding, it knows about bytes.

Thx, I understand now, it know about bytes. maybe I should find a method to change to 'filename' field from Window ANSI to UTF8

Was this page helpful?
0 / 5 - 0 ratings