When i run this snippet:
require('http').get({
host: 'www.extra.com.br',
path: '/EsporteLazer/AcessoriosdeTreino/?Filtro=C418_C2129_M71802'
}, (res) => {
console.log(res.rawHeaders);
});
I have this return
[ 'Content-Type',
'text/html',
'Location',
'http://www.extra.com.br/EsporteLazer/AcessoriosdeTreino/PanoM脙隆gico/?Filtro=C418_C2129_M71802',
'Server',
'Microsoft-IIS/8.5',
'X-dynaTrace',
'PT=60245451;PA=1467083649;SP=Monitoring;PS=-2089690959',
'dynaTrace',
'PT=60245451;PA=1467083649;SP=Monitoring;PS=-2089690959',
'X-Frame-Options',
'SAMEORIGIN',
'X-AspNet-Version',
'4.0.30319',
'X-SERVER',
'CHELSEA004',
'Content-Length',
'0',
'Expires',
'Sun, 31 Jul 2016 19:27:23 GMT',
'Cache-Control',
'max-age=0, no-cache, no-store',
'Pragma',
'no-cache',
'Date',
'Sun, 31 Jul 2016 19:27:23 GMT',
'Connection',
'close',
'Set-Cookie',
'akaau=1469993543~id=4df66c9861fbbe2b5efa29ea1b91cee6; path=/' ]
The Location value has a problem with the encoding:
http://www.extra.com.br/EsporteLazer/AcessoriosdeTreino/PanoM脙隆gico/?Filtro=C418_C2129_M71802
if i try to use something like curl i have the correct encoding:
$ curl -I http://www.extra.com.br/EsporteLazer/AcessoriosdeTreino/?Filtro=C418_C2129_M71802
With this request i have the following location
Location: http://www.extra.com.br/EsporteLazer/AcessoriosdeTreino/PanoM谩gico/?Filtro=C418_C2129_M71802
The headers in your example are in binary encoding. This produces the expected result (Buffer.toString() default is utf8):
require('http').get({
host: 'www.extra.com.br',
path: '/EsporteLazer/AcessoriosdeTreino/?Filtro=C418_C2129_M71802'
}, (res) => {
// same with res.headers.location
console.log(Buffer.from(res.rawHeaders[3], 'binary').toString());
});
@claudiorodriguez When i do res.headers instead of res.rawHeaders should i have the same behaviour?
is res.header.location a binary too?
Header values in node are parsed using the 'binary'/'latin1' encoding, which is ISO8859-1. So if a server sends UTF-8 (or any other character set really) instead, you will encounter the issue you have here. The easy solution in this particular case is to do what @claudiorodriguez suggested, create a Buffer and call .toString() on it.
res.headers and res.rawHeaders are both encoded the same. The only difference between the two is that the latter is an array containing the original header names and values, in order, with duplicates preserved. The former is an object with duplicates either dropped or merged into one value, depending on the header name, and with header names lowercased.
Thank you guys. Since this is the expected behaviour i will close this issue.
Why is Node.js not parsing it as UTF8? It's also not documented, AFAIK, that headers are in binary encoding.
@sindresorhus It's in the HTTP 1.1 specification.
Should we be specific and write that down in the Nodejs documentation?
Most helpful comment
Why is Node.js not parsing it as UTF8? It's also not documented, AFAIK, that headers are in
binaryencoding.