Phantomjs: Custom HTTP header fields order

Created on 31 Dec 2013  Â·  29Comments  Â·  Source: ariya/phantomjs

According to RFC2616 (http://www.w3.org/Protocols/rfc2616/rfc2616.html)

the order in which header fields with differing field names are received is not significant

however, some implementations may pay particular attention to the order of these fields.

Are there any plans to support custom orderings ?

eg. Have "Connection: Keep-Alive" come before "Accept-Encoding: gzip".

A webpage property similar to this would probably work:
page.headerFieldsOrder = ["Accept", "Accept-Language", "Host", "Connection"...];

Thanks

QOther

Most helpful comment

The order does actually matter. Some anti-bot systems use it to identify
phantom and block requests with a particular order.

On Sep 9, 2016 8:02 AM, "Vitaly Slobodin" [email protected] wrote:

Since this problem marked as out-of-scope by Qt I believe we can close it
too.
Because future versions of PhantomJS will use the system-installed (or
original version) of Qt.

Also, RFC describes that the order of HTTP headers doesn't matter.

Thanks!

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ariya/phantomjs/issues/11859#issuecomment-245939504,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAS-TfrLzR6eBpwK94nJ9a3fE3af2Punks5qoXTsgaJpZM4BWy1A
.

All 29 comments

What benefit does it have?

It may not be a critical enhancement but it would likely improve the flexibility of the tool.

I have been fiddling around with phantomjs and CDNs and found out some services like Incapsula may be looking at the order of HTTP request headers other than values to determine the type of browser.

Here are the images of two GET requests, the first made by Firefox 26 and the second by phantomjs using customHeaders to mimic Firefox.

Firefox:

firefox

Phantomjs with customHeaders:

phantomjs

Below is the code I used to set the headers. Some of the field values may not be compatible, however, my goal was to get two identical HTTP responses from the server.

page.customHeaders = {
    "User-Agent" : "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate",
    "Connection" : "keep-alive"
};

Given the same values for the headers I would expect identical responses, for some reason this doesn't happen. The order of the fields may make a difference.

Thoughts ?

I also find this feature useful. Any plans to add it soon?

:+1: +1

+1
Without possibility to modify HTTP headers order it is impossible to fetch sites protected with Incapsula.

+1

+1

I have successfully made PhantomJS fetch a page from an Incapsula-protected site (www.enjin.com), with modification of two files:

  1. qhttpnetworkrequest.cpp (fixed http header field ordering)
  2. webpage.cpp (renamed the phantom-callback object name, in order for the Incapsula js-test not finding it)

gist (including example script): https://gist.github.com/GunsAkimbo/aa6ac81bd55dd1802637

It's not pretty, just a proof-of-concept of the changes to make in order for not be stopped by Incapsula.

So in principle we are interested in making changes like these. However:

  • We are trying to minimize the number of patches we carry relative to upstream Qt. Please discuss that part of your changes with the Qt developers. They'll probably be more receptive to a patch that allows an application to set the ordering itself, than a patch that hardwires an order.
  • Nothing stops Incapsula or whoever from looking for __phantom as well as, or instead of, phantom. The right change there would be to enable the controller script to _remove_ the phantom intrusion entirely, if it isn't needed, and/or rename it as it sees fit.
  1. Nope, rejected with reason "Out of scope": https://bugreports.qt.io/browse/QTBUG-49659
  2. I know, it was not meant for a real patch, just something quick and dirty if someone really wanted to compile something that "works".

I'm using PhantomJS for a website screenshotting service, and I'm unable do much on Incapsula protected sites. This is a clear issue for multiple people, don't fully understand why its not being addressed....

@yegors Well, PJS uses an external library for network communication (QT), and I think the problem lies partly there, which means we cannot fix that in this repo. The other part is to hide the global object with a fixed name "_phantom" or have the ability to rename it runtime before loading a page.

You could try to compile a build yourself, with the changes mentioned in the gist I linked to a few posts back. Those changes made it possible to pass through the protection, but I'm not sure if both changes were necessary.

As an alternative, you could look into https://phantomjscloud.com/site/index.html
I have had good results using this service for incapsula-protected sites.

@GunsAkimbo Its pretty unfortunate that QT refuses to fix it on their end. Custom compile seems like its the best (only) option at this point, as a 3rd party service is out of the question for our applications. Will try your patch and see what happens.

Whew, thanks guys for documenting this! I would have wasted hours trying to support incapsula. Will move my scripts now to firefox/selenium.

@GunsAkimbo trying to implement your fix.
qhttpnetworkrequest.cpp where is this file located? cant find in this repo.

@opahopa That file belongs to the QT-repo, it used to be referenced in the .gitmodules-file, I guess that has been changed, according to the history.

@GunsAkimbo any idea how to change the headers order now?

Since this problem marked as out-of-scope by Qt I believe we can close it too.
Because future versions of PhantomJS will use the system-installed (or original version) of Qt.

Also, RFC describes that the order of HTTP headers doesn't matter.

Thanks!

The order does actually matter. Some anti-bot systems use it to identify
phantom and block requests with a particular order.

On Sep 9, 2016 8:02 AM, "Vitaly Slobodin" [email protected] wrote:

Since this problem marked as out-of-scope by Qt I believe we can close it
too.
Because future versions of PhantomJS will use the system-installed (or
original version) of Qt.

Also, RFC describes that the order of HTTP headers doesn't matter.

Thanks!

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ariya/phantomjs/issues/11859#issuecomment-245939504,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAS-TfrLzR6eBpwK94nJ9a3fE3af2Punks5qoXTsgaJpZM4BWy1A
.

Yes, I know that. But the problem is that implementing this feature require a custom version of Qt. We want to move away from custom (patched) version to the original version.

@maximilianh You can put PhantomJS behind a proxy which reorders headers as you want. Such proxy could be implemented in any language without much trouble, or may be there is an existing solution

@annulen Do you know of any such proxy service providers or proxy libraries in Python/Ruby/NodeJS which can reorder the headers? I have tried many libraries which can modify the headers but they cannot reorder them. Any help is appreciated. Thanks.

Is header reordering required for HTTPS, or plain HTTP is enough?

@annulen : It would be better if possible for both. Otherwise plain HTTP is also OK.

For HTTPS it would require "bumping" SSL connections which would significantly complicate code of proxy, even if we don't consider things like using client certificates or validating server certificate on client side. In case HTTPS is needed it's indeed much easier solution to fix order on client side, i.e. patch Qt.

If you are only concerned with Host header position, it would be better to write a patch for https://bugreports.qt.io/browse/QTBUG-51557, it will be accepted

Do we have any other soluion other than proxy?

Fix the code, seriously.

There are 2 independent issues here:

  • QtNetwork adds some headers automatically if they are not set by application, and does it by appending them to the end of the list, instead of maintaining some meaningful order. These headers are Proxy-Connection/Connection, Accept-Encoding, Accept-Language, User-Agent, Host

    • It's possible to work around this by specifying all headers explicitly. New QNetworkRequest should be created and filled in desired order. Explicit Accept-Encoding means that PhantomJS or QtWebKit need to decompress content encoding with zlib instead of relying on QNAM automagic.

    • It's weird to set Host manually, so solving QTBUG-51557 won't hurt.

  • PhantomJS uses QVariantMap to store customHeaders property, so their order is not preserved and replaced with lexicographic. This can be worked around by using container that preserves order

Update: patch for QTBUG-51557 will be included into Qt 5.10.1, see https://codereview.qt-project.org/#/c/216980/.

Hello, I implement @GunsAkimbo concept to bypass Incapsula.
You can download phantomjs at
https://drive.google.com/drive/folders/1Y0XqQ89hQUhDj9_EPW-kja8V1vX4Catf?usp=sharing

There're 2 files, 1 for window & 1 for linux.

Was this page helpful?
0 / 5 - 0 ratings