_[email protected] commented:_
Which version of PhantomJS are you using? Tip: run 'phantomjs --version'.
phantomjs 1.4.1What steps will reproduce the problem?
- I want to fetch same url multiple times sequentially. For example , based on examples/netsniff.js, in page.open("http://www.baidu.com", function(status) {
// require a new page
// ++count; exit when count exceeds
// page open the same url- each open will create a har file, the 1st one has full interaction info as expected. But subsequence's har file only has quite small size. In them, only "GET /" is recorded.
- watched by fiddler or wireshark, the packets for subsequent page.open() only point to 1st page of the host, no subsequent detail action.
What is the expected output? What do you see instead?
Should have same or near size of all har files.
But 2nd, 3rd.. are more smaller than 1st har.Which operating system are you using?
Windows 7 x64Did you use binary PhantomJS or did you compile it from source?
binaryPlease provide any additional information below.
It seems there's some cache working.
I have written another js serves as a server, to get input file preodically, and fetches up to 8 urls at one time. If reinput some old urls to it after, for example, 32 urls job, I could get correct har.
Disclaimer:
This issue was migrated on 2013-03-15 from the project's former issue tracker on Google Code, Issue #357.
:star2: 7 people had starred this issue at the time of migration.
_[email protected] commented:_
Also having this issue.
I'm using Phantom to benchmark frontend performance, and I want to be able to force-reload all resources, but logging resources with page.onResourceRequested shows that resources like javascript and css files are not loaded when reloading a given page.
It would be great if there were an option to turn off caching altogether for cases like these.
_[email protected] commented:_
Another use case for forcing full reload: forms with a csrf token to protect against csrf attacks. Even though the page is reloaded (using the same WebPage), the token stays the same (and thus, the form can't be submitted successfully).
The workaround I'm using for the moment is to have a different WebPage object for each reload.
_[email protected] commented:_
Another case for force full reload:
I am working on improving performance of a project. And I need to record the loading speed of the page. It's not a accurate way to just open the page one times, I need to check several times to get the average time cost. However, after the first open operation, all the static file will be cached. It's very inconvenient if I can not clear the cache file.
_[email protected] commented:_
+1 for more control of the memory cache. It's obvious that some kind of memory caching is happening. Even with
--disk-cache=false, repeated requests to the same page either result in 304s or Phantom simply not making HTTP requests for static resources.This is preventing me from taking measurements of average page load time, since only the first run is a true complete page load. Subsequent runs pull things from a cache, meaning the numbers I get are useless.
_[email protected] commented:_
It's possible to clear out the memory cache, eg this patch will clear out memory cache between reloads:
diff --git a/src/webpage.cpp b/src/webpage.cpp index bf6d814..304ae6c 100644 --- a/src/webpage.cpp +++ b/src/webpage.cpp @@ -559,6 +559,8 @@ void WebPage::applySettings(const QVariantMap &def) opt->setAttribute(QWebSettings::JavascriptCanOpenWindows, def[PAGE_SETTINGS_JS_CAN_OPEN_WINDOWS].toBool()); opt->setAttribute(QWebSettings::JavascriptCanCloseWindows, def[PAGE_SETTINGS_JS_CAN_CLOSE_WINDOWS].toBool()); + QWebSettings::clearMemoryCaches( ); + if (def.contains(PAGE_SETTINGS_USER_AGENT)) m_customWebPage->m_userAgent = def[PAGE_SETTINGS_USER_AGENT].toString();
It doesn't seem like this has been implemented, right? At least I keep having this issue with the latest PhantomJS from binary (x64).
No, this hasn't been implemented. Looks like it would be pretty easy... we would just need to wrap the QWebSettings::clearMemoryCaches(); call up to expose it to webpage module instances.
What about a setting on the WebPage like:
page.settings.forceRefresh = [false|true]
The caveat is that I'm assuming that this will clear the call for _all_ webpage module instances as this is a static method call.
As such, it should either:
WebPage#clearHttpCacheForAllWebPages:js
var WP = require('webpage');
var page = WP.create();
page.clearHttpCacheForAllWebPages();
webpage module itself rather than instances:js
var WP = require('webpage');
WP.clearHttpCache();
I would personally prefer the latter.
That's true. In that case the latter scenario with a static method makes more sense indeed.
Shouldn't clearCache() instead of clearHttpCache() be enough though?
I'll try to do it, maybe there are any other ideas?
@Tomtomgo: I'd recommend keep the "http" in the method name somewhere as there are many types of independently managed caches/storage inside of QtWebKit (WebSQL, LocalStorage, AppCache, Favicon cache, etc.), and this method would only be clearing the one.
We can consider adding clear methods for other such caches as needed/possible in the future, too.
BTW, I looked into the HTTP cache a bit more and the real sticking point is that you can only have one HTTP cache per QNetworkAccessManager (QNAM) instance. Qt recommends that a single QNAM instance is enough for a whole Qt application, and as such PhantomJS uses a singleton QNAM.
To differentiate caches between webpage instances, you'd need to alter the internals to create a separate QNAM instance.
Okay thanks! I'll wait a bit for other replies and otherwise implement the static WP.clearHttpCache().
Oh, hey, actually: here's another idea!
Each QWebPage instance does get its own QWebSettings instance, so you could probably "disable" caching for a single page by applying the QWebSettings::PrivateBrowsingEnabled WebAttribute to that single instance.
What's the status of this feature? I'd really like to see it implemented.
I did not implement it.
Very much in need for this feature :( . Any hack currently available?
@ariya Thanks for getting this in. Looking forward very much to 2.x
Was this feature lost in the transition to 2.0.0? I'm getting 304s after upgrading to 2.0, even with --disk-cache=false and/or --max-disk-cache-size=0.
I get the 304s as well.
I'm running into the same issue while using PhantomJS version 2.0.0. A workaround is to append a random request parameter to the URL, which forces a cache miss.
phantomjs --version
2.1.1
page.settings.clearMemoryCaches = true;
page.clearMemoryCache();
before page.open and before page.close
even I destroy page and I create a new one. Phantomjs still doing cache of objects, even HTML with no-cache headers ("name": "Cache-Control", "value": "no-store, no-cache, must-revalidate, post-check=0, pre-check=0, private" }, { "name": "Pragma", "value": "no-cache" })
How I tested it?
First opened, HTML size is around 19K, second opened HTML size is 4030 bytes, the answer is 200 OK, but I think really the answer is a 304. The HTML returned is cached by Cloudflare { "name": "CF-Cache-Status", "value": "HIT" } so there is not possibility to return different HTML for each call. And the date, the HTTP header from server is two second in after first request, then, a HTTP request really is made to server, but with etag or if-not-modified or something related.
Why does it say in the changelog
2015-01-23: Version 2.0.0
New features
* Implemented clearing of memory cache (issue 10357)
yet the method does not exist anywhere in the code and does not work? Where was it implemented and how do we use it? Its a very basic feature that many people requested.
@Vitallium Thanks. I see it there. I think the problem is that I'm using phantomjs-node and they didn't expose that API. I'll check with that developer.
Most helpful comment
I'm running into the same issue while using PhantomJS version 2.0.0. A workaround is to append a random request parameter to the URL, which forces a cache miss.