I'm curious to know what changes there are between running as headless true vs false. When I run a login to Amazon using headless: true I get an error from Amazon via the screenshot. But when I set headless: false I watch it work just fine, no error.
So I'm trying to figure out what headless: true is doing that is different from when it's not headless.
Thanks to any suggestions.
There could be any number of things going on. They could be looking for the Headless added to the UA string and blocking that. Or they could be using some techniques to detect automated access and prevent it.
If it works in non-headless and fails in headless then the site itself is doing something to prevent automated access. So you'd need to figure out what that is and work around it or move on. Some things are easy to get around (like modifying the UA string) while others are non-trivial to bypass.
I am also facing the same issue.
When Headless is false
page url ===> http://lvh.me:3000/dashboard
When Headless is true
page url ===> about:blank
(node:29206) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1)
Can anyone provide an actual example file to run that reproduces this issue?
I will try to find something public that I can post. My example is confidential so I can't share it.
@Garbee just FYI, I'm setting the UA, so I don't think that's it. And I'm performing things like delays and mouse movement etc. Since the only difference is the headless: true it leads me to believe that there is something going on in the lib, and not on the site that I'm scraping. But I will keep trying and hopefully will find an example to post.
Are there other kinds of debugging maybe that can help point to where an issue might be?
@Garbee Here is the code. This happens only for localhost if I give the actual website URL (http://www.google.com... etc) it is working for both options.
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto('localhost:3000', {
networkIdleTimeout: 1000,
waitUntil: 'networkidle',
timeout: 3000000
});
console.log(page.url());
Output:
about:blank
Expected output:
localhost:3000
If headless is false I am getting the expected output.
I'll thicken the plot. I've started debugging the POST requests to my amazon login. When headless is set to true, Amazon is making an additional POST request that I don't recognize. That doesn't exist when headless is set to false. So that says to me something else is changing with this setting that I don't yet know.
I've also inspected the request and response for both headless and non-headless. They seem to be identical in nature.
In non-headless mode, screenshots work differently because my screen is in HiDPI mode (MacBook Retina). Here's one of the 'different' screenshots:

Remember the protocol is required for urls in goto.
@LoganDark that is a different issue completely. Please file your own for triage and discussion.
Different issue? Well, I didn't know that because of the title.
Reading the issue description as well, nothing stands out to me that would make my issue completely different. Here are the parts that made me think my issue did belong here:
I'm curious to know what changes there are between running as headless true vs false.
So I'm trying to figure out what headless: true is doing that is different from when it's not headless.
@Garbee Yes giving the protocol in goto solves the issue.
await page.goto('http://localhost:3000', {
networkIdleTimeout: 1000,
waitUntil: 'networkidle',
timeout: 3000000
});
console.log(page.url)
If I don't give the protocol for google.com, am getting an error Error: Protocol error (Page.navigate): Cannot navigate to invalid URL undefined whereas for the above case I am getting about:blank. The error handling it done differently for localhosts.. Shouldn't it be giving the protocol error?
await page.goto('www.google.com', {
networkIdleTimeout: 1000,
waitUntil: 'networkidle',
timeout: 3000000
});
console.log(page.url)
@LoganDark Sorry about the poorly worded title for the issue. There is nothing I can do about that. Your issue is with screenshot functionality while this was opened about some navigational problems. They are entirely distinct separated issues. Therefore a new issue is required to focus on your problem.
@kaushik-sundar Throwing an error for missing the protocol is a good idea IMO. I'll need to look into it though as it could be non-trivial to setup well due to the number of allowed protocols.
My apologies on the title, but I do agree that protocol issue is separate. My issue is more related to something about the request from the browser is different when headless is on vs off, causing the site in question to act differently.
Here is a gist of the problem. With params.isHeadless as false the browser opens and the form successfully logs in, whereas with it false I get an auth error page (which I actually _cannot_ replicate through normal means no matter what kinds of correct/incorrect credential permutations I try to use).
Since the problem is behind an auth wall (or rather, the act of authenticating itself) I cannot share the _exact_ code with my own credentials. However if you have or create your own vendorcentral account you should be able to see this behavior.
I wrote the code in such a way that it works for some other services as well, such as imgur. For this, just change params.url (to https://imgur.com/signin for example). It works on Imgur, which implies that Amazon _is_ doing something explicit, however we have been as of yet unable to determine what that is, because as @optikalefx has said we have tried sporadic mouse movement, delayed typing, etc.
Note: I'll open another unrelated issue for this eventually as I need to do more research and experimentation, but I found that
page.press('Enter')does not actually press the enter key. At least for me and my environment.
but I found that
page.press('Enter')does not actually press the enter key
Try page.press('Return') as well..?
@LoganDark That didn't work either. I probably shouldn't have brought it up here at all, completely unrelated. Let's ignore it.
I'm curious to know what changes there are between running as headless true vs false.
@optikalefx The major change is a user agent - chrome headless identifies itself as HeadlessChrome. Try running the following script in headless and headful modes:
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
console.log(await page.evaluate(() => navigator.userAgent));
browser.close();
})();
User agent is sent with every request as a user-agent header. If there's a need, user-agent could be changed with the page.setUserAgent method.
In non-headless mode, screenshots work differently because my screen is in HiDPI mode (MacBook Retina). Here's one of the 'different' screenshots:
@LoganDark please, file a separate issue.
Here is a gist of the problem.
@rosshadden try overriding user-agent in your gist. If this doesn't help, please file a separate issue.
From @Garbee:
@LoganDark that is a different issue completely. Please file your own for triage and discussion.
From @Garbee again:
Therefore a new issue is required to focus on your problem.
From @aslushnikov
@LoganDark please, file a separate issue.
Yeah, 3 times already I've been told to file a different issue.
I haven't. And I won't right now.
Stop telling me to.
@aslushnikov we need to re-open this ticket IMO. I'm sorry that this issue had unrelated things in it. Setting the user-agent doesn't change anything - as in something is still different about the request. The result of that user-agent log after it's set is exactly what I set it to.
Can you think of anything else that changes when headless is set to true? Something that Amazon is able to detect? Maybe something about cookies? Maybe you could guide me in the right direction in the code and I can look through myself. Being unfamiliar with the codebase would make having a quick guidance very helpful.
There are a few ways Amazon can be detecting headless access. Nothing can really be done internally about them if Amazon is implementing any techniques like this.
The only primary difference is the Headless in the UA string. Beyond that, everything should be functioning the same from the user perspective of headless, as stated before.
@Garbee super interesting. So, why can't we just define things like language, plugins etc? I can't set things on navigator, but I can polyfill other methods to prevent detection. Maybe you guys can set the navigator settings?
It looks like I can polyfill navigator using
Object.defineProperties(navigator, {
'plugins': {
value: ['adBlock'],
writable: true
}
});
Well I polyfilled everything in that article, and it passes all of those tests after the goto statement. But it still is getting caught. quite interesting.
@aslushnikov While my gist doesn't have a UA set, setting it was the first thing @optikalefx tried when we discovered this problem. What I can do is update my gist with setting the UA and the polyfills/workarounds we have tried since.
@optikalefx @rosshadden Chrome headless is built atop of content/ layer and doesn't include chrome/ layer, whereas chrome headful includes both content/ and chrome/ layers. So naturally, there might be multiple subtle ways to detect headless.
More on chromium architecture could be found here:
As mentioned in the article @Garbee posted the headless version does not have languages set on the navigator object.
Note also that the headless version will not have languages set in its Accept-Language Header. Some sites (ASP.NET in my experience) require this header to be set. Other sites are looking for this header specifically to identify headless browsers.
I copied the value from an example request generated by my normal chrome install. There is probably a more minimal setting for this header that works.
await page.setExtraHTTPHeaders({
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
});
@koreus7 - Solution worked for Amazon issue reported by @optikalefx
Full code to Scrap Amazon behind Login Wall - Optimized and Works in Headless Mode (Avoid BOT detection)
// Get addressess from Amazon Address Book
const puppeteer = require('puppeteer');
(async () => {
// Syntactic Sugar
const Navigate = async (url) => {
await page.goto(url);
}
const EnterText = async (selector, text) => {
await page.click(selector);
await page.keyboard.type(text);
}
const ClickNavigate = async (selector, waitFor = -1) => {
await page.click(selector);
if (waitFor >= 0) {
await page.waitFor(waitFor*1000)
}
else {
await page.waitForNavigation();
}
}
// Main Flow
const C_HEADELESS = true
const C_OPTIMIZE = true
const C_SLOWMOTION = 0 // slow down by X ms
const browser = await puppeteer.launch({
headless: C_HEADELESS,
slowMo: C_SLOWMOTION
});
const page = await browser.newPage();
// To ensure Amazon doesn't detect it as a Bot
await page.setExtraHTTPHeaders({
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
});
// No unwanted resources
if (C_OPTIMIZE) {
await page.setRequestInterception(true);
const block_ressources = ['image', 'stylesheet', 'media', 'font', 'texttrack', 'object', 'beacon', 'csp_report', 'imageset'];
page.on('request', request => {
//if (request.resourceType() === 'image')
if (block_ressources.indexOf(request.resourceType) > 0)
request.abort();
else
request.continue();
});
}
// Creds
const USER_EMAIL = "YOUR_EMAIL_HERE"
const USER_PASSWORD = "YOUR_PASSWORD_HERE"
// Home Page constants
const U_HOMEPAGE = 'https://amazon.com'
const U_LOGIN_PAGE = 'https://www.amazon.com/ap/signin?clientContext=135-8638983-8261231&openid.return_to=https%3A%2F%2Fwww.amazon.com%2Fa%2Faddresses&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=usflex&openid.mode=checkid_setup&marketPlaceId=ATVPDKIKX0DER&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&pageId=usflex&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&openid.pape.max_auth_age=900&siteState=clientContext%3D143-3525329-4850620%2CsourceUrl%3Dhttps%253A%252F%252Fwww.amazon.com%252Fa%252Faddresses%2Csignature%3Dnull'
const S_LOGIN_LINK = '#nav-link-accountList'
// Optimzed the flow to reach address book faster, trick is to manually try to go to Target page before login and will be hit
// by the Amazon Login Wall, capture the URL which will now have return page set to openid.return_to field in the url
// This helps to land on the target page direclty after login without having to browse through heavy Home page
// Caution: Trying to go to Address Book directly (any page with sensitive information) will challenge the user with additional password screen.
// Commented, since this is now optimized
// ------------------------------------------
// // Go to Home Page
// await Navigate(U_HOMEPAGE)
//
// // Go to Login Page
// await ClickNavigate(S_LOGIN_LINK, 1)
// ------------------------------------------
// Go directly to Login Page
await Navigate(U_LOGIN_PAGE) // USER-ACTION
// Login Page constants
const S_EMAIL_TEXT = '#ap_email'
const S_CONTINUE_BUTTON = '#continue'
const S_PASSWORD_TEXT = '#ap_password'
const S_SIGNIN_BUTTON = '#signInSubmit'
// Login - Step 1
await EnterText(S_EMAIL_TEXT, USER_EMAIL); // USER-ACTION
await ClickNavigate(S_CONTINUE_BUTTON); // USER-ACTION
// Login - Step 2
await EnterText(S_PASSWORD_TEXT, USER_PASSWORD); // USER-ACTION
await ClickNavigate(S_SIGNIN_BUTTON); // USER-ACTION
// Enter password again - Secondary Protection - This is required only if you try to land on the page with sensitive information directly
await EnterText(S_PASSWORD_TEXT, USER_PASSWORD); // USER-ACTION
await ClickNavigate(S_SIGNIN_BUTTON); // USER-ACTION
// AddressBook constants
const U_ADDRESSBOOK = 'https://www.amazon.com/a/addresses'
const S_ADDRESS_TILE = '.normal-desktop-address-tile'
const S_ADDRESS_FULLNAME = '#address-ui-widgets-FullName'
const S_ADDRESS_LINEONE = '#address-ui-widgets-AddressLineOne'
const S_ADDRESS_LINETWO = '#address-ui-widgets-AddressLineTwo'
const S_ADDRESS_CITYSTATEPOSTALCODE ='#address-ui-widgets-CityStatePostalCode'
const S_ADDRESS_COUNTRY = '#address-ui-widgets-Country'
const S_ADDRESS_PHONENUMBER = '#address-ui-widgets-PhoneNumber'
const S_ADDRESS_NODEFAULT = '.address-section-no-default'
const S_ADDRESS_DEFAULT = '.default-section'
const S_ADDRESS_DEFAULT_FRESH = '#ya-myab-fresh-address-icon'
const S_ADDRESS_DEFAULT_AMAZON = '#ya-myab-default-shipping-address-icon'
// Commented, since this is now optimized
// ------------------------------------------
// // Go to AddressBook
// await Navigate(U_ADDRESSBOOK)
// ------------------------------------------
// Get All Addresses
const allAddressElements = await page.$$(S_ADDRESS_TILE);
const getAddresses = allAddressElements.map(async (addressElement) => {
let defaultAddressforAmazon = false
let defaultAddressforFresh = false
const defaultAddressElement = await addressElement.$(S_ADDRESS_DEFAULT)
if (defaultAddressElement !== null) {
const defaultAddressForAmazonElement = await defaultAddressElement.$(S_ADDRESS_DEFAULT_AMAZON)
defaultAddressforAmazon = defaultAddressForAmazonElement ? true: false
const defaultAddressForFreshElement = await defaultAddressElement.$(S_ADDRESS_DEFAULT_FRESH)
defaultAddressforFresh = defaultAddressForFreshElement ? true: false
}
const fullNameElement = await addressElement.$(S_ADDRESS_FULLNAME)
const fullName = await (await fullNameElement.getProperty('innerHTML')).jsonValue();
const addressLineOneElement = await addressElement.$(S_ADDRESS_LINEONE)
const addressLineOne = await (await addressLineOneElement.getProperty('innerHTML')).jsonValue();
const addressLineTwoElement = await addressElement.$(S_ADDRESS_LINETWO)
const addressLineTwo = addressLineTwoElement ? await (await addressLineTwoElement.getProperty('innerHTML')).jsonValue() : '';
const cityStatePostalCodeElement = await addressElement.$(S_ADDRESS_CITYSTATEPOSTALCODE)
const cityStatePostalCode = await (await cityStatePostalCodeElement.getProperty('innerHTML')).jsonValue();
const countryElement = await addressElement.$(S_ADDRESS_COUNTRY)
const country = await (await countryElement.getProperty('innerHTML')).jsonValue();
const phoneNumberElement = await addressElement.$(S_ADDRESS_PHONENUMBER)
let phoneNumber = await (await phoneNumberElement.getProperty('innerHTML')).jsonValue();
phoneNumber = phoneNumber.split(':')
phoneNumber = phoneNumber[1].trim()
return {
FullName: fullName,
AddressLineOne: addressLineOne,
AddressLineTwo: addressLineTwo,
CityStatePostalCode: cityStatePostalCode,
Country: country,
PhoneNumber: phoneNumber,
DefaultAddressforAmazon: defaultAddressforAmazon,
DefaultAddressforFresh: defaultAddressforFresh
}
});
let addresses = await Promise.all(getAddresses)
console.log(addresses)
await browser.close();
})();
This is an absolute pearl. Thanks for sharing the code above.
I would also like to add, for our implementation, we turned on 2FA, and will keep it on. We have setup a number with Twilio or a Twilio like service to receive the SMS code, and then our login script receives that code from Twilio to enter into the 2FA. We require this b/c sometimes Amazon asks for it, and rather than a re-try sometimes code, we just always assume 2fa.
For what it's worth I've also found that adding the following user agents override can help smooth over differences in some cases:
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36')
The UA I've provided is just an example. You can use any valid UA that matches an existing browser.
I noticed another difference, when in non-headless mode the address seems to change localhost to 127.0.0.1 which means it's difficult to assert on the URL.
as @jondlm said, UserAgent option make headless selenium work do same with non-headless selenium. thx.
@koreus7 setting the languages works like a charm!
I get it works by adding this 2
await page.setExtraHTTPHeaders({
'Accept-Language': 'en-US,en;q=0.9'
});
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36');
Must thanks to @koreus7 & @jondlm , it won't if miss out any 1 of it.
P/S: I was trying to access this site www.blibli.com
I've made a fake user agent generator that works pretty fine!
function* generateUserAgent() {
let webkitVersion = 10;
let chromeVersion = 1000;
const so = [
'Windows NT 6.1; WOW64',
'Windows NT 6.2; Win64; x64',
"Windows NT 5.1; Win64; x64",
'Macintosh; Intel Mac OS X 10_12_6',
"X11; Linux x86_64",
"X11; Linux armv7l"
];
let soIndex = Math.floor(Math.random() * so.length);
while (true) {
yield `Mozilla/5.0 (${so[soIndex++ % so.length]}) AppleWebKit/537.${webkitVersion} (KHTML, like Gecko) Chrome/56.0.${chromeVersion}.87 Safari/537.${webkitVersion} OPR/43.0.2442.991`;
webkitVersion++;
chromeVersion++;
}
}
const userAgents = generateUserAgent();
// ...
await page.setUserAgent(userAgents.next().value);
So headless true/false change user agent and other stuffs?
i have two different test that works on headless:false mode but fails on headless:true mode due to rendering differences of fonts and due to time needed to make a button clickable, but i cannot share due to confindential website.
I think headless true/false should not change rendering process.
Should i consider to set a common user agent to make behaviour more consistent?
thanks.
My case is completely the opposite of the OP's situation. I got an Amazon's robot check while headless mode:false, and bypass while headless mode:true. I solved this issue thanks to @koreus7 Many thanks 馃憤
Using @koreus7 and @jondlm comments solved my problem
Recently, I had the same experience of getting blocked because of using headless browser. While scraping a popular website. Even after adding proper headers and user agent it didn't work out.
Finally used puppeteer-extra with stealth mode plugin which fixed the problem.
This thread helped me a lot to figure out what all could go wrong.
Thanks @Garbee @optikalefx
@Bhabaranjan19966 so this https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra with this https://www.npmjs.com/package/puppeteer-extra-plugin-stealth ? i will try, thanks.
not working for me : headless and gui mode renders page in a little different way

@Bhabaranjan19966 so this https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra with this https://www.npmjs.com/package/puppeteer-extra-plugin-stealth ? i will try, thanks.
Yes, those are the two repositories fixed my problem. @andreabisello
I'm having this same issue with peapod.com right now. In headful mode, my program runs successfully. In headless mode, I'm screenshotting to debug and see that the link is clicked, spinner is activated, but the page never changes. How can I debug this better? @aslushnikov , could you provide me some guidance?
Recently, I had the same experience of getting blocked because of using headless browser. While scraping a popular website. Even after adding proper headers and user agent it didn't work out.
Finally used puppeteer-extra with stealth mode plugin which fixed the problem.
This thread helped me a lot to figure out what all could go wrong.
Thanks @Garbee @optikalefx
The stealth mode did the trick for me too! TYVM
None of these suggested solutions work on Mac OS X. To reproduce:
en-US or en, so that applications use that locale.en-US on non-headless mode at least.What I am trying to do, is setup testing with Puppeteer for my browser extension Spellbook.
I have the first test now passing on Mac OS X (using some Finnish strings), and it is probably failing on other systems when you do yarn run test:puppeteer, because I use every method of setting the locale: https://github.com/peterhil/spellbook/commit/3480a73ed841f81cfac1ab99137820ea2aa5b6d6
For what it's worth I've also found that adding the following user agents override can help smooth over differences in some cases:
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36')The UA I've provided is just an example. You can use any valid UA that matches an existing browser.
Add this just below where page is defined.
Most helpful comment
As mentioned in the article @Garbee posted the headless version does not have languages set on the navigator object.
Note also that the headless version will not have languages set in its Accept-Language Header. Some sites (ASP.NET in my experience) require this header to be set. Other sites are looking for this header specifically to identify headless browsers.
I copied the value from an example request generated by my normal chrome install. There is probably a more minimal setting for this header that works.