Hey there! I was wondering if anyone has had success using puppeteer within a serverless environment like AWS Lambda or Google Cloud Functions. I was curious of a few things:
Thanks!
Puppeteer requires Node version 7.10 or greater
... so you can't currently use it on Lambda, which supports Node 6.10.
To run headless Chrome on AWS Lambda this project might be a better match https://github.com/adieuadieu/serverless-chrome/tree/develop/packages/lambda. Uses Node v6 and has builds that does fit on the Lambda file size limits.
Should be possible. You just need to transpile the code to Node 6.10 before deploying to Lambda/Cloud Functions. We're also likely going to switch Chromeless to use Puppeteer which we hope will ease the Serverless-thing for the community.
To address @sean-hill specifically (based on experience with serverless-chrome and Chromeless):
@adieuadieu
We're also likely going to switch Chromeless to use Puppeteer which we hope will ease the Serverless-thing for the community.
This seems like the right move. Good to know. And the fonts do not seem to be an issue in Puppeteer from my ad-hoc testing so far. More on that when I'm able to dig in more.
Great guys, thanks for the feedback! Will close this for now until AWS Lambda becomes more awesome.
puppeteer now supports node 6
I'm curious how chrome's dependencies would be handled on Lambda, e.g. X11, fontconfig, etc?
anybody success in deploying to Lambda would like to share?
@deathemperor try this https://github.com/sambaiz/puppeteer-lambda-starter-kit
thanks @bluepeter, after asking the question I was able to find and deploy that repo for my use. It was quite hard at first because I've never deployed any Lambda app before.
Have you taken a look at chomeless? It's purpose built for running on AWS lambda, however it comes at the cost of a different API. The main contributor there has also written a fair deal on the subject. I also have a service that solves the problem of running Chrome in a production environment here.
I tried chromeless but it has an issue when trying to get html content described here https://github.com/graphcool/chromeless/issues/276. Right now I'm OK with puppeteer on Lambda.
@joelgriffith chromeless on Lambda wasn't equitable for us. it wasn't respecting the --proxy-server flags passed to it and that was a no go.
@deathemperor do you have any hints (or sharable code) on how to get puppeteer running on Lambda? I'm struggling with the puppeteer-lambda-starter-kit.
sure thing. What issue do you have? I'll try to help as much as I could.
Hi @deathemperor, I'm having similar issue with @jborden13
My requirement is to run chrome in AWS (I knew there is Docker and Heroku option, but I want to know if it possible with AWS) and connect it with puppeteer.connect(). Am I correct way if use the puppeteer-lambda-starter-kit?
Thanks!
Yes puppeteer-lambda-starter-kit is the way to go with. I'm scraping data with it on a daily basis now. It's kind of confusing to get started with at first if you don't know Lambda, and/or puppeteer.
I've forked the repo and make ready-to-deploy changes here https://github.com/deathemperor/puppeteer-lambda-starter-kit if you're interested.
Cool! Thanks @deathemperor
@deathemperor your starter-kit is great stuff - thank you! Question for you, have you ever run into the tmp directory filling up and throwing an error like:
Error: ENOSPC: no space left on device, mkdtemp '/tmp/puppeteer_dev_profile-XXXXXX'
I've tried clearing the tmp dir out like crazy, but it keeps coming back.
@jborden13 no I have not sorry.
@deathemperor
I'm curious about this. Now that AWS Lambda have doubled the potential memory available to functions, 3008 MB at last count, what's the time taken for Puppeteer to initialize? Speed is important for what I'm trying to achieve.
@tomgallagher the issue isn't memory, it's disk space. The error Error: ENOSPC: no space left on device, mkdtemp '/tmp/puppeteer_dev_profile-XXXXXX' occurs over time because user profile data is not released by the Chrome process. If you have light usage you may not see this error. You can also get rid of the error by forcing Lambda to provision new containers by deploying a new version of your function, even if there are technically no updates to the code. It's a hack, but that works.
Another alternative to Lambda, which is what we are doing at the moment, is to use AWS Batch on Spot instances. These can be very cost effective and don't have the limits that Lambda has.
We have found ways around Lambda space limits... at least we think so. Set all caches to zero, clear cache after use. We did this initially not for disk space issues, but because we didn't want a "polluted" Chrome instance when browsing a new site (i.e., interference w/ a prior user's cookies/cache/etc).
@toddwprice
on Lambda, the memory allocated is also a proxy for computing power, I believe. So more computing power = faster initialization of Puppeteer.
@tomgallagher true, true. I was referring to the out of disk space issue though not the startup time. I wasn't aware though that the memory limit is ~3GB now which is great to know.
@bluepeter do you have a reference to the API for managing the caching behavior, or an example? Either would be most appreciated.
ihave used this https://github.com/deathemperor/puppeteer-lambda-starter-kit, but doesn't seems to work on lambda. it freezes at page creation, any ideas how to solve this problem?
@kenorbi I've built this tool which I believe is simpler to use then the other solutions. Give it shot! I've gotten it to work with Lambda.
@sean-hill I'm new to lambda and tried to find a way to package your tool and run code but still fail...
You provide and "usage" example but where you plug that code on your tool structure ?
You have any sample ZIP file that I can see content that can be uploaded to Amazon Lambda and run ?
Thanks !
@pperron — Have a look at this guide by @nadeesha. It walks through how to run puppeteer on Lambda.
@adieuadieu Thanks for the link but still it would help a lot to have a ZIP example of a puppeteer lambdas function, that way we just need to open the ZIP package and see how the thing is done and fine tune it to our need. With your link, I still remain witht the question on how I build that ZIP file so it run on Amazon Lambdas.
Thanks !
Hi all,
Has anyone managed to make puppeteer and chronium work recently on AWS?
I have try:
https://github.com/sambaiz/puppeteer-lambda-starter-kit
and
https://github.com/deathemperor/puppeteer-lambda-starter-kit
My final code is:
https://github.com/sambaiz/puppeteer-lambda-starter-kit
Replace index.js:
https://github.com/sambaiz/puppeteer-lambda-starter-kit/blob/master/src/index.js
By:
https://github.com/deathemperor/puppeteer-lambda-starter-kit/blob/master/src/index.js
Also, i'm on windows 7 so to build the package I remove/change a lot of stuff on the package.json for scripts sections.
I have create package with and without babel and lint. Also, I have try with different version of puppeteer and chronium.
I always get this error on aws:
{
"errorMessage": "Failed to launch chrome! spawn /tmp/headless_shell ENOENT\n\n\nTROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md\n",
"errorType": "Error",
"stackTrace": [
"",
"",
"TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md",
"",
"onClose (/var/task/node_modules/puppeteer/lib/Launcher.js:299:14)",
"ChildProcess.helper.addEventListener.error (/var/task/node_modules/puppeteer/lib/Launcher.js:290:64)",
"emitOne (events.js:116:13)",
"ChildProcess.emit (events.js:211:7)",
"Process.ChildProcess._handle.onexit (internal/child_process.js:196:12)",
"onErrorNT (internal/child_process.js:372:16)",
"_combinedTickCallback (internal/process/next_tick.js:138:11)",
"process._tickDomainCallback (internal/process/next_tick.js:218:9)"
]
}
Config AWS:
I use "Upload a file from Amazon S3" option because it always finish by time out with the UI and same thing for CLI command.
Runtime: Node.js 8.10
Handler: index.handler
Executable role: lambda_basic_execution. I have also try with a custom role who have full access on lambda and S3 just in case.
TimeOut: 30 sec
Memory: 3008 mb.
If someone can guide me a little bit
thanks
The puppeteer-lambda-starter-kit project use a chrome binary specific for AWS lambda. The puppeteer version should match the chrome version.
@FrankTheCat, maybe the puppeteer version has been updated in your project. The package.lock does not lock the version(https://github.com/sambaiz/puppeteer-lambda-starter-kit/blob/master/package.json#L14).
Try to change to "puppeteer": "1.1.1"
@cirdes
"The puppeteer-lambda-starter-kit project use a chrome binary specific for AWS lambda. The puppeteer version should match the chrome version."
Yep, I have also try one time with: https://github.com/adieuadieu/serverless-chrome/releases
Version: stable-headless-chromium-68.0.3440.106-amazonlinux-2017-03.zip
I make the tar.gz with 7-Zip and tutorial found on stackoverflow and replace the content of headless_shell.tar.gz with the right filename.
Tutorial( not exacly the one I found but ):
https://www.techwalla.com/articles/how-to-create-a-tgz-file-in-windows
"@FrankTheCat, maybe the puppeteer version has been updated in your project. The package.lock does not lock the version(https://github.com/sambaiz/puppeteer-lambda-starter-kit/blob/master/package.json#L14).
Try to change to "puppeteer": "1.1.1"
"
Thanks for the hint but unfortunately same error.
I have try with original index.js and the deathemperor index.js.
If someone want try with my package with puppeteer version 1.1.1:
https://www.dropbox.com/s/diqi8gtvv40ez8i/sambaiz-original-puppeteer-1.1.1.zip?dl=0
https://www.dropbox.com/s/pyurmg3peiw7hya/sambaiz-original-puppeteer-1.1.1-with-index-change.zip?dl=0
Thanks
Thanks all for your great works :)
I finally managed to deploy the sambaiz package. Also I updated the chronium to the lasted stable version( HeadlessChrome/68.0.3440.106 ) and last version of puppeteer ( 1.7.0 ).
https://www.dropbox.com/s/p4t7zod2nf97cwn/sambaiz-puppeteer.zip?dl=0
If you want to build your own package and you are on windows you can:
{
"name": "puppeteer-lambda-starter-kit",
"version": "1.1.2",
"description": "Starter Kit for running Headless-Chrome by Puppeteer on AWS Lambda",
"scripts": {
"package": "npm run package-prepare",
"package-prepare": "npm run babel && copy package.json dist && cd dist && npm config set puppeteer_skip_chromium_download true -g && npm install --production",
"babel": "mkdir dist && \"./node_modules/.bin/babel\" src --out-dir dist",
"local": "npm run babel && copy node_modules dist && node dist/starter-kit/local.js",
"package-nochrome": "npm run package-prepare && cd dist && zip -rq ../package.zip ."
},
"dependencies": {
"babel": "^6.23.0",
"puppeteer": "^1.1.1",
"tar": "^4.0.1"
},
"devDependencies": {
"aws-sdk": "^2.111.0",
"babel-cli": "^6.26.0",
"babel-preset-env": "^1.6.0"
}
}
@FrankTheCat
Hello
I try to run your project on aws lambda.
but when I use page.type function in my script it occured error.
"errorMessage":"Protocol error (Input.insertText): 'Input.insertText' wasn't found","errorType":"Error","stackTrace":["Promise (/var/task/node_modules/puppeteer/lib/Connection.js:202:56)","new Promise (
Have you had kind of this error ?
If you have, It could be a lot help.
Thx.
@yunho2141
I don't use page.type on my project. What version of Chromium you use ?
Also after I finally managed to deploy the sambaiz package I found the project have new branch( Puppeteer 1.7.0 ): https://github.com/sambaiz/puppeteer-lambda-starter-kit/tree/puppeteer1.7.0 .
If you use the last version of puppeteer and not the last version of Chromium maybe use the chromium version on puppeteer1,7,0 branch.
@FrankTheCat
Do you only change headless_shell file on puppeteer 1.7.0 branch?
when I run 'npm run package' it over 50MB..
so I'm trying to scale down...
Could you give me help?
@yunho2141
I didn't use the 1.7.0 branch. I only figure out this branch is available after I finally managed to deploy the master branch. Also I manually gzip the last stable version from https://github.com/adieuadieu/serverless-chrome/releases
On the branch 1.7.0 it use probably lastest dev version.
So "Do you only change headless_shell file on puppeteer 1.7.0 branch?":
Once you have make the package ( zip of the content of dist ) it's will be smaller than 50mb. My package are like 43mb, If it's higher maybe you have not exclude babel from dist folder when you create the dist folder or you have 2 time the chromium gzip.
"when I run 'npm run package' it over 50MB.."
Do you use the package.json from the branch or mine ? Which OS you use ?
Hey guys I am using Puppeteer Lambda. Which is good to use chrome in Lambda.
You need to run
CUSTOM_CHROME=true npm install puppeteer-lambda
then set environment :
CUSTOM_CHROME: true
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD: true
access key, secret access key
const browser = await puppeteerLambda.getBrowser({ headless: true, slowMo: 100, args: ['--no-sandbox', '--disable-setuid-sandbox', '--single-process', '--start-fullscreen', '--window-size=1413,749']});
very good helpful package
everything works good...til I have a problem with:
await page.goto(url);
I got stuck with timed out error .. it wont connect to url
Hey guys I am using Puppeteer Lambda. Which is good to use chrome in Lambda.
You need to runCUSTOM_CHROME=true npm install puppeteer-lambda
then set environment :
CUSTOM_CHROME: true
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD: true
access key, secret access keyconst browser = await puppeteerLambda.getBrowser({ headless: true, slowMo: 100, args: ['--no-sandbox', '--disable-setuid-sandbox', '--single-process', '--start-fullscreen', '--window-size=1413,749']});
very good helpful package
everything works good...til I have a problem with:
await page.goto(url);
I got stuck with timed out error .. it wont connect to url
I'm facing the same issue. got "Navigation Timeout Exceeded: 30000ms exceeded"
@kenorbi I've built this tool which I believe is simpler to use then the other solutions. Give it shot! I've gotten it to work with Lambda.
@kenorbi Could you explain how to use your cleanup function in an actual puppeteer script? Thanks!
@kenorbi do you have docs on how the deployment should be done?
@aigars-jekabsons
Sharing this here in case it's of use to anyone: https://github.com/alixaxel/chrome-aws-lambda
Currently working on compatibility with the new Node 10 Lambda runtime.
Hey guys I am using Puppeteer Lambda. Which is good to use chrome in Lambda.
You need to runCUSTOM_CHROME=true npm install puppeteer-lambda
then set environment :
CUSTOM_CHROME: true
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD: true
access key, secret access keyconst browser = await puppeteerLambda.getBrowser({ headless: true, slowMo: 100, args: ['--no-sandbox', '--disable-setuid-sandbox', '--single-process', '--start-fullscreen', '--window-size=1413,749']});
very good helpful package
everything works good...til I have a problem with:
await page.goto(url);
I got stuck with timed out error .. it wont connect to urlI'm facing the same issue. got "Navigation Timeout Exceeded: 30000ms exceeded"
I'm facing the same issue. Anyone has already the solution?
@alixaxel Experiencing the same issue with chrome-aws-lambda.
node 8
@kjr247 Which issue exactly? This thread is long and not specifically related to my project.
Perhaps it would be better if you open an issue on https://github.com/alixaxel/chrome-aws-lambda.
Hello,
Can anyone explain to me why these examples of puppeteers in lambda have an option to generate a screenshot?
As I understand it, we will use the puppeteer in lambda just to do something (take a screenshot, create a pdf) and return the content.
Why do these examples have the option of generating a file inside lambda, whereas in AWS Lambda has no option to create local files, only in the / tmp folder?
@saki85 I guess you could upload from /tmp to somewhere like S3.
Most helpful comment
puppeteer now supports node 6