Puppeteer: Deploying on AWS Lambda

Created on 17 Aug 2017 · 49Comments · Source: puppeteer/puppeteer

Hey there! I was wondering if anyone has had success using puppeteer within a serverless environment like AWS Lambda or Google Cloud Functions. I was curious of a few things:

How would downloading Headless Chrome work in a serverless environment?
How long does it roughly take to download and start navigating to a page?

Thanks!

Source

sean-hill

👍21

Most helpful comment

puppeteer now supports node 6

elyobo on 19 Sep 2017

🎉26 👍15 ❤5

All 49 comments

Puppeteer requires Node version 7.10 or greater

... so you can't currently use it on Lambda, which supports Node 6.10.

bluepeter on 17 Aug 2017

To run headless Chrome on AWS Lambda this project might be a better match https://github.com/adieuadieu/serverless-chrome/tree/develop/packages/lambda. Uses Node v6 and has builds that does fit on the Lambda file size limits.

riston on 17 Aug 2017

👍5

Should be possible. You just need to transpile the code to Node 6.10 before deploying to Lambda/Cloud Functions. We're also likely going to switch Chromeless to use Puppeteer which we hope will ease the Serverless-thing for the community.

To address @sean-hill specifically (based on experience with serverless-chrome and Chromeless):

Usually you package all of your resources before deploying them to the "serverless" service. It differs a bit how this works between Lambda and Cloud Functions
Given a Lambda Function with 1536MB of memory, there's about 500-600ms overhead before you can start interaction with headless Chrome.

adieuadieu on 17 Aug 2017

👍15

@adieuadieu

We're also likely going to switch Chromeless to use Puppeteer which we hope will ease the Serverless-thing for the community.

This seems like the right move. Good to know. And the fonts do not seem to be an issue in Puppeteer from my ad-hoc testing so far. More on that when I'm able to dig in more.

toddwprice on 18 Aug 2017

Great guys, thanks for the feedback! Will close this for now until AWS Lambda becomes more awesome.

sean-hill on 18 Aug 2017

puppeteer now supports node 6

elyobo on 19 Sep 2017

🎉26 👍15 ❤5

I'm curious how chrome's dependencies would be handled on Lambda, e.g. X11, fontconfig, etc?

eblanshey on 24 Oct 2017

anybody success in deploying to Lambda would like to share?

deathemperor on 13 Nov 2017

@deathemperor try this https://github.com/sambaiz/puppeteer-lambda-starter-kit

bluepeter on 13 Nov 2017

👍2 🎉1

thanks @bluepeter, after asking the question I was able to find and deploy that repo for my use. It was quite hard at first because I've never deployed any Lambda app before.

deathemperor on 14 Nov 2017

Have you taken a look at chomeless? It's purpose built for running on AWS lambda, however it comes at the cost of a different API. The main contributor there has also written a fair deal on the subject. I also have a service that solves the problem of running Chrome in a production environment here.

joelgriffith on 14 Nov 2017

I tried chromeless but it has an issue when trying to get html content described here https://github.com/graphcool/chromeless/issues/276. Right now I'm OK with puppeteer on Lambda.

deathemperor on 15 Nov 2017

👍2 ❤1

@joelgriffith chromeless on Lambda wasn't equitable for us. it wasn't respecting the --proxy-server flags passed to it and that was a no go.

jborden13 on 18 Nov 2017

@deathemperor do you have any hints (or sharable code) on how to get puppeteer running on Lambda? I'm struggling with the puppeteer-lambda-starter-kit.

jborden13 on 19 Nov 2017

👍1

sure thing. What issue do you have? I'll try to help as much as I could.

deathemperor on 20 Nov 2017

Hi @deathemperor, I'm having similar issue with @jborden13
My requirement is to run chrome in AWS (I knew there is Docker and Heroku option, but I want to know if it possible with AWS) and connect it with puppeteer.connect(). Am I correct way if use the puppeteer-lambda-starter-kit?
Thanks!

adarmanto on 27 Nov 2017

Yes puppeteer-lambda-starter-kit is the way to go with. I'm scraping data with it on a daily basis now. It's kind of confusing to get started with at first if you don't know Lambda, and/or puppeteer.

I've forked the repo and make ready-to-deploy changes here https://github.com/deathemperor/puppeteer-lambda-starter-kit if you're interested.

deathemperor on 28 Nov 2017

👍5

Cool! Thanks @deathemperor

adarmanto on 28 Nov 2017

@deathemperor your starter-kit is great stuff - thank you! Question for you, have you ever run into the tmp directory filling up and throwing an error like:

Error: ENOSPC: no space left on device, mkdtemp '/tmp/puppeteer_dev_profile-XXXXXX'

I've tried clearing the tmp dir out like crazy, but it keeps coming back.

jborden13 on 10 Dec 2017

👍4

@jborden13 no I have not sorry.

deathemperor on 12 Dec 2017

@deathemperor
I'm curious about this. Now that AWS Lambda have doubled the potential memory available to functions, 3008 MB at last count, what's the time taken for Puppeteer to initialize? Speed is important for what I'm trying to achieve.

tomgallagher on 18 Mar 2018

@tomgallagher the issue isn't memory, it's disk space. The error Error: ENOSPC: no space left on device, mkdtemp '/tmp/puppeteer_dev_profile-XXXXXX' occurs over time because user profile data is not released by the Chrome process. If you have light usage you may not see this error. You can also get rid of the error by forcing Lambda to provision new containers by deploying a new version of your function, even if there are technically no updates to the code. It's a hack, but that works.

toddwprice on 18 Mar 2018

Another alternative to Lambda, which is what we are doing at the moment, is to use AWS Batch on Spot instances. These can be very cost effective and don't have the limits that Lambda has.

toddwprice on 18 Mar 2018

We have found ways around Lambda space limits... at least we think so. Set all caches to zero, clear cache after use. We did this initially not for disk space issues, but because we didn't want a "polluted" Chrome instance when browsing a new site (i.e., interference w/ a prior user's cookies/cache/etc).

bluepeter on 18 Mar 2018

@toddwprice
on Lambda, the memory allocated is also a proxy for computing power, I believe. So more computing power = faster initialization of Puppeteer.

tomgallagher on 18 Mar 2018

@tomgallagher true, true. I was referring to the out of disk space issue though not the startup time. I wasn't aware though that the memory limit is ~3GB now which is great to know.

@bluepeter do you have a reference to the API for managing the caching behavior, or an example? Either would be most appreciated.

toddwprice on 18 Mar 2018

ihave used this https://github.com/deathemperor/puppeteer-lambda-starter-kit, but doesn't seems to work on lambda. it freezes at page creation, any ideas how to solve this problem?

kenorbi on 13 Jun 2018

@kenorbi I've built this tool which I believe is simpler to use then the other solutions. Give it shot! I've gotten it to work with Lambda.

sean-hill on 27 Jun 2018

👍2

@sean-hill I'm new to lambda and tried to find a way to package your tool and run code but still fail...
You provide and "usage" example but where you plug that code on your tool structure ?
You have any sample ZIP file that I can see content that can be uploaded to Amazon Lambda and run ?

Thanks !

pperron on 11 Jul 2018

@pperron — Have a look at this guide by @nadeesha. It walks through how to run puppeteer on Lambda.

adieuadieu on 12 Jul 2018

👍1

@adieuadieu Thanks for the link but still it would help a lot to have a ZIP example of a puppeteer lambdas function, that way we just need to open the ZIP package and see how the thing is done and fine tune it to our need. With your link, I still remain witht the question on how I build that ZIP file so it run on Amazon Lambdas.

Thanks !

pperron on 16 Jul 2018

👍1

Hi all,

Has anyone managed to make puppeteer and chronium work recently on AWS?

I have try:
https://github.com/sambaiz/puppeteer-lambda-starter-kit
and
https://github.com/deathemperor/puppeteer-lambda-starter-kit

My final code is:
https://github.com/sambaiz/puppeteer-lambda-starter-kit
Replace index.js:
https://github.com/sambaiz/puppeteer-lambda-starter-kit/blob/master/src/index.js
By:
https://github.com/deathemperor/puppeteer-lambda-starter-kit/blob/master/src/index.js

Also, i'm on windows 7 so to build the package I remove/change a lot of stuff on the package.json for scripts sections.
I have create package with and without babel and lint. Also, I have try with different version of puppeteer and chronium.

I always get this error on aws:
{
"errorMessage": "Failed to launch chrome! spawn /tmp/headless_shell ENOENT\n\n\nTROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md\n",
"errorType": "Error",
"stackTrace": [
"",
"",
"TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md",
"",
"onClose (/var/task/node_modules/puppeteer/lib/Launcher.js:299:14)",
"ChildProcess.helper.addEventListener.error (/var/task/node_modules/puppeteer/lib/Launcher.js:290:64)",
"emitOne (events.js:116:13)",
"ChildProcess.emit (events.js:211:7)",
"Process.ChildProcess._handle.onexit (internal/child_process.js:196:12)",
"onErrorNT (internal/child_process.js:372:16)",
"_combinedTickCallback (internal/process/next_tick.js:138:11)",
"process._tickDomainCallback (internal/process/next_tick.js:218:9)"
]
}

Config AWS:
I use "Upload a file from Amazon S3" option because it always finish by time out with the UI and same thing for CLI command.
Runtime: Node.js 8.10
Handler: index.handler
Executable role: lambda_basic_execution. I have also try with a custom role who have full access on lambda and S3 just in case.
TimeOut: 30 sec
Memory: 3008 mb.

If someone can guide me a little bit

thanks

KeeperFrancis on 17 Aug 2018

The puppeteer-lambda-starter-kit project use a chrome binary specific for AWS lambda. The puppeteer version should match the chrome version.

@FrankTheCat, maybe the puppeteer version has been updated in your project. The package.lock does not lock the version(https://github.com/sambaiz/puppeteer-lambda-starter-kit/blob/master/package.json#L14).

Try to change to "puppeteer": "1.1.1"

cirdes on 17 Aug 2018

@cirdes

"The puppeteer-lambda-starter-kit project use a chrome binary specific for AWS lambda. The puppeteer version should match the chrome version."
Yep, I have also try one time with: https://github.com/adieuadieu/serverless-chrome/releases
Version: stable-headless-chromium-68.0.3440.106-amazonlinux-2017-03.zip
I make the tar.gz with 7-Zip and tutorial found on stackoverflow and replace the content of headless_shell.tar.gz with the right filename.

Tutorial( not exacly the one I found but ):
https://www.techwalla.com/articles/how-to-create-a-tgz-file-in-windows

"@FrankTheCat, maybe the puppeteer version has been updated in your project. The package.lock does not lock the version(https://github.com/sambaiz/puppeteer-lambda-starter-kit/blob/master/package.json#L14).

Try to change to "puppeteer": "1.1.1"
"

Thanks for the hint but unfortunately same error.
I have try with original index.js and the deathemperor index.js.

If someone want try with my package with puppeteer version 1.1.1:
https://www.dropbox.com/s/diqi8gtvv40ez8i/sambaiz-original-puppeteer-1.1.1.zip?dl=0
https://www.dropbox.com/s/pyurmg3peiw7hya/sambaiz-original-puppeteer-1.1.1-with-index-change.zip?dl=0

Thanks

KeeperFrancis on 17 Aug 2018

Thanks all for your great works :)

I finally managed to deploy the sambaiz package. Also I updated the chronium to the lasted stable version( HeadlessChrome/68.0.3440.106 ) and last version of puppeteer ( 1.7.0 ).

https://www.dropbox.com/s/p4t7zod2nf97cwn/sambaiz-puppeteer.zip?dl=0

If you want to build your own package and you are on windows you can:

Download: https://github.com/sambaiz/puppeteer-lambda-starter-kit
Change package.json by mine:

{
  "name": "puppeteer-lambda-starter-kit",
  "version": "1.1.2",
  "description": "Starter Kit for running Headless-Chrome by Puppeteer on AWS Lambda",
  "scripts": {
    "package": "npm run package-prepare",
    "package-prepare": "npm run babel && copy package.json dist && cd dist && npm config set puppeteer_skip_chromium_download true -g && npm install --production",
    "babel": "mkdir dist && \"./node_modules/.bin/babel\" src --out-dir dist",
    "local": "npm run babel && copy node_modules dist && node dist/starter-kit/local.js",
    "package-nochrome": "npm run package-prepare && cd dist && zip -rq ../package.zip ."
  },
  "dependencies": {
    "babel": "^6.23.0",
    "puppeteer": "^1.1.1",
    "tar": "^4.0.1"
  },
  "devDependencies": {
    "aws-sdk": "^2.111.0",
    "babel-cli": "^6.26.0",
    "babel-preset-env": "^1.6.0"
  }
}

Change the version of node in .babelrc to 8.10
npm install babel ( if it's not already install )
npm run package
Copy chrome/headless_shell-67.0.3361.0.tar.gz to dist
Rename dist/headless_shell-67.0.3361.0.tar.gz to headless_shell.tar.gz
Zip the content of dist and you have your package ready to deploy

KeeperFrancis on 29 Aug 2018

@FrankTheCat
Hello

I try to run your project on aws lambda.
but when I use page.type function in my script it occured error.

"errorMessage":"Protocol error (Input.insertText): 'Input.insertText' wasn't found","errorType":"Error","stackTrace":["Promise (/var/task/node_modules/puppeteer/lib/Connection.js:202:56)","new Promise ()","CDPSession.send (/var/task/node_modules/puppeteer/lib/Connection.js:201:12)","Keyboard.sendCharacter (/var/task/node_modules/puppeteer/lib/Input.js:151:24)","Keyboard.type (/var/task/node_modules/puppeteer/lib/Input.js:166:20)","ElementHandle.type (/var/task/node_modules/puppeteer/lib/ElementHandle.js:173:31)","","process._tickDomainCallback (internal/process/next_tick.js:228:7)"]}

Have you had kind of this error ?

If you have, It could be a lot help.

Thx.

yunho2141 on 3 Sep 2018

@yunho2141
I don't use page.type on my project. What version of Chromium you use ?

Also after I finally managed to deploy the sambaiz package I found the project have new branch( Puppeteer 1.7.0 ): https://github.com/sambaiz/puppeteer-lambda-starter-kit/tree/puppeteer1.7.0 .

If you use the last version of puppeteer and not the last version of Chromium maybe use the chromium version on puppeteer1,7,0 branch.

KeeperFrancis on 4 Sep 2018

@FrankTheCat

Do you only change headless_shell file on puppeteer 1.7.0 branch?

when I run 'npm run package' it over 50MB..

so I'm trying to scale down...

Could you give me help?

yunho2141 on 4 Sep 2018

@yunho2141
I didn't use the 1.7.0 branch. I only figure out this branch is available after I finally managed to deploy the master branch. Also I manually gzip the last stable version from https://github.com/adieuadieu/serverless-chrome/releases

On the branch 1.7.0 it use probably lastest dev version.

So "Do you only change headless_shell file on puppeteer 1.7.0 branch?":
Once you have make the package ( zip of the content of dist ) it's will be smaller than 50mb. My package are like 43mb, If it's higher maybe you have not exclude babel from dist folder when you create the dist folder or you have 2 time the chromium gzip.

"when I run 'npm run package' it over 50MB.."
Do you use the package.json from the branch or mine ? Which OS you use ?

KeeperFrancis on 6 Sep 2018

Hey guys I am using Puppeteer Lambda. Which is good to use chrome in Lambda.
You need to run

CUSTOM_CHROME=true npm install puppeteer-lambda
then set environment :
CUSTOM_CHROME: true
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD: true
access key, secret access key

const browser = await puppeteerLambda.getBrowser({ headless: true, slowMo: 100, args: ['--no-sandbox', '--disable-setuid-sandbox', '--single-process', '--start-fullscreen', '--window-size=1413,749']});

very good helpful package

everything works good...til I have a problem with:
await page.goto(url);
I got stuck with timed out error .. it wont connect to url

ratkorle on 8 Oct 2018

Hey guys I am using Puppeteer Lambda. Which is good to use chrome in Lambda.
You need to run

CUSTOM_CHROME=true npm install puppeteer-lambda
then set environment :
CUSTOM_CHROME: true
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD: true
access key, secret access key

const browser = await puppeteerLambda.getBrowser({ headless: true, slowMo: 100, args: ['--no-sandbox', '--disable-setuid-sandbox', '--single-process', '--start-fullscreen', '--window-size=1413,749']});

very good helpful package

everything works good...til I have a problem with:
await page.goto(url);
I got stuck with timed out error .. it wont connect to url

I'm facing the same issue. got "Navigation Timeout Exceeded: 30000ms exceeded"

etanxing on 30 Nov 2018

@kenorbi I've built this tool which I believe is simpler to use then the other solutions. Give it shot! I've gotten it to work with Lambda.

@kenorbi Could you explain how to use your cleanup function in an actual puppeteer script? Thanks!

2803media on 18 Dec 2018

@kenorbi do you have docs on how the deployment should be done?

aigars-jekabsons on 3 May 2019

@aigars-jekabsons

Sharing this here in case it's of use to anyone: https://github.com/alixaxel/chrome-aws-lambda

Currently working on compatibility with the new Node 10 Lambda runtime.

alixaxel on 15 May 2019

👍2

Hey guys I am using Puppeteer Lambda. Which is good to use chrome in Lambda.
You need to run

CUSTOM_CHROME=true npm install puppeteer-lambda
then set environment :
CUSTOM_CHROME: true
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD: true
access key, secret access key

const browser = await puppeteerLambda.getBrowser({ headless: true, slowMo: 100, args: ['--no-sandbox', '--disable-setuid-sandbox', '--single-process', '--start-fullscreen', '--window-size=1413,749']});

very good helpful package
everything works good...til I have a problem with:
await page.goto(url);
I got stuck with timed out error .. it wont connect to url

I'm facing the same issue. got "Navigation Timeout Exceeded: 30000ms exceeded"

I'm facing the same issue. Anyone has already the solution?

okomura on 9 Jul 2019

@alixaxel Experiencing the same issue with chrome-aws-lambda.

node 8

kjr247 on 20 Aug 2019

@kjr247 Which issue exactly? This thread is long and not specifically related to my project.

Perhaps it would be better if you open an issue on https://github.com/alixaxel/chrome-aws-lambda.

alixaxel on 20 Aug 2019

👍1

Hello,

Can anyone explain to me why these examples of puppeteers in lambda have an option to generate a screenshot?

As I understand it, we will use the puppeteer in lambda just to do something (take a screenshot, create a pdf) and return the content.

Why do these examples have the option of generating a file inside lambda, whereas in AWS Lambda has no option to create local files, only in the / tmp folder?