diff --git a/README.md b/README.md index 26fe2d9..87caedb 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,8 @@ ## lambdium ### headless chrome + selenium webdriver in AWS Lambda +**Lambdium allows you to run a Selenium Webdriver script written in Javascript inside of an AWS Lambda function bundled with [Headless Chromium](https://developers.google.com/web/updates/2017/04/headless-chrome).** + *This project is now published on the [AWS Serverless Application Repository](https://serverlessrepo.aws.amazon.com), allowing you to install it in your AWS account with one click. Install in your AWS account [here](https://serverlessrepo.aws.amazon.com/#/applications/arn:aws:serverlessrepo:us-east-1:156280089524:applications~lambdium).* Quickstart instructions are in the [`README-SAR.md` file](https://github.com/smithclay/lambdium/blob/master/README-SAR.md). This uses the binaries from the [serverless-chrome](https://github.com/adieuadieu/serverless-chrome) project to prototype running headless chromium with `selenium-webdriver` in AWS Lambda. I've also bundled the chromedriver binary so the browser can be interacted with using the [Webdriver Protocol](https://www.w3.org/TR/webdriver/). @@ -13,7 +15,7 @@ The function interacts with [headless Chromium](https://chromium.googlesource.co Since this Lambda function is written using node.js, you can run almost any script written for [selenium-webdriver](https://www.npmjs.com/package/selenium-webdriver). Example scripts can be found in the `examples` directory. -#### Requirements +#### Requirements and Setup * An AWS Account * The [AWS SAM Local](https://github.com/awslabs/aws-sam-local) tool for running functions locally with the [Serverless Application Model](https://github.com/awslabs/serverless-application-model) (see: `template.yaml`) @@ -21,7 +23,9 @@ Since this Lambda function is written using node.js, you can run almost any scri * `modclean` npm modules for reducing function size (optional) * Bash -#### Fetching dependencies +_Note:_ If you don't need to build, customize, or run this locally, you can deploy it directly from a [template on the AWS Serverless Application repository](https://serverlessrepo.aws.amazon.com/#/applications/arn:aws:serverlessrepo:us-east-1:156280089524:applications~lambdium) and skip all of the below steps. + +#### 1. Fetching dependencies The headless chromium binary is too large for Github, you need to fetch it using a script bundled in this repository. [Marco Lüthy](https://github.com/adieuadieu) has an excellent post on Medium about how he built chromium for for AWS Lambda [here](https://medium.com/@marco.luethy/running-headless-chrome-on-aws-lambda-fa82ad33a9eb). @@ -29,45 +33,45 @@ The headless chromium binary is too large for Github, you need to fetch it using $ ./scripts/fetch-dependencies.sh ``` -#### Running locally with SAM Local +##### 2. Cleaning up the `node_modules` directory to reduce function size -SAM Local can run this function on your computer inside a Docker container that acts like AWS Lambda. To run the function with an example event trigger that uses selenium to use headless chromium to visit `google.com`, run this: +It's a good idea to clean the `node_modules` directory before packaging to make the function size significantly smaller (making the function run faster!). You can do this using the `modclean` package: + +To install it: ```sh - $ sam local invoke Lambdium -e event.json + $ npm i -g modclean ``` -### Deploying - -#### Creating a bucket for the function deployment - -This will create a file called `packaged.yaml` you can use with Cloudformation to deploy the function. - -You need to have an S3 bucket configured on your AWS account to upload the packed function files. For example: +Then, run: ```sh - $ export LAMBDA_BUCKET_NAME=lambdium-upload-bucket + $ modclean --patterns="default:*" ``` -##### Reducing function size for performance (and faster uploads!) +Follow the prompts and choose 'Y' to remove extraneous files from `node_modules`. -It's a good idea to clean the `node_modules` directory before packaging to make the function size significantly smaller (making the function run faster!). You can do this using the `modclean` package: +#### 3. Running locally with SAM Local -To install it: +SAM Local can run this function on your computer inside a Docker container that acts like AWS Lambda. To run the function with an example event trigger that uses selenium to use headless chromium to visit `google.com`, run this: ```sh - $ npm i -g modclean + $ sam local invoke Lambdium -e event.json ``` -Then, run: +### Deploying + +#### Creating a S3 bucket for the function deployment + +This will create a file called `packaged.yaml` you can use with Cloudformation to deploy the function. + +You need to have an S3 bucket configured on your AWS account to upload the packed function files. For example: ```sh - $ modclean --patterns="default:*" + $ export LAMBDA_BUCKET_NAME=lambdium-upload-bucket ``` -Follow the prompts and choose 'Y' to remove extraneous files from `node_modules`. - -##### Packaging the function for Cloudformation using SAM +#### Packaging the function for Cloudformation using SAM ```sh $ sam package --template-file template.yaml --s3-bucket $LAMBDA_BUCKET_NAME --output-template-file packaged.yaml @@ -83,7 +87,7 @@ This will create the function using Cloudformation after packaging it is complet If set, the optional `DEBUG_ENV` environment variable will log additional information to Cloudwatch. -## Invoking the function +### Running the function Post-deploy, you can have lambda run a Webdriver script. There's an example of a selenium-webdriver simple script in the `examples/` directory that the Lambda function can now run. diff --git a/examples/visitgoogle.js b/examples/visitgoogle.js index aa3e9f5..8c91695 100644 --- a/examples/visitgoogle.js +++ b/examples/visitgoogle.js @@ -11,5 +11,5 @@ $browser.findElement($driver.By.name('btnK')).click(); $browser.wait($driver.until.titleIs('Google'), 1000); $browser.getTitle().then(function(title) { console.log("title is: " + title); + console.log('Finished running script!'); }); -console.log('Finished running script!'); \ No newline at end of file diff --git a/index.js b/index.js index eb9e8bd..ffde4c1 100644 --- a/index.js +++ b/index.js @@ -4,26 +4,35 @@ const fs = require('fs'); const chromium = require('./lib/chromium'); const sandbox = require('./lib/sandbox'); -const { log } = require('./lib/helpers'); +const log = require('lambda-log'); -console.log('Loading function'); +log.info('Loading function'); + +// Create new reusable session (spawns chromium and webdriver) +if (!process.env.CLEAN_SESSIONS) { + $browser = chromium.createSession(); +} exports.handler = (event, context, callback) => { context.callbackWaitsForEmptyEventLoop = false; - if (process.env.CLEAR_TMP) { - log('attempting to clear /tmp directory') - log(child.execSync('rm -rf /tmp/core*').toString()); + if (process.env.CLEAN_SESSIONS) { + log.info('attempting to clear /tmp directory') + log.info(child.execSync('rm -rf /tmp/core*').toString()); + } + + if (process.env.DEBUG_ENV || process.env.SAM_LOCAL) { + log.config.debug = true; + log.config.dev = true; } - if (process.env.DEBUG_ENV) { - //log(child.execSync('set').toString()); - log(child.execSync('pwd').toString()); - log(child.execSync('ls -lhtra .').toString()); - log(child.execSync('ls -lhtra /tmp').toString()); + if (process.env.LOG_DEBUG) { + log.debug(child.execSync('pwd').toString()); + log.debug(child.execSync('ls -lhtra .').toString()); + log.debug(child.execSync('ls -lhtra /tmp').toString()); } - log('Received event:', JSON.stringify(event, null, 2)); + log.info(`Received event: ${JSON.stringify(event, null, 2)}`); // Read input const inputParam = event.Base64Script || process.env.BASE64_SCRIPT; @@ -32,17 +41,18 @@ exports.handler = (event, context, callback) => { } const inputBuffer = Buffer.from(inputParam, 'base64').toString('utf8'); - if (process.env.DEBUG_ENV) { - log(`Executing "${inputBuffer}"`); + log.debug(`Executing script "${inputBuffer}"`); + + // Creates a new session on each event (instead of reusing for performance benefits) + if (process.env.CLEAN_SESSIONS) { + $browser = chromium.createSession(); } - // Start selenium webdriver session - $browser = chromium.createSession(); sandbox.executeScript(inputBuffer, $browser, webdriver, function(err) { - if (process.env.DEBUG_ENV) { - log(child.execSync('ps aux').toString()); + if (process.env.LOG_DEBUG) { + log.debug(child.execSync('ps aux').toString()); + log.debug(child.execSync('cat /tmp/chromedriver.log').toString()) } - if (err) { callback(err, null); } diff --git a/lib/chromium.js b/lib/chromium.js index e2dcf03..3ddaad0 100644 --- a/lib/chromium.js +++ b/lib/chromium.js @@ -1,5 +1,4 @@ const child = require('child_process'); -const { log } = require('./helpers'); const os = require('os'); const path = require('path'); const chrome = require('selenium-webdriver/chrome'); @@ -44,9 +43,15 @@ const defaultChromeFlags = [ const HEADLESS_CHROME_PATH = 'bin/headless-chromium'; const CHROMEDRIVER_PATH = '/var/task/bin/chromedriver'; exports.createSession = function() { - const service = new chrome.ServiceBuilder(CHROMEDRIVER_PATH) - .enableVerboseLogging() - .build(); + var service; + if (process.env.LOG_DEBUG || process.env.SAM_LOCAL) { + service = new chrome.ServiceBuilder(CHROMEDRIVER_PATH) + .loggingTo('/tmp/chromedriver.log') + .build(); + } else { + service = new chrome.ServiceBuilder(CHROMEDRIVER_PATH) + .build(); + } const options = new chrome.Options(); @@ -60,16 +65,3 @@ exports.createSession = function() { return chrome.Driver.createSession(options, service); } -const spawnProcess = function(localPath, flags) { - const opts = { - cwd: os.tmpdir(), - shell: true, - detached: true, - }; - - const proc = child.spawn(path.join(process.env.LAMBDA_TASK_ROOT, localPath), - flags, - opts - ); - return proc; -}; diff --git a/lib/helpers.js b/lib/helpers.js deleted file mode 100644 index d07d3a3..0000000 --- a/lib/helpers.js +++ /dev/null @@ -1,6 +0,0 @@ -const net = require('net'); -const child = require('child_process'); - -exports.log = function() { - console.log.apply(this, arguments); -}; diff --git a/lib/sandbox.js b/lib/sandbox.js index dfc6c26..0759bda 100644 --- a/lib/sandbox.js +++ b/lib/sandbox.js @@ -1,4 +1,5 @@ const vm = require('vm'); +const log = require('lambda-log'); exports.executeScript = function(scriptText, browser, driver, cb) { // Create Sandbox VM @@ -14,7 +15,7 @@ exports.executeScript = function(scriptText, browser, driver, cb) { try { script.runInContext(scriptContext); } catch (e) { - console.log('[script error]', e); + log.error(`[script error] ${e}`); return cb(e, null); } @@ -27,8 +28,19 @@ exports.executeScript = function(scriptText, browser, driver, cb) { } }); */ - - $browser.quit().then(function() { - cb(null); - }); -} \ No newline at end of file + // https://github.com/GoogleChrome/puppeteer/issues/1825#issuecomment-372241101 + // Reuse existing session, likely some edge cases around this... + if (process.env.CLEAN_SESSIONS) { + $browser.quit().then(function() { + cb(null); + }); + } else { + browser.manage().deleteAllCookies().then(function() { + return $browser.get('about:blank').then(function() { + cb(null); + }).catch(function(err) { + cb(err); + }); + }); + } +} diff --git a/package-lock.json b/package-lock.json index b7e8b41..6e67c8e 100644 --- a/package-lock.json +++ b/package-lock.json @@ -92,6 +92,11 @@ "readable-stream": "2.0.6" } }, + "lambda-log": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/lambda-log/-/lambda-log-1.3.0.tgz", + "integrity": "sha1-b2YnzkOjLhT1WN826htcGUMGuV0=" + }, "lie": { "version": "3.1.1", "resolved": "https://registry.npmjs.org/lie/-/lie-3.1.1.tgz", diff --git a/package.json b/package.json index 7826f30..46961eb 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "lambdium", - "version": "0.1.0", + "version": "0.1.1", "description": "headless chromium in lambda prototype", "main": "index.js", "scripts": { @@ -9,6 +9,7 @@ "author": "Clay Smith", "license": "ISC", "dependencies": { + "lambda-log": "^1.3.0", "selenium-webdriver": "^3.6.0" } } diff --git a/template.yaml b/template.yaml index fa25ab1..915f381 100644 --- a/template.yaml +++ b/template.yaml @@ -9,10 +9,9 @@ Resources: Runtime: nodejs6.10 FunctionName: lambdium Description: headless chromium running selenium - MemorySize: 1024 - Timeout: 10 + MemorySize: 1156 + Timeout: 20 Environment: Variables: - DEBUG_ENV: "true" CLEAR_TMP: "true" CodeUri: .