Skip to content

Commit

Permalink
Merge pull request #14 from smithclay/cleanup-0.1.1
Browse files Browse the repository at this point in the history
Use standard logging library, reuse session for performance
  • Loading branch information
smithclay authored Mar 14, 2018
2 parents 0194b8d + 48b0778 commit 247cbe6
Show file tree
Hide file tree
Showing 9 changed files with 92 additions and 75 deletions.
50 changes: 27 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
## lambdium
### headless chrome + selenium webdriver in AWS Lambda

**Lambdium allows you to run a Selenium Webdriver script written in Javascript inside of an AWS Lambda function bundled with [Headless Chromium](https://developers.google.com/web/updates/2017/04/headless-chrome).**

*This project is now published on the [AWS Serverless Application Repository](https://serverlessrepo.aws.amazon.com), allowing you to install it in your AWS account with one click. Install in your AWS account [here](https://serverlessrepo.aws.amazon.com/#/applications/arn:aws:serverlessrepo:us-east-1:156280089524:applications~lambdium).* Quickstart instructions are in the [`README-SAR.md` file](https://github.com/smithclay/lambdium/blob/master/README-SAR.md).

This uses the binaries from the [serverless-chrome](https://github.com/adieuadieu/serverless-chrome) project to prototype running headless chromium with `selenium-webdriver` in AWS Lambda. I've also bundled the chromedriver binary so the browser can be interacted with using the [Webdriver Protocol](https://www.w3.org/TR/webdriver/).
Expand All @@ -13,61 +15,63 @@ The function interacts with [headless Chromium](https://chromium.googlesource.co

Since this Lambda function is written using node.js, you can run almost any script written for [selenium-webdriver](https://www.npmjs.com/package/selenium-webdriver). Example scripts can be found in the `examples` directory.

#### Requirements
#### Requirements and Setup

* An AWS Account
* The [AWS SAM Local](https://github.com/awslabs/aws-sam-local) tool for running functions locally with the [Serverless Application Model](https://github.com/awslabs/serverless-application-model) (see: `template.yaml`)
* node.js + npm
* `modclean` npm modules for reducing function size (optional)
* Bash

#### Fetching dependencies
_Note:_ If you don't need to build, customize, or run this locally, you can deploy it directly from a [template on the AWS Serverless Application repository](https://serverlessrepo.aws.amazon.com/#/applications/arn:aws:serverlessrepo:us-east-1:156280089524:applications~lambdium) and skip all of the below steps.

#### 1. Fetching dependencies

The headless chromium binary is too large for Github, you need to fetch it using a script bundled in this repository. [Marco Lüthy](https://github.com/adieuadieu) has an excellent post on Medium about how he built chromium for for AWS Lambda [here](https://medium.com/@marco.luethy/running-headless-chrome-on-aws-lambda-fa82ad33a9eb).

```sh
$ ./scripts/fetch-dependencies.sh
```

#### Running locally with SAM Local
##### 2. Cleaning up the `node_modules` directory to reduce function size

SAM Local can run this function on your computer inside a Docker container that acts like AWS Lambda. To run the function with an example event trigger that uses selenium to use headless chromium to visit `google.com`, run this:
It's a good idea to clean the `node_modules` directory before packaging to make the function size significantly smaller (making the function run faster!). You can do this using the `modclean` package:

To install it:

```sh
$ sam local invoke Lambdium -e event.json
$ npm i -g modclean
```

### Deploying

#### Creating a bucket for the function deployment

This will create a file called `packaged.yaml` you can use with Cloudformation to deploy the function.

You need to have an S3 bucket configured on your AWS account to upload the packed function files. For example:
Then, run:

```sh
$ export LAMBDA_BUCKET_NAME=lambdium-upload-bucket
$ modclean --patterns="default:*"
```

##### Reducing function size for performance (and faster uploads!)
Follow the prompts and choose 'Y' to remove extraneous files from `node_modules`.

It's a good idea to clean the `node_modules` directory before packaging to make the function size significantly smaller (making the function run faster!). You can do this using the `modclean` package:
#### 3. Running locally with SAM Local

To install it:
SAM Local can run this function on your computer inside a Docker container that acts like AWS Lambda. To run the function with an example event trigger that uses selenium to use headless chromium to visit `google.com`, run this:

```sh
$ npm i -g modclean
$ sam local invoke Lambdium -e event.json
```

Then, run:
### Deploying

#### Creating a S3 bucket for the function deployment

This will create a file called `packaged.yaml` you can use with Cloudformation to deploy the function.

You need to have an S3 bucket configured on your AWS account to upload the packed function files. For example:

```sh
$ modclean --patterns="default:*"
$ export LAMBDA_BUCKET_NAME=lambdium-upload-bucket
```

Follow the prompts and choose 'Y' to remove extraneous files from `node_modules`.

##### Packaging the function for Cloudformation using SAM
#### Packaging the function for Cloudformation using SAM

```sh
$ sam package --template-file template.yaml --s3-bucket $LAMBDA_BUCKET_NAME --output-template-file packaged.yaml
Expand All @@ -83,7 +87,7 @@ This will create the function using Cloudformation after packaging it is complet
If set, the optional `DEBUG_ENV` environment variable will log additional information to Cloudwatch.
## Invoking the function
### Running the function
Post-deploy, you can have lambda run a Webdriver script. There's an example of a selenium-webdriver simple script in the `examples/` directory that the Lambda function can now run.
Expand Down
2 changes: 1 addition & 1 deletion examples/visitgoogle.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@ $browser.findElement($driver.By.name('btnK')).click();
$browser.wait($driver.until.titleIs('Google'), 1000);
$browser.getTitle().then(function(title) {
console.log("title is: " + title);
console.log('Finished running script!');
});
console.log('Finished running script!');
46 changes: 28 additions & 18 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,35 @@ const fs = require('fs');

const chromium = require('./lib/chromium');
const sandbox = require('./lib/sandbox');
const { log } = require('./lib/helpers');
const log = require('lambda-log');

console.log('Loading function');
log.info('Loading function');

// Create new reusable session (spawns chromium and webdriver)
if (!process.env.CLEAN_SESSIONS) {
$browser = chromium.createSession();
}

exports.handler = (event, context, callback) => {
context.callbackWaitsForEmptyEventLoop = false;

if (process.env.CLEAR_TMP) {
log('attempting to clear /tmp directory')
log(child.execSync('rm -rf /tmp/core*').toString());
if (process.env.CLEAN_SESSIONS) {
log.info('attempting to clear /tmp directory')
log.info(child.execSync('rm -rf /tmp/core*').toString());
}

if (process.env.DEBUG_ENV || process.env.SAM_LOCAL) {
log.config.debug = true;
log.config.dev = true;
}

if (process.env.DEBUG_ENV) {
//log(child.execSync('set').toString());
log(child.execSync('pwd').toString());
log(child.execSync('ls -lhtra .').toString());
log(child.execSync('ls -lhtra /tmp').toString());
if (process.env.LOG_DEBUG) {
log.debug(child.execSync('pwd').toString());
log.debug(child.execSync('ls -lhtra .').toString());
log.debug(child.execSync('ls -lhtra /tmp').toString());
}

log('Received event:', JSON.stringify(event, null, 2));
log.info(`Received event: ${JSON.stringify(event, null, 2)}`);

// Read input
const inputParam = event.Base64Script || process.env.BASE64_SCRIPT;
Expand All @@ -32,17 +41,18 @@ exports.handler = (event, context, callback) => {
}

const inputBuffer = Buffer.from(inputParam, 'base64').toString('utf8');
if (process.env.DEBUG_ENV) {
log(`Executing "${inputBuffer}"`);
log.debug(`Executing script "${inputBuffer}"`);

// Creates a new session on each event (instead of reusing for performance benefits)
if (process.env.CLEAN_SESSIONS) {
$browser = chromium.createSession();
}

// Start selenium webdriver session
$browser = chromium.createSession();
sandbox.executeScript(inputBuffer, $browser, webdriver, function(err) {
if (process.env.DEBUG_ENV) {
log(child.execSync('ps aux').toString());
if (process.env.LOG_DEBUG) {
log.debug(child.execSync('ps aux').toString());
log.debug(child.execSync('cat /tmp/chromedriver.log').toString())
}

if (err) {
callback(err, null);
}
Expand Down
26 changes: 9 additions & 17 deletions lib/chromium.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
const child = require('child_process');
const { log } = require('./helpers');
const os = require('os');
const path = require('path');
const chrome = require('selenium-webdriver/chrome');
Expand Down Expand Up @@ -44,9 +43,15 @@ const defaultChromeFlags = [
const HEADLESS_CHROME_PATH = 'bin/headless-chromium';
const CHROMEDRIVER_PATH = '/var/task/bin/chromedriver';
exports.createSession = function() {
const service = new chrome.ServiceBuilder(CHROMEDRIVER_PATH)
.enableVerboseLogging()
.build();
var service;
if (process.env.LOG_DEBUG || process.env.SAM_LOCAL) {
service = new chrome.ServiceBuilder(CHROMEDRIVER_PATH)
.loggingTo('/tmp/chromedriver.log')
.build();
} else {
service = new chrome.ServiceBuilder(CHROMEDRIVER_PATH)
.build();
}

const options = new chrome.Options();

Expand All @@ -60,16 +65,3 @@ exports.createSession = function() {
return chrome.Driver.createSession(options, service);
}

const spawnProcess = function(localPath, flags) {
const opts = {
cwd: os.tmpdir(),
shell: true,
detached: true,
};

const proc = child.spawn(path.join(process.env.LAMBDA_TASK_ROOT, localPath),
flags,
opts
);
return proc;
};
6 changes: 0 additions & 6 deletions lib/helpers.js

This file was deleted.

24 changes: 18 additions & 6 deletions lib/sandbox.js
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
const vm = require('vm');
const log = require('lambda-log');

exports.executeScript = function(scriptText, browser, driver, cb) {
// Create Sandbox VM
Expand All @@ -14,7 +15,7 @@ exports.executeScript = function(scriptText, browser, driver, cb) {
try {
script.runInContext(scriptContext);
} catch (e) {
console.log('[script error]', e);
log.error(`[script error] ${e}`);
return cb(e, null);
}

Expand All @@ -27,8 +28,19 @@ exports.executeScript = function(scriptText, browser, driver, cb) {
}
});
*/

$browser.quit().then(function() {
cb(null);
});
}
// https://github.com/GoogleChrome/puppeteer/issues/1825#issuecomment-372241101
// Reuse existing session, likely some edge cases around this...
if (process.env.CLEAN_SESSIONS) {
$browser.quit().then(function() {
cb(null);
});
} else {
browser.manage().deleteAllCookies().then(function() {
return $browser.get('about:blank').then(function() {
cb(null);
}).catch(function(err) {
cb(err);
});
});
}
}
5 changes: 5 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "lambdium",
"version": "0.1.0",
"version": "0.1.1",
"description": "headless chromium in lambda prototype",
"main": "index.js",
"scripts": {
Expand All @@ -9,6 +9,7 @@
"author": "Clay Smith",
"license": "ISC",
"dependencies": {
"lambda-log": "^1.3.0",
"selenium-webdriver": "^3.6.0"
}
}
5 changes: 2 additions & 3 deletions template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,9 @@ Resources:
Runtime: nodejs6.10
FunctionName: lambdium
Description: headless chromium running selenium
MemorySize: 1024
Timeout: 10
MemorySize: 1156
Timeout: 20
Environment:
Variables:
DEBUG_ENV: "true"
CLEAR_TMP: "true"
CodeUri: .

0 comments on commit 247cbe6

Please sign in to comment.