≡ Menu

OpenVPN is a popular solution for deploying VPN servers to enable point to site secure connectivity to your cloud resources.  You can be up and running with an OpenVPN server in your AWS Virtual Private Cloud (VPC) in about 30 minutes thanks to the availability of the OpenVPN image (AMI) in the EC2 Marketplace.  OpenVPN also has a high-availability failover mode built right in.  Unfortunately, it doesn’t work with AWS.  That’s where this solution comes in.

Why doesn’t OpenVPN’s high-availability failover mode work in AWS?

According to OpenVPN’s high-availability failover documentation, AWS strips out UCARP/VRRP traffic which is how the OpenVPN servers send a heartbeat to each other.

Platform compatibility
This method unfortunately does not work on all platforms. For example on Amazon AWS, broadcast UCARP/VRRP traffic is simply filtered away, so this model cannot be used on Amazon AWS.

OK, how are we going to do this in AWS?  Route53!

If we were building an application that relied on traffic bound to one port (TCP, UDP, etc.), we’d be able to use an application load balancer to balance the traffic between our servers, but VPN works differently – a secure tunnel is created between the client and the server and all traffic is sent through this tunnel.  To solve this, we’ll need load balancing and failover at the DNS level.  This is where Route53’s traffic policies come in.

Route53’s traffic policies allow you to create rules that route traffic to different endpoints based on rules and health checks.  In this scenario, to keep it simple, we’ll only use two OpenVPN servers and we’ll use an evenly weighted rule to send the same amount of traffic to each server.  For the health check, we’ll monitor the web admin panel which runs under the same service (openvpn_as) as the VPN service.

Limitations of using DNS for failover

  1. When you create a DNS record, you specify a TTL (time to live) which tells the client how long the DNS record is valid for before they should retrieve a new one.  60 seconds is typical.  This means that in a failover scenario, your users’ VPN clients may not see the new DNS record for 60 seconds.  This is on top of the time it took your health check to fail.  In my setup, this is 4 minutes.
  2. Clients disobeying DNS record TTLs.  This shouldn’t apply in this scenario – after all you’re providing access to resources for members of your organization or  your customers.  It is, however, important to keep this limitation in mind for other possible uses of Route53 traffic policies.
  3. DNS caching. Even though you are specifying a TTL, there’s no guarantee a consuming client or network will respect this.  Again, this most likely won’t apply in this scenario.
  4. Cost.  Creating a Route53 traffic policy costs a flat $50/month.

Good with all that?  Let’s build.

Requirements:

  1. A VPC with two availability zones.  This ensures redundancy in the event one availability zone goes offline.
  2. An OpenVPN server running in each availability zone with a common user database.
  3. DNS Zone hosted in Route53.
  4. An IAM Role that can create DNS records in Route53.
  5. LetsEncrypt wildcard certificates.

Below is a simplified topology diagram.  To save space, I’ve omitted the subnet(s) that would hold resources such as application servers and databases.

OpenVPN Route53 AWS VPC Topology

Build a VPC

To build the VPC out for this demo, I’m just going to use a CloudFormation template that:

  1. Creates a VPC ‘OpenVPN Demo’ in region US-EAST-1 (10.0.0.0/24)
  2. Creates an Internet Gateway
  3. Updates the default routing table to add a route to all (0.0.0.0/0) non-subnet outbound traffic through the Internet Gateway.
  4. Creates two subnets ‘DMZ Subnet A’ and ‘DMZ Subnet B’ located in availability zones US-EAST-1A and US-EAST-1B respectively.
  5. Associates the routing table with the 2 DMZ Subnets.

Launch an OpenVPN server in each Availability Zone

To launch the two OpenVPN servers, we’ll navigate to the EC2 ‘Launch Instance’ Wizard and then ‘AWS Marketplace’ on the left nav.  When the marketplace loads, type ‘OpenVPN’ into the search box and you should see something like:

Select the OpenVPN AMI

We want the selection listed first since it is the ‘Bring Your Own License’ model.  This version of OpenVPN will include 2 free connected devices (users) which be plenty for the purposes of this demo.  This is also a good time to note that purchasing the OpenVPN license through Amazon is a much worse deal ($900/year for 10 users) than just buying the license through the OpenVPN site where it’s $150/year for 1 year for 10 users.  🤔

Once we’ve clicked ‘Select’ on the BYOL AMI, we’ll see some product details and pricing info.  Since we’re using the BYOL version the additional cost above the EC2 instance is $0.00 for all instance sizes.  On the next dialog, we’ll choose an instance size of ‘t2.micro’ which will run us $8.47/month.  Note that the new ‘t3’ instances aren’t available as of the time of this writing.  When they become available, choose ‘t3.micro’.

On the next dialog:

  1. Choose the ‘OpenVPN Demo’ VPC in the Network dropdown.
  2. Select the ‘DMZ Subnet A’ Subnet
  3. Select ‘Enable’ in the Auto-Assign Public IP dropdown.

OpenVPN EC2 Step 3 Instance Details

Click Next to see the storage options.  Stick with the default EBS – General Purpose choice of 8GB.

On Step 5, add a tag, ‘Name’, and call this instance ‘OpenVPN A’.

Step 5: Add Tags

On Step 6, rename the security group to something a little more user friendly, like ‘OpenVPN Default’.  Also be sure to change the SSH rule so it’s bound to only your IP (Choose ‘My IP’ in the dropdown).   The other 3 ports (TCP 443, TCP 943, and UDP 1194) are for the VPN connections and web administration.

Step 6: Configure Security Groups

Finally, create a new key or select an existing key that will be used to connect to the instance via SSH to continue setup.

Select Key Pair

Next you’ll need to repeat these steps for the second OpenVPN instance that will go into ‘DMZ Subnet A’.  Be sure to change the appropriate value (subnet) and name tag.  You can use the same key pair ‘OpenVPNDemo’.

To connect and configure your OpenVPN instances, you can follow this guide in the OpenVPN docs.   All of the default values will be fine for the purposes of this exercise.  Keep in mind that in a production scenario you’ll want to setup a separate user store using a MySQL plugin or connecting to an external service such as Active Directory (see AWS Directory Service).

Create an IAM role with access to Route 53 and assigned it to our VMs

Next we’ll create an IAM role with permissions to modify our Route53 DNS records and then assign it to our OpenVPN instances.  This IAM role is needed so our LetsEncrypt client (Certbot) can programmatically create DNS records to validate our wildcard cert requests.

We’ll need our Route 53 Hosted Zone ID for a step in this process.  We can get this value from the Route 53 console.  The Zone ID for my domain is ‘Z3LSP2JGNYI6DC’.

Route53 Zone ID

Then we’ll go to the IAM Roles console and click ‘Create Role’.

On the next step, we’ll choose ‘AWS Service’ as the trusted entity and EC2 as the service that will be using (assuming) this role.

IAM Role Entity and Service

Now if we look at the requirements page for our Certbot Route53 plugin to, we can see that we need to add the following permissions to our IAM role:

route53:ListHostedZones
route53:GetChange
route53:ChangeResourceRecordSets

They’ve also provided a handy JSON policy doc that we can modify Hosted Zone we looked up earlier in and drop in via the IAM editor.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "route53:ListHostedZones",
                "route53:GetChange"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect" : "Allow",
            "Action" : [
                "route53:ChangeResourceRecordSets"
            ],
            "Resource" : [
                "arn:aws:route53:::hostedzone/Z3LSP2JGNYI6DC"
            ]
        }
    ]
}

In the IAM role step, we’ll click ‘Create Policy’.

IAM Create Policy

Here we’ll select the JSON tab.

Select JSON Policy Tab

and then we’ll paste in the JSON policy above:

JSON IAM Policy

Next we’ll enter a name and description for the policy.

Review IAM Policy

Then we’ll assign this policy to the role and name the role ‘Route53_LetsEncrypt’.

Assign Policy to Role

Create Role

Now that we have our ‘Route53_LetsEncrypt’ role, we need to attach it to our OpenVPN Ec2 instances.  This gives our LetsEncrypt client, Certbot,. the ability to interact with Route53 without needed to deal with AWS API keys.  This is easier to manage and most importantly – more secure!

In the EC2 console, find the OpenVPN A and B servers we created earlier and attach the ‘Route53_LetsEncrypt’ IAM role to the instances.

Attach IAM Role

Attach Route53_LetsEncrypt Role

Note: This IAM role could’ve been created and assigned as part of the EC2 launch instance wizard.  I intentionally chose to move this to a separate step since it’s important and worth calling out.

Setup LetsEncrypt Wildcard Certificates

SSH to your OpenVPN instance:

ssh openvpnas@{Your OpenVPN Public IP} -i OpenVPNDemo.pem 

Then grab the repo containing certbot and update the repo:

sudo apt-get -y install software-properties-common
sudo add-apt-repository -y ppa:certbot/certbot
sudo apt-get -y update
sudo apt-get -y install certbot

Once certbot finishes installing, check the version:

$ certbot --version
certbot 0.26.1

Now we’ll install pip and the Route53 plugin for Certbot.  Be sure to specify the version of certbot installed above.

sudo apt-get update && sudo apt-get -y upgrade
sudo apt-get install python-pip
sudo -H pip install certbot_dns_route53==0.26.1

Let’s test out our certbot install and route53 plugin using the –dry-run flag to generate a wildcart cert and ensure everything is working properly:

$ sudo certbot certonly -d *.dan-russell.com --dns-route53 -email dan@dan-russell.com --agree-tos --non-interactive --dry-run
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator dns-route53, Installer None
Obtaining a new certificate
Performing the following challenges:
dns-01 challenge for dan-russell.com
Waiting for verification...
Cleaning up challenges

IMPORTANT NOTES:
- The dry run was successful.
- Your account credentials have been saved in your Certbot
configuration directory at /etc/letsencrypt. You should make a
secure backup of this folder now. This configuration directory will
also contain certificates and private keys obtained by Certbot so
making regular backups of this folder is ideal.

If you get the message above, it means you’ve properly configured your IAM role, installed certbot, the certbot route53 plugin and you’re able to generate a wildcard cert via LetsEncrypt.

Next we’ll create a shell script and add it to crontab so it runs certbot weekly and applies the cert to OpenVPN via the command line.  Note that the DOMAIN variable is set to the top level domain (dan-russell.com) and not *.dan-russell.com because LetsEncrypt saves the certs at the top-domain level (dan-russell.com).  If you change the DOMAIN variable to the wildcard value, things will break.

Create the following script with your preferred Linux text editor and save it with a .sh extension.  I called mine ‘RenewAndApplyCert.sh’.

#!/bin/sh

# Change to your domain!
DOMAIN="dan-russell.com"

# Renew the Cert.  Assumes wildcard based on top level domain above
sudo certbot certonly -n -d *.$DOMAIN --dns-route53 --agree-tos --email dan@dan-russell.com --non-interactive

# STOP openVPN
service openvpnas stop

# Apply the certs to OpenVPN using configuration scripts
/usr/local/openvpn_as/scripts/confdba -mk cs.ca_bundle -v "`cat /etc/letsencrypt/live/$DOMAIN/fullchain.pem`"
/usr/local/openvpn_as/scripts/confdba -mk cs.priv_key -v "`cat /etc/letsencrypt/live/$DOMAIN/privkey.pem`" > /dev/null
/usr/local/openvpn_as/scripts/confdba -mk cs.cert -v "`cat /etc/letsencrypt/live/$DOMAIN/cert.pem`"

# START OpenVPN
service openvpnas start

Make your shell script executable:

chmod +x RenewAndApplyCert.sh

Now add it to /etc/crontab and set it to run weekly:

# /etc/crontab: system-wide crontab
# Unlike any other crontab you don't have to run the `crontab'
# command to install the new version when you edit this file
# and files in /etc/cron.d. These files also have username fields,
# that none of the other crontabs do.

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# m h dom mon dow user command
17 * * * * root cd / && run-parts --report /etc/cron.hourly
25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
47 6 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )
 0 1 * * 1 root /home/openvpnas/RenewAndApplyCert.sh

Don’t forget to repeat this step for your other server ‘OpenVPN B’, so they’re both running wildcard certs.  Once that’s done, you’re ready to move to the final step.

Create a Traffic Policy in Route 53

Now that we have two OpenVPN servers running in different availability zones, we’ll setup a Route53 traffic policy that balances traffic between the two instances and fails over in the event one instance goes down or the entire availability zone goes offline.

The first thing we’ll do is create health checks for each of our VPN instances.  Go to the Health Check console in Route53 then click ‘Create Health Check’.   Enter a descriptive name (OpenVPN A), set the protocol to HTTPS and specify the public IP address of your OpenVPN A instance.  You can use either port 443 or 943.  Leave the default settings under ‘Advanced Configuration’ which checks the server’s status every 30 seconds.

OpenVPN Health Check

Click ‘Next’ then skip creating a CloudWatch alarm.  You should see a message indicating your Health Check has been created successfully.  Repeat the same steps for your other OpenVPN instance (OpenVPN B).  You now see something like this indicating both health checks are working:

Route53 Health Check

Go to Route53’s Traffic Policies console and click ‘Create Traffic Policy’.  You’ll be prompted to enter a name.  Call it ‘OpenVPN’.  Now you’ll be presented with a visual editor that looks something like:

Route53 Traffic Policy

Next we’ll add a ‘Weighted Rule’ and specify the same weight for each server (10).  Then we select the appropriate health check for each of our servers.

Traffic Policy Weighted Rule

Now in the ‘Connect to…’ area we’ll add each of our OpenVPN instances as endpoints by IP.  Remember to specify the Public IP of each instance here.  This gives us:

Traffic Policy Endpoints

Click ‘Save’.  Now we’re presented with the policy record screen.  Here we’re going to determine which DNS A record is associated with the the traffic policy we just created.   Here I’ve chosen to create an A record ‘vpn.dan-russell.com’ with a 60 second TTL that’s associated with this traffic policy.  After clicking ‘Create policy records’, the traffic policy will be created.  This takes a few minutes.

Route53 Policy Record

Testing

To make sure this all works, we’ll load our OpenVPN client and import our VPN connection (OpenVPN > Import > From Server) then enter the DNS name we created (vpn.dan-russell.com).   Once we’ve imported the server, we’ll connect (OpenVPN > vpn.dan-russell.com > Connect).

Now we’ll make sure failover works.  From your command prompt, ping the DNS entry you created.

$ dig vpn.dan-russell.com

; <<>> DiG 9.10.3-P4-Ubuntu <<>> vpn.dan-russell.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23440
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;vpn.dan-russell.com.		IN	A

;; ANSWER SECTION:
vpn.dan-russell.com.	40	IN	A	18.204.19.143

In this instance, we can see the A record returned an IP of 18.204.19.143 which references the OpenVPN B instance.

To test failover, I’m going to log into the OpenVPN B instance and manually stop the OpenVPN service.

$ sudo service openvpnas stop

Now after a few failed health checks, the Traffic Policy record we created will fail the DNS over so ‘vpn.dan-russell.com’ points at OpenVPN A.

OpenVPN Health Checks

Let’s run the dig command.

$ dig vpn.dan-russell.com

; <<>> DiG 9.10.3-P4-Ubuntu <<>> vpn.dan-russell.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26991
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;vpn.dan-russell.com. IN A

;; ANSWER SECTION:
vpn.dan-russell.com. 60 IN A 18.209.230.2

Now we can see that our vpn.dan-russell.com A record is pointed at OpenVPN A’s public IP 18.209.230.2.  Success!

If we launch our VPN client again and connect to ‘vpn.dan-russell.com’, we’ll be successfully connected to OpenVPN A.

Questions?  Comments?

Please let me know what you thought of this post in the comments.  Did I provide too much detail?  Too little?  Have suggestions on how to improve this implementation?

{ 7 comments }

As an avid fisherman on Lake Champlain, I’m constantly checking the National Weather Service’s lake forecast.  This tells me three key pieces of information:

  1. How windy is it?
  2. Which direction is the wind blowing in?
  3. How big are the waves?

These three pieces of information will determine where I’ll launch the boat and where I’ll fish.  And in some cases whether I’ll fish at all.  Fishing in 3-4 foot waves can be pretty miserable.

I’ve recently been using the Twilio SMS API to handle messaging for work and I noticed they had some newer workflow options to build SMS and voice flows, so I thought it’d be interesting to build a simple SMS weather bot.  To do this, I knew I’d need to tie a few different pieces together.  Here they are in order:

  1.  A way to scrape the forecast page every morning when the new forecast is published and extract just the text I needed.  For this I used an AWS Lambda function with a Cloudwatch Event running on a CRON schedule.
  2. An easy way to store the scraped forecast data so I don’t have to scrape it more than once per day.  DynamoDB fit the bill here.
  3. A way to serve up the stored forecast data. API Gateway + Lambda.
  4. An SMS # to receive and send messages.  Twilio.
  5. A workflow to build logic against the SMS # so I can call the API that serves up the forecast data.

Note: The lake forecast has recently expanded to include 5 days of data.  To keep things simple for this demo, I’m only using three forecast periods: “today”, “tonight” and “tomorrow”.

You can out my Lake Weather SMS bot by sending any message to: +1 202-999-3555

AWS + Twilio Lake Weather Forecast Application

Here’s a diagram showing the overall process and how everything fits together.

Note: You can build this entire application for free

Please note that free trials are available for both Twilio and AWS which will allow you to build this entire application free of charge.  You can read more about the AWS Trial and the Twilio Trial on their respective web sites.

Let’s get building!

The Lake Champlain Forecast page

Let’s take a look at the page containing the forecast we’ll be parsing (https://forecast.weather.gov/product.php?site=BTV&issuedby=BTV&product=REC&format=txt&version=1&glossary=0).  If you view source, you’ll see a preformatted <pre> tag (<pre class=”glossaryProduct“>) that contains the text we care about:

<pre class="glossaryProduct">
000
SXUS41 KBTV 070656
RECBTV
NYZ028>031-034-035-VTZ001>012-016>019-072115-

Recreational Forecast
National Weather Service Burlington VT
256 AM EDT Fri Sep 7 2018

.The Lake Champlain Forecast...

.TODAY...North winds 5 to 10 knots. Waves around 1 foot.
.TONIGHT...North winds 5 to 10 knots, becoming northwest 10 to
20 knots after midnight. Waves around 1 foot, building to 1 to
2 feet after midnight.
.SATURDAY...North winds 10 to 20 knots, becoming 10 to 15 knots in
the afternoon. Waves 1 to 2 feet.
.SATURDAY NIGHT...Northeast winds 10 to 15 knots, becoming north
5 to 10 knots after midnight. Waves 1 to 2 feet.
.SUNDAY...North winds around 5 knots, becoming northeast in the
afternoon. Waves 1 foot or less, subsiding to around 1 foot in the
afternoon.

The parts we care about are TODAY, TONIGHT and SATURDAY (tomorrow).  If we look closely we’ll see that each piece we care about is loosely separated by three dots (…).  If we process the text contained within the <pre> tag as a string and split it into an array on the three dots, we’ll end up with:

First array item (position 0 in JavaScript):

000
SXUS41 KBTV 070656
RECBTV
NYZ028>031-034-035-VTZ001>012-016>019-072115-

Recreational Forecast
National Weather Service Burlington VT
256 AM EDT Fri Sep 7 2018

.The Lake Champlain Forecast

Second array item (position 1):

.TODAY

Third array item (position 2) – Today’s forecast:

North winds 5 to 10 knots. Waves around 1 foot.
.TONIGHT

Fourth array item (position 3) – Tonight’s forecast:

North winds 5 to 10 knots, becoming northwest 10 to
20 knots after midnight. Waves around 1 foot, building to 1 to
2 feet after midnight.
.SATURDAY

This gives us enough to get started with.  We know we’ll want most of the text from the 3rd 4th and 5th (not shown) items in the array to get forecasts for ‘today’, ‘tonight’ and ‘tomorrow’.  We’ll also need to deal with the following day’s forecast title (.TONIGHT, .SATURDAY, etc.) which comes along for the ride.  If we look carefully, we’ll see that we can use Regex to split this string again since it’s a period followed by no space and then a upper case letter.  This results in the following RegEx in JavaScript: (/\r?\.[A-Z]/).  Once we split the string using this RegEx, we’ll take the first item in the array as our forecast.

The Lambda Web Scraper

To build a scraper in Lambda, we’re going to use NodeJS and load in a few libraries:

npm install --save request-promise cheerio aws-sdk
  1. request-promise – Allows us to make a web request to the NOAA forecast page.
  2. cheerio – Parses the HTML easily.
  3. aws-sdk – Includes the DynamoDB SDK.

Here’s the code of index.js in its entirety:

const rp = require('request-promise');
const cheerio = require('cheerio');
var AWS = require('aws-sdk');
var dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10', region: 'us-east-1'});

exports.handler = (event, context, callback) => {

    // Set the options for our Scrape.  The URI to scrape and a User Agent so we don't get blocked.
    const options = {
        uri: 'https://forecast.weather.gov/product.php?site=BTV&issuedby=BTV&product=REC&format=txt&version=1&glossary=0',
        headers: {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36'
        },
        transform: function (body) {
          // Load the retrieved page into Cheerio so we can parse it
          return cheerio.load(body);
        }
      };

  // Get the weather page and parse it 
  rp(options)
  .then(($) => {
    let rawForecast = $('pre').html();
    // Split the weather data on '...'
    let splitForecast = rawForecast.split("...");

    // Dynamo DB params
    var params = {
        ExpressionAttributeNames: {
         "#F": "Forecast"
        }, 
        ExpressionAttributeValues: {
         ":f": {
           S: splitForecast[2].split(/\r?\.[A-Z]/)[0] // Splits the weather data on .[A-Z] so we don't get the next day's prefix
          }
        }, 
        Key: {
         "forecastperiod": {
           S: "today"
          }
        }, 
        ReturnValues: "ALL_NEW", 
        TableName: "LakeForecast", 
        UpdateExpression: "SET #F = :f"
       };
       dynamodb.updateItem(params, function(err, data) {
         if (err) console.log(err, err.stack); // an error occurred
         else {
                //Update tonight
                params.ExpressionAttributeValues[":f"].S = splitForecast[3].split(/\r?\.[A-Z]/)[0];
                params.Key.forecastperiod.S = "tonight";
                dynamodb.updateItem(params, function(err, data) {
                if (err) console.log(err, err.stack); // an error occurred
                else {
                     //Update tomorrow
                    params.ExpressionAttributeValues[":f"].S = splitForecast[4].split(/\r?\.[A-Z]/)[0];
                    params.Key.forecastperiod.S = "tomorrow";
                    dynamodb.updateItem(params, function(err, data) {
                        if (err) console.log(err, err.stack); // an error occurred
                        else {
                            // Let Lambda know we're all done processing.
                            callback(null, 'all done processing!');
                        }        
                      });
       
                }        
              });
         }        
       });
  })
  .catch((err) => {
    callback(err);
  });
};

There’s definitely some room for improvement with the text parsing (splitting on ‘…’, etc.) but this will do for now.  Once we’ve created our index.js we’ll need to zip up the files to include our node_modules folder which contains the libraries and their dependencies.

Our lambda function will need to be assigned an IAM role that allows it to execute the Lambda function (Lambda basic execution) and write data to DynamoDB.  If you aren’t familiar with IAM, let me know in the comments and I’m happy to help.

Note that when you call the DynamoDB ‘updateItem’ method if a record doesn’t already exist with the key you provide, it’s created/inserted.  It’s really an ‘upsert’ operation.  This is handy for our needs so we don’t need to get the item first to see if it exists.  We can just call ‘updateItem’ and let DynamoDB figure it out behind the scenes.

Cloudwatch Events

To run this Lambda daily, we’re going to setup a cloudwatch event to run every day at 09:00 (9 AM) UTC which is after the Lake Champlain forecast is published each morning around 3 AM EDT/2 AM EST.  To do this, choose ‘Cloudwatch Events’ as a trigger at the top of the Lambda page and then scroll down to customize the Cloudwatch Event.  I entered a Cron expression of:

cron(0 9 * * ? *)

You can read more about the Cron expressions accepted by Lambda in this AWS doc.

Lambda Cloudwatch Event Trigger

Cloudwatch Event

DynamoDB

To hold our forecast data, we’ll just create a DynamoDB table, ‘LakeForecast’, with a key of ‘forecastperiod’ and we’ll pass in an attribute of ‘forecast’ when we load data.  You won’t need a secondary index since we’ll always be retrieving data using the exact key (e.g., today, tonight, tomorrow).  Here’s how our DynamoDB table will look once we’ve scraped and populated some records:

DynamoDB Forecast Table

API Gateway + Lambda

To retrieve the data from the Twilio flow using an API request, we need to setup an API Gateway with a POST method at the root.  When we want to retrieve our forecast, we’ll POST a JSON message specifying the forecast period we want to retrieve:

{
    "forecastperiod" : "today"
}

This JSON body will travel all the way through to our Lambda method where it’ll appear as the ‘event’ variable.  This means you’ll need to add a model under ‘Request Body’ in the Method Request with a content type of ‘application/json’.  In the Integration Request, we’ll allow the Request body passthrough as seen below.

API Gateway POST Method

 

API Gateway Method Request

API Gateway Integration Request Mapping Template:

API Gateway Mapping Template

 

Our API Gateway POST method will be integrated with a new lambda function, ‘lambda-weather-get-forecast’, we’ll create to retrieve the appropriate forecast data from DynamoDB:

API Gateway Integration Request

Lambda-weather-get-forecast index.js:

var AWS = require('aws-sdk');
var dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10', region: 'us-east-1'});

exports.handler = (event, context, callback) => {
    console.log(event);
    
    var params = {
        Key: {
         "forecastperiod": {
           S: event.forecastperiod.toLowerCase()
          }
        }, 
        TableName: "LakeForecast"
       };
       dynamodb.getItem(params, function(err, data) {
         if (err) callback(err); // an error occurred
         else {
             let reply = { message: data.Item.Forecast.S };
             callback(null, reply );          // successful response
         }
       });
}

When this Lambda function executes, it will return data like:

{ 
   "message": "North winds 5 to 10 knots. Waves around 1 foot."
}

This Lambda function will need an IAM role that can execute Lambda and read from DynamoDB.

Note that I originally created a resource with a path parameter of {forecastperiod} planning to use a GET request to /today, /tonight, /tomorrow, but unfortunately the Twilio widget for API requests doesn’t support a GET method with a response type of application/json only URL encoded forms.  Odd, right?

Twilio SMS #

Creating a new SMS # within Twilio is a really straight forward process.  You can do so on the Twilio create number page.  If you’re using the free trial, keep in mind that you’ll only be able to receive messages from Twilio on cell phones that you’ve verified.

Once you’ve created your Twilio number, go to this page and click on the number you’ve just created.  You’ll see a page like this:

This page is where you can integrate your Twilio number with webhooks, functions, etc.  In this case, I’ve specified that when a new message arrives, I want to kick off a ‘Studio Flow’ called ‘LakeWeatherSMS’.  This is where we’ll continue in the next and final step.

Twilio Flow

Twilio Flow gives us the ability to build workflows that are triggered from an SMS message, a phone call or a REST API call.  Flow uses Twilio Studio which means a large part of the workflow is built using a visual editor which is really nice since it illustrates the flow from each widget based on criteria such as success/fail, match/no match, etc.  Let’s take a look at the ‘LakeWeatherSMS’ flow at a high level then we’ll break down each piece (widget) individually.

Twilio LakeWeatherSMS Flow

Starting at the top of the flow, we have our trigger which fires on an incoming message.  This trigger was enabled in the previous step when we specified ‘LakeWeatherSMS’ as the flow to trigger when a new message arrives.  The arrow from the Trigger to the ‘sanitizeText’ widget indicates the execution direction and hints that information about the trigger is available to the sanitizeText function widget.

Twilio functions are serverless functions (FaaS), much like Lambda, that can be invoked via an API call or integrated into a Twilio flow.

Twilio Trigger

 

 

 

In the sanitizeText function I’ve specified a parameter ‘msgText’ which is mapped to the variable {{trigger.message.Body}} which as you might expect is the body of the SMS message received.  This parameter is passed to the sanitizeText function as the event which looks like:

{
   msgText: 'Today'
}

The sanitizeText function code is as follows:

exports.handler = function(context, event, callback) {
    console.log(event);
    
    var response = {
        cleanText: event.msgText.toLowerCase()
    }

	callback(null, response);
};

which returns:

{
   cleanText: 'today'
}

Yep, that’s it.  All it does is return a JSON object with one field, ‘cleanText’, that contains the event message text set to lower case.  This is helpful as most phones automatically capitalize the first letter of a text message (today becomes Today, etc.).  To simplify the matching case for the next widget, I want it set to lowercase.  I also wanted to see how Twilio functions worked for future applications.

Twilio Flow Section 2

In the next section, the execution path splits based on the message body we received from our Twilio function.  If the variable {{widgets.sanitizeText.parsed.cleanText}} (returned by the sanitizeText function) matches any of ‘today’, ‘tonight’, or ‘tomorrow’, then the execution path continues down the right side to call the API.  If it doesn’t, we continue down the left side and send a message, “Sorry, we couldn’t understand your request. Please reply with which forecast you would like: today, tonight or tomorrow.”  One important thing to note here is that the ‘send_instructions’ widget does not have any additional steps.  This means that the flow execution ends and a new incoming message would trigger a new flow execution.

The ‘split_incoming’ widget looks like this:

Twilio Split Based On Message Config

Twilio Split Based On Message Transitions

On the left side, we just name the widget and which variable we’re splitting on.  On the right side, we determine what our next widgets will be based on whether the text matches or not.  Pretty straightforward.

 

 

 

 

 

 

 

 

 

 

Now let’s crack open the send_instructions widget to see how messages are sent from Twilio:

Send Message Config

Send Message Transitions

Here we can see the Message Body as well as the Send Message From and Send Message To Variables.  The Send Message From variable, {{flow.channel.address}}, maps to our Twilio SMS #.  The Send Message To variable, {{contact.channel.address}}, maps to the original message address that triggered the flow.  As you can see there are no transitions specified which means execution will end once this Send Message widget is executed.

 

 

 

Let’s take a look at the right side of our execution now.

Twilio API Call and Send Message

In this section, we’re continuing because our SMS contained the words, ‘today’, ‘tonight’, or ‘tomorrow’.  We’ll take this text and call the API Gateway we created to retrieve our forecast and then send a message back which contains the forecast text.  Then the execution ends.

 

 

 

 

 

 

 

 

Twilio HTTP Request Config

Twilio HTTP Request Transitions

Let’s take a look under the hood in the HTTP Request widget.  We’ve specified our Request URL which points at the API Gateway we created earlier as well as a Content Type of ‘application/json’ and then the Request Body that we’re posting to our API.  Here you can see we’re referencing a Twilio variable, {{widgets.sanitizeText.parsed.cleanText}}, which is the lower case text we got back from our Twilio function – ‘today’, ‘tonight’ or ‘tomorrow’.

On the transitions pane, we decide which widget to execute next based on Success or Fail.  In this demo, I’ve only chosen to execute the ‘send_forecast’ widget next on success.   In a failure scenario, we could send a message to the user along the lines of, “Sorry, unable to retrieve your forecast, please try again later.”

 

 

Now it’s time to look at our final widget, send_forecast, which sends an SMS to the user with the weather forecast text we received back from our API Gateway which looks something like:

{ 
   "message": "North winds 5 to 10 knots. Waves around 1 foot."
}

Twilio Send Message Config

In the Message Body field of the config, we can reference this value as {{widgets.call_weather_api.parsed.message}} which parses the JSON for us and makes the ‘message’ property easily accessible.  As we did before, we’re specifying where the message should be sent from (our Twilio SMS #) and who it should go to (the SMS # that triggered this workflow execution).

 

 

 

 

 

 

 

 

 

See this workflow in action

You can test this workflow out yourself by sending a text message to +1 202-999-3555.  If you specify a message text of ‘today’, ‘tonight’ or ‘tomorrow’, you’ll get a reply with the weather forecast.  Otherwise you’ll receive instructions on how to request a forecast.

Our new Twilio bot in action

Conclusion

Wow!  We just built an SMS bot that can retrieve the Lake Champlain Weather forecast by integrating AWS and Twilio products.

This application is just scratching the surface of what can be built with Twilio’s Flow product.  Flow could be used to build things like an SMS bot to provide order status for online grocery delivery, airport delays, and so on.  By integrating Flow with AWS via an API call, we’re able to tap into the huge variety of managed services that AWS offers.  In this example, our application is entirely serverless since we’re using all managed services such as API Gateway, Lambda, and DynamoDB on the AWS side and everything in Twilio’s stack is serverless.

A favor to ask

Did you find this post helpful?  Did I leave something important out that would help you and others build this application?  Have suggestions on how I could improve this blog post?  Please let me know in the comments below.  Thanks!

{ 0 comments }

Why would the Vermont legislation endanger the livelihood of their independent business constituents?

Yesterday, at 3:34 PM, I received the following email from Amazon:

Hello,

We are writing from the Amazon Associates Program to notify you that your Associates account will be closed and your Amazon Services LLC Associates Program Operating Agreement will be terminated effective January 6, 2015. This is a direct result of Vermont’s state tax collection legislation (32 V.S.A. § 9701(9)(I)). As a result, we will no longer pay any advertising fees for customers referred to an Amazon Site after January 5, nor will we accept new applications for the Associates Program from Vermont residents.

Please be assured that all qualifying advertising fees earned prior to January 6, 2015, will be processed and paid in full in accordance with your regular advertising fee schedule. Based on your account closure date of January 6, 2015, any final payments will be paid by March 31, 2015.

Amazon strongly supports federal legislation creating a simplified framework to uniformly resolve interstate sales tax issues. We are working with states, retailers, and bipartisan supporters in Congress to get legislation passed that would allow us to reopen our Associates program in Vermont.

We thank you for being part of the Amazon Associates Program, and hope to be able to re-open our program to Vermont residents in the future.

Sincerely,
The Amazon Associates Team

This email announced immediate termination of all Vermonters’ Amazon Affiliate accounts.

What is the Amazon Affiliates program?

The Amazon Affiliate program allows me to earn commission on referrals that originate from links on my websites and YouTube channel.  When I published this Nissan Pathfinder Knock Sensor repair video  on YouTube in 2010, I added Amazon Affiliate tagged links to tools and parts I used in the video.  When someone clicked the link and made a purchase on Amazon, I received a commission between 5 and 10%.  Since 2010 this has generated passive income for me of a few thousand dollars.  This will have a large impact on Vermont creatives that blog and depend on this revenue stream for a large portion of their income.   The Amazon affiliate link can be seen in the screenshot of my video below.  This one link has generated 2,115 clicks and 248 orders since January 2012.

Amazon Affiliate Link Vermont Amazon Tax

OK, So what changed?

In 2011, the Vermont legislation amended the Sales and Use tax law definition of a ‘vendor’ to include internet affiliate programs such as Amazon (source).  This was dubbed the ‘Amazon tax’.  It’s not clear when this law change took effect, but it may have been the first of this year.  An excerpt of this Vermont statute 32 VSA 9701 (9)(I), can be found below:

(I) For purposes of subdivision (C) of this subdivision (9), a person making sales that are taxable under this chapter shall be presumed to be soliciting business through an independent contractor, agent, or other representative if the person enters into an agreement with a resident of this State under which the resident, for a commission or other consideration, directly or indirectly refers potential customers, whether by a link on an Internet website or otherwise, to the person if the cumulative gross receipts from sales by the person to customers in the State who are referred to the person by all residents with this type of an agreement with the person are in excess of $10,000.00 during the preceding tax year. For purposes of subdivision (C) of this subdivision (9), the presumption may be rebutted by proof that the resident with whom the person has an agreement did not engage in any solicitation in the State on behalf of the person that would satisfy the nexus requirements of the United States Constitution during the tax year in question.

The vagueness of this line is of particular concern:

directly or indirectly refers potential customers, whether by a link on an Internet website or otherwise

This has the potential to impact other much more significant revenue streams that Vermont bloggers depend on such as Google’s AdSense.  (Google AdSense is a program that pays you to put ads on your website.)  It’s unclear at this point the extent to which this law change will impact the income of Vermont’s bloggers and content creators.

For me personally, it means I will be shutting down a website that relied heavily on revenue from Amazon affiliate links.  It also makes me question whether it’s worth posting ‘how to’ videos such as the one above that take a significant amount of time to produce.

Questions or concerns?  Please leave a comment below.

 

Special thanks to Cairn Cross (@vtcairncross) for providing me information regarding the ‘Amazon tax’ passed by the Vermont legislature.

You can also follow this story on VPR online at “Amazon shutters Vermont program over tax issue.

 

{ 5 comments }

To take advantage of the new cross device measurement reports in Google Analytics, you need to enable the User ID functionality.  For these reports to be of any value, you need to be tracking user interaction across all of your web and mobile touch points against the same GA Property ID.  The property ID is that funky UA-XXXXXX-XX value that Google assigns you when you create a new Google Analytics property.

A typical hierarchy with views might look like this:

  • Client/Property Name (The UA-XXXXXXX-XX is assigned here)
    • Web Rollup View (Tracks all web hits – no filters)
    • Web Site Subdomain (For tracking a particular portion of your site  – filters to a specific subdomain)
    • Web Site Subdirectory (For tracking a particular portion of your site – filters to a specific subdirectory)
    • Mobile App Rollup (Tracks both iOS and Android hits for segmentation and aggregate reporting)
    • Mobile App iOS (Only tracks iOS SDK hits – Filtered to iOS operating system)
    • Mobile App Android (Only tracks Android SDK hits – Filtered to Android operating system)
    • User ID (Tracks authenticated users across all platforms)

In the above hierarchy, the top 3 views are ‘Web Site’ type and the bottom 3 are ‘Mobile’ type.  The last profile is the User ID profile that only records traffic when a User ID is passed along with the Google Analytics pageviews and events.  This enables those snazzy cross device measurement reports.

What you may not realize is that as of version 3.0.0 of the iOS and Android SDKs released on August 6th, 2013, the mobile SDKs were modified to record hits in the same manner as the web version.  While web and mobile profiles differ in tracking pageviews (screens in mobile), the sessions are similar.  As a result, your mobile sessions may be reflected in your web profiles and vice versa.  Oh the humanity!

Hindenburg

How do I know if I’m affected?

The quickest way to see if you’re affected by this, is to look at the Audience > Technology > Browser & OS report.  If you see a browser ‘GoogleAnalytics’, then you’re affected.  This is the browser the Android and iOS SDKs pretend to be when they record hits.

GoogleanalyticsBrowser

 

See how the report is showing the ‘GoogleAnalytics’ browser sessions, but no pageviews?  You can imagine the havoc this wreaks on your computed metrics such as bounce rate, pages/session and so on.

 

How do I fix it?

To prevent mobile SDK hits from sneaking into your web views add the following rule to all of your web views:

Screen Shot 2014-06-08 at 6.07.19 PM

 

** Hint: Once you save this rule, you can use it in other views.

To prevent web hits from showing in your mobile profiles, add this rule:

Screen Shot 2014-06-08 at 6.09.45 PM

Once you add these rules, you’ll be accurately reporting web hits and sessions in your web views and mobile hits and sessions in your mobile views.   Keep in mind that you should not add either of the above rules to your User ID views.   Doing so will remove the data needed to generate the cross device measurement reports.

Questions or comments?  Get in touch with me on Google+ or leave a comment below.

 

{ 7 comments }

Update 6/9/2014: Jump to the end of this post for a solution that obviates the need for two tags.

A week ago, I migrated a site to use Universal Analytics (UA) from classic GA.  As part of the migration, I wanted to enable the new User ID functionality which lets you pass a User ID for authenticated (signed in) with each GA hit.  This gives Google Analytics a common identifier to the ‘connect the dots’ across visits where cookies aren’t shared.  That way when a user visits from multiple machines and mobile devices, you have visibility into their behavior through the new Cross Device Measurement reports in Google Analytics.

The site being upgraded to UA was using Google Tag Manager which makes migrations of this nature a breeze.  I disabled the classic Google Analytics tag and added the new Universal Analtyics tag.  To enable the User ID feature, you need to add a row in the ‘Fields to Set’ section.

Screen Shot 2014-06-08 at 4.26.16 PM

 

The value field is populated with my {{uid}} macro which is the customer identifier that is passed to the data layer when a user is authenticated.  What I didn’t realize is that when the user isn’t signed in, the {{uid}} is empty, but the &uid field is still passed to the Google Analytics hit.   This caused a significant drop in pageviews, because for some reason when the UID field is empty, Google Analytics isn’t tracking them.

The workaround is to create two versions of your Universal Analytics tag to handle authenticated and non-authenticated scenarios.  For the authenticated users, you want a firing rule similar to this:

Screen Shot 2014-06-08 at 4.37.07 PM

 

This rule uses RegEx to test that the {{uid}} is a valid GUID (the customer identifier on this particular site) and fires on all pages.

For the non-authenticated (no User ID to populate), modify your Universal Analytics tag so the ‘Fields to Set’ is not being populated with the ‘&uid’ field.  The firing rule for this tag will simply be the ‘All Pages’ rule.  Now add a blocking rule using the rule created above (All Authenticated Pageviews) that only fires when the {{uid}} macro is a valid GUID.

Now  you have two versions of your Universal Analytics tag.  One will fire when users are authenticated and pass the User ID.  The other tag will fire for anonymous users and not populate the User ID field.  This will resolve the issue of pageviews not being properly recorded due to the empty User ID field.

Don’t forget to do this same workaround for any event or ecommerce tags you have implemented in Google Tag Manager.

As a final note, I have reached out to the Google Analytics team asking if this is the intended behavior as I suspect others will encounter this same issue.  I will update this post based on their response.

Update 6/9/2014: Thanks to Simo Ahava, there is a solution that eliminates the need for two using two tags.  For this version, you’ll only need one Universal GA tag, one rule  (All Pages) and two macros ({{uid}} and {{Get User ID}}.  In the Universal GA Tag fields to set, you’ll use a JavaScript macro {{Get User ID}} that will do a check to see if the User ID value is empty or not.  If it is, it simply returns which sets the {{Get User ID}} macro to undefined.  Google Analytics does not pass undefined values as part of the hit, so you avoid the problem I encountered previously.  Here’s what the JavaScript macro looks like:

Screen Shot 2014-06-09 at 7.40.59 PM

Happy Measuring!

Questions or comments?  Get in touch with me on Google+ or leave a comment below.

 

 

{ 9 comments }

In many software-as-a-service (SaaS) companies, applications are multi-tenant.  This means you’ll have multiple clients sharing the same code base.  However, clients will often want their own Google Analytics profiles or even their own accounts.  To meet this need while also keeping things manageable in Google Tag Manager (GTM), you’ll want to store your clients’ Google Analytics (GA) Profile IDs in your database alongside their configurations.  Then, once an instance of a client’s site is loaded, you can pass the client’s GA Profile ID into the GTM data layer.

First, create a macro to hold the client’s GA Profile ID in GTM:

Screen Shot 2014-05-14 at 6.50.14 PM

 

 

Now create a tag instance that uses this macro for the GA Profile ID:

Screen Shot 2014-05-14 at 6.58.22 PM

The last thing we’ll do in GTM is create a rule that fires our new tag after we check that the GA Profile ID is populated:

Screen Shot 2014-05-14 at 7.02.13 PM

 

Now in our data layer on the multi-tenant application, we’ll pass the client’s GA Profile ID:

<script>
 var dataLayer = [{
  'gaProfileId': '[Dynamic Code to Insert client's GA Profile ID]'
 }];
</script>

Now you can scale your Google Analytics using Google Tag Manager!  When the gaProfileID in our data layer is populated with a value starting with ‘UA-‘, then our rule will be true and our Universal GA tag will fire.  This same logic applies to adding additional dynamic values that may be client-specific but useful in the client’s GA profile.  If you have an aggregate profile that collects data across all clients, you can pass a unique client ID value as a custom dimension which will enable you to filter by client in your aggregate profile.

Questions or comments?  Get in touch with me on Google+ or leave a comment below.

{ 3 comments }

I’m an active member of the LinkedIn Google Analytics group and a fellow group member, Ruth, recently asked how to set the unique Google Analytics user ID from the GA cookie as a custom variable.  I responded with the JavaScript code that would do this in classic Google Analytics.  Before long Ruth asked how to do this in Universal Analytics and finally in Google Tag Manager.  I thought it would make sense to post it here as well.

First off, why would you want to do this?  Setting a custom variable with the unique user ID from GA will allow you track individual user behavior in reports by building a custom segment against a particular user ID.  You can also use this user ID to connect non-authenticated user behavior (before a user creates an account on your site) to an authenticated user (after they create an account).  This can be immensely helpful in attribution.

How does this code work? 
The code snippets below all perform the same basic steps:

  1. Attempt to read the __utma cookie value.  (Read about the GA cookies in depth here)
  2. Make sure the cookie has data in it (not undefined).
  3. Split the cookie values into an array based on the dot delimiter.
  4. Extract the appropriate User ID from the array (2nd slot in classic GA, 3rd slot in Universal Analytics).  Note that JavaScript arrays are zero-based so the ‘cookieValues[1]’ below  is really accessing the 2nd item of the array.
  5. Assign this User ID as a visitor scope custom variable in GA or as a custom dimension in Universal Analytics.
  6. Fires an event with a category of ‘Custom Variables’ and action of ‘Set UserId’ with non-interaction set to true.

Before getting into the code itself, a few assumptions are in order:

  1. You’re only firing one GA Profile ID on the site.
  2. You don’t already have a custom variable (custom dimension in Universal) in the first variable slot.  If you do, just modify the slot # to the first available slot.

This code needs to be implemented on your site after your Google Analytics code has had a chance to run.  This ensures that the __utma cookie that contains this unique user ID has been set on the user’s browser.  If you have your GA code running in the <head> section of your site, this code should be implemented near the closing </body> tag.

Here’s the script to set the GA user ID as a custom variable in the asynchronous version of classic GA:

Important Note: If you plan to upgrade to Universal Analytics in the near future, the functionality below will no longer work when you switch to Universal Analytics.  If you have not already implemented this User ID tracking, I would suggest implementing Universal Analytics first and then implementing the Universal Analytics logic in the 2nd script section below.

<script type="text/javascript"> 
function readCookie(name) { 
  name += '='; 
  for (var ca = document.cookie.split(/;\s*/), i = ca.length - 1; i >= 0; i--) 
  if (!ca[i].indexOf(name)) 
    return ca[i].replace(name, ''); 
} 

var gaUserCookie = readCookie("__utma"); 

if (gaUserCookie != undefined) { 
  var cookieValues = gaUserCookie.split('.'); 

    if (cookieValues.length > 1) { 
      var userId = cookieValues[1]; 
      try {
        _gaq.push(['_setCustomVar',1,'gaUserId',userId,1]);
        _gaq.push(['_trackEvent', 'Custom Variables', 'Set UserId','',0,true]);
      } catch(e) {}
  }  
}  
</script>

If you do need to change the variable slot in the code above, change the blue value below to a number between 2 and 5 in the script above.

_gaq.push(['_setCustomVar',1,'gaUserId',userId,1]);

Universal Analytics:

In Universal Analytics, you are setting a custom dimension instead of a custom variable.  You’ll need to add the custom dimension in the Universal Analytics profile setup prior to implementing the code below.  If you already have a custom dimension in the first dimension slot (dimension1), you’ll need to change the ‘dimension1’ value in the code below to the appropriate ‘dimensionX’ value.

<script type="text/javascript"> 
function readCookie(name) { 
  name += '='; 
  for (var ca = document.cookie.split(/;\s*/), i = ca.length - 1; i >= 0; i--) 
  if (!ca[i].indexOf(name)) 
    return ca[i].replace(name, ''); 
} 

var gaUserCookie = readCookie("_ga"); 

if (gaUserCookie != undefined) { 
  var cookieValues = gaUserCookie.split('.');
  if (cookieValues.length > 2 ) 
  { 
    var userId = cookieValues[2]; 
    try {
      ga('set', 'dimension1', userId);
      ga('send', 'event', 'Custom Variables', 'Set UserId', {'nonInteraction': 1});
    } catch(e) {}
   }  
}  
</script>

In Google Tag Manager with Universal Analytics:

You can implement the script below as a ‘Custom HTML’ tag firing on all pages (or just your landing pages, if preferred).  For the script below to work, you’ll also need to create an additional tag in GTM:

  1. Create a data layer macro to hold the userId value.
  2. Add an Analytics tag to set the custom dimension  in Universal Analytics passing in the userId macro you created above.  You can use a tag type of event to do this.  You’ll need to populate the event category and event action.  Be sure to set the non-interaction to ‘True’ to avoid artificially decreasing bounce rate.
  3. Set a firing rule for the tag of:  {{event}} equals setUserId
<script type="text/javascript"> 
function readCookie(name) { 
  name += '='; 
  for (var ca = document.cookie.split(/;\s*/), i = ca.length - 1; i >= 0; i--) 
  if (!ca[i].indexOf(name)) 
  return ca[i].replace(name, ''); 
} 
var gaUserCookie = readCookie("_ga"); 
if (gaUserCookie != undefined)  { 
  var cookieValues = gaUserCookie.split('.');
  if (cookieValues.length > 2 )  { 
    var userId = cookieValues[2]; 
    dataLayer.push({'event':'setUserId', 'userId': userId}); 
  } 
} 
</script>

Have any implementation questions or comments?  Get in touch with me by leaving a comment below or on Google+.

{ 35 comments }

As a developer, client-side errors can be a real pain.  Developers usually do a good job logging server-side application errors, but often times ignore client-side errors.  As the web evolves, we’re seeing more and more single page applications that are heavily reliant on client-side service calls using technologies like AJAX.  This makes sites heavily dependent on JavaScript.

Google Tag Manager just released a new listener tag type, the ‘JavaScript Error Listener’.  The JavaScript Error Listener will allow you to capture client-side errors that would otherwise be invisible to the server and log them to Google Analytics or another service of your own choosing.  For my example, I’ll be logging JavaScript errors to Google Analytics as an event with the error itself as the event label.  Bear in mind that the maximum size for an event label is 500 Bytes which will let us store 500 characters.

The first thing we need to do is add the JavaScript Error Listener to our tag container:

Screen Shot 2014-03-13 at 8.56.27 PM

Now that we have the JavaScript Error Listener in place with a firing rule of ‘All Pages’, I’m going to publish a debug version of the container so I can inspect the gtm.PageError object that will be pushed to the data layer when a JavaScript error occurs.  To do this, I’ve created a simple test page with a JavaScript error that results when the link is clicked because no function with that name exists.  Once I opened the test page:

  1. I opened the Chrome developer tools
  2. Clicked the link on the test page to cause the JavaScript error
  3. Typed ‘dataLayer’ in the console to inspect the dataLayer and expanded the last item (most recently added) to see the contents of the gtm.pageError event.

Screen Shot 2014-03-13 at 9.08.34 PM

Now I can see that the ‘gtm.errorMessage’ field is populated with some useful information about the uncaught exception.  To log this to Google Analytics as an event, I’ll need to add a new tag in GTM with a firing rule of {{event}} equals gtm.pageError and create a new macro for the gtm.errorMessage object.

Here’s the rule to fire the GA event when an error occurs:

Screen Shot 2014-03-13 at 9.21.06 PM

The macro to pull in the JavaScript error message:

Screen Shot 2014-03-13 at 9.21.52 PM

and finally the GA event tag itself:

Screen Shot 2014-03-13 at 9.22.45 PM

Now when a JavaScript error occurs on your site, you’ll be able to see them in your Google Analytics event reports:

Screen Shot 2014-03-13 at 9.28.11 PM

To take this a step further, I’d suggest setting up custom alerts to send an email when JavaScript errors occur on your site.

Questions or comments?  Get in touch with me on Google+ or leave a comment below.

{ 4 comments }

Justin Cutroni, Analytics Evangelist at Google

Last night, I had the pleasure of attending our local Web Analytics Wednesday meeting (yes, I know, on a Tuesday) at ThoughtFaucet in Burlington, VT. As Vermonters working in the web analytics space, we’re fortunate to have Justin Cutroni, Analytics Evangelist for Google, right here in our backyard. The format for the meeting was open Q&A. We started off with a few general questions, but before long Justin was diving deep and sharing some nuggets that I’ll share with you:

  1. Universal Analytics is the future of Google Analytics.  Universal Analytics will be out of beta very, very soon.  Classic Google Analytics won’t go away, but all future enhancements will only be released to Universal Analytics.  Justin encouraged planning for the migration to Universal Analytics now – especially for larger sites with more time between releases.  Justin also recommends using Google Tag Manager to assist with this implementation.
  2. Remarketing is currently the only feature in classic Google Analytics that is not yet available in Universal Analytics.
  3. Remember those cross device measurement reports that Google announced at I/O back in May of last year?   Justin indicated we can expect them “very, very soon.” This will allow you to measure multi-device paths to conversion.  For instance, you’ll be able to track a user as they start building their cart on their mobile phone through to checkout on their laptop.  Keep in mind that in order for these reports to work, you’ll need to use Universal Analytics and provide  a ‘UID’ (User ID) to connect the dots between any given user’s desktop, phone and tablet.
  4. There will always be a free version of Google Analytics.  With the launch of Google Analytics Premium, this has been a common concern.  With so many of us relying on a free tool, it’s good to know that it’s here to stay.
  5. For the Google Analytics API, Justin said that his team’s goal is to make every report, metric and dimension available in the Web UI through the API.
  6. Industry benchmarks will be available again soon in a different more useful format.  Justin said they decided to remove them because they led to more confusion than helpful insights.
  7. Google Tag Manager (GTM) will soon undergo a big UI refresh to help management of tag containers with a large number of tags, rules, and macros.  GTM will also soon allow you to set a firing order for tags in a particular container.  This is a big deal as tags currently fire in random order unless you use this workaround.

Google Analytics Universal Analytics

Justin also delved into a number of other topics such as building a measurement plan, performing an analytics audit, as well as big data and the ‘internet of things’.  I won’t get into that here other than to say that it’s clear from hearing Justin talk that we can expect our roles as data analysts to change dramatically.  Rather than us analysts dissecting the data to build recommendations, we’ll be designing the systems that do it for us.  Exciting times!

Thanks again to Justin for taking time away from his family to answer our endless questions and to @Gahlord for hosting the event at ThoughtFaucet.

Have questions about this post?  Get in touch with me by leaving a comment or Google+.

{ 3 comments }

For those of you using the WordPress Thesis theme (as this site is), you may be wondering how to properly implement Google Tag Manager.  The built in Thesis option for scripts places scripts such as Google Analytics at the end of the page just before the closing </body> tag.  Thesis_Software_Scripts_GoogleAnalyticsThis is not an ideal location for Google Tag Manager.  Google Tag Manager should be located just after the opening <body> tag and if you implement a data layer, it needs to be in the <head> section.  Fortunately, Thesis provides us with a way to do this using a feature called Hooks.  A hook is a way to inject a particular piece of code or function into a specific location within WordPress.  There is a list of the many available hook locations on the Thesis support site.

To add a hook, you’ll need to edit the custom_functions.php file in the Thesis theme.  To do this, log into your WordPress admin dashboard, expand the ‘Appearance’ menu and click ‘Editor’.  Now, we’ll add two functions.  One for the Tag Manager script and one for the data layer.  Then we’ll call two hook functions that inject the functions we just created into our desired locations.

Let’s get to it.

  1. Create the function to inject Google Tag Manager
    function injectTagManager() {
    ?>
    <!-- Google Tag Manager -->
    <noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-5JKJ9S"
    height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
    <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
    new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
    j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
    '//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
    })(window,document,'script','dataLayer','GTM-5JKJ9S');</script>
    <!-- End Google Tag Manager -->
    <?php
    }
  2. Call the add_action function referencing the function you just created and call it in the ‘thesis_hook_before_html’ location.
    add_action('thesis_hook_before_html','injectTagManager');
  3. Create the function to inject the data layer
    function injectDataLayer() {
    ?>
    <script>
    var dataLayer = [];
    </script>
    <?php
    }
  4. Finally, call the add_action function to inject the data layer in the <head> section of every page.
    add_action('wp_head','injectDataLayer');

That’s it!  Hit save and visit your site with your handy Google Tag Assistant Chrome extension and verify Google Tag Manager is loading along with any contained tags such as Google Analytics.

GoogleTagAssistant_GoogleTagManager

Here’s what my custom_functions.php file looks like in its entirety:

<?php
/* By taking advantage of hooks, filters, and the Custom Loop API, you can make Thesis
 * do ANYTHING you want. For more information, please see the following articles from
 * the Thesis User’s Guide or visit the members-only Thesis Support Forums:
 * 
 * Hooks: http://diythemes.com/thesis/rtfm/customizing-with-hooks/
 * Filters: http://diythemes.com/thesis/rtfm/customizing-with-filters/
 * Custom Loop API: http://diythemes.com/thesis/rtfm/custom-loop-api/
---:[ place your custom code below this line ]:---*/
function injectTagManager() {
?>
<!-- Google Tag Manager -->
<noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-5JKJ9S"
height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-5JKJ9S');</script>
<!-- End Google Tag Manager -->
<?php
}
add_action('thesis_hook_before_html','injectTagManager');
function injectDataLayer() {
?>
<script>
var dataLayer = [];
</script>
<?php
}

add_action(‘wp_head’,’injectDataLayer’);

Happy Measuring!

Have questions about this post?  Get in touch with me by leaving a comment or Google+.

{ 3 comments }