The Power of Regular Expressions and Google Analytics

Skribent:

2011-05-27 13:16
Uncategorized

There is an awful lot you can do with Google Analytics, in spite of the fact that it is free (in most cases anyway). But to leverage analytics to the max, you should immerse yourself in Regular Expressions. It can in many cases help you when you want to create advanced filters, set up goal tracking, or segment your traffic. But to be able to create these, one of the prerequisites is that you understand the basics of Regular Expression, that is a science in itself but by trial and error you can create incredible things.

Content

Backslash-/
Pipe -|
Question Mark-?
The Parentheses-()
Point -.
Brackets, stretch and Staples – [-] {x, y}
Star – *

Backslash-/

We begin with the backslash since it is probably the one you will use and need the most. A backslash will be used in the cases when we want to change regular expression to regular text. For example if we have a URL containing “?”, we have a problem, because the “?” fills a function for regular expression. By then using the backslash we can remove the “?” function. It is called to escape the character.

 

Example:
Regular Expression - backslash

 

 

 

 

 

 

 

 

Lets say we have an online shop that use different ID numbers in the URLs for different categories on the website. We can then use the filter “Search and replace” to create clear reports for people who are not familiar with your website’s URL structure. A problem you will run into when creating this filter is, that the URL contains a “?”. But by putting a / before “? “, we remove the”? “.

Pipe-|

The pipe or line character, call it what you want, is very simple and most of you are probably well aware of what it does and how it works. You use | the times you want to make use of the function “or”. In many cases it can be useful when we want to check the keywords people used to find a website.

 

Example:

 

 

 

Say that we would like to compare two different types of visitors to see which ones are converting the best. Maybe we want to compare those who find your site with keywords like “free” and “cheap” with all other search traffic. We could then create an advanced segment where we make use of keywords that includes “free|cheap”.

Question mark-?

As I mentioned before, the question mark is another feature of Regular Expression. What it does is that it marks a character or expression as “optional”. Lets say that in the keyword report, we want to filter out the company name. Your company is called The Dance Group, but a common misspelling is The Dannce Group with two n. We then would be able to write “The Dann?ce Group” essentially, filtering out both The DanceGroup and The Dannce Group, since the last letter “n” is marked as optional, it needs to be incorporated.

Parenthesis-()

Parenthesis works with regular expression just as it does with math. If this falls under “things I should have learned but I never did,” don’t worry, we are going to go over a few examples. If we look up our old notes from the math class we hopefully find something like this,

  • 7+3*3=16 – i.e. three times three is nine, seven plus nine equals sixteen
  • (7+3)*3=30- i.e. seven plus three is ten, ten times three equals thirty

 

You may remember how your math teacher always said, “multiply first.” But by using parenthesis, we can change the order of how we calculate a number. In a similar way, we can make use of parenthesis together with regular expression. Instead of math, we here have a similar example using regular expression.

  • /map-one|two/contact=/map-one or two/contact
  • /map-(one|two)/contact=/map-one/contact or /map-two/contact

 

We can therefore, with the help of the parenthesis group expressions without affecting the entire string itself.

 

Example:

 

 

 

 

 

 

 

 

 

 

 

 

Imagine a site with various forms scattered over an entire website and the value/information for each completed form are the same, so we want to collect all of these under the same goal tracking. However, the problem is that forms are using different end destinations. Some will land in /sent some in /sent.html and others in /sent.php. We then would be able to specify the target address by stating /sent(.html|.php)?

 

This way we cover all three final destinations, since the question mark makes our brackets “optional”. If we would remove the question mark, /sent would not be included, only /sent.html and /sent.php.

Point – .

As you saw in the previous example, we used the backslash before the points in the filter we created. In other words, then sign “.” has a function in Regular Expression. The point will match with any character whatsoever. So in reality, we wouldn’t need to use the backslash prior, simply because we probably won’t encounter anything other than just a point there. It might not be used often, but it can be very useful with other expressions.

  • Da.is-will match, Dagis, Da.is, Da_is i.e. But not Dais

 

Example:
Say we want to filter out traffic from a specific IP range. If your company would hold all IP addresses from 83.75.140.250 to 83.75.140.255, we would then with Regular Expression be able to write 83.75.140.25. This way we cover all the IP numbers. We do also cover 83.75.140.256, but this will never happen since this is not a correct IP number.

Brackets, Stretch and Staples – [-] {x, y}

Brackets are generally used together with “-” as well as with staples but can also be used without. We could, for example, in a list of keywords use filters such as [abc], we will then see all keywords containing a, b, or c. If we instead write 1 [abc] 2, we would only see 1a2, 1b2 or 1c2. But the best use of the brackets is like I said together with “-“. We can use the following techniques,

 

  • [a-z] – All lowercase letters from a to z
  • [A-Z] – All uppercase letters from A to Z
  • [0-9] – All numbers from zero to nine
  • [a-zA-Z0-9]-All the above (note that you do not use pipe or a comma.)

 

And get ready for what you can do when you begin to use staples. If we take the above example but add a staple, we get 1 [abc] {1,2} 2. So we are now going beyond the first example and match 1ab2, 1ac2, 1ba2, 1bc2 and so on.

 

Staples work by repeating previous statements from X to Y times.

Star – *

The star is perhaps the character in Regular Expressions that’s mostly used incorrectly, since it doesn’t really have the same function it usually has. The star means that you take the previous character and repeat anywhere from zero to countless times. So Be* st will match both Beeeeest and Bst. The star itself is maybe not so useful, but together with the point, we can create pretty cool things. A filter that I use frequently is a filter that makes it possible to see the whole URL address in which a visitor came from.

 

Example:

 

 

 

 

 

 

 

 

If we create a filter using the following settings,

  • Filter type: Custom filter = Advanced
  • Box A->Extract A: Reference = (. *)
  • Box B->Extract B: Leave Blank
  • Output to-constructor: User defined = $A1
  • Box A is mandatory: Yes
  • Box B is mandatory: No
  • Override fields for output: Yes
  • Difference between uppercase and lowercase letters: No

 

So with this filter, we will be able to see from what URL the visitor found the site. We will then find the report under the tab Visitors-> User Defined. Remember that the filter will begin to work from the moment you create it.

 

Hopefully you have now learned a little bit more about Regular Expressions and how you can use it with Google Analytics. Later on, we will dive deeper into Regular Expressions, but until then I will let you test and explore the material you find here. Please leave a comment about how you would want to use Regular Expressions or how you use it with Google Analytics today.

 
  • Niranjan Pande

    thank you so much for publishing this article

  • https://se.linkedin.com/in/svenssonkristian Kristian Svensson

    Once you get to know regex and combine it with GA you can use it a lot better. A note on the Parenthesis-() would be that you can also use them as a capturing group which you can make reference to e.g. “^start your (engines) now” you can extract “engines” by using $1 – which would be the 1st capturing group. Very powerful when you need to work with GA filters and segments. Thanks for a nice article!