As of October 1, 2023, LINE has been rebranded as LY Corporation. Visit the new blog of LY Corporation here: LY Corporation Tech Blog

Blog


Automatically inspecting and testing with Google Analytics

Abstract

Google Analytics testing for TODAY

Using Google Analytics (GA) is a common way to track traffic on websites. With GA, you can learn important information such as the number of visitors on a specific page and even sort them by device model or country, so that operations team can use these data to analyze the visitors' behavior. After analyzing, operations team can strategically optimize the website content and design in order to provide better user experience to the visitors.

For LINE TODAY app, in order to make the operations team have confidence in the reports and content generated by GA, we wanted to make sure that the GA tracking code worked properly and that we have set custom dimensions correctly. After every release, LINE TODAY's development and QA teams have been using a proxy tool and an in-house parser tool in Python to verify if GA events are sent correctly.

What is Mitmproxy

Mitmproxy is a handy Python program network analysis tool as a middle proxy to track traffic. The reason for calling it handy is for it allowing addons (in Python script) which are useful for later analysis. You can inspect and capture the desired traffic with the console on your computer. Also, mitmproxy addons mechanism comes with a set of APIs that support developers to get desired traffic information.

The tool

The following picture illustrates how mitmproxy with addons mechanism intercepts GA traffic for LINE TODAY app.

Steps Description
1– 2 A user navigates to LINE TODAY page such as news article page on a client.
 3 The client sends LINE TODAY GA pageview or event (to the GA server).
4 The addon Python script hooks into and change mitmproxy’s behavior (intercepts all GA pageview and event), thus the request will be tracked by the addon for later analysis.
5 mitmproxy passes the request on to the GA server.

Test Automation of GA

Now, we'll introduce two solutions for verifying GA events are sent correctly, using mitmproxy. One is semi-automated solution, which provides a real-time output log for verification. And the other is full-automated solution, which provides tests an overall result for the GA verification.

Semi-automated solution

Using the semi-automated monitoring tool, we can verify in real-time if QA events are sent out properly or not. 

Set up

Before moving on to setting up, we first need to install mitmproxy and CA certificate. After installing, set the desired port in the proxy settings. In this post, we will use port 8081, avoid using 8080 just in case the port being used by other services.

The overall flow of setting up the semi-automated testing is as follows:

  1. Implement your addon Python script to intercept the desired traffic.
  2. Run mitmproxy with your addon.
  3. Visit webpages on your device and verify the GA traffic on your console.
  4. Check the real-time output log for verification.

The following is an example of an addon Python script, which keeps a track of GA pageviews and event traffic. When a GA request is generated, it bypasses the Mitmproxy's internal logging mechanism. Output logs are displayed on the console by using mitmdump.

from mitmproxy import ctx
import mitmproxy.log
import mitmproxy.proxy.protocol
 
class Tracker:
    def __init__(self):
    ..........
    ....
    ......
 
    def request(self, flow):
        logRequest = None
         
        # 1. Which event types do you want to monitor, ex: pageview and event
        self.event_type = ['pageview', 'event']
         
        # 2. Add custom dimensions at here, there are base on your requirement.
        checkCds = ['type', 'dp', 'cd17', 'cd4']
         
        # 3. Parsing logs of sending to Google Analytics server
        for a_key in flow.request.query.keys():
            if ('google-analytics' in flow.request.url):
                val = flow.request.query[a_key]
                if (a_key == 'type' and val in self.event_type):
                    ctx.log.info('Host: ' + flow.request.url)
                    ctx.log.info('%s = %s ' % (a_key, val))
                    logRequest = True
                elif (a_key in checkKeys and logRequest):
                    ctx.log.info('%s = %s ' % (a_key, val))
 
    def done(self):
        """
            Called when the addon shuts down, either by being removed from
            the mitmproxy instance, or when mitmproxy itself shuts down.
        """
 
addons = [
    Tracker()
]

mitmdump is the command-line companion to mitmproxy. Since it provides a tcpdump-like functionality, we can record logs and record the traffic both with Python script.

$ mitmdump -s [Your_Parser_File].py -p 8081 --set flow_detail=0

The option -p 8081 binds the proxy to listen to port 8081. The option --set flow_detail=0 sets the display detail level for flows in mitmdump; 0 is almost quiet.

Real-time logs

Suppose a user lands on the main page for movies from another category. Then you will see a log something like the following. 

# When a user clicks movie main page tab, the click event would be triggered.
 
--------------------------------------------------
Event sends to GA
--------------------------------------------------
Host: https://www.google-analytics.com/collect?v=xxxxx&t=event&_s=14&dl=https%3A%2F%2Ftoday.line.me%2FTW&ul=en-us&de=UTF-8&dt=LINE%xxxxx&tid=UA-xxxxxxxxx-1&_gid=&z=xxxxxxxxx
type = event
ea = portal_bar_click
el = bar_movie_click
cd17 = ios
cd4 = TW
..........
....
......
 
 
# The pageview event would be requested after event sent out.
--------------------------------------------------
Pageview sends to GA
--------------------------------------------------
Host: https://www.google-analytics.com/collect?v=1&_v=j68&a=xxxxxxxx&t=pageview&_s=15&dl=https%3A%2F%2Ftoday.line.me%2FTW&dp=TW_portal_%E9%9B%BB%E5%BD%B1&ul=en-us&de=UTF-8&dt=LINE%20TODAY&sd=24-bit&sr=2560x1440&vp=2560x791&je=0&_u=SCCAAEABE~&jid=&gjid=&tid=UAxxxxxxxxx-1&z=xxxxxxxxxxx
type = pageview
dp = TW_portal_movie
cd17 = ios
cd4 = TW
..........
....
......

We can see the logs in real-time; we can verify the console output log to ensure that all the keys in the GA request are correct or not.

Fully-automated solution

To save time on manual testing for verifying GA requests, we integrate mitmdump into our automation framework as part of automated tests. However, how can we ensure pageviews are sent out correctly in such tests? We could record a correct PV(pageview) log as a file, and once the test is done, the parser tool would generate a file and then compare with the correct one. If both the files are identical, we can conclude that the pageview in this test was sent out correctly.

Set up and Implementation

Since this solution is fully automated, there is no need for setting up or implementation. You just need to get the code from Git and run it. See the following diagram and table for the overall process.

Steps Description
1 Start a process to run mitmdump.
2 Start another process to lunch Selenium Webdriver.
3 Selenium Webdriver automates web browsers.
4–5 Browser navigates to LINE TODAY page and sends LINE TODAY GA pageview or event.
6 The desired GA requests will be tracked by the addon for later analysis. After all the tests have finished, the GA logs will be collected in an output file.
7 Mitmproxy passes the request on to the Google Analytics server.
8 The GA log file generated and the "correct" file are compared, and then the asserted result is shown.

Example

The following is an example integrating mitmdump with Robot Framework to get traffic while running GUI tests with the framework. We've set the keyword as Start Mitmdump Proxy Process to launch a process for running mitmdump.

Verify GA Pageview Event in PC Web
    Start Mitmdump Proxy Process
    # Here you can put the keywords to nevigate all the pages on your website
    # ..
    # ..
    # ..
    [Teardown]    Run Keywords    Terminate Mitmdump Proxy Process   Compare Two PC PV Files

Launch another process for navigating the pages that you want to verify GA on your website. When you are done with testing, terminate the mitmdump proxy process and compare the generated PV log file with the correct one to judge the GA tracking code works fine or not.

The following is an example which generates the keyword for executing mitmdump process. After testing is completed, the in.txt file is generated which then is compared against the file containing correct results.

# Keyword
Start Mitmdump Proxy Process
    ${result} =  Start Process    mitmdump -s YOUR/FILE/PATH/xxx.py -p 8081 --set flow_detail=0 > in.txt    shell=True

If the in.txt file and the file containing the correct result are identical, you will see something like the following on your console.

Conclusion

If Google Analytics is an important tool for your team to optimize your website, you may want a tool to ensure that GA requests work fine. Furthermore, if you use automated test tools to verify GA and set the tests to run automatically in Jenkins before each release, you will see the confidence in GA reports grow. Do have a go at using mitmproxy to intercept traffic from your app or website.

Reference

Here are two refrences for you to get more detail.