This is a continuation of my GA4 real-time analytics project using BigQuery, Google Analytics and Looker Studio.
Code to remove query params from URL
There is an article by Matt Mazur published in 2018 that includes the code needed to remove the query params.
His article includes the full BigQuery syntax to strip the params – so take a read. However, as I’m plugging the code into an existing query I only needed to take the Case statement, which looks something like this:
current_url,
CASE
WHEN STRPOS(current_url, "?") > 0 THEN SUBSTR(current_url, 0, STRPOS(current_url, "?") – 1)
WHEN STRPOS(current_url, "#") > 0 THEN SUBSTR(current_url, 0, STRPOS(current_url, "#") – 1)
ELSE current_url
END AS url
Here’s an explanation of why you might need this code, and specifically how it’s going to help my dashboard project.
BigQuery tables contain URLs with parameters and fragments
One thing Google Analytics likes to do is to track URLs with their parameters and fragments included. Which can be useful for example, WordPress uses ?s= as it’s search parameter. And because GA will track the parameter it means you can see what words people are searching on your website and how many times.
However, this means GA will also track URLs with UTM data as a separate URL from the non-parameter. This isn’t good because it basically classes them as a separate page, when in reality we want the stats for both URLs to be classed as one page.
This is one reason to find a way to strip the Parameters from the URL in BigQuery and make them ‘clean’.
However, I also have a very specific reason why I want clean URLs
Using ‘clean’ URLs as a join key to merge two data sets
As part of the big analytics and dashboard project I’m working on, we’re using BigQuery to merge data from Google Analytics with “meta” data we’ve manually added to pages on our site and then outputted to a feed.
The meta data I’m talking about here is information such as content owners, page purpose, department etc. Which will allow the dashboards to segment URLs for the marketing teams to analyse.
To merge the data I’m using the URL as the join key, because it makes the most sense. However, the join key looks for a match, which means the URLs won’t merge if they have parameters or fragments added.
Leave a Reply
You must be logged in to post a comment.