SQL in BigQuery – First steps in Raw Data Google Analytics 4
How to make an SQL report from Raw Data of Google Analytics 4? First, you need to have data. Have already? Log in to Google BigQuery
On the left panel choose your project (sugestowo in my screen) -> analytics_XXXXX -> event_s and click Query table. Now the magic starts 😉
I heard the opinion, that if you know English, you know SQL. I partially agree 🙂 It’s not that hard. Let’s start with the basics:
SELECT * FROM `sugestowo.analytics_263648625.events_20210303`
Select – it means that you are choosing something
“*” – means ALL (all columns)
FROM – means which table would you like to get data from
When you click run, you will get all the data for the 3rd of March 2021:
As you can see in the green in my example query processed 363.3 KB it’s very small amount. Queries easily take 10 GB. And Google provides you with 1TB free per month (each next TB cost ~5USD).
You can check a number of rows in your data looking at the bottom of the page (580), or by running:
SELECT count(*) FROM `sugestowo.analytics_263648625.events_20210303`
count(*) means to count all rows, so it provides you with 580. You can change your column name easily with “as”:
SELECT count(*) as NumberOfRows FROM `sugestowo.analytics_263648625.events_20210303`
As it is data from Google Analytics 4 let’s check which events are measured, event_name shows that:
SELECT event_name as NumberOfRows FROM `sugestowo.analytics_263648625.events_20210303`
But it gives … 580 rows of different types. So let’s make them unique, meaning … distinct. Yes, that’s the function 😉
SELECT distinct(event_name) as NumberOfRows FROM `sugestowo.analytics_263648625.events_20210303`
BigQuery / SQL Counting and grouping by
How to calculate how many times each event was fired? You need both “event_name” and “count(*)” so the first approach would be:
SELECT event_name, count(*) as NumberOfRows FROM `sugestowo.analytics_263648625.events_20210303`
But it will not work. Google will show you what’s wrong
You need to “group by” as google doesn’t know how to count(*) if there is an additional column. You need to make in the query group by event_name. This way it will now that you want to calculate a number of raws for each specific event_name:
SELECT event_name, count(*) as NumberOfRows FROM `sugestowo.analytics_263648625.events_20210303` group by event_name
To make it easier you do not need to provide event_name after group by. “1” will be enough as event_name is the first column
Date(s) handling in BigQuery SQL – Google Analytics 4
Let’s get to something more complicated. As on the end of a table name (in from), there is a date (20210303), how to create a table, which will take into account range of dates, for example between 1st and 3rd of march? Fairly simple. At the begining of your statement declare data variables:
DECLARE a DATE DEFAULT "2021-03-01"; DECLARE b DATE DEFAULT "2021-03-03"; #And in the next line: SELECT event_name, count(*) FROM `sugestowo.analytics_263648625.events_*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',a) AND FORMAT_DATE('%Y%m%d',b) group by 1
Why I added “#”? Because with the hash you are commenting:
How to make queries (codes) more user-friendly?
Let’s dive into “with as”. This way you can create a “helping” query for example:
with AllData as ( SELECT * FROM `sugestowo.analytics_263648625.events_*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',a) AND FORMAT_DATE('%Y%m%d',b) )
Then in your statement, you can use it in an easier way – by making it from “AllData”. Let’s look at it:
If you want to use many such sub-queries you need to do something like:
with FirstSubQuery as (XXXXXX), SecondSubQuery as (YYYYY), ThirdSubQuery as (ZZZZZ)
As you can see you provide “with” only one time. And you need to add a comma after “)” to let BigQuery SQL know, that you are starting the next subquery. If you have only one, you don’t use “,” after “)”.
Filtering results in BigQuery
You won’t guess it – to filter results you need to add “where” to a statement. So for example I would like to take only those raws of data in which event_name is outband_link
select * from AllData where event_name="outband_link"
Something more advanced? I want to check which users (cookies) made such activity on the 3rd of March:
#Don't forget to change dates to 3rd of march (beggining and end) select user_pseudo_id from AllData where event_name="outband_link"
How about making those users distinct? (I don’t want duplicates)
select distinct(user_pseudo_id) from AllData where event_name="outband_link"
And now the funny things start. Lets’ check which of those users (who made an outband_link on the 3rd of March) were on a site between 28th of Feb till 2nd of March.
We need to start with getting those users. It is fairly simple just declare two additional dates, and get all user_pseudo_id
select distinct(user_pseudo_id) from AllData2 #where AllData2 is also declared, but based on date range between 28th of Feb and 2nd of March
SQL Left join in BigQuery based on user cookies
Now we can select both, export them to Google Sheets and check it. However, there is of course better way 🙂 The query is:
#Start Date DECLARE a DATE DEFAULT "2021-03-03"; #End date DECLARE b DATE DEFAULT "2021-03-03"; #Start Date DECLARE c DATE DEFAULT "2021-02-28"; #End date DECLARE d DATE DEFAULT "2021-03-02"; #All data betwenn a and b date with AllData as (SELECT * FROM `sugestowo.analytics_263648625.events_*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',a) AND FORMAT_DATE('%Y%m%d',b) ), AllData2 as ( SELECT * FROM `sugestowo.analytics_263648625.events_*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',c) AND FORMAT_DATE('%Y%m%d',d) ), DistinctUserIDMarchOutbound as (select distinct(user_pseudo_id) from AllData), DistinctUserIDFebMarchVisit as (select distinct(user_pseudo_id) as user_pseudo_id_v1 from AllData2) select * from DistinctUserIDMarchOutbound left join DistinctUserIDFebMarchVisit on user_pseudo_id=user_pseudo_id_v1
The only new thing here is left join. First I have a table of unique users who made an event outbound link on the 3rd of march (“select * from DistinctUserIDMarchOutbound”), then left join (add to this table to the additional columns) DistinctUserIDFebMarchVisit. Match data based on user_pseudo_id. The results are here:
Handling nested data with Cross Join in BigQuery
Let’s look at exactly one raw of data, for example, some random pageview:
Yes, this is one raw. The data in event_params is nested. You can unnest this data by:
CROSS JOIN UNNEST (event_params) AS params
But what the heck. It’s 12 rows now!
Let’s look to the right:
Unnest took the nested data:
And created a copy for each of them, but on the very right, it divided it. So to take only page URL on which user was in this event you need to:
- Filter event_params.key to be equal page_location
- Have a column: event_params.value.string_value
Let’s do it:
#Start Date DECLARE a DATE DEFAULT "2021-03-03"; #End date DECLARE b DATE DEFAULT "2021-03-03"; with AllData as (SELECT * FROM `sugestowo.analytics_263648625.events_*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',a) AND FORMAT_DATE('%Y%m%d',b) ) Select value.string_value from AllData CROSS JOIN UNNEST (event_params) AS params where event_timestamp=1614805011838408 and user_pseudo_id="1984128012.1614805012" and event_name="page_view" and key="page_location"
It was an example with only one user. How to get all the users, and their pages? In select add “user_pseudo_id” and in where delete not necessary conditions:
How to get users and visited URL’s
Finally how to get distinct values? Use group by. So the final query is:
#Start Date DECLARE a DATE DEFAULT "2021-03-03"; #End date DECLARE b DATE DEFAULT "2021-03-03"; with AllData as (SELECT * FROM `sugestowo.analytics_263648625.events_*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',a) AND FORMAT_DATE('%Y%m%d',b) ) Select user_pseudo_id, value.string_value from AllData CROSS JOIN UNNEST (event_params) AS params where key="page_location" group by 1,2
If you prefer video version of tutorials here you will find it:
Those are the basics of reporting with Google Analytics 4 raw data. Let me know in a comment how do you like this article 🙂
Leave a Reply