Joao Correia

How to export Google Analytics data from BigQuery with standard SQL

This post is an update to “how to export Google Analytics data from BigQuery,” I wrote two years ago. If you are unfamiliar with Google Analytics BigQuery data or would like to review the export process, it is still worth reading.

In June 2016 Google introduced BigQuery, 1.11 wich features a revamped SQL dialect named “standard”. The previous SQL is now called “legacy.”

I thought it would be a good idea to update the BigQuery data export SQL script it to the new “standard.”

Below is a snippet of the SQL query. You can find the complete SQL queries on my GitHub.

Standard SQL

The #standardSQL signals BigQuery the SQL version you are using.

#standardSQL
SELECT
  -- Hits are recorded in UTC. Customize your timezone to convert to your timezone.
  FORMAT_TIMESTAMP("%Y-%m-%d %H:%M:%S", TIMESTAMP_SECONDS(SAFE_CAST(visitStartTime+hits.time/1000 AS INT64)), "America/Los_Angeles") AS hit_timestamp,
  visitNumber,
  visitId,
  fullVisitorId,
  FORMAT_TIMESTAMP("%Y-%m-%d %H:%M:%S", TIMESTAMP_SECONDS(SAFE_CAST(visitStartTime AS INT64)), "America/Los_Angeles") AS hit_visitStartTime,
  CONCAT(SUBSTR(date,0,4),"-",SUBSTR(date,5,2),"-",SUBSTR(date,7,2)) AS date,
  trafficSource.referralPath,
  trafficSource.campaign,
  trafficSource.source
  ...
 -- Events
  hits.eventInfo.eventCategory,
  hits.eventInfo.eventAction,
  hits.eventInfo.eventLabel,
  hits.eventInfo.eventValue,

  -- Custom Dimensions (Add your custom dimensions by adding a line for each dimension and chaging the index)
  (SELECT MAX(IF(index=1, value, NULL)) FROM UNNEST(hits.customDimensions)) AS dimension1,
  
  -- Custom Metrics (Add your custom metrics by adding a line for each metric)
  (SELECT MAX(IF(index=1, value, NULL)) FROM UNNEST(hits.customMetrics)) AS metric1

  FROM  `dataset_id.ga_sessions_YYYYMMDD` , UNNEST(hits) as hits

Legacy SQL

The #legacySQL signals BigQuery the SQL version you are using.

#legacySQL
SELECT
  STRFTIME_UTC_USEC(SEC_TO_TIMESTAMP(visitStartTime + hits.time/1000),"%Y-%m-%d %H:%M:%S") as hit.timestamp,  
  -- ROUND(visitStartTime + hits.time/1000) as hits.timestamp,   /* Use for UNIX timestamp instead of timestamp */
  visitNumber,
  visitId,
  fullVisitorId,
  STRFTIME_UTC_USEC(SEC_TO_TIMESTAMP(visitStartTime),"%Y-%m-%d %H:%M:%S") as hit.visitStartTime,  
  LEFT(date,4)+"-"+SUBSTR(date,5,2)+"-"+RIGHT(date,2) as date,
  trafficSource.referralPath,
  trafficSource.campaign,
  trafficSource.source,
  ...
  -- Events
  hits.eventInfo.eventCategory,
  hits.eventInfo.eventAction,
  hits.eventInfo.eventLabel,
  hits.eventInfo.eventValue,
  
   -- Custom Dimensions (Add your custom dimensions by adding a line for each dimension)
  MAX(IF (hits.customDimensions.index = 1, hits.customDimensions.value,  NULL)) WITHIN RECORD AS dimension1,
  
  -- Custom Metrics (Add your custom metrics by adding a line for each metric)
  MAX(IF (hits.customMetrics.index = 1, hits.customMetrics.value,  NULL)) WITHIN RECORD AS metric1,
  
FROM [dataset_id.ga_sessions_YYYYMMDD]

WRITTEN BY

Joao Correia

SHARE OR SAVE IT FOR LATER