Advanced SQL Techniques in BigQuery

Table of Contents

Introduction
Window Functions
Arrays & STRUCTs
Temporary User‑Defined Functions
Performance Tips
Conclusion

Introduction

Google BigQuery is a powerful, fully‑managed data warehouse that supports ANSI‑SQL with several extensions. In this post we explore some of the more advanced capabilities that can unlock deeper analytical insights while keeping queries efficient.

Window Functions

Window functions let you perform calculations across a set of rows related to the current row without collapsing the result set.

Running Total Example

SELECT
  date,
  sales,
  SUM(sales) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM `my_dataset.sales_table`
ORDER BY date;

Percentile Rank

SELECT
  user_id,
  score,
  PERCENT_RANK() OVER (ORDER BY score DESC) AS rank
FROM `my_dataset.scores`;

Arrays & STRUCTs

BigQuery’s support for repeated fields (arrays) and nested records (STRUCTs) enables complex, hierarchical data modeling.

Flattening a Repeated Field

SELECT
  user_id,
  event.timestamp,
  event.type
FROM `my_dataset.events`,
UNNEST(events) AS event;

Aggregating Into an ARRAY

SELECT
  country,
  ARRAY_AGG(city ORDER BY city) AS cities
FROM `my_dataset.locations`
GROUP BY country;

Temporary User‑Defined Functions

Define reusable logic inline with CREATE TEMP FUNCTION. This is especially handy for complex transformations.

CREATE TEMP FUNCTION normalize_email(email STRING)
RETURNS STRING
LANGUAGE js AS """
  return email.trim().toLowerCase();
""";

SELECT
  user_id,
  normalize_email(email) AS clean_email
FROM `my_dataset.users`;

Performance Tips

Use SELECT ... FROM … WHERE … filters early to prune data.
Prefer IN over OR for multiple value checks.
Leverage CLUSTER BY on frequently filtered columns.
Materialize intermediate results with CREATE TEMP TABLE when re‑using them.
Use APPROX_COUNT_DISTINCT when exact counts aren’t required.

Conclusion

Mastering these advanced features can drastically improve the expressiveness and performance of your BigQuery workloads. Experiment with them, combine techniques, and keep an eye on query execution details in the UI.