Introduction
Google BigQuery is a powerful, fully‑managed data warehouse that supports ANSI‑SQL with several extensions. In this post we explore some of the more advanced capabilities that can unlock deeper analytical insights while keeping queries efficient.
Window Functions
Window functions let you perform calculations across a set of rows related to the current row without collapsing the result set.
Running Total Example
SELECT
date,
sales,
SUM(sales) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM `my_dataset.sales_table`
ORDER BY date;
Percentile Rank
SELECT
user_id,
score,
PERCENT_RANK() OVER (ORDER BY score DESC) AS rank
FROM `my_dataset.scores`;
Arrays & STRUCTs
BigQuery’s support for repeated fields (arrays) and nested records (STRUCTs) enables complex, hierarchical data modeling.
Flattening a Repeated Field
SELECT
user_id,
event.timestamp,
event.type
FROM `my_dataset.events`,
UNNEST(events) AS event;
Aggregating Into an ARRAY
SELECT
country,
ARRAY_AGG(city ORDER BY city) AS cities
FROM `my_dataset.locations`
GROUP BY country;
Temporary User‑Defined Functions
Define reusable logic inline with CREATE TEMP FUNCTION
. This is especially handy for complex transformations.
CREATE TEMP FUNCTION normalize_email(email STRING)
RETURNS STRING
LANGUAGE js AS """
return email.trim().toLowerCase();
""";
SELECT
user_id,
normalize_email(email) AS clean_email
FROM `my_dataset.users`;
Performance Tips
- Use
SELECT ... FROM … WHERE …
filters early to prune data. - Prefer
IN
overOR
for multiple value checks. - Leverage
CLUSTER BY
on frequently filtered columns. - Materialize intermediate results with
CREATE TEMP TABLE
when re‑using them. - Use
APPROX_COUNT_DISTINCT
when exact counts aren’t required.
Conclusion
Mastering these advanced features can drastically improve the expressiveness and performance of your BigQuery workloads. Experiment with them, combine techniques, and keep an eye on query execution details in the UI.