Data governance in GCP: the Marie Kondo of BigQuery
- perdidosenmatematica
- 30 jun
- 10 Min. de lectura
Compulsive hoarding disorder is a mental health condition marked by a persistent difficulty in discarding or parting with possessions—regardless of their actual value. Sound familiar?
If you work in data, or have ever been part of a data-driven company, chances are you've already witnessed what I’m calling the Data Hoarding Disorder: the endless stockpiling of datasets, tables, views, logs, backups—all “just in case”.
But why does it happen? The honest answer is: there’s no real reason. Still, if we try to make sense of something that might not have any, here are a few likely suspects:
Fear of losing information — classic digital FOMO.
The belief that cleaning up is risky or a waste of time — “best leave it”.
A ‘more is better’ mindset — as if storage were free and searching painless.
No automated cleanup tools or policies — no one owns it, so no one acts.
From my experience working with multiple clients, I can confirm that many BigQuery projects live in a state of chronic digital clutter: orphaned datasets, forgotten tables, broken views... everyone sees them, no one deletes them.
Where there’s data, there’s power. But when that data is disorganised, duplicated or simply forgotten, it becomes noise, cost, and vulnerability instead of value.
In this piece, you'll learn:
How to spot unnecessary data hoarding in BigQuery.
A simple way to audit your environment—no extra tools needed.
What metadata is already available to govern your data better.
How to make decisions without checking tables one by one.
Why this approach saves time, money and frustration.
The BrotDani Method: Auditing BigQuery Without Losing Your Mind
Marie Kondo is a world-renowned organisation consultant and author, famous for creating the KonMari Method — a system that encourages people to keep only the items that “spark joy.” Her method recommends tidying by category rather than by physical location. More than just order, it’s about aligning your space with your values and priorities.
Daniela Rodriguez-Brot Rey — better known in certain Slack circles as Daniela Brot — is a (still) little-known communicator bridging data science and data engineering. One day, she was given a daunting task: audit the BigQuery estate of several client projects with over ten thousand tables. The goal was clear: understand what was there and clean it up. And so, the BrotDani Method was born: a practical approach to auditing and documenting BigQuery, based entirely on metadata.
The BrotDani Method offers a tangible way to apply data governance principles to BigQuery, using metadata as a starting point and visualisation as a decision-making tool.
In short, it involves:
Mapping what exists: Gathering information about all BigQuery assets — datasets, tables, views, jobs, structure, size, usage, and more.
Crossing real usage with structure: Combining activity (last query, last update) with technical traits (number of columns, partitions, clustering, size, naming conventions).
Detecting common issues: Unused views, outdated tables, forgotten backups and tests, poorly designed or expensive structures.
Generating automated recommendations: Based on a simple set of CASE WHEN rules, each object is classified into possible actions: delete, optimise, document, rename, monitor.
The BrotDani Method is a way to audit BigQuery with clarity and purpose. It doesn’t just list what’s there — it diagnoses, classifies, and suggests specific actions to improve the health of your data ecosystem.
Where does all this information come from?
INFORMATION_SCHEMA views in BigQuery are automatic, system-generated views that are read-only and give you access to the metadata of your objects — tables, views, columns, jobs, etc. No external tools, no obscure scripts, and no special permissions beyond access to the project or dataset.
The official documentation lists around 70 INFORMATION_SCHEMA views you can query to retrieve metadata in BigQuery. Some are marked with a cross (†), which means they have variants like BY_PROJECT, BY_USER, or BY_FOLDER. Others show a flask icon, indicating they are experimental.
There are views for almost everything: listing datasets, columns or routines; checking permissions; monitoring sessions; or reviewing automatic system recommendations.
Keep in mind that each view has its own way of being queried. For some, it’s enough to specify the region:
SELECT *
FROM `region-{eu/us}.INFORMATION_SCHEMA.SCHEMATA`
Others only accept dataset-level qualifiers — meaning you’ll need to specify the full path, including project and dataset, to query them.
SELECT *
FROM `{project_id}.{dataset_id}.INFORMATION_SCHEMA.PARTITIONS`
Some accept both formats — region-level and dataset-level — giving you more flexibility when querying across your BigQuery environment.
SELECT *
FROM `{project_id}.{dataset_id}.INFORMATION_SCHEMA.TABLES`
SELECT *
FROM `region-{eu/us/...}.INFORMATION_SCHEMA.TABLES`
There are other cases where project-level qualifiers are required, and filtering becomes mandatory to avoid overly large or restricted queries.
SELECT *
FROM `{project_id}`.`region-{eu/us/...}`.INFORMATION_SCHEMA.OBJECT_PRIVILEGES
WHERE object_schema = "{dataset_id}"
AND object_name = "{table_id}";
All of this is detailed in the syntax section or on each individual view’s documentation page.
What was used in the BrotDani method?
As a quick reminder, the goal of the BrotDani Method is to audit and document BigQuery projects using only the available metadata—no external tools required. The idea is to build a living inventory that combines structure with actual usage, allowing for clear decisions: what to delete, what to document, what to optimise, and so on.
This method relies on seven main INFORMATION_SCHEMA views, all queried at the regional level.
View | What does it provide? |
SCHEMATA | identifies all datasets, their location, creation and modification dates — a snapshot of your data estate. |
TABLES | lists all tables and views, including type and creation dates — helps detect proliferation and redundancies. |
JOBS | reveals who runs what, when, and at what cost — crucial for understanding usage and accountability. |
TABLE_STORAGE | gives storage metrics (logical/physical bytes, time travel, fail-safe) — supports cost and performance decisions. |
TABLE_OPTIONS | retrieves descriptive metadata (labels, descriptions, tags) — essential for assessing documentation quality. |
COLUMNS | shows table structures and clustering — useful for detecting oversized or badly designed tables. |
VIEWS | maps views to their source tables — helpful for identifying broken links or unnecessary views. |
Together, these views allow you to connect structure, usage, and documentation — the three pillars for informed decision-making in data governance.
Execution of the BrotDani Method
The execution of the method involved designing a series of SQL queries that cross-reference structure, usage, and documentation to build a living inventory of the entire BigQuery ecosystem.
These queries were organised into five main blocks:
Bloque de análisis | Objetivo general | Consulta principal (vistas usadas) |
Dataset Detail | Map all project datasets | SCHEMATA |
Dimension Table/View | Standardise and group objects by logical name | TABLES |
Job Detail | Obtener uso real, costo, duración y errores | JOBS + referenced_tables + labels |
Table Detail | Retrieve actual usage, cost, duration, and errors | TABLE_STORAGE, TABLE_OPTIONS, COLUMNS, JOBS |
View Detail | Audit views: origin, documentation, and activity | VIEWS, TABLE_OPTIONS, COLUMNS, JOBS |
Each block follows its own logic to combine relevant metadata. In the case of tables and views, this combination enables an automatic recommendation per object (e.g., delete, optimise, document, no action, etc.), based on simple rules defined using CASE WHEN statements.
Dataset detail
Returns the list of all datasets in the project, including creation and last modified dates, as well as their regional location.
What’s it for?
Helps identify how many datasets exist, when they were created, and whether they’re active or abandoned.
Enables cross-referencing with other views (like JOBS or TABLES) to spot datasets with no recent use or no clear owner.
Acts as a starting point to build an inventory of the BigQuery ecosystem.
Resulting table:
Field | Meaning |
project_id | Project ID (from catalog_name) |
dataset_id | Full dataset path (project_id.dataset_name) |
dataset_name | Dataset name (schema_name) |
creation_date | Dataset creation date |
last_modified_date | Last modified date |
location | Region where the dataset is hosted (location: EU) |
Code:
SELECT catalog_name as project_id
, CONCAT(catalog_name, ".",schema_name ) as dataset_id
, schema_name as dataset_name
, EXTRACT(DATE FROM creation_time) as creation_date
, EXTRACT(DATE FROM last_modified_time) as last_modified_date
, location
FROM `region-eu`.INFORMATION_SCHEMA.SCHEMATA
Dimension table view
Groups and standardises tables and views to create a dimension of logical objects, by removing numerical suffixes and harmonising similar names (e.g. date stamped versions of the same table).
What’s it for?
Identifies how often the same table has been recreated with different dates or suffixes.
Helps detect patterns of automated creation or unsustainable practices (such as excessive duplication).
Serves as a key to join with other views (like storage or jobs) and consolidate metrics by logical group.
Resulting table:
Field | Meaning |
project_id | Project ID (from table_catalog) |
dataset_id | Dataset the table or view belongs to |
table_id | Full object path, using unified name (_* as wildcard) |
table_view_name | Logical name: removes suffixes like dates or versions |
table_type | Whether it is a BASE TABLE or a VIEW |
first_creation_date | Earliest creation date among all objects sharing the same logical name |
last_creation_date | Most recent creation date among all objects sharing the same logical name |
Code:
SELECT table_catalog as project_id
, CONCAT (table_catalog, ".", table_schema) as dataset_id
, CONCAT (table_catalog, ".", table_schema, ".",
REGEXP_REPLACE(table_name, r'_\d{1,8}\*?$', '_*')) as table_id
, REGEXP_REPLACE(table_name, r'_\d{1,8}\*?$', '_*') AS table_view_name
, table_type
, MIN(EXTRACT(DATE FROM creation_time)) as first_creation_date
, MAX(EXTRACT(DATE FROM creation_time)) as last_creation_date
FROM `region-eu`.INFORMATION_SCHEMA.TABLES
GROUP BY ALL
Job detail
Explore the history of jobs run within the project, extracting metadata on queries, users, costs, durations, errors, and referenced tables.
What’s it for?
Helps identify who accessed which table and when it was last queried.
Enables analysis of estimated costs and detection of expensive or failed queries.
Essential for linking actual usage to the objects analysed (tables, views) and assigning recent activity.
Resulting table:
Field | Meaning |
creation_date | Job creation date (formatted as YYYYMMDD) |
user_email | User who ran the job, or the requestor (if labelled) |
job_type, statement_type | Job type (e.g., QUERY, LOAD) and executed SQL statement |
duration_seconds | Job's duration |
state, error_message | Final job state and any error message |
total_bytes_processed | Raw data processed |
total_slot_ms, total_bytes_billed | Slot consumption and billed bytes |
estimated_cost_usd | Estimated cost in USD |
referenced_table | Table accessed or modified |
table_id_in_query | Table inferred from SQL if not listed as referenced_table (useful for SELECT, INSERT, etc.) |
destination_table.* | Destination table (if applicable) |
looker_studio_report_id | Report ID if triggered from Looker Studio |
is_scheduled_query | Whether the job was a Scheduled Query |
Code:
Table detail
Consolidates structure, size, usage and technical configuration of all tables in the project. It merges multiple sources to enrich each table with key metadata.
What’s it for?
Audits logical/physical size, partitioning, clustering, and number of columns.
Assesses whether tables are properly documented (description, labels, tags).
Detects inactivity, inefficient or poorly designed structures, and applies rules to generate automatic recommendations such as delete, optimise, document, etc.
Resulting table:
Field | Meaning |
table_id | Unique identifier of the table |
total_logical_gb / total_physical_gb | Logical and physical size in GB |
total_rows | Total number of rows |
naming_snake_case | Whether the table name follows naming conventions |
num_columns | Number of columns in the table |
description, labels, tags | Useful metadata for data cataloguing |
last_date_query | Last time the table was queried |
storage_last_modified_date | Last actual content update |
query_made_by | Who queried the table (user, Looker, or process) |
days_since_last_query / days_since_last_update | Key metrics for data clean-up |
recommendation | Suggested action: delete, document, optimise, etc. |
Code:
View detail
Extracts information about views created in the project: their origin (base table), structure, documentation, and recent usage.
What’s it for?
Helps detect views that haven’t been queried in a long time or have poorly defined names.
Makes it easier to check whether views are documented or reference non-existent objects.
Generates recommendations such as delete view, add description, or change name, based on defined rules.
Resulting table:
Field | Meaning |
view_id / view_name | Full name and logical name of the view |
origin_table / origin_table_group | Base table from which the view is built |
naming_snake_case | Whether it follows a naming convention |
description, labels, tags | Metadata for documentation |
last_date_query, query_made_by | Last time it was used, and by whom |
days_since_last_query | Time since the last execution |
num_columns | Structural complexity |
recommendation | Suggested action: delete, document, rename, etc. |
Code:
The full code is published in this public GitHub repository: Link to repository
BrotDani Model Schema
There’s no magic behind the BrotDani Method—just structure. A relational model was built, where tables are connected through primary and foreign keys. This setup brings together real usage, technical structure, and descriptive metadata into a final table that enables clear recommendations and informed decision-making.

Lo que todo gobernador del dato quiere
This approach goes beyond technical auditing: it provides an automated and scalable way to cover many of the core responsibilities a tactical data steward should handle..
Specifically, it addresses:
Usage and activity auditing: Understand which tables and views are used, when, and by whom — and detect abandoned resources.
Data documentation: Check whether objects have descriptions, labels, or other metadata that clarify their purpose.
Naming convention control: Evaluate whether names follow a defined format (like snake_case) to ensure consistency.
Storage and performance optimisation: Identify opportunities to partition, cluster, or reduce columns — lowering costs and improving performance.
Obsolete asset removal: Combine real usage with structure and metadata to suggest deletions or revisions.
Living data catalogue: Build a comprehensive view of your BigQuery ecosystem with health logic per object and full traceability.
This solution covers roughly 80% of the tactical data governance work needed for BigQuery. Best of all: it’s automated, replicable, and sustainable. What often feels like a never-ending project becomes a practical system with real impact.
Too long, don't read
Auditing BigQuery isn’t just a technical task — it’s a core practice of data governance. Without proper controls, projects quickly become cluttered with unnecessary tables, views, and datasets — driving up costs, creating confusion, and putting data quality at risk.
The BrotDani Method offers a practical and automatable approach to auditing and documenting BigQuery, using metadata alone. By querying INFORMATION_SCHEMA, it builds a relational model that combines real usage, technical structure, and descriptive metadata to generate automatic recommendations: delete, optimise, document, or monitor.
The result isn’t just a one-off diagnosis — it’s a living, scalable inventory that covers around 80%, if not more, of the tactical responsibilities of a BigQuery data steward. A concrete solution to support informed decision-making and foster a more organised, efficient, and sustainable data culture.
Thanks for reading!
And don’t forget—you can “like” this post without logging in.
A heartfelt thank you to Esther Ponce for the shared learning and support.
Hi, I'm Daniela. For many years, I was a mathematics teacher at schools and universities, and I also edited and wrote educational books. Today, I work as a Data Science Engineer and share content about mathematics, statistics, and data science on this site and social media. Creating and sharing knowledge is my way of staying connected to teaching. I strongly believe in two ideas that have always guided me: "acquiring new knowledge keeps you relevant" and "sharing is living." |
댓글