top of page

Data Governance in GCP: BigQuery’s Bodyguard

  • Foto del escritor: Daniela RBR
    Daniela RBR
  • 19 jun
  • 15 Min. de lectura

Actualizado: 30 jun

"Knowledge is power" — an old phrase that's more relevant than ever. We live in the era of Big Data, and as Stephen Baker wrote in his book: The Numerati know everything about you. But when he said that, back in 2008, he was referring to a small group of experts. Today, almost 20 years later, it's estimated that over 97 million professionals work in the data world — a massive, booming sector, but with a shortage of talent. At the same time, many companies have become highly data-dependent — the so-called data-driven companies — which means they handle huge volumes of information.


Where there’s a lot of data, there’s a lot of power. And as Aunt May wisely told Peter: With great power comes great responsibility.”


In this post, I’ll walk you through:

  • Why limiting access to data is essential.

  • What responsibilities you take on when assigning permissions.

  • How to manage access in BigQuery while staying GDPR-compliant.

  • How to audit who granted access to what, to whom, and when.



Why should access to stored data be limited?

Limiting access to the data we store isn’t just a good security or data governance practice — it’s also a legal requirement.


Since 2018 —and fully enforced across all European systems since 2023— the General Data Protection Regulation (GDPR) has been in effect in Europe [link to the law]. This regulation mandates that any company processing personal data must follow the principle of data minimization: collect and expose only the data that is strictly necessary, and only to those who genuinely need it to perform their duties.


Article 5 of the GDPR sets out the core principles that must guide the processing of personal data.


Artículo 5 – Principles relating to processing of personal data: 1. (c) adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’); (e) kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed; (f) processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures (‘integrity and confidentiality’). 2. The controller shall be responsible for, and be able to demonstrate compliance with, paragraph 1 (‘accountability’). [Artículo 5 - link]

In short, it states that data must be used for legitimate purposes, be limited to what is strictly necessary, kept up to date, retained only for as long as needed, and —above all— properly protected against unauthorized access. On the other hand, Article 32 focuses on the security of processing and requires companies to implement measures that are proportionate to the risk, in order to ensure the confidentiality, integrity, and availability of the data. [Article 32 - link]

And when I mentioned "great responsibility," I was referring directly to the consequences of failing to comply with this law. Article 83 of the GDPR lays out the penalties for non-compliance with data protection regulations.


Article 83 – General conditions for imposing administrative fines 2. fines are imposed based on the specific circumstances of each incident, considering factors such as severity, duration, intent, corrective actions taken, and cooperation with the supervisory authority. 4. Less severe” violations (e.g., obligations of data controllers under Articles 8–11, 25–39, 42–43) can be fined up to €10 million or 2% of the global annual turnover — whichever is higher.I 5. More severe” violations (e.g., breaches of core data processing principles, data subject rights, international data transfers, or failure to comply with supervisory orders) can lead to fines of up to €20 million or 4% of the global annual turnover — whichever is higher. [Article 83 - link]

So, what happens if you fail to comply? Fines can reach up to €20 million or 4% of the company’s total global annual turnover, whichever is higher. That’s definitely a lot of money.



What responsibility do you take on when granting access to data in BigQuery?

The title of this article says it all: it’s not just about storing data — it’s about guarding and protecting it, like a bodyguard. When you manage access in BigQuery, you’re not just deciding who can query a table; you’re taking on a key role in legal compliance, in safeguarding sensitive data, and in preventing leaks.


In previous section, I referenced Article 5 of the GDPR, which lays out principles like data minimization and proactive accountability. In practice, this means that a company must know what data it collects, justify why it stores it and who can access it, and avoid storing or exposing more than necessary. In BigQuery, this translates into having a clear dataset structure, setting access at the project or role level, and enforcing both technical and organizational minimization policies.


We also looked at Article 32, which turns security into an active duty, not just a technical recommendation. It calls for access restrictions using Row or Column Level Security, encryption wherever possible, backups, system resilience, and regular audits of the effectiveness of these measures. In addition, anyone accessing the data must do so under clear and documented instructions.


In short, granting access in BigQuery is a technical task, yes — but also a legal and ethical responsibility. That’s why there’s a dedicated role to ensure it’s done right: data governance.



Resource Hierarchy

Although BigQuery is a product within Google Cloud Platform (GCP), it doesn’t live in isolation — its security and access controls follow GCP’s resource hierarchy [docs: hierarchy]. This is important to keep in mind, since permissions are inherited down the chain.



In the case of BigQuery, the relevant levels for granting permissions are:

  1. Organization (optional): general access to all projects, only if the company is linked to a domain.

  2. Project: access to all enabled APIs, including BigQuery; this is usually where general usage is defined.

  3. Dataset: access to groups of tables; you can control reading, writing, view creation, etc.

  4. Table: more granular access; querying, inserting, and updating can be allowed or restricted.

  5. Row/Column: fine-grained control through Row-Level Security and Column-Level Security, useful for applying dynamic rules based on the user.



Role hierarchy  

Just as BigQuery resources follow a hierarchical structure, so do roles and permissions [doc: roles and permissions]. Understanding this hierarchy is key to applying an effective security policy and enforcing access control based on the principle of least privilege.


Basic roles in GCP

Google Cloud Platform includes three basic roles that define general access to a project::

Rol

Detalle

Owner

Full control: can create, modify, and delete resources, manage IAM permissions, and handle billing.

Editor

Can create, modify, and delete resources across all services, but cannot manage IAM permissions or billing.

Viewer

Allows viewing resources and metadata within the project. Cannot make changes.

Browser

Allows viewing the project and folder hierarchy. Cannot see or access the resources within the project.

These roles can be selected from IAM by clicking "Grant access" and choosing the "Basic" category.


To allow someone to find and browse a project from the Google Cloud Console, it's necessary to assign them a basic role at the project level, such as Viewer or Browser. These roles ensure the project appears in their list and becomes accessible through the graphical interface [GCP Console].


On the other hand, when you give someone access to a specific resource in GCP — like BigQuery — they can only use it if they have a direct link to that resource within the project. Without that link, they won’t be able to find the project on their own by searching in the console. The access exists, but it's hidden.


So, when granting access to people (this doesn't apply to service accounts), it’s essential to combine basic roles with resource-specific roles, ensuring both technical access and visibility within their working environment.


Predefined Roles in BigQuery

Predefined roles are managed by Google Cloud and contain the necessary permissions to perform common tasks for a specific service. In the case of BigQuery, these roles can be grouped into four main categories:

  • Data roles: dataOwner, dataEditor, dataViewer, metadataViewer

  • Execution roles: jobUser, user, readSessionUser

  • Administrative roles: admin, resourceAdmin, connectionAdmin, connectionUser

  • Composite role: studioUser (integrates permissions across multiple services)


As mentioned earlier, these roles can be assigned through the IAM console and are easily searchable by resource.

If we focus exclusively on roles related to querying or viewing datasets and tables in BigQuery, we find the following (ordered from most to least powerful):


Rol

Qué permite hacer

BigQuery Data Owner

🧱 Full control over datasets, including modifying, deleting, and setting access controls.

BigQuery Data Editor

✍️ Can read and modify data (insert, update, delete), but cannot change access controls.

BigQuery Data Viewer

👁️ Can view metadata and read data from tables and views.

BigQuery User

🔧Can run queries and create their own datasets, but only sees data if they have permission to the underlying resources.

BigQuery Job User

🏃Can run jobs (queries, loads, etc.) but needs explicit access to datasets or tables.


It's essential to understand the scope of each role individually. To make this easier, Google’s official documentation provides a complete IAM roles and permissions index, where you can look up and explore the details of every role: link



How to Grant Permissions in BigQuery (Without Losing Your Mind)

The title of this section isn't random. In recent months, managing permissions in GCP—especially for BigQuery—has become part of my regular tasks. And while it may seem easy at first glance, once you get into it, you realize it’s anything but simple.

Still, like most things in life, once you figure out the logic, everything becomes much more manageable.


As explained in the documentation, granting IAM permissions involves three key components:

  • Principal: Who are you giving access to?

  • Role: What level of access are you granting?

  • Resource: What is the principal going to work with?


Step 1: Always assign a basic role

As explained earlier, it's necessary to assign a basic role so that the person can find the project without needing a direct link. Assuming the goal is for them to work only with BigQuery, the most appropriate basic role is Browser. This allows them to locate the project, view the list of resources, but when trying to access any of them, they’ll be told they don’t have permissions.


Browser (roles/browser) Read-only access to browse a project’s hierarchy, including folders, organization, and IAM policy. This role does notinclude permission to view or interact with the resources inside the project. [link]

This step only applies to human users — service accounts do not require a basic role to locate the project in the console.


Step 2: Define Access

Once the user can locate the project, the next step is to define what they should have access to. This is where the well-known Principle of Least Privilege comes into play: if someone needs to query a table, they don’t need permissions to view or edit all datasets. If they only need to run queries, they don’t need to create new datasets.


In short: grant only the permissions strictly necessary to do their job — no more, no less. To achieve this, it’s essential to define both the resource and the role.


Decision Flow: Type of Role

Decision Flow: Resource-Level Access


Step 3: Granting Access

The decision trees help clarify what access needs to be granted and to which resource — the next step is to make it happen.


But granting access can be just as complex —or even more so— than defining it. Not all permissions are assigned the same way: some are resolved with a simple IAM click, while others require more fine-grained conditions or specific configurations within BigQuery.


This is where the difference between giving access and giving access well starts to matter.


General Access: Assigning Roles Without Complexity

When someone needs to work with all the data in a BigQuery project, the most straightforward way is to assign the role from the IAM panel. Just click “Grant access”, select the appropriate user or group, and choose the relevant role.


This type of access applies to all datasets, tables, and views within the project, making it ideal when a full view of the data environment is required.


Conditionals Access: permissions with rules

Not all access is black or white. Sometimes, you need to grant permissions—but only under specific conditions. For example: allowing someone to query data only during certain hours, until a set date, within a specific dataset, or only if the resources have a particular label. This is where IAM Conditions come into play.


IAM Conditions in Google Cloud let you add conditional rules to access policies, giving you more granular control over who can do what, when, and under which circumstances.


When assigning a role, these conditions can be added (note: this doesn’t apply to basic roles).


There are two ways to do it: using the Condition Builder or the Condition Editor.


While there are two ways to define these conditions, the Condition Builder is the more user-friendly option: it offers a guided interface to create simple rules without writing any code. Once you select this option, the first thing it asks is to choose the type of condition. You'll find two main groups:

  • Time-based conditions, which let you define a time window during which access is granted. This is useful, for example, when giving temporary access to contractors or for time-bound projects.

  • Resource-based conditions, which allow you to limit access based on resource attributes—such as the resource name (resource.name) or labels (resource.matchLabels). This is ideal when you want to restrict access to a specific dataset or a subset of resources.

Condiciones basadas en tiempo 
Condiciones basadas en tiempo 
Condiciones basadas en recurso
Condiciones basadas en recurso

On the other hand, the Condition Editor lets you work directly with expressions written in CEL (Common Expression Language)—a more powerful and flexible syntax. It's ideal when you need to combine multiple conditions or apply more complex logic. [link to documentation]


For example:

resource.name == "projects/{project_id}/datasets/{dataset_id}" 
&& 
request.time < timestamp("2025-12-31T23:59:59Z")

Before wrapping up this section, there are a few important details worth keeping in mind when working with IAM Conditions or more customized access controls in BigQuery. These may seem small, but not knowing them can drive you crazy later on:

  1. Avoid using more than 12 logical operators in a single condition. While this isn’t officially documented as a hard limit, in practice it can cause the condition to fail silently or not be saved.

  2. If you grant access to a specific table, you must also grant access to the dataset that contains it. Otherwise, the person might be able to query the table (if they have the full path) but won’t see it listed or be able to browse to it in the UI.

  3. Conditions don’t apply to basic roles like Viewer or Editor. They only work with custom or compatible predefined roles.

  4. If you use labels in a condition, you can’t mix them with other types of conditions. This means you might need to grant the same role multiple times under different condition sets. Also, make sure the labels are properly applied and synced, otherwise the condition won’t work.

  5. When granting conditional access to multiple users at once, editing one condition might affect them all—or not—depending on how it was set up.

  6. And finally, as a good practice: document the conditions you apply, when and why. That way, others on your team can understand the access logic—and so can future you. You can even use the description field in IAM for this.


Restricted Access: protecting sensitive data

In scenarios where access needs to be limited to specific parts of a table —whether by rows, columns, or transformations— restricted access mechanisms come into play. BigQuery offers three powerful tools for this: Row-Level Security, Column-Level Security, and Authorized Views. Each serves a different purpose, and using them correctly is key to protecting data without disrupting the team's work.


Column-Level Security (CLS)

CLS in BigQuery is a feature that lets you restrict access to specific columns within a table, without blocking access to the entire table. It's useful for hiding sensitive columns based on the user's profile. Policy tags are applied to the columns via Data Catalog, and access is controlled through permissions on those tags. This means two people can query the same table but see different levels of detail.


To use CLS, the following requirements must be met:

  1. The project must belong to an organization in GCP.

  2. The Data Policy API must be enabled.

  3. You must have the following permissions:

    • Policy Tag Admin, to manage policy tags (configured from IAM).

    • Fine-Grained Reader or Masked Reader, to access data governed by those tags.


Row-Level Security (RLS)

It lets each person see only certain rows of a table, based on defined rules (for example, customer_id = 19). It's useful when you want to share a single table across multiple users without exposing all its content.


This type of permission is configured directly in BigQuery using SQL. For example:

CREATE ROW ACCESS POLICY only_customer_19
ON `mi_proyecto.mi_dataset.transactions`
GRANT TO ("user:persona@empresa.com")
FILTER USING (customer_id = 19);

Once set, you'll be able to view the policies from the table's “Schema” tab.


It sounds simple, but it's not that straightforward. When a table has RLS policies, only users included in at least one policy will be able to see any data. If someone isn’t covered by any of them, they’ll see nothing — even if they have other permissions.

Also, RLS policies are evaluated using OR logic: if a person meets at least one condition, they’ll see the matching rows. So if you first give someone restricted access and later add a policy that grants full access to all users in the project, that person will also see everything. In practice, the most permissive condition always wins.



Authorized Views

Just by reading what we’ve covered so far, it’s easy to see that granting access using RLS or CLS isn't exactly simple. Fortunately, there’s an alternative solution: authorized views.


A regular view in BigQuery works like a saved query, but in order to see its results, the user must also have access to the source table. That means if someone doesn’t have permission on the base table, even with access to the view, they won’t see any data.


An authorized view breaks that dependency: it allows someone to query the view without having any access to the original table. This is achieved by explicitly authorizing the view to access the source data — and then sharing only the view with the user.


An authorized view can be used as an alternative to RLS or CLS. If we apply a filter in the WHERE clause, it acts similarly to row-level security. If we carefully select specific columns in the SELECT clause, we can hide sensitive fields — just like column-level security.


To make a view an authorized view, you must explicitly grant it access to the source table. This is done by configuring the dataset that contains the source table to authorize the view — even if that view lives in another dataset or project.


From that moment on, any user with access to the view will be able to query the data it exposes — without needing direct access to the original table.




Access Auditing

Granting permissions is a key part of data governance — but it doesn’t end there. In many cases, access is assigned by different teams, or even by people outside the governance group who still have the authority to share data. That’s why it’s essential to track who gave access to whom, when, and how.


Google Cloud Platform automatically generates logs that record all activity across its services and resources. This functionality is part of Cloud Logging, which belongs to GCP’s observability toolkit. If someone grants access to a user, you’ll find the details in the corresponding log inside the Log Explorer. [Cloud Logging documentation]


Checking logs in the Log Explorer isn’t impossible, but it’s not the easiest way to regularly monitor activity. Records are displayed in raw JSON, which isn’t user-friendly and makes filtering or structured analysis harder. Fortunately, GCP allows you to create a sink, a pipeline that redirects those logs into a BigQuery table — so you can query them using SQL and even cross-reference them with other data sources. This approach turns technical logs into a clear, traceable, and easily exploitable audit trail.


To set it up, go to the Log Router section in GCP and create a new sink. When prompted for a destination, select a BigQuery dataset as the storage location for your logs.


Then, in the filtering section, it’s essential to restrict the logs to only those related to permission assignments. You can use the following filter for that:

protoPayload.methodName="SetIamPolicy"
logName="projects/{project_id}/logs/cloudaudit.googleapis.com%2Factivity"

With this configuration, GCP will start sending to BigQuery a log entry every time someone modifies IAM permissions — whether it’s adding, removing, or updating roles. These logs will be stored in a sharded table that updates continuously, with one partition per day.


At first glance, that table might seem endless and hard to read.


However, the following query turns it into something much more useful and clear:

SELECT distinct timestamp as modification_date
        ,protopayload_auditlog.authenticationInfo.principalEmail
        ,pi.permissionType
        ,bindingDeltas.action
        ,bindingDeltas.role
        ,bindingDeltas.member
        ,bindingDeltas.condition.title
        ,bindingDeltas.condition.expression
FROM `project_id.controles_gcp.cloudaudit_googleapis_com_activity_*`,
UNNEST(protopayload_auditlog.authorizationInfo) as pi,
UNNEST(protopayload_auditlog.servicedata_v1_iam.policyDelta.bindingDeltas) as bindingDeltas
order by 1 DESC

Final table schema
Final table schema

But it doesn’t end there — sometimes excessive permissions are granted, and GCP helps surface this easily through the "Security insights" column in the IAM panel.


When you see something like "9941/9972 excess permissions", it refers to a security metric that highlights how many of the permissions granted to a user, service, or account have not been used within a recent observation period. [recommender]


In this example, we see two users: one used only 121 out of over 9,000 permissions, and the other just 31. This is a clear sign of excessive privilege, suggesting their access could be significantly reduced without impacting their actual usage.



To long, don´t read

Granting access in BigQuery isn’t just a technical task—it involves meeting legal responsibilities like those required by GDPR, which mandates limiting and justifying who accesses data and why.


GCP allows you to manage access at different levels (project, dataset, table, column, or row) using a variety of roles. To do it right, it's essential to apply the principle of least privilege: give only the necessary access, to the specific resource, with the minimal required power.


Every IAM permission has three components: who you grant it to (principal), what they can do (role), and on what(resource). You can also add specific conditions or use tools like RLS, CLS, and authorized views for more granular control.


Finally, GCP enables access auditing via logs exportable to BigQuery, making it easier to track activity and support a secure, well-governed, and compliant access strategy.



Thanks for reading!

And don’t forget—you can “like” this post without logging in.

Big thanks to Carlos Clement for suggesting the idea of using a sink to audit access in BigQuery.

Hi, I'm Daniela. For many years, I was a mathematics teacher at schools and universities, and I also edited and wrote educational books. Today, I work as a Data Science Engineer and share content about mathematics, statistics, and data science on this site and social media. Creating and sharing knowledge is my way of staying connected to teaching. I strongly believe in two ideas that have always guided me: "acquiring new knowledge keeps you relevant" and "sharing is living."


Opmerkingen


bottom of page