GCP - Dataflow Enum
Tip
Learn & practice AWS Hacking:
HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking:HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking:HackTricks Training Azure Red Team Expert (AzRTE)
Support HackTricks
- Check the subscription plans!
- Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.
- Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.
Basic Information
Google Cloud Dataflow is a fully managed service for batch and streaming data processing. It enables organizations to build pipelines that transform and analyze data at scale, integrating with Cloud Storage, BigQuery, Pub/Sub, and Bigtable. Dataflow pipelines run on worker VMs in your project; templates and User-Defined Functions (UDFs) are often stored in GCS buckets. Learn more.
Components
A Dataflow pipeline typically includes:
Template: YAML or JSON definitions (and Python/Java code for flex templates) stored in GCS that define the pipeline structure and steps.
Launcher (Flex Templates): A short-lived Compute Engine instance may be used for Flex Template launches to validate the template and prepare containers before the job runs.
Workers: Compute Engine VMs that execute the actual data processing tasks, pulling UDFs and instructions from the template.
Staging/Temp buckets: GCS buckets that store temporary pipeline data, job artifacts, UDF files, flex template metadata (.json).
Batch vs Streaming Jobs
Dataflow supports two execution modes:
Batch jobs: Process a fixed, bounded dataset (e.g. a log file, a table export). The job runs once to completion and then terminates. Workers are created for the duration of the job and shut down when done. Batch jobs are typically used for ETL, historical analysis, or scheduled data migrations.
Streaming jobs: Process unbounded, continuously arriving data (e.g. Pub/Sub messages, live sensor feeds). The job runs until explicitly stopped. Workers may scale up and down; new workers can be spawned due to autoscaling, and they will pull pipeline components (templates, UDFs) from GCS at startup.
Enumeration
Dataflow jobs and related resources can be enumerated to gather service accounts, template paths, staging buckets, and UDF locations.
Job Enumeration
To enumerate Dataflow jobs and retrieve their details:
# List Dataflow jobs in the project
gcloud dataflow jobs list
# List Dataflow jobs (by region)
gcloud dataflow jobs list --region=<region>
# Describe job (includes service account, template GCS path, staging location, parameters)
gcloud dataflow jobs describe <job-id> --region=<region>
Job descriptions reveal the template GCS path, staging location, and worker service account—useful for identifying buckets that store pipeline components.
Template and Bucket Enumeration
Buckets referenced in job descriptions may contain flex templates, UDFs, or YAML pipeline definitions:
# List objects in a bucket (look for .json flex templates, .py UDFs, .yaml pipeline defs)
gcloud storage ls gs://<bucket>/
# List objects recursively
gcloud storage ls gs://<bucket>/**
Privilege Escalation
Post Exploitation
GCP - Dataflow Post Exploitation
Persistence
References
Tip
Learn & practice AWS Hacking:
HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking:HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking:HackTricks Training Azure Red Team Expert (AzRTE)
Support HackTricks
- Check the subscription plans!
- Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.
- Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.


