A misconfigured Apache Airflow to AWS Account Compromise

Avinash Jain (@logicbomb)
5 min readFeb 2, 2022

It’s been a long time since I have penned down my findings with the security community and I think this write-up was worth sharing. In summary, this is about how I was able to exploit a security misconfiguration present in the older version of Apache Airflow for authentication bypass which I discovered while recon and then escalated it to access sensitive pages and functionalities that further exposed some sensitive credentials which led me to access their internal tooling and cloud platform.

Apache Airflow is an open-source tool to programmatically author, schedule, and monitor workflows. It is one of the most robust platforms used by Data Engineers for orchestrating workflows or pipelines for visualizing data pipelines’ dependencies, progress, logs, code, trigger tasks, and success status. Notably Apache Airflow is the #1 starred open-source workflows application on GitHub.

Recon

The misconfiguration that is designated as CVE-2020–17526 is already been exploited in the wild and since it is one of the most popular open-source tools, it makes the misconfiguration more widespread. As the CVE is the most recent one there is always a high chance of getting hold of such unpatched older versions over the internet. The goal was simply to first find an Apache Airflow instance running on a vulnerable version of 1. x.x. I began with enumerating through a list of domains/targets and gathering subdomains to find if Apache Airflow is running on it. Subfinder and a quickly written script came to rescue me here. I was pretty sure that there would be dozens of them still being used in the organizations and the same actually happened.

My hypothesis became stronger when I did a quick search over Shodan to actually see how many of them are exposed over the internet and vulnerable to CVE-2020–17526. Also to add to this, by default apache airflow doesn’t provide authentication in the older versions. A simple search revealed that there are more than 300 airflow instances publically exposed over the internet without any authentication.

I executed my script to find how many of them are on an older version, the count came out to be as high as 75. Publically exposed misconfigured instances that allow internet-wide access make these platforms ideal candidates for exploitation by attackers. No surprise why CVE-2020–17526 is so much in the news.

Exploit CVE-2020–17526

Coming back to the finding, once I discovered a bunch of Airflow instances, now the next step was to run for the CVE-2020–17526 . A bit of explanation around it—

Airflow’s web interface uses Flask’s stateless, signed cookies to store authentication data since this is stateless Airflow instance has no idea if any attribute is modified (in this case it is user_id within the json which identifies which user is logged in). Airflow uses a default signing key as temporary_key to sign the session cookie. If this key is not changed, it can be cracked using flask-unsign and session json value can be modified to include an extra attribute to sign as admin and resign back with the temporary_key.

And this is what I did — Decrypt the session cookie, forge the user_id attribute which will designate what user ID you want to login as, tried 1 for admin, and resign it back.

The next step was to replace the session cookie in the browser and navigate to the home page. I found myself successfully logged in to the tool as admin.

Privilege Escalation

Now there was a goldmine in front of me. I went on to check each DAG code to find the most common developer mistake i.e. hardcoded plain text credentials which got me access to a slack token of one of the users.

Used slackpirate to extract the sensitive information it has access to —

Explored other options in the Airflow instance to find -

  1. AWS Keys are being hardcoded in the connection tab.
  2. airflow.cfg configuration widely open and exposing postgress Connection string.

Tried logging in using the AWS credentials which lead me to access their AWS account (though it has limited access). This is a scary example of how a wild CVE and chaining of bad security practices and misconfigurations can lead to multiple exposures of vulnerabilities.

Remediation

It is strongly recommended to update the version of your Airflow instances to the latest version or change the default value for `[webserver] secret_key` config to mitigate the attack. Ref — https://lists.apache.org/thread/rxn1y1f9fco3w983vk80ps6l32rzm6t0

Conclusion: Learning for organizations

I hope that this blog will also bring the much-needed attention of organizations implementing any tool or services to regularly harden them, review their misconfiguration, secure defaults configs, keep track of their assets by continuously discovering their publicly exposed assets beyond IPs, and Subdomains and be more vigilant to the security attacks happening around them.

That’s it about this finding. Thanks for reading!

--

--

Avinash Jain (@logicbomb)

Security Engineer @Microsoft | DevSecOps | Speaker | Breaking stuff to learn | Featured in Forbes, BBC| Acknowledged by Google, NASA, Yahoo, UN etc