If one word (well, actually three) could sum up the year 2021 in terms of information security, it would be “supply chain attack.”
Hackers modify the code in third-party software components to exploit the ‘downstream’ programs that utilise them in a software supply chain attack. We’ve witnessed a substantial increase in such assaults in 2021, thanks to high-profile security incidents like the SolarWinds, Kaseya, and Codecov data breaches, which have shattered company confidence in third-party service providers’ security procedures.
You might wonder what this has to do with secrets. In a nutshell, a lot. Take, for example, the Codecov case (which we’ll return to shortly): it’s a textbook illustration of how hackers use hardcoded credentials to obtain initial access to their victims’ systems and then harvest more secrets further down the chain.
Despite being a top target in hackers’ playbooks, secrets-in-code remains one of the most neglected vulnerabilities in the application security area. In this post, we’ll discuss secrets and how keeping them out of source code is now the most important step in securing the software development lifecycle.
What exactly is a secret?
Secrets are digital authentication credentials used in applications, services, and infrastructures (API keys, certificates, tokens, and so on). A secret authenticates systems to facilitate interoperability, similar to how a password (plus a device in the case of 2FA) authenticates a person. But there’s a catch: secrets, unlike passwords, are supposed to be shared.
Software engineering teams must combine more building components in order to offer new features on a regular basis. The number of credentials in use across several teams (development squad, SRE, DevOps, security, etc.) is rapidly increasing. To make it easier to alter the code, developers sometimes retain keys in an unsecured area, however this frequently leads in the information being forgotten and unwittingly disclosed.
Hardcoded secrets are a unique form of vulnerability in the application security environment. First, because source code is a very leaky asset that is constantly copied, checked out, and forked on many machines, secrets are also leaking. But, more importantly, don’t forget that code has a memory of its own.
Any codebase is handled by a version control system (VCS), which keeps a historical history of all the changes made to it throughout time, often decades. The concern is that still-valid secrets might be buried anywhere on this timeframe, giving the attack surface a new dimension. Unfortunately, most security studies are performed on a codebase’s present, ready-to-deploy condition. In other words, these technologies are completely blind when it comes to credentials stored in an old commit or even a never-deployed branch.
Six million secrets have been uploaded on GitHub.
Last year, GitGuardian discovered more than 6 million exposed secrets by monitoring contributions published to GitHub in real time, more than double the figure from 2020. A credential was found in three out of every 1,000 commits, which is up 50% from previous year.
Access to company resources was one of the most important secrets. It’s no surprise that an attacker attempting to acquire access to a corporate system would check first at its public GitHub repositories, followed by those owned by its workers. Many developers use GitHub for personal projects, and company credentials might be leaked accidentally (yep, it happens all the time!).
When attackers utilize real company credentials, they act as authorized users, making it difficult to identify misuse. Because it only takes 4 seconds for a credential to be compromised after being published to GitHub, it should be revoked and cycled right afterwards to avoid being penetrated. We can see why, out of shame or a lack of technical expertise, individuals often choose the incorrect road to get out of this dilemma.
Another wicked error made by businesses is to allow secrets to exist in non-public repositories. The State of Secrets Sprawl report from GitGuardian underscores the fact that private repositories have far more secrets than their public counterparts. Private repositories, it is hypothesized, provide their owners a false feeling of security, making them less concerned about potential secrets hidden in the codebase.
That’s ignoring the fact that these forgotten secrets could someday have a devastating impact whether harvested by hackers.
To be reasonable, application security teams are well aware of the problem. But the amount of work to be done to enquire, revoke and rotate the secrets dedicated every week, or dig through years of uncharted territory, is simply overwhelming.
Breach of headlines… and the rest
There is, nevertheless, a sense of urgency. Hackers are regularly searching GitHub for “dorks,” which are easily identifiable patterns that may be used to identify disclosed information. And GitHub isn’t the only location where they may be active; any registry (such as Docker Hub) or source code leak might be a goldmine for finding attack vectors.
You only need to look at previously publicized breaches for proof: Codecov is a code coverage tool that is popular among many open-source projects. It was hacked last year by attackers who were able to get access by extracting a static cloud account credential from the official Docker image. They were able to meddle with a CI script and capture hundreds of secrets from Codecov’s user base after successfully gaining access to the official source code repository.
Twitch’s full codebase was recently exposed, revealing over 6,000 Git repositories and 3 million documents. Despite a plethora of data suggesting a high degree of AppSec maturity, over 7,000 secrets may be exposed! Hundreds of AWS, Google, Stripe, and GitHub keys are at stake. Only a handful of them would be sufficient to launch a full-scale attack on the company’s most vital systems. This time, no client information was exposed, although that was mainly by chance.
Uber was not so fortunate a few years ago. An employee unintentionally uploaded business code on his own public GitHub repository. Hackers discovered and identified the keys to Uber’s infrastructure held by a cloud service provider. The result was a catastrophic breach.
The final message is that you can’t predict when a secret will be exploited, but you should be aware that bad actors are watching your developers and hunting for your code. Remember that these are simply the tip of the iceberg, and that there are likely many more breaches involving secrets that aren’t publicly reported.
Conclusion
Secrets are an essential aspect of any software stack, and since they are so powerful, they must be well-protected. It’s difficult to keep track of where they end up, whether it’s source code, production logs, Docker images, or instant messaging applications, due to their dispersed nature and modern software development processes. Because even secrets may be exploited in an assault leading to a significant breach, a secret detection and remediation capability is a necessary. Such instances occur on a weekly basis, and as more services and infrastructure are employed in the company, the number of leaks is rapidly increasing. The sooner you take action, the easier it will be to secure source code against future attacks.
Note: This post was authored by Thomas Segura, a GitGuardian technical content writer. Thomas has worked for a number of large French organizations as an analyst and software engineer consultant.