Secrets are what computers use to authenticate to other computers. For instance, an application sending a connection string to a database is its way of asking, “I am this specific web app, please let me query your database.” When the database connection works, that’s the database’s way of saying, “Sure thing!” Computers don’t have eyes, ears, or brains, so they can’t ‘recognize’ someone like humans can; they have to use secrets.
A secret can be a password, an API secret, a certificate, a hash, a connection string, etc. Most importantly, they should not be shared and should only be saved into your secret management tool. But I am getting ahead of myself.
This is a talk I gave in April 2023 at #Bsides San Francisco, “Hunting Secrets”. Similar topic!
When we save secrets into our code, it is possible for another programmer to come along and use that secret; for better or for worse. They can log into your database, connect to your API, or anything else that the secret can be used for. Sometimes, this can seem quite helpful; for instance, if a client forgot their password when I was a programmer, I used to log into the database, grab a copy of their password, use our decryption tool, and tell it to them over the phone. My whole team used to do it. Now I know that it’s more secure to have the user receive a password link in their email (to validate they are who they say they are), that the client’s password should have been salted and hashed (a one-way cryptographic method), and that the password to the database should have been kept in a secret management tool (making it unretrievable for human beings). Secrets in our code allow for all sorts of potential attacks, breaches, and embarrassments.
If you want to find out if you have secrets in your code, you can use a tool called a secret scanner. There are many on the market, and many of them are free. They use a variety of ways to try to find secrets, but most commonly, they use REGEX (regular expressions) to look for entropy (extremely long and random bunches of characters) and keywords (password, secret, key, etc.).
When I work somewhere doing AppSec, I try to get read-only access to the code repositories as soon as possible (for many reasons, not just this). Once I have it, I download all the code from all the projects I can in a zip. I unzip it, point my secret scanner at it, and then settle in for a few hours to go hunting around in the code. Putting on music and getting a tasty warm beverage (hot chocolate, anyone?) can make this a more enjoyable activity. It’s not exactly riveting.
Start by looking at the first finding. Sometimes, it’s something really obviously bad, such as:
That’s a secret for sure! The next step is to rotate that secret. Rotating this secret would mean changing the password to something new on the system this is used for. Then you check that new secret into your secret management tool (more on this soon), and then (the hard part) you update the code in this application to fetch the secret from your secret management tool instead and publish the updated code. Do not, under any circumstance, use the same value as the one you found. That secret has been ‘spoiled’, ‘spilled’, or ‘spilt’. It is no longer usable, as someone malicious might have it saved somewhere or already be actively using it for malicious purposes.
You are going to need to follow this process for every secret you find. Sometimes, it means regenerating a certificate, creating a new API, etc. It’s a bit of a pain, but it’s a lot better than having a data breach or other type of security incident to deal with.
Special Note: when you find a secret in the code, depending upon what you found, you may want to trigger the incident response (IR) process, to investigate as to if this secret has been used improperly. When you find a secret, you can't know if you were the first, second, or tenth person to find it. Kicking off your IR process is a real-life application of the 'assume breach' secure design concept.
Preventing Secrets in the Code
Code repositories (also known as version control or ‘repo’) have several types of ‘events’ that can be used to trigger automation. When someone merges their code back into the main branch, you can automate it to run tests to verify it integrates nicely. When the code is checked, the repo can prompt someone else to review the changes before it is merged into all the other code. The event we are interested in is called a ‘pre-commit hook’.
The moment someone checks in the code that contains a secret, they have spilled it. The secret will be in the history and backups and maybe even in the logs. You must rotate it. Even if you realize your mistake only 5 minutes later, the damage is done.
A pre-commit hook allows you to run your secret scanning tool on only the new or changed code you are checking in, and if it finds a secret, it stops the check-in process. It gives the user an error message, explaining that it thinks it has found a secret, and blocks the code from being checked in. This means the secret has not been spilled; no secret rotation is required! If your code does not have a secret in it, your check-in continues, and any other events you set up do their thing. The test takes so little time that it is almost unnoticeable to the end user.
Secret Management tools did not exist when I started programming. In fact, they are somewhat ‘new on the scene’ and not widely adopted yet. Secret management tools manage secrets for machines. They are not password managers, which manage secrets for humans. They are still fantastic, though!
When using secret management tools, generally, we create a new vault (an instance of encrypted secrets) per system (the application to which those secrets belong). We do this so that if one vault is compromised somehow (perhaps the vault is lost or corrupted), then only one system will be harmed. We also do this to ensure the vault is accessible by whatever system it supports; you wouldn’t want to have to open a hundred holes in your firewall so that all your systems can connect to it.
When we check a secret into a secret management tool, we say goodbye to it forever. We do not keep a copy elsewhere because we can trust the secret management tool to keep it safe for us. It’s encrypted in the vault, and it is retrieved only programmatically (humans cannot ‘reveal’ the secret in plaintext). Your CI/CD can retrieve it, your application, APIs, etc. This means your secrets are managed in an automated way, leaving zero room for human error. Trust me, it’s a good deal!
As you follow the process of finding all the secrets, you should take note of false positives, so you can suppress them in the future. An example I ran into myself: there was a license key for a mail merge program, but the company that made the program had gone out of business years ago. This meant that they weren’t breaking any licensing agreement to use it all over the place, and they didn’t need to protect the key because it could be used as many times as they liked. That meant it wasn’t really a secret anymore. We suppressed the license key from then on.
You should create rules to avoid false positives, as it will become annoying over time if you have weird situations like the one mentioned above.
If you work at an organization with a lot of technical debt, cleaning up all of your secrets can take quite a lot of time. That said, if you have an intern, co-op student, or junior application security person on your team, this task is ideal for them. It’s lots of work, easy to do, and looks good on a resume. It also reduces the risk of your organization greatly, which is always a big win.
Semgrep is a fast, open-source, code scanning tool for finding bugs, detecting dependency vulnerabilities, and enforcing code standards.