Software supply chain security is hard

tl;dr Software Composition Analysis (SCA) tools are often noisy and create cross-team friction, yet we need them to manage the ever-increasing security exposure from developing applications with open source libraries. This post explores how reachability analysis can sharpen such tooling and make dependency scanning more impactful.

The software supply chain is flooded with vulnerabilities

I’ve been super lucky as a developer in this day and age. The open source community is huge, and there are powerful libraries for just about anything. Rather than spending hours digging into Qt or GTK to build my own graphs, I can spend less than 5 seconds importing matplotlib.pyplot and calling plot() – no need to reinvent the wheel!

But convenience has a catch: we have to trust code written by thousands of developers we’ve never met. While in most cases this trust is well founded, incidents like npm event-stream and Apache Log4j highlight weaknesses in the open source model; things are always changing, intent becomes harder to ascertain, and the list of vulnerabilities that maintainers need to address grows larger. In 2017 alone, NVD disclosed 14,645 vulnerabilities—that’s a 227% increase from 2016!

It’s no surprise that we need tools to navigate this large security surface area.

Today’s tools are too noisy

While there is some variance in how SCA tools approach scanning and alerting, fundamentally they all work in the same way: they look at your manifest files, lockfiles, and more, and compare them to a database to figure out how safe your open source dependencies are. This type of check is simple and noisy; it flags the packages you use that are vulnerable but doesn’t account for how you actually use those packages in practice.

I’ve spoken to security and developer teams big and small, and they’ve echoed the same sentiment: “SCA tools are false positive factories!” Take npm’s SCA tool, npm-audit: when it was released, it sparked widespread confusion across the community:

“I just ran ‘create-react-app’, why do I have vulnerabilities already?”
“Why are these vulnerabilities when these are just dev dependencies?”
“Almost all of these npm-audit detected vulnerabilities are false positives”

And this is just one case where the tool didn't live up to expectation; there is a whole slew of SCA tools out there that evoke a similar reaction. Now, if you’re an AppSec engineer at a company tasked with securing hundreds of repositories, have limited political capital with developers, and every day sees something like this — that’s just so frustrating!

How exactly are today’s tools flawed?

Let’s take a quick look at CVE-2018-16487. It affects lodash for versions < 4.17.11. But your code is vulnerable only when using specific methods – merge(), mergeWith(), and defaultsDeep() - and only when they pollute ‘proto’. If you don’t use these methods in this way, the CVE does not apply.

Traditional SCA tools would flag every single instance where you import lodash < 4.17.11 because they only check for the version range – and that’s ridiculous! Why flag lodash as vulnerable when the usage of its 100+ other methods isn’t vulnerable at all? This is overkill and burns out AppSec engineers and developers alike.

Dependency scanning needs to be better – and it can be.

Reachability analysis to the rescue

Reachability boils down to this: from the vulnerable packages you are using, see if a vulnerable method is being called and if it’s called in a vulnerable way.

supply chain is hard1 Figure 1: Merge is called with malicious object

In the above code, we have a reachable example: lodash is imported (as _) and merge() is called with an object (maliciousObj) that pollutes the prototype. This is the behavior described in CVE-2018-16487 and is what we’d expect to be vulnerable. If we were using an SCA tool, we would want this case to be flagged as reachable.

supply chain is hard2 Figure 2: Merge is called but not with an object that pollutes proto

Here we have an unreachable example: although both lodash is imported and the merge() is called, merge() is not using any malicious object that could pollute the prototype. In that sense, this shouldn’t be an issue that’s flagged because the vulnerability is not reachable. We would not want this to be flagged as a priority because it isn’t essential to fix right now!

Unfortunately in both cases, traditional SCA tools will flag both of these code instances because they only check to see if you import lodash and if it’s < version 4.17.11.

False positives like this are manageable for an individual project but quickly spiral out of control for folks managing many repositories. From teams we support at r2c, we hear that having to go through thousands of negligible findings is a major source of tension. AppSec teams have SLAs to meet, but developers have code velocity to maintain – often placing both parties at odds with each other.

To address these issues, SCA tools have to evolve to work for large-scale security teams.

Conclusion

As code re-usability becomes more and more popular, developer usage of open source dependencies will undoubtedly increase the surface area for imminent threats. It’s more important than ever for security engineers to mitigate such risks, but existing tools are noisy and make it challenging.

We saw today how reachability analysis focuses on vulnerabilities that are actual threats. Armed with this, security engineers can prioritize issues that impact their codebase, spend less political capital convincing developers to fix issues, and overall reduce friction across teams. Triaging negligible findings is a painful experience – I’m confident future application of reachability could make security easier.

We’ve been thinking about this problem space for some time at r2c. Stay tuned here in the upcoming days – we think you’ll be excited to learn more about what we’ve come up with!

Update: we’re extremely excited to introduce Semgrep Supply Chain and would love to hear what you think about it!

Software supply chain security is hard

Share

The software supply chain is flooded with vulnerabilities

Today’s tools are too noisy

How exactly are today’s tools flawed?

Reachability analysis to the rescue

Conclusion

About

Featured posts from the Semgrep blog, written by our engineering team

From idea to (secure) app: Semgrep + Replit

Take control of sensitive code without developer frustration

Announcing an AI AppSec engineer that users agree with 95% of the time

Find and fix the issues that matter before build time