Comparing Reachability Analysis methods: Semgrep's distinct approach

Introduction

There's plenty of evidence supporting the need to scan your software supply chain for vulnerabilities:

70-90% of an application’s codebase is open-source software
Supply chain attacks increased at a rate of 742% per year between 2019-2022
>20,000 CVEs are published each year, and growing
The GitHub Advisory Database contains more than 215,000 security advisories for open-source projects!

While the industry has made significant strides in identifying project dependencies via software composition analysis (SCA), I welcome you to try most software supply chain solutions - you will find that most of the vulnerabilities reported don’t even exist in your project.

Knowing this as the reality, the industry has coined the term “reachability” and has pursued more advanced SCA capabilities to determine which vulnerabilities are worth investigating and remediating.

However, reachability is not well-defined, nor is it consistent amongst available solutions. In this blog post we'll outline the various things people can mean when they use the term "reachability", and go over the pros and cons of different approaches to SCA and reachability.

Reachability is especially important in a world where organizations are likely to remediate only 10% of their vulnerabilities each month, regardless of the amount, according to a collaborative study conducted by Security Scorecard and Cyentia Institute. The study included data from 1.6M organizations.

What is Software Composition Analysis (SCA)?

“Software composition analysis (SCA) is a practice in the fields of Information technology and software engineering for analyzing custom-built software applications to detect embedded open-source software and detect if they are up-to-date, contain security flaws, or have licensing requirements [1].”

SCA may encompass the evaluation of a project's manifest file and lockfile, as well as employing static and dynamic analyses or a combination of these methods. Today, if you see the words SCA, there is a high degree of likelihood that they are referring to manifest & lockfile analysis, AKA traditional SCA; this simply isn’t sufficient to determine reachability.

At a high level, the types of SCA have the following capabilities:

Manifest analysis: Analyzes the application's manifest file to identify used open-source components, offering a basic view of dependencies but has no insight into how the application utilizes them.

Lockfile analysis: Examines the lockfile for a detailed snapshot of specific versions of dependencies used, providing accurate version tracking but not insight into their actual execution or reachability in the application.

Static analysis: Reviews the source code without execution to determine how dependencies are integrated, revealing which parts of third-party libraries are referenced and potentially vulnerable.

Dynamic analysis: Observes the application during runtime to capture real-time data on dependency interaction and usage, offering the most precise insights into reachability and runtime vulnerabilities.

In summary, while manifest and lockfile analyses offer a surface-level view of the components in use, static and dynamic analyses delve deeper, providing a more nuanced understanding of how these components are utilized and potentially exploited in an application. For reachability analysis, a combination of these methods is often the most effective approach; Semgrep leverages manifest, lockfile, and static analysis to determine when a vulnerability is reachable.

The Impact of Reachability

Modern development environments often need more precision than traditional SCA can provide. They overwhelm teams with information, identifying tens or even hundreds of thousands of vulnerabilities, making it impossible to distinguish between theoretical vulnerabilities and those that pose an immediate risk. This strains security and developer resources and increases the risk of overlooked critical vulnerabilities.

Contrast Security's 2021 State of Open Source Security Report mentions that only 31% of library classes are invoked by the application, if any at all. Shockingly, the report also disclosed that 62% of libraries found in applications are never used at runtime.

Reachability analysis changes this dynamic by bringing clarity and focus to vulnerability management. It aligns perfectly with the principles of agile and DevSecOps practices, where speed, efficiency, and accuracy are key. By ensuring that developers only spend time on vulnerabilities that could realistically compromise the application, reachability analysis enables teams to maintain a high development velocity without compromising on security.

A Semgrep study from 2022 assessed 1,100 open-source projects. Of 1,614 Dependabot alerts, Semgrep’s reachability analysis determined that only 31 were reachable in the project’s code - making only ~2% of findings reachable!

Comparing Types of Reachability Analysis

Similar to SCA, there are various types of reachability analysis. For example, a tool may label a vulnerability as reachable if it impacts a direct dependency or if an internet-facing application uses it. For Semgrep, a vulnerability is classified as reachable if your project uses the vulnerable component (e.g., class or function) that introduces the risk.

Semgrep Researchers manually review security advisories and reverse engineer patches to identify vulnerable package components. We then create a Semgrep rule to identify its usage in your application.

SCA reachability comparison

Conclusion

Adopting varied reachability analysis methods marks a significant evolution in identifying and prioritizing vulnerabilities. Each technique enables security teams to prioritize vulnerabilities that would otherwise be overwhelming. The “right” solution may also depend on your use case. For standard applications, we find that reachability analysis is most effective when identifying the vulnerable usage of a package and less so when basing it on contextual information. However, a team heavily reliant on containerized applications will likely appreciate the contextual analysis.

Semgrep's approach to reachability analysis by integrating manifest, lockfile, and static analyses, stands out for its speed and alignment with modern development practices. This method narrows the vast field of potential vulnerabilities to those most relevant and actionable, enabling teams to focus their efforts more efficiently. This precision is precious in agile and DevSecOps environments, where maintaining a balance between speed, efficiency, and security is critical.

However, Semgrep's reachability analysis has its limitations. If you’re familiar with the everlasting static vs. dynamic analysis debate, those trade-offs exist in the supply chain security space. Static analysis excels in efficiently scanning codebases for potential vulnerabilities without execution, though it may occasionally miss issues that only surface at runtime, an area where dynamic analysis can provide additional insights. Specifically, this often causes false positives when the security advisory specifies the risk requires user-defined input to be passed to a function.

References

[1] Prana, Gede Artha Azriadi, et al. "Out of sight, out of mind? How vulnerable dependencies affect open-source projects." Empirical Software Engineering 26 (2021): 1-34.

Comparing Reachability Analysis methods: Semgrep's distinct approach

Introduction

What is Software Composition Analysis (SCA)?

The Impact of Reachability

Comparing Types of Reachability Analysis

Conclusion

References

Dive deeper into Security Research or continue reading our featured posts.

Mythos: Bad Takes, Facts, and Fear

Will there be more security engineers in the future, or fewer?

Introducing Semgrep Custom Workflows