Codebase-Aware Reachability Analysis Coverage for Rust

Reachability analysis has provided a breakthrough for AppSec practitioners struggling to keep dependency upgrades under control. By deprioritizing findings based on whether a vulnerable third party package was actually being used by their codebase, teams could effectively filter out significant chunks of their vulnerability backlogs as false positives.

But reachability analysis is language-dependent: it requires a deep understanding of each language's semantics, features, and runtime behaviors. Semgrep is recognized among developers and AppSec teams for its extensive polyglot coverage, and we’re excited to introduce reachability analysis for Rust as the next addition to our suite of offerings.

Why Rust?

A versatile programming language known for its performance, memory safety, and concurrency, Rust is arguably the hottest new language on the planet. There are over 3 million coders writing in Rust, increasingly to write critical systems that were once written in C and C++.

Companies from Microsoft to Amazon regard it as key to their future. The chat platform Discord uses Rust to speed up its system, Dropbox uses it to sync files to your computer, and Cloudflare uses it to process more than 20% of all internet traffic.

According to Stack Overflow’s annual poll of devs around the world, Rust has been rated the most “loved” programming language for seven years running. Even the US government is avidly promoting software in Rust as a way to make its processes more secure.

As much as Semgrep has always been trusted by AppSec, it has also been equally loved by developers. So the growing popularity of Rust among the dev community was an exciting opportunity for us to bring to this ecosystem all the benefits of Semgrep’s codebase aware reachability analysis.

Not all reachability is created (or performs) equal

The efficacy of reachability analysis, measured in the amount of false positives identified, can vary considerably based on the quality and depth of analysis. We’ve always maintained that for reachability analysis to be meaningfully complete, it must analyze not only the packages being imported, nor even the functions being called, but it must understand how arguments are passed between those functions in the context of your codebase.

Package-level reachability, or dependency reachability, scans manifest files for listed dependencies, and uses import statements to determine which packages are used, and which are not. But it only looks as far as the package being imported, and cannot tell whether a package being used has vulnerable functions called by the source code or not
Let’s say, for example, that your code uses version 1.18.0 of the Rust libraries "deno"and"deno_runtime". Since there exists a Github Security Advisory for this version of the library, package-level reachability analysis would trigger the following rule:

# Affected versions of deno and deno_runtime are vulnerable to Improper Neutralization of Escape, Meta, or Control Sequences. A lack of ANSI escape filtering in Deno's interactive run‐permission prompts for `op_spawn_child` and `op_kill` lets a malicious script inject control codes into the prompt text. By embedding ANSI sequences that clear and overwrite the first two lines, an attacker can spoof the displayed command name or request source, misleading the user into granting run permissions for unintended programs.
  severity: HIGH
  metadata:
    confidence: HIGH
    category: security
    cve: CVE-2023-28446
    cwe:
       - 'CWE-150: Improper Neutralization of Escape, Meta, or Control Sequences'
    ghsa: GHSA-vq67-rp93-65qf
    owasp: - A06:2021 - Vulnerable and Outdated Components
    sca-fix-versions:
      - deno: 1.31.2
      - deno_runtime: 1.31.2
    sca-reachable-if: you run untrusted JavaScript/TypeScript code using the Deno runtime (e.g. deno run)
    depends-on-either:
     - namespace: cargo
    package: deno
    version: '>=1.8.0, <1.31.2'
     - namespace: cargo
    package: deno_runtime
    version: '>=1.8.0, <1.31.2'
    languages:
     - rust

In Semgrep’s view, this is insufficiently helpful, as engineers still have to invest effort in triaging supposedly “reachable” findings to actually figure out which ones to prioritize.

Function-level reachability looks beyond the dependency graph into your actual application code. It maps out imported dependencies and performs static analysis on your first party code to build a detailed model of which dependencies are actually used within your application.

Suppose you’re using version 0.7.8 of the Rust library “cookie::Cookie”, which would be considered reachable by package-level reachability. It contains a function:

fn parse_user_cookie(raw: &str) {

that per GHSA, is vulnerable to Improper Input Validation.

# When parsing a cookie's Max-Age attribute, Cookie calls time::Duration::seconds on the raw value. If an attacker supplies a very large Max-Age (greater than ~2^64/1000 but ≤2^64), this call panics, crashing the client or server and causing a denial of service.
severity: HIGH
metadata:
confidence: HIGH
category: security
cve: CVE-2017-18589
cwe:
    - 'CWE-20: Improper Input Validation'
ghsa: GHSA-vjrq-cg9x-rfjp
owasp:
    - A03:2021 - Injection
    - A05:2025 - Injection
    - A06:2021 - Vulnerable and Outdated Components
publish-date: '2021-08-25T20:43:02Z'
references:
    - https://github.com/advisories/GHSA-vjrq-cg9x-rfjp
    - https://nvd.nist.gov/vuln/detail/CVE-2017-18589

Function-level analysis tells us that you need to be using both an unsafe version of cookie::Cookie and use parse_user_cookie for the vulnerability to be reachable.

Still, there are cases where you could be calling the function that can be unsafe in the version of the package that has a vulnerability, yet the conditions for reachability are so specific that even function-level analysis will flag a finding incorrectly.

Dataflow reachability builds upon function-level analysis, just as function-level analysis builds upon dependency-level analysis. By looking into function arguments in the context of your code, it can understand how data flows between functions.

Let’s use an example where your application is calling the function take_from in an affected version 0.7.3 of package bcder, which is flagged by CVE-2023-39914. This finding would be flagged as reachable even under function-level reachability analysis.

# Affected versions of bcder are vulnerable to Improper Handling of Syntactically Invalid Structure / Improper Handling of Undefined Values. Malformed or invalid ASN.1 input can trigger panics in bcder during both immediate and delayed decoding, causing unhandled crashes (denial-of-service) instead of graceful error handling.
  technology:
    - rust
    - namespace: cargo
      package: bcder
      version: '<0.7.3'
  languages:
  - rust
  mode: taint
  pattern-sources:
  - pattern-either:
    - pattern: Mode::Der.decode(...)
    - pattern: Mode::Ber.decode(...)
    - pattern: Mode::Cer.decode(...)
  pattern-sinks:
  - pattern-either:
    - pattern: Oid::take_from(...)
    - pattern: BitString::take_from(...)

Here is an example where both conditions are met, making the vulnerability reachable:

use bcder::Mode;
use bcder::decode::SliceSource;
use bcder::string::BitString;
use bcder::Oid;
fn parse_asn1(input: &[u8]) -> Result<(), bcder::decode::DecodeError<std::convert::Infallible>> {
    Mode::Der.decode(SliceSource::new(input), |cons| {
        // ruleid: ssc-050b0f30-9933-b0c2-b26e-268a25542907
        let oid = Oid::take_from(cons)?;
        // ruleid: ssc-050b0f30-9933-b0c2-b26e-268a25542907
        let bits = BitString::take_from(cons)?;
        println!("OID: {}", oid);
        println!("Unused bits: {}", bits.unused());
        Ok(())
    })
}
fn main() {
    let data = b"\x06\x00\x03\x01\x08";
    // ok: ssc-050b0f30-9933-b0c2-b26e-268a25542907
    let _ = parse_asn1(data);
}

Our dataflow analysis shows, however, that the vulnerable code is only considered reachable if the data sent to take_from comes from the decode function. Specifically, the Mode::Ber, Mode::Cer, and Mode::Der data flows from decode to take_from. Unless these conditions are met, the vulnerable code is not exploitable, and hence would not be flagged as a priority finding.

This example shows how Semgrep’s dataflow reachability can help even when exploitability is contingent on highly specific preconditions that can remain opaque to other types of reachability analyses. By identifying conditional exploitability, it distinguishes false alerts which would otherwise appear true, reducing false positives by +96%, more than any other solution on the market.

Conclusion

At its heart, reachability analysis for SCA is about determining whether and how your first party code interacts with vulnerable third party code, and that depends on the language specifics of the code being analyzed. Making sure that an SCA solution supports all the languages used by your organization, whether directly or indirectly, is essential for getting reachability insights. With the addition of reachability coverage for Rust, Semgrep now offers full depth of reachability coverage for 12 languages.

Check out our docs page on how to get started!

Codebase-Aware Reachability Analysis Coverage for Rust

Why Rust?

Not all reachability is created (or performs) equal

Dive deeper into Application Security or continue reading our featured posts.

Announcing Pyro Caml: A Continuous Profiler for OCaml

Mythos: Bad Takes, Facts, and Fear

Introducing Semgrep Custom Workflows