Semgrep

As mentioned in our previous post comparing Semgrep and Bandit, GitLab recently announced that they are transitioning a majority GitLab SAST analyzers to Semgrep! This transition, along with phasing out Bandit, will also phase out ESLint, and we would like to compare ESLint to Semgrep in order to provide a further analysis of what this switch will entail.

This post covers:

Security coverage: What does each tool detect?
Custom rules: What do custom rules look like? And, following good engineering principles, how are they tested?
Performance: How fast is each tool?
Usage in CI/CD: How can each be run continuously?

For the curious, here’s a quick summary. More details are below!

Semgrep vs ESLint summary

Security coverage

ESLint (v7.29.0) includes 261 rules for JavaScript that are not deprecated or removed. Semgrep is an engine for scanning code and therefore doesn’t ship with rules itself. However, Semgrep has access to a community-maintained registry with over 1,000 rules for many languages. As of June 2021, the Semgrep registry has 404 rules for JavaScript, which provides a subset of what ESLint covers in addition to other things. The Semgrep registry also provides groups of rules called rulesets, including two rulesets that contain subsets of ESLint rules: p/eslint-plugin-security and p/gitlab-eslint, which is maintained by GitLab.

p/gitlab-eslint contains a small subset of all the ESLint rules (13 rules total). Each rule in p/gitlab-eslint tries to match the same findings from the corresponding ESLint rule as closely as possible. The table below shows the differences in results between Semgrep using p/gitlab-eslint and ESLint using the lodash repository. We classify two findings as “the same” when both the starting line number and file path match.

Let’s dive into a few differences, starting with security/detect-object-injection. Here is one of these instances:

key = iteratee(value);
if (hasOwnProperty.call(result, key)) {
  ++result[key];
}

This seems like an appropriate catch, as key depends on the user-inputted value. Let’s look further into why Semgrep doesn’t catch this.

The corresponding Semgrep rule has a clause that explicitly filters out cases of Arr[key] if key is set to another variable in the same file previous to use. This clause is pasted below:

[... snipped ...]
- id: ESLint.detect-objecty-injection
  patterns:
  - pattern: $O[$ARG]
  - pattern-not-inside: |
      $ARG = $V;
      ...
      <... $O[$ARG] ...>;
[... snipped ...]

This pattern-not-inside clause was originally added to reduce the number of false positives from this rule, and leads to Semgrep not catching this specific example.

On the other hand, there are 110 cases of detect-object-injection that Semgrep finds and ESLint does not find. Here is one such example:

function last(array) {
  const length = array == null ? 0 : array.length;
  return length ? array[length - 1] : undefined;
}

Although there’s no obvious vulnerability in this code, this does detect what the rule intends to find as the length variable depends on an externally-controlled array parameter. However, if you want to further reduce false positives in Semgrep, it’s easy to extend Semgrep rules with additional definitions to reduce false positives. See the Custom Rules section for more details, or visit the Semgrep docs.

Let’s now look at another case that Semgrep detected but ESLint did not: an example that the rule gitlab.eslint.detect-non-literal-regexp flagged.

Here is the code snippet:

var values = [
    new Foo(),
    new Boolean(),
    new Date(),
    Foo,
    new Number(),
    new String(),
    new RegExp(),
  ],
  expected = lodashStable.map(values, stubTrue);

This seems to be a false positive on Semgrep’s part — especially as Regexp is only a concern when called with a variable. ESLint specifically checks that the number of arguments thatRegexp is called with is greater than zero, and thus avoids this false positive. Below is the copied code snippet from ESLint that performs this functionality:

if (args && args.length > 0 && args[0].type !== "Literal") {
  var token = context.getTokens(node)[0];
  return context.report(
    node,
    "Found non-literal argument to RegExp Constructor"
  );
}

However, the corresponding Semgrep rule could easily be modified to not catch this case by requiring that an argument be present with new RegExp($ARG, ...):

patterns:
  - pattern: |
      new RegExp($ARG, ...)
  - pattern-not: |
      new RegExp("...", ...)

We’ve made a MR to GitLab’s Semgrep analyzer in order to remove this false positive case from Semgrep ESLint rules. We’re always looking to improve our Semgrep rules, so please feel free to create an issue if you see any way to make our rules better!

From this investigation, we can see that ESLint appears to have fewer false positives than Semgrep does — it did not report the case of new Regexp called without an argument. Moreover, both Semgrep and ESLint reported unique results that were worth delving into, as seen from the examples for the rule detect-object-injection. ESLint therefore seems to have less noise while providing thorough coverage with its rules, while Semgrep benefits from its easy-to-understand syntax and the ability to make rapid changes to rules.

Custom rules

ESLint

ESLint rules are written in JavaScript code using the ESLint parser. ESLint translates JavaScript code into an AST under the hood, and lets users traverse the AST easily using its API. To write a custom rule, you must first setup an ESLint plugin. Following that, you can write your ESLint check as JavaScript code and use the very helpful AST Explorer to validate your rule. The ESLint syntax makes it quite simple to write basic rules, such as this one that disallows octal literals, but more complex rules require deeper understanding of the AST. An example of a more complex check is the rule we discussed above that detects object injection.

If you’re curious about writing ESLint custom rules, here is an article detailing how to create ESLint plugins and write a simple rule.

Semgrep parses code and search queries into an internal AST representation. This means that Semgrep queries (henceforth called “patterns”) look similar to the code that will be matched. For example, to detect the presence of eval, the Semgrep pattern is eval(...). The ellipsis is a Semgrep construct; you can read more about the Semgrep syntax in the documentation.

More sophisticated rules are expressed in a YAML file which composes multiple patterns together. Detecting object injections in Semgrep can expressed in a YAML file like the one shown below. The rule uses the pattern: clause to find all occurrences of array[key] and the pattern-not: and pattern-not-inside: clauses to filter out safe constructions of array[key]. Metavariables (specified with the $ and capital letters or numbers after) are also a Semgrep construct that you can become familiar with in the documentation.

rules:
  - id: ESLint.detect-object-injection
    languages:
      - javascript
      - typescript
    message:
      Bracket object notation with user input is present, this might allow an
      attacker to access all properties of the object and even it's prototype,
      leading to possible code execution.
    metadata:
      cwe: "CWE-94: Improper Control of Generation of Code ('Code Injection')"
    patterns:
      - pattern: $O[$ARG]
      - pattern-not: $O["..."]
      - pattern-not: "$O[($ARG : float)]"
      - pattern-not-inside: |
          $ARG = [$V];
          ...
          <... $O[$ARG] ...>;
      - pattern-not-inside: |
          $ARG = $V;
          ...
          <... $O[$ARG] ...>;
      - metavariable-regex:
          metavariable: $ARG
          regex: (?![0-9]+)
    severity: WARNING

Testing rules

Just like writing code without tests is ill-advised, so is writing static analysis checks without tests to ensure they work as expected.

After all, you don’t want to think you’re finding and blocking certain bad code patterns, only to later learn your rule had some sort of subtle bug. Rule tests also provide valuable documentation, as they make it easy to quickly grok what code a rule is and isn’t supposed to flag.

ESLint

ESLint encourages rules to be written with a set of unit tests — in fact, each submitted rule to the ESLint core must have unit tests in order to be accepted. The test file is named the same way as the source file but lives in tests/lib/. You can run one test file with:

npm run test:cli tests/lib/rules/...

and can use the command

npm test

to run all tests.

Testing in ESLint builds on the Mocha framework, and ESLint provides RuleTester to help with test-writing. The RuleTest#run() method can be used to run tests, and in order to run one or a subset of RuleTester test cases, you can add only: true to the test case.

In ESLint unit tests, you can specify code that the tested rule will flag and then check that the correct errors were outputted. This is a simple and easy to use system and allows for quick test-writing and fast execution of tests. However, often times these test cases are short (only containing one line of code) and therefore don’t reflect the complexity of real source code.

Semgrep

Semgrep supports creating unit tests for each rule by defining test cases in source code (e.g., my-rule.js) for each corresponding Semgrep YAML file (e.g., my-rule.yml).

You can then test that your patterns match the intended code via running $ semgrep --test. It’s possible to annotate lines you expect to match or not match currently, as well as lines you plan to have match in the future (for example, after you improve a rule).

The following is an example from the docs. You can see many examples of rules and their unit tests at the official Semgrep rules GitHub: https://github.com/returntocorp/semgrep-rules.

Performance

This is a runtime test using a 2019 Macbook Pro (2.6 GHz 6-Core Intel Core i7) on four repositories. The runtime was measured in wall-clock time for an entire invocation of the command. For a better comparison, Semgrep was run in single-threaded mode because, at the time of this writing, ESLint does not support multi-threaded scans.

Semgrep vs ESLint runtime

We see that ESLint is much faster on both smaller and larger repositories, although it seems that it is significantly faster on larger repositories. ESLint is around 14 times faster on juice-shop, although it is 6 times faster on lodash, sysdig, and socketio. From this comparison, we see that ESLint performance scales better on larger repositories.

If we run Semgrep using multithreading with 8 jobs using the command

semgrep --json -j 8 -f p/gitlab-eslint <repository>

scans are up to 2.5x faster on smaller and medium sized repositories. However, it seems that multithreading doesn’t help semgrep as much on larger repositories, as the time it takes to run on juice-shop still remains close to a minute.

As the Semgrep maintainers, this was an interesting finding for us! We are looking into speeding this up.

Semgrep vs ESLint runtime

Usage

Integrations during development

Both ESLint (docs) and Semgrep (docs) can be run with pre-commit.

ESLint has a sublime text extension, a vim plugin, an emacs plugin, and plugins for eclipse, textmate, atom, VS Code, and more. Feel free to checkout the ESLint integrations page for more information.

Semgrep has a VS Code extension, IntelliJ IDEA plugin, and a vim plugin. See the extension docs for more details.

Integrations during CI/CD

As CLI tools, ESLint and Semgrep can be easily inserted into any build system that supports running arbitrary CLI tools (read: nearly all of them).

ESLint has a GitLab analyzer, though it is being deprecated in favor of Semgrep, as well as community-contributed configs for other CI providers.

ESLint has several community-contributed GitHub Actions that can scan the changed files, specific files in a repository, or perform a full repository scan. However, it seems that these actions lack the ability to let the user choose to block the build or upload results.

Semgrep’s officially supported GitHub Action can be configured to scan only the changed files or do a full repo scan, write PR comments, block the build (or let it pass), and upload results to GitHub’s Advanced Security tab in SARIF format for review within GitHub.

Semgrep also has example configurations for other CI providers, including GitLab, Buildkite, CircleCI, Jenkins, and more (docs).

Ignoring lines of code

Both ESLint (docs) and Semgrep (docs) support ignoring a result on a specific line of code. You can use // eslint-disable-line for ESLint or # nosemgrep for Semgrep.

Both scanners support ignoring specific rules on a given line of code. For Semgrep, the command to do this is *# nosemgrep: rule-id-1, rule-id-2*, and the command for ESLint is //eslint-disable-line rule-id-1, rule-id-2.

ESLint also supports disabling rule warnings for a section of code using block comments of the following format:

/* eslint-disable */
<code>
/* eslint-enable */

Furthermore, ESLint allows for disabling rule alerts on the line after the comment with // eslint-disable-next-line. These features let users determine exactly which lines to disable, reducing ambiguity significantly.

Ignoring paths

ESLint supports a .eslintignore configuration file (docs) where you can specify files for ESLint to not run on. ESLint also allows you to specify files to ignore through the ignorePatterns key in configuration files. Semgrep similarly supports path-based excludes in a .semgrepignore file (docs). These files can be checked in to a repository to take effect.

Ignoring rules

Inside the ESLint configuration file, you can tell ESLint to only run certain checks, skip certain checks, run checks from specific plugins, and warn or error for a rule (docs).

Semgrep’s rule configuration file lists out every rule Semgrep should run. As such, you can mix-and-match rules inside a configuration file; if you want to disable a rule, you can save a configuration and remove the unwanted rules.

Other features

Semgrep is multilingual, supporting Python, JavaScript, Go, Ruby, and more, which means Semgrep can scan multi-language projects. Additionally, for any coverage that may be missing, Semgrep’s pattern syntax makes it easy to add new rules.

Semgrep understands certain language semantics. For example, in JavaScript, Semgrep will match variations of import statements. The pattern import "module-name"; will still match the code import * as name from "module-name" and import { foo , bar } from "module-name/path/to/specific/un-exported/file";. Other semantic features include detecting unordered keyword arguments (the order in which you write function arguments in a Semgrep pattern doesn’t matter) and constant propagation (which can determine if a literal value—a constant—has not been modified).

Semgrep sports a number of experimental features, one of which is autofix. While limited in functionality, Semgrep’s autofix enables simple expressions to be fixed with the click of a button.

Summary

Semgrep vs ESLint summary

We hope you found this informative!

If there are any other aspects about the comparison that we should cover, or if we’re missing anything, please let us know!

JavaScript static analysis comparison: ESLint vs Semgrep

Share

Security coverage

Custom rules

ESLint

Semgrep

Testing rules

ESLint

Semgrep

Performance

Usage

Integrations during development

Integrations during CI/CD

Other features

Summary

About

Featured posts from the Semgrep blog, written by our engineering team

Announcing an AI AppSec engineer that security researchers agree with 96% of the time

How we built an AppSec AI that security researchers agree with 96% of the time

Less effort, more insight: Introducing Dependency Graph for Supply Chain

Find and fix the issues that matter before build time