Skip to main content

Reduce false positives in semgrep scan

The semgrep scan command can be used to quickly perform SAST scans. However, you may encounter false positives as you work through your findings. This document presents different strategies to reduce false positives and increase true positives in your scans.

Customize your rules

If you notice that a specific Semgrep Community rule generates a high rate of false positives, the rule is said to be noisy. You can:

  • Fork and customize that rule to improve its performance
  • Remove the rule from the scan

Set up local rules

To have more granular control over the rules in a ruleset, you must add the ruleset to your machine, then configure Semgrep to use those local rules.

  1. Navigate to the Semgrep community rules repository.
  2. Fork or clone the repository to create a local copy of all the rules.
  3. To clone, click Code then copy and run the cloning command in your CLI. This creates a semgrep-rules repository.
  4. To fork, click Fork and follow the steps provided by GitHub. You must also clone the forked repository to your machine.
  5. In your CLI, navigate to your semgrep-rules repository.
  6. Find and copy the rules you want to use in a folder within your target codebase. Give the folder a descriptive name, such as semgrep-rules.
  7. To use the local rules, run the following command:
    semgrep scan --config='SEMGREP_RULES_FOLDER/'

Customize a rule from a Semgrep Community ruleset

  1. Edit the noisy rule to improve its performance.
  2. Test your rule improvements by entering:
    semgrep scan --config='SEMGREP_RULES_FOLDER/NAME_OF_IMPROVED_RULE.yaml'

Remove the rule from the scan

Delete the rule from the folder containing your Semgrep rules.

Use advanced analyses and Pro rules

Optimizing rules can be a time-consuming process. Often, rules are not necessarily noisy, but lack additional analysis to detect true positives while ignoring false positives.

Semgrep Code provides cross-function (interprocedural) and cross-file (interfile) analyses. These analyses both reduce false positives and detect true positives that Semgrep OSS can't find.

For some languages and frameworks, such as Java or the Python Django framework, Semgrep also provides advanced analyses that take into account the language's characteristics, framework-specific dataflows, and the like. These analyses are available by default once you've signed in to Semgrep.

note

Semgrep Code is free for up to 10 users.

Sign in to Semgrep

You need a GitHub or GitLab account to sign in to Semgrep.

  1. Enter the following command:
    semgrep login
  2. Follow the steps to create an account and proceed.
  3. Optional: Enter semgrep ci to run a scan. By default, these scans use Semgrep Pro rules, cross-function analysis, and language-specific improvements.
tip

You can't use the --config option with semgrep ci once you are logged in. To use your custom rules, add them to your Policies page.

Analyses and improvements available by default

The following features are enabled by default and help reduce false positives.

Pro rules

Semgrep Pro rules are high-confidence, professionally maintained rules provided exclusively by Semgrep.

Click to view languages with Pro rules coverage
  • C#
  • Go
  • Java
  • PHP
  • Python
  • Ruby
  • Swift
  • TypeScript

The goal of Pro rules is to provide a set of well-supported rules with improved coverage across languages and vulnerability types. Semgrep Pro rules are written using Semgrep’s latest features and, in general, target users who are looking to produce accurate, actionable findings.

Cross-function analysis

Cross-function analysis means that interactions between functions are taken into account. This improves taint analysis, which tracks unsanitized variables flowing from a source to a sink through arbitrarily many functions.

To see cross-function analysis in action, run the interactive example.

Language-specific improvements

Languages such as Java and frameworks such as Django, FastAPI, and Flask have specific improvements that take into account language features and implicit dataflows. To learn more:

Enable cross-file analysis

Cross-file analysis (also known as interfile analysis) takes into account how information flows between files. In particular, cross-file analysis includes cross-file taint analysis, which tracks unsanitized variables flowing from a source to a sink through arbitrarily many files. Other analyses performed across files include constant propagation and type inference.

Cross-file analysis is usually used in contrast to intrafile, or per-file analysis, where each file is analyzed as a standalone block of code.

To run a scan with cross-file analysis, use the following command:

semgrep ci --pro
Run SCA and SAST scans with one command

The semgrep ci command can also run SCA scans with the Semgrep Supply Chain product, which makes use of the same analyses mentioned in this document to determine reachability and reduce false positives.

Dataflow and interfile analyses in particular ensure that Semgrep Supply Chain provides a high true positive rate while reducing false positives. Read the Doyensec Software Composition Analysis Benchmark to learn more.


Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.