Reduce false positives in semgrep scan
The semgrep scan
command can be used to quickly perform SAST scans. However, you may encounter false positives as you work through your findings. This document presents different strategies to reduce false positives and increase true positives in your scans.
Customize your rules
If you notice that a specific Semgrep Community rule generates a high rate of false positives, the rule is said to be noisy. You can:
- Fork and customize that rule to improve its performance
- Remove the rule from the scan
Set up local rules
To have more granular control over the rules in a ruleset, you must add the ruleset to your machine, then configure Semgrep to use those local rules.
- Navigate to the Semgrep community rules repository.
- Fork or clone the repository to create a local copy of all the rules.
- To clone, click Code then copy and run the cloning command in your CLI. This creates a
semgrep-rules
repository. - To fork, click Fork and follow the steps provided by GitHub. You must also clone the forked repository to your machine.
- In your CLI, navigate to your
semgrep-rules
repository. - Find and copy the rules you want to use in a folder within your target codebase. Give the folder a descriptive name, such as
semgrep-rules
. - To use the local rules, run the following command:
semgrep scan --config='SEMGREP_RULES_FOLDER/'
Customize a rule from a Semgrep Community ruleset
- Edit the noisy rule to improve its performance.
- Test your rule improvements by entering:
semgrep scan --config='SEMGREP_RULES_FOLDER/NAME_OF_IMPROVED_RULE.yaml'
Remove the rule from the scan
Delete the rule from the folder containing your Semgrep rules.
Use advanced analyses and Pro rules
Optimizing rules can be a time-consuming process. Often, rules are not necessarily noisy, but lack additional analysis to detect true positives while ignoring false positives.
Semgrep Code provides cross-function (interprocedural) and cross-file (interfile) analyses. These analyses both reduce false positives and detect true positives that Semgrep Community Edition (CE) can't find.
For some languages and frameworks, such as Java or the Python Django framework, Semgrep also provides advanced analyses that take into account the language's characteristics, framework-specific dataflows, and the like. These analyses are available by default once you've signed in to Semgrep.
Semgrep Code is free for up to 10 users.
Sign in to Semgrep
You need a GitHub or GitLab account to sign in to Semgrep.
- Enter the following command:
semgrep login
- Follow the steps to create an account and proceed.
- Optional: Enter
semgrep ci
to run a scan. By default, these scans use Semgrep Pro rules, cross-function analysis, and language-specific improvements.
You can't use the --config
option with semgrep ci
once you are logged in. To use your custom rules, add them to your Policies page.
Analyses and improvements available by default
The following features are enabled by default and help reduce false positives.
Pro rules
Semgrep Pro rules are high-confidence, professionally maintained rules provided exclusively by Semgrep.
Click to view languages with Pro rules coverage
- C#
- Go
- Java
- PHP
- Python
- Ruby
- Swift
- TypeScript
The goal of Pro rules is to provide a set of well-supported rules with improved coverage across languages and vulnerability types. Semgrep Pro rules are written using Semgrep’s latest features and, in general, target users who are looking to produce accurate, actionable findings.
Cross-function analysis
Cross-function analysis means that interactions between functions are taken into account. This improves taint analysis, which tracks unsanitized variables flowing from a source to a sink through arbitrarily many functions.
To see cross-function analysis in action, run the interactive example.
Language-specific improvements
Languages such as Java and frameworks such as Django, FastAPI, and Flask have specific improvements that take into account language features and implicit dataflows. To learn more:
Enable cross-file analysis
Cross-file analysis (also known as interfile analysis) takes into account how information flows between files. In particular, cross-file analysis includes cross-file taint analysis, which tracks unsanitized variables flowing from a source to a sink through arbitrarily many files. Other analyses performed across files include constant propagation and type inference.
Cross-file analysis is usually used in contrast to intrafile, or per-file analysis, where each file is analyzed as a standalone block of code.
To run a scan with cross-file analysis, use the following command:
semgrep ci --pro
The semgrep ci
command can also run SCA scans with the Semgrep Supply Chain product, which makes use of the same analyses mentioned in this document to determine reachability and reduce false positives.
Dataflow and interfile analyses in particular ensure that Semgrep Supply Chain provides a high true positive rate while reducing false positives. Read the Doyensec Software Composition Analysis Benchmark to learn more.
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.