- Semgrep in CI
- Team & Enterprise Tier
Running Semgrep in continuous integration (CI) without Semgrep Cloud Platform
Run Semgrep in your Continuous Integration (CI) pipeline to scan your repository for code vulnerabilities and other issues. This guide explains how to set up Semgrep in your pipeline without the use of Semgrep Cloud Platform, also known as a stand-alone setup.
There are three general steps to setting up Semgrep in your CI pipeline:
- Set up the CI job or action to scan with Semgrep and receive an exit code.
- Test that the Semgrep job is scanning your repository and that you are able to view scan results in your CI provider's log.
- Configure the CI job's parameters. It is easier to troubleshoot any parameters after testing that the job runs successfully.
Figure 1. Steps to run Semgrep in CI without Semgrep Cloud Platform.
This guide defines a job or CI job as a script executed within a certain environment and managed by a CI provider.
Reasons for configuring additional CI job parameters
By configuring a job's parameters, you are able to achieve the following goals:
- Run Semgrep on a schedule. Run full scans on mainline branches at the least intrusive time on developer teams.
- Run Semgrep with custom rules. Apply rules specific to your organization's business goals and coding conventions.
- Run Semgrep when an event triggers. Run Semgrep when a pull or merge request (PR or MR) is created. These event triggers or event hooks are dependent on your CI provider.
- Run Semgrep on relevant files and blocks of code. Configure Semgrep to ignore files and folders such as test files, configuration files, and files from other vendors.
- Configure a Semgrep CI job to pass even when any finding is detected. By default, stand-alone configurations fail when any finding is detected. You can also configure Semgrep to pass CI jobs when findings are reported.
- Output, export, or save findings to a file. Semgrep can save to a number of file formats, including SARIF and JSON.
Limitations of Semgrep stand-alone CI scans
Running Semgrep in CI without Semgrep Cloud Platform bears specific limitations compared to the use of Semgrep in CI with Semgrep Cloud Platform. See the following list:
- Stand-alone CI jobs cannot send [PR comments]((/semgrep-cloud-platform/github-pr-comments) or MR comments. These comments describe the finding and help developers resolve vulnerabilities and other code issues.
- Stand-alone CI jobs cannot block a pull or merge request based on a finding. There are no user-defined rule modes to distinguish between rules. Alternatively, Semgrep Cloud Platform's Policies page lets users define certain actions, such as blocking PRs or MRs on a finding generated by a reliable rule.
- Findings are dumped to a log, so there is no record of a finding's status, such as open, ignored, or fixed.
Setting up the CI job
semgrep ci
is the command used to run Semgrep in a CI environment. In most cases, this is the recommended command to run in the CI job. This command is a subset of the semgrep scan
command. semgrep ci
command has the following characteristics:
semgrep ci
is git-aware. This means it is able to detect branches and git states.semgrep ci
makes use of environment variables to configure its behavior.semgrep ci
performs a full scan by default. This is the recommended behavior. Semgrep can be configured to perform a diff-aware scan.semgrep ci
can be run on a local.git
repository at any time to test its behavior before running it within a CI environment.
Running Semgrep with template or sample CI configuration files
To integrate any CI provider in the following list, follow the instructions in Sample CI configuration.
Running Semgrep through other CI providers
Use either of the following methods to run Semgrep through other CI providers.
Direct docker usage
Reference or add the returntocorp/semgrep Docker image directly. The method to add the Docker image varies based on the CI provider. This method is used in the BitBucket Pipelines code snippet.
Install semgrep
within your CI job
If you cannot use the Semgrep Docker image, install Semgrep as a step or command within your CI job:
- Add
pip3 install semgrep
into the configuration file as a step or command, depending on your CI provider's syntax. - Run any valid
semgrep ci
command, such assemgrep ci --config auto
.
This method is used in the Jenkins CI code snippet.
Configuring the CI job
The following sections describe methods to customize your CI job.
Passing or failing the CI job
By default, a Semgrep CI job exits with exit code 1 if the scan returns any findings. This causes the job to fail.
Semgrep provides fail open options. These options enable you to suppress findings that block your pipeline:
semgrep ci
- Fail on blocking findings, but passes on internal errors. This is the default behavior.
semgrep ci --no-suppress-errors
- The Semgrep CI job fails on blocking findings and on internal errors.
semgrep ci || true
- Pass on blocking findings and on internal errors.
Refer to Semgrep exit codes to understand various internal issues that cause Semgrep to fail.
Diff-aware scanning
- Diff-aware scanning is automatically configured for GitHub Actions and GitLab CI/CD when the user runs Semgrep on PR or MR events. Do not set
SEMGREP_BASELINE_REF
for GitHub Actions or GitLab CI/DD. - For other CI providers, Semgrep Cloud Platform provides a full scan configuration by default. You can set up both diff-aware scanning and full scans through either of the following:
- Create separate jobs for diff-aware scans and full scans.
- If your CI provider supports conditional statements, use an if/then statement that detects the presence of
SEMGREP_BASELINE_REF
.
Semgrep scans can be classified by scope. The scope of a scan refers to what lines of code are scanned in a codebase. When classifying scans by scope, there are two types of scans:
- Full scan
A full scan runs on your entire codebase and reports every finding in the codebase. It is recommended to perform a full scan of your
main
branch at a regular cadence, such as every night or every week. This ensures that Semgrep Cloud Platform has a full list of all findings in your code base, regardless of when they were introduced. To run a full scan, runsemgrep ci
without setting theSEMGREP_BASELINE_REF
environment variable.- Diff-aware scan
A diff-aware scan runs on your code before and after some "baseline" and only reports findings that are newly introduced in the commits after that baseline.
For example, imagine a hypothetical repository with 10 commits. You set commit number 8 as the baseline. Consequently, Semgrep only returns scan results introduced by changes in commits 9 and 10. This is how
semgrep ci
can run in pull requests and merge requests, since it reports only the findings that are created by those code changes. To run a diff-aware scan, useSEMGREP_BASELINE_REF=REF semgrep ci
where REF can be a commit hash, branch name, or other Git reference.
- Do not perform diff-aware scans on your
main
branch. Semgrep Cloud Platform keeps track of which findings have been fixed on a given branch. If you configure diff-aware scans on your main branch, and compare the last commit to the penultimate commit, Semgrep wrongly considers all findings from before the penultimate commit to be fixed. - Do not perform full scans on non-mainline or non-trunk branches. Performing full scans on every branch slows down your CI jobs, displays findings that developers did not introduce, and results in many duplicated findings in Semgrep Cloud Platform, resulting in a poorer experience.
Examples of SEMGREP_BASELINE_REF
To only report findings newly added since branching off from your main
branch, set the following:
SEMGREP_BASELINE_REF=main
To only report findings newly added after a specific commit, set the following:
SEMGREP_BASELINE_REF=INSERT_GIT_COMMIT_HASH
Setting a scan schedule
The following table is a summary of methods and resources to set up schedules for different CI providers.
CI provider | Where to set schedule | Resource |
---|---|---|
GitHub Actions | Within semgrep.yml file | Sample code snippet |
GitLab CI/CD | Within GitLab CI/CD interface | Official documentation |
Jenkins | Within Jenkins interface | Official documentation |
BitBucket Pipelines | Within BitBucket Pipelines interface | Official documentation |
CircleCI | Within CircleCI interface | Official documentation |
Buildkite | Within Buildkite interface | Official documentation |
Azure Pipelines | Within Pipelines interface (recommended) | Official documentation |
Customizing rules and rulesets
Adding rules to scan with semgrep ci
semgrep ci
accepts a list of rules and rulesets to run on each scan. The rules and rulesets can come from the Semgrep Registry, or your own rules. The sources for rules to scan with are:
- A
.semgrep
folder located at the root of your repository. - The value of the
SEMGREP_RULES
environment variable.
The SEMGREP_RULES
environment variable accepts a list of local and remote rules and rulesets to run. The SEMGREP_RULES
list is delimited by a space (
) if the variable is exported from a shell command or script block. For example, see the following BitBucket Pipeline snippet:
# ...
script:
- export SEMGREP_RULES="p/nginx p/ci no-exec.yml"
- semgrep ci
# ...
The line defining SEMGREP_RULES
defines three different sources, delimited by a space:
- export SEMGREP_RULES="p/nginx p/ci no-exec.yml"
The example references two rulesets from Semgrep Registry (p/nginx
and p/ci
) and a rule available in the repository (no-exec.yml
).
If the SEMGREP_RULES
environment variable is defined from a YAML block, the list of rules and rulesets to run is delimited by a newline. See the following example of a GitLab CI/CD snippet:
# ...
variables:
SEMGREP_RULES: >-
p/nginx
p/ci
no-exec.yml
# ...
Writing your own rules
Write custom rules to enforce your team's coding standards and security practices. Rules can be forked from existing community-written rules.
See Writing rules to learn how to write custom rules.
Ignoring files
By default semgrep ci
skips files and directories such as tests/
, node_modules/
, and vendor/
. It uses the default .semgrepignore
file which you can find in the Semgrep GitHub repository. This default is used when no explicit .semgrepignore
file is found in the root of your repository.
Optional: Copy and commit the default .semgrepignore
file to the root of your repository and extend it with your own entries or write your .semgrepignore
file from scratch. If Semgrep detects a .semgrepignore
file within your repository, it does not append entries from the default .semgrepignore
file.
For a complete example, see the .semgrepignore file in Semgrep’s source code.
.semgrepignore
is only used by Semgrep. Integrations such as GitLab's Semgrep SAST Analyzer do not use it.
For information on ignoring individual findings in code, see the Ignoring findings page.
Saving or exporting findings to a file
To save or export findings, pass file format options and send the formatted findings to a file.
For example, to save to a JSON file:
semgrep ci --json > findings.json
You can also use the SARIF format:
semgrep ci --sarif > findings.sarif
Refer to the CLI reference for output formats.
Migrating to Semgrep Cloud Platform from a stand-alone CI setup
Migrate to Semgrep Cloud Platform to:
- View and manage findings in a centralized location. False positives can be ignored through triage actions.These actions can be undertaken in bulk.
- Configure rules and actions to undertake when a finding is generated by the rule. You can undertake the following actions:
- Audit the rule. This means that findings are kept within Semgrep's Findings page and are not surfaced to your team's SCM.
- Show the finding to your team through the use of PR and MR comments.
- Block the pull or merge request.
To migrate to Semgrep Cloud Platform:
- Create an account in Semgrep Cloud Platform.
- Click Projects > Scan New Project > Run scan in CI.
- Follow the steps in the setup to complete your migration.
- Optional: If you have previously set a custom
SEMGREP_TIMEOUT
environment variable, commit it to the CI configuration file created by Semgrep Cloud Platform. Do not copySEMGREP_RULES
. - Optional: Remove the old CI job that does not use Semgrep Cloud Platform.
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.