Skip to main content
  • Semgrep in CI
  • Team & Enterprise Tier

Running Semgrep in continuous integration (CI) without Semgrep Cloud Platform

Run Semgrep in your Continuous Integration (CI) pipeline to scan your repository for code vulnerabilities and other issues. This guide explains how to set up Semgrep in your pipeline without the use of Semgrep Cloud Platform, also known as a stand-alone setup.

There are three general steps to setting up Semgrep in your CI pipeline:

  1. Set up the CI job or action to scan with Semgrep and receive an exit code.
  2. Test that the Semgrep job is scanning your repository and that you are able to view scan results in your CI provider's log.
  3. Configure the CI job's parameters. It is easier to troubleshoot any parameters after testing that the job runs successfully.

Steps to run Semgrep in CI without Semgrep Cloud Platform Figure 1. Steps to run Semgrep in CI without Semgrep Cloud Platform.

This guide defines a job or CI job as a script executed within a certain environment and managed by a CI provider.

Reasons for configuring additional CI job parameters

By configuring a job's parameters, you are able to achieve the following goals:

  • Run Semgrep on a schedule. Run full scans on mainline branches at the least intrusive time on developer teams.
  • Run Semgrep with custom rules. Apply rules specific to your organization's business goals and coding conventions.
  • Run Semgrep when an event triggers. Run Semgrep when a pull or merge request (PR or MR) is created. These event triggers or event hooks are dependent on your CI provider.
  • Run Semgrep on relevant files and blocks of code. Configure Semgrep to ignore files and folders such as test files, configuration files, and files from other vendors.
  • Configure a Semgrep CI job to pass even when any finding is detected. By default, stand-alone configurations fail when any finding is detected. You can also configure Semgrep to pass CI jobs when findings are reported.
  • Output, export, or save findings to a file. Semgrep can save to a number of file formats, including SARIF and JSON.

Limitations of Semgrep stand-alone CI scans

Running Semgrep in CI without Semgrep Cloud Platform bears specific limitations compared to the use of Semgrep in CI with Semgrep Cloud Platform. See the following list:

  • Stand-alone CI jobs cannot send [PR comments]((/semgrep-cloud-platform/github-pr-comments) or MR comments. These comments describe the finding and help developers resolve vulnerabilities and other code issues.
  • Stand-alone CI jobs cannot block a pull or merge request based on a finding. There are no user-defined rule modes to distinguish between rules. Alternatively, Semgrep Cloud Platform's Policies page lets users define certain actions, such as blocking PRs or MRs on a finding generated by a reliable rule.
  • Findings are dumped to a log, so there is no record of a finding's status, such as open, ignored, or fixed.

Setting up the CI job

semgrep ci is the command used to run Semgrep in a CI environment. In most cases, this is the recommended command to run in the CI job. This command is a subset of the semgrep scan command. semgrep ci command has the following characteristics:

  • semgrep ci is git-aware. This means it is able to detect branches and git states.
  • semgrep ci makes use of environment variables to configure its behavior.
  • semgrep ci performs a full scan by default. This is the recommended behavior. Semgrep can be configured to perform a diff-aware scan.
  • semgrep ci can be run on a local .git repository at any time to test its behavior before running it within a CI environment.

Running Semgrep with template or sample CI configuration files

To integrate any CI provider in the following list, follow the instructions in Sample CI configuration.

Running Semgrep through other CI providers

Use either of the following methods to run Semgrep through other CI providers.

Direct docker usage

Reference or add the returntocorp/semgrep Docker image directly. The method to add the Docker image varies based on the CI provider. This method is used in the BitBucket Pipelines code snippet.

Install semgrep within your CI job

If you cannot use the Semgrep Docker image, install Semgrep as a step or command within your CI job:

  1. Add pip3 install semgrep into the configuration file as a step or command, depending on your CI provider's syntax.
  2. Run any valid semgrep ci command, such as semgrep ci --config auto.

This method is used in the Jenkins CI code snippet.

Configuring the CI job

The following sections describe methods to customize your CI job.

Passing or failing the CI job

By default, a Semgrep CI job exits with exit code 1 if the scan returns any findings. This causes the job to fail.

Semgrep provides fail open options. These options enable you to suppress findings that block your pipeline:

semgrep ci
Fail on blocking findings, but passes on internal errors. This is the default behavior.
semgrep ci --no-suppress-errors
The Semgrep CI job fails on blocking findings and on internal errors.
semgrep ci || true
Pass on blocking findings and on internal errors.

Refer to Semgrep exit codes to understand various internal issues that cause Semgrep to fail.

Diff-aware scanning

info
  • Diff-aware scanning is automatically configured for GitHub Actions and GitLab CI/CD when the user runs Semgrep on PR or MR events. Do not set SEMGREP_BASELINE_REF for GitHub Actions or GitLab CI/DD.
  • For other CI providers, Semgrep Cloud Platform provides a full scan configuration by default. You can set up both diff-aware scanning and full scans through either of the following:
    • Create separate jobs for diff-aware scans and full scans.
    • If your CI provider supports conditional statements, use an if/then statement that detects the presence of SEMGREP_BASELINE_REF.

Semgrep scans can be classified by scope. The scope of a scan refers to what lines of code are scanned in a codebase. When classifying scans by scope, there are two types of scans:

Full scan

A full scan runs on your entire codebase and reports every finding in the codebase. It is recommended to perform a full scan of your main branch at a regular cadence, such as every night or every week. This ensures that Semgrep Cloud Platform has a full list of all findings in your code base, regardless of when they were introduced. To run a full scan, run semgrep ci without setting the SEMGREP_BASELINE_REF environment variable.

Diff-aware scan

A diff-aware scan runs on your code before and after some "baseline" and only reports findings that are newly introduced in the commits after that baseline.

For example, imagine a hypothetical repository with 10 commits. You set commit number 8 as the baseline. Consequently, Semgrep only returns scan results introduced by changes in commits 9 and 10. This is how semgrep ci can run in pull requests and merge requests, since it reports only the findings that are created by those code changes. To run a diff-aware scan, use SEMGREP_BASELINE_REF=REF semgrep ci where REF can be a commit hash, branch name, or other Git reference.

Flow chart of Semgrep code scanning behavior based on environment variable

caution
  • Do not perform diff-aware scans on your main branch. Semgrep Cloud Platform keeps track of which findings have been fixed on a given branch. If you configure diff-aware scans on your main branch, and compare the last commit to the penultimate commit, Semgrep wrongly considers all findings from before the penultimate commit to be fixed.
  • Do not perform full scans on non-mainline or non-trunk branches. Performing full scans on every branch slows down your CI jobs, displays findings that developers did not introduce, and results in many duplicated findings in Semgrep Cloud Platform, resulting in a poorer experience.

Examples of SEMGREP_BASELINE_REF

To only report findings newly added since branching off from your main branch, set the following:

SEMGREP_BASELINE_REF=main

To only report findings newly added after a specific commit, set the following:

SEMGREP_BASELINE_REF=INSERT_GIT_COMMIT_HASH

Setting a scan schedule

The following table is a summary of methods and resources to set up schedules for different CI providers.

CI providerWhere to set scheduleResource
GitHub ActionsWithin semgrep.yml fileSample code snippet
GitLab CI/CDWithin GitLab CI/CD interfaceOfficial documentation
JenkinsWithin Jenkins interfaceOfficial documentation
BitBucket PipelinesWithin BitBucket Pipelines interfaceOfficial documentation
CircleCIWithin CircleCI interfaceOfficial documentation
BuildkiteWithin Buildkite interfaceOfficial documentation
Azure PipelinesWithin Pipelines interface (recommended)Official documentation

Customizing rules and rulesets

Adding rules to scan with semgrep ci

semgrep ci accepts a list of rules and rulesets to run on each scan. The rules and rulesets can come from the Semgrep Registry, or your own rules. The sources for rules to scan with are:

  • A .semgrep folder located at the root of your repository.
  • The value of the SEMGREP_RULES environment variable.

The SEMGREP_RULES environment variable accepts a list of local and remote rules and rulesets to run. The SEMGREP_RULES list is delimited by a space ( ) if the variable is exported from a shell command or script block. For example, see the following BitBucket Pipeline snippet:

# ...
script:
- export SEMGREP_RULES="p/nginx p/ci no-exec.yml"
- semgrep ci
# ...

The line defining SEMGREP_RULES defines three different sources, delimited by a space:

- export SEMGREP_RULES="p/nginx p/ci no-exec.yml" 

The example references two rulesets from Semgrep Registry (p/nginx and p/ci) and a rule available in the repository (no-exec.yml).

If the SEMGREP_RULES environment variable is defined from a YAML block, the list of rules and rulesets to run is delimited by a newline. See the following example of a GitLab CI/CD snippet:

# ...
variables:
SEMGREP_RULES: >-
p/nginx
p/ci
no-exec.yml
# ...

Writing your own rules

Write custom rules to enforce your team's coding standards and security practices. Rules can be forked from existing community-written rules.

See Writing rules to learn how to write custom rules.

Ignoring files

By default semgrep ci skips files and directories such as tests/, node_modules/, and vendor/. It uses the default .semgrepignore file which you can find in the Semgrep GitHub repository. This default is used when no explicit .semgrepignore file is found in the root of your repository.

Optional: Copy and commit the default .semgrepignore file to the root of your repository and extend it with your own entries or write your .semgrepignore file from scratch. If Semgrep detects a .semgrepignore file within your repository, it does not append entries from the default .semgrepignore file.

For a complete example, see the .semgrepignore file in Semgrep’s source code.

caution

.semgrepignore is only used by Semgrep. Integrations such as GitLab's Semgrep SAST Analyzer do not use it.

For information on ignoring individual findings in code, see the Ignoring findings page.

Saving or exporting findings to a file

To save or export findings, pass file format options and send the formatted findings to a file.

For example, to save to a JSON file:

semgrep ci --json > findings.json

You can also use the SARIF format:

semgrep ci --sarif > findings.sarif

Refer to the CLI reference for output formats.

Migrating to Semgrep Cloud Platform from a stand-alone CI setup

Migrate to Semgrep Cloud Platform to:

  • View and manage findings in a centralized location. False positives can be ignored through triage actions.These actions can be undertaken in bulk.
  • Configure rules and actions to undertake when a finding is generated by the rule. You can undertake the following actions:
    • Audit the rule. This means that findings are kept within Semgrep's Findings page and are not surfaced to your team's SCM.
    • Show the finding to your team through the use of PR and MR comments.
    • Block the pull or merge request.

To migrate to Semgrep Cloud Platform:

  1. Create an account in Semgrep Cloud Platform.
  2. Click Projects > Scan New Project > Run scan in CI.
  3. Follow the steps in the setup to complete your migration.
  4. Optional: If you have previously set a custom SEMGREP_TIMEOUT environment variable, commit it to the CI configuration file created by Semgrep Cloud Platform. Do not copy SEMGREP_RULES.
  5. Optional: Remove the old CI job that does not use Semgrep Cloud Platform.

Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.