Skip to main content
  • Semgrep in CI
  • Community Tier
  • Team & Enterprise Tier

Running Semgrep in continuous integration (CI) with Semgrep Cloud Platform

Run Semgrep in your continuous integration (CI) pipeline to scan your repository for code vulnerabilities and other issues. Connect your CI pipeline with Semgrep Cloud Platform to:

  • Block pull or merge requests (PRs or MRs) based on the rule that generated the finding.
  • Scan many repositories and manage their findings in bulk.
  • Ignore false-positive findings from noisy rules.
  • Fork existing rules to create custom rules and add them to Semgrep Cloud Platform for scanning.

This guide explains how to connect your repository to Semgrep Cloud Platform (SCP) to scan continuously.

info
  • This guide's configuration and feature support are specific to SCP-connected CI jobs. Refer to Running Semgrep in CI without Semgrep Cloud Platform for stand-alone CI jobs.
  • Semgrep 0.98.0 introduced changes to how certain CI providers fetch environment variables. Refer to the appendix at the end of this document for more information.
  • Semgrep Cloud Platform creates a SAST (Static Application Security Testing) job by default. To run dependency scans exclusively, refer to Sample CI configurations.

The following video walks you through setting Semgrep in your CI through Semgrep Cloud Platform.

Semgrep Cloud Platform feature support

Support for certain features of Semgrep Cloud Platform may depend on your CI provider, source code management tool (SCM), or both. The following table breaks down the features and their availability:

Integrations with source code providers, dependent on CI provider:

FeatureGitHub with GitHub ActionsGitLab with GL CI/CDGitHub, GitLab, or BitBucket with other CI providers
Diff-aware scanning✅ (May need additional set up)
Hyperlinks✅ (May need additional set up)
PR or MR comments✅ (May need additional set up)
SCM security dashboard✅ GitHub Advanced Security Dashboard✅ GitLab Security Dashboard❌ No

For example, if you use CircleCI as your CI provider on a GitHub repository, SCP does not have any support for GitHub Advanced Security Dashboard.

The following list defines the above features.

Diff-aware scanning
Semgrep Cloud Platform can scan only changes in files when running on a pull or merge request (PR or MR). This keeps the scan fast and reduces finding duplication.
Hyperlinks to code
Semgrep Cloud Platform collects findings in a Findings page. In this page, you can click on a finding to return to your SCM (Github, GitLab, or Bitbucket) to view the lines of code in your repository that generated the finding.
Receiving results (findings) as PR or MR comments
This feature enables you to receive PR or MR comments from Semgrep Cloud Platform on the lines of code that generated a finding.
SCM security dashboard
Send Semgrep findings to your SCM's security dashboard.
note
  • Your code does not leave your environment and is not sent to Semgrep Cloud Platform servers.
  • Semgrep Cloud Platform collects findings data, which includes the line number of the code match, not the code. It is hashed using a one-way hashing function. Findings data is used to generate hyperlinks and support other Semgrep functions.

Setting up the CI job and Semgrep Cloud Platform connection

Steps to run Semgrep in CI without Semgrep Cloud Platform Figure 1. Steps to run Semgrep in CI with Semgrep Cloud Platform.

The next sections provide guidance for specific CI providers.

CI providers listed in Semgrep Cloud Platform

This section applies to the following providers:

  • GitHub Actions
  • GitLab CI/CD
  • Jenkins
  • Bitbucket Pipelines
  • CircleCI
  • Buildkite
  • Azure Pipelines

In-app providers are explicitly listed in Semgrep Cloud Platform, and Semgrep Cloud Platform can generate CI configuration files to commit in your repository.

Screenshot of Projects page CI provider modal list

note

GitHub, GitLab, and BitBucket SCMs are compatible with the above mentioned CI providers, but steps and feature enablement may vary for on-premise, self-hosted, or virtual private cloud (VPC) deployments, such as GitHub Enterprise Server.

To set up the CI job and connect with Semgrep Cloud Platform:

  1. Sign in to Semgrep Cloud Platform. See Signing in to Semgrep Cloud Platform for details on requested repository permissions and access.
  2. Click Projects > Scan New Project > Run Scan in CI.
  3. Select your CI provider from the menu.
  4. Optional: Some providers can ask you to select your organization if applicable to your SCM tool.
  5. Follow the steps outlined in the page:
    1. Optional: Additional permissions may be requested for Semgrep Cloud Platform to perform certain actions in your SCM tool, such as GitHub. If you prefer not to grant these permissions, Semgrep Cloud Platform provides alternative instructions in the Don't want to install the app? section within the page itself.
    2. Click Create new API token. This is your SEMGREP_APP_TOKEN environment variable.
    3. Click Copy snippet, then paste and commit the snippet into your configuration file (the filename is indicated in the page).
    4. Click Check connection. Semgrep Cloud Platform starts the scan.
  6. After verifying that Semgrep Cloud Platform is able to scan the repository, you can customize the CI job or Semgrep Cloud Platform configuration.

Sample CI configuration snippets

Refer to the following table for links to sample CI configuration snippets:

In-app CI providerSample CI configuration snippet
GitHub Actionssemgrep.yml
GitLab CI/CD.gitlab-ci.yml
JenkinsJenkinsfile
Bitbucket Pipelinesbitbucket-pipelines.yml
CircleCIconfig.yml
Buildkitepipelines.yml
Azure Pipelinesazure-pipelines.yml

Setting up security dashboards for GitHub and GitLab

Refer to the following sample configurations to set up security dashboards for GitHub and GitLab.

GitHub: Sample semgrep.yml configuration file
# Name of this GitHub Actions workflow.
name: Semgrep

on:
# Scan changed files in PRs (diff-aware scanning):
pull_request: {}

jobs:
semgrep:
# User definable name of this GitHub Actions job:
name: Scan
# Only change the if you are self-hosting. See also:
# If you are self-hosting, change the following `runs-on` value:
runs-on: ubuntu-latest

container:
# A Docker image with Semgrep installed. Do not change this.
image: returntocorp/semgrep

# To skip any PR created by dependabot to avoid permission issues:
if: (github.actor != 'dependabot[bot]')

steps:
- uses: actions/checkout@v3
- run: semgrep scan --sarif --output=semgrep.sarif --config=policy
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
- name: Upload SARIF file for GitHub Advanced Security Dashboard
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: semgrep.sarif
if: always()
GitLab: Sample .gitlab-ci.yml configuration snippet
semgrep:
# A Docker image with Semgrep installed.
image: returntocorp/semgrep

rules:
# Scan changed files in MRs (diff-aware scanning):
- if: $CI_MERGE_REQUEST_IID
# Scan all files on the default branch and report any findings:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

variables:
# Add the rules that Semgrep uses by setting the SEMGREP_RULES environment variable.
SEMGREP_RULES: p/default # See more rules at semgrep.dev/explore.
# Uncomment SEMGREP_TIMEOUT to set this job's timeout (in seconds):
# Default timeout is 1800 seconds (30 minutes).
# Set to 0 to disable the timeout.
# SEMGREP_TIMEOUT: 300
# Upload findings to GitLab SAST Dashboard
SEMGREP_GITLAB_JSON: "1"
script: semgrep ci --gitlab-sast > gl-sast-report.json || true
artifacts:
reports:
sast: gl-sast-report.json

Other CI providers (environment variables setup)

Other CI providers, such as Drone CI and AppVeyor, can run Semgrep continuously and connect to Semgrep Cloud Platform through the use of environment variables provided in this document. The general steps are:

  1. Create a SEMGREP_APP_TOKEN.
  2. Add this token as a credential, secret, or token into your CI provider and CI configuration file.
  3. For GitHub repositories: Grant permissions for Semgrep Cloud Platform.
  4. Create a CI job running Semgrep and commit the updated configuration file.
  5. The CI job starts automatically depending on your configuration and CI provider. If the job does not start, run the job by committing code or creating a pull request (PR) or merge request (MR).
  6. Semgrep detects the SEMGREP_APP_TOKEN, sends it to Semgrep Cloud Platform for verification, and if verified, findings are sent to Semgrep Cloud Platform.
  7. Define additional environment variables to enable other Semgrep Cloud Platform features. This is done last because it is easier to set up and troubleshoot CI jobs after ensuring that the CI job runs correctly.

The next sections go over these steps in detail.

Creating a SEMGREP_APP_TOKEN

To create a SEMGREP_APP_TOKEN, follow these steps:

  1. Sign in to Semgrep Cloud Platform.
  2. Click Settings > Tokens.
  3. Click Create new token.
  4. Copy the name and value, then click Update.
  5. Store the token value into your CI provider. Tokens can also be referred to as secrets, credentials, or secure variables. The steps to do this vary depending on your CI provider.
  6. Add the SEMGREP_APP_TOKEN environment variable into your Semgrep CI job. Refer to your CI provider's documentation for the correct syntax. You can also see the examples in Create a CI job.

Granting permissions for Semgrep Cloud Platform (GitHub repositories only)

tip

Perform these steps before committing your CI job configuration to ensure that Semgrep Cloud Platform has the necessary permissions to scan your code.

Follow these steps for GitHub permissions access:

  1. Go to the Semgrep application within GitHub Marketplace.
  2. Click on Install it for free. Follow the instructions to begin the installation.
  3. Once semgrep-app is installed, select what repositories semgrep-app can access. Select All repositories or Only select repositories. Screenshot of GitHub authorization page for Semgrep App
  4. Click Install & Authorize to finalize your installation.

Creating a CI job running Semgrep

  1. Add Semgrep to your CI pipeline. Do either of the following:
    1. Reference or add the Semgrep Docker image. This is the recommended method.
    2. Add pip install semgrep into your configuration file as a step or command, depending on your CI provider's syntax.
  2. Add semgrep ci as a step or command.
  3. Set the SEMGREP_APP_TOKEN environment variable within your configuration file.

The following example is a bitbucket-pipelines.yml file that adds Semgrep through the Docker image:

Add Semgrep through the Docker image.
image: atlassian/default-image:latest

pipelines:
default:
- parallel:
- step:
name: 'Run Semgrep scan with current branch'
deployment: dev
# Reference the Semgrep Docker image:
image: returntocorp/semgrep
script:
# You need to set the token as an environment variable
# (see Create a `SEMGREP_APP_TOKEN` section).
- export $SEMGREP_APP_TOKEN
# Run semgrep ci:
- semgrep ci

The next example is a Jenkinsfile configuration that adds Semgrep by installing it:

Add Semgrep by installing it.
pipeline {
agent any
stages {
stage('Semgrep-Scan') {
environment {
// You need to set the token as an environment variable
// (see Create a `SEMGREP_APP_TOKEN` section).
SEMGREP_APP_TOKEN = credentials('SEMGREP_APP_TOKEN')
}
steps {
// Install and run Semgrep:
sh 'pip3 install semgrep'
sh 'semgrep ci'
}
}
}
}

Running the job

Depending on your CI provider and configuration, the job runs automatically. Otherwise, trigger the job by committing code or opening a PR or MR.

Verifying the connection between your CI job and Semgrep Cloud Platform

To verify that your Semgrep CI job is connected to Semgrep Cloud Platform:

  1. Go to your Semgrep Cloud Platform Projects page.
  2. Verify that your repository is listed on the Projects page and that Semgrep Cloud Platform is running a scan.

Refer to the following section to set up additional environment variables.

Configuring the Semgrep Cloud Platform CI job

Diff-aware scanning

info
  • Diff-aware scanning is automatically configured for GitHub Actions and GitLab CI/CD when the user runs Semgrep on PR or MR events. Do not set SEMGREP_BASELINE_REF for GitHub Actions or GitLab CI/DD.
  • For other CI providers, Semgrep Cloud Platform provides a full scan configuration by default. You can set up both diff-aware scanning and full scans through either of the following:
    • Create separate jobs for diff-aware scans and full scans.
    • If your CI provider supports conditional statements, use an if/then statement that detects the presence of SEMGREP_BASELINE_REF.

Semgrep scans can be classified by scope. The scope of a scan refers to what lines of code are scanned in a codebase. When classifying scans by scope, there are two types of scans:

Full scan

A full scan runs on your entire codebase and reports every finding in the codebase. It is recommended to perform a full scan of your main branch at a regular cadence, such as every night or every week. This ensures that Semgrep Cloud Platform has a full list of all findings in your code base, regardless of when they were introduced. To run a full scan, run semgrep ci without setting the SEMGREP_BASELINE_REF environment variable.

Diff-aware scan

A diff-aware scan runs on your code before and after some "baseline" and only reports findings that are newly introduced in the commits after that baseline.

For example, imagine a hypothetical repository with 10 commits. You set commit number 8 as the baseline. Consequently, Semgrep only returns scan results introduced by changes in commits 9 and 10. This is how semgrep ci can run in pull requests and merge requests, since it reports only the findings that are created by those code changes. To run a diff-aware scan, use SEMGREP_BASELINE_REF=REF semgrep ci where REF can be a commit hash, branch name, or other Git reference.

Flow chart of Semgrep code scanning behavior based on environment variable

caution
  • Do not perform diff-aware scans on your main branch. Semgrep Cloud Platform keeps track of which findings have been fixed on a given branch. If you configure diff-aware scans on your main branch, and compare the last commit to the penultimate commit, Semgrep wrongly considers all findings from before the penultimate commit to be fixed.
  • Do not perform full scans on non-mainline or non-trunk branches. Performing full scans on every branch slows down your CI jobs, displays findings that developers did not introduce, and results in many duplicated findings in Semgrep Cloud Platform, resulting in a poorer experience.

Examples of SEMGREP_BASELINE_REF

To only report findings newly added since branching off from your main branch, set the following:

SEMGREP_BASELINE_REF=main

To only report findings newly added after a specific commit, set the following:

SEMGREP_BASELINE_REF=INSERT_GIT_COMMIT_HASH
tip

Hyperlinks are automatically enabled for all CI providers listed in Semgrep Cloud Platform.

Hyperlinks enable you to view the code that generated the finding from within your repository.

Screenshot of findings page snippet with no hyperlinks Figure 2. Partial screenshot of findings page with no hyperlinks.

Screenshot of findings page snippet with hyperlinks Figure 3. Partial screenshot of findings page with hyperlinks.

To enable hyperlinks, additional environment variables must be added into your CI configuration file. The following example provides sample values that the environment variables accept. You can substitute these values with variables following your CI provider's syntax.

SEMGREP_REPO_NAME="foo/bar"
SEMGREP_REPO_URL="https://github.com/foo/bar"
SEMGREP_BRANCH="feature/add-new-bugs"
SEMGREP_JOB_URL="https://ci-server.com/jobs/1234"
SEMGREP_COMMIT="a52bc1ef"
SEMGREP_PR_ID="44"

Receiving PR or MR comments

To receive PR or MR comments in your repository, follow the steps to enable hyperlinks. Verify that comments are sent by adding rules to your Rule Board's Comment or Block columns that can match code to generate a finding. To configure PR or MR comments, review Alerts and notifications documentation.

info

Only rules in the Comment and Block columns of your Rule board create the PR or MR comments. Rules from the Block column also block the PR or MR pipeline. To unblock the pipeline, the detected code needs to be fixed.

Setting a custom timeout

By default, Semgrep times out after 30 minutes. To set a custom timeout for the Semgrep job, set the SEMGREP_TIMEOUT environment variable in seconds. For example:

SEMGREP_TIMEOUT="300"

Customizing rules through the Rule Board

Semgrep Cloud Platform's Rule Board displays all rules and rulesets that are used to scan repositories. These rules are scanned based on the repository's programming language and framework as well as additional Semgrep parameters, such as ignored files.

For example, given five repositories each with different programming languages, the Rule Board only scans using rules and rulesets for that repository's language that are in the Rule Board.

Semgrep's speed is not affected by having multiple rules for different languages in the Rule Board.

You may select rules and rulesets from your own rules, your organization's rules, or rules from the Registry.

Screenshot of Rule board

The Rule Board is composed of three columns:

Monitor
Rules here show findings only on Semgrep Cloud Platform.
Comment
Rules here show findings to developers through PRs or MRs.
Block
Rules here show block merges and commits, in addition to showing findings in Semgrep Cloud Platform and PRs or MRs.

To add rules and rulesets to your Rule Board:

  1. Click Rule Board on the left sidebar.
  2. Click Add Rules. A right-side drawer appears.
  3. Type in a search term relevant to your codebase's framework or programming language.
  4. Drag a card from the search results to the appropriate column.
  5. Select Save changes.

For more information on operations such as filtering and deleting as well as Rule board management, see Rule board.

Setting a scan schedule

The following table is a summary of methods and resources to set up schedules for different CI providers.

CI providerWhere to set scheduleResource
GitHub ActionsWithin semgrep.yml fileSample code snippet
GitLab CI/CDWithin GitLab CI/CD interfaceOfficial documentation
JenkinsWithin Jenkins interfaceOfficial documentation
BitBucket PipelinesWithin BitBucket Pipelines interfaceOfficial documentation
CircleCIWithin CircleCI interfaceOfficial documentation
BuildkiteWithin Buildkite interfaceOfficial documentation
Azure PipelinesWithin Pipelines interface (recommended)Official documentation

Ignoring files

By default semgrep ci skips files and directories such as tests/, node_modules/, and vendor/. It uses the default .semgrepignore file which you can find in the Semgrep GitHub repository. This default is used when no explicit .semgrepignore file is found in the root of your repository.

Optional: Copy and commit the default .semgrepignore file to the root of your repository and extend it with your own entries or write your .semgrepignore file from scratch. If Semgrep detects a .semgrepignore file within your repository, it does not append entries from the default .semgrepignore file.

For a complete example, see the .semgrepignore file in Semgrep’s source code.

caution

.semgrepignore is only used by Semgrep. Integrations such as GitLab's Semgrep SAST Analyzer do not use it.

For information on ignoring individual findings in code, see the Ignoring findings page.

Appendix

Compatibility of environment variables

Starting from Semgrep 0.98.0, Semgrep Cloud Platform can fetch values of environment variables for CI providers listed in Semgrep Cloud Platform. Therefore, not all CI providers need the same environment variables.

To help troubleshoot the features in this guide, ensure that you have updated your Semgrep installation.

Environment variableFunctionAffected CI providers
SEMGREP_APP_TOKENEstablishes a connection to Semgrep Cloud Platform.Required to enable Semgrep Cloud Platform for all CI providers.
SEMGREP_BASELINE_REFEnable diff-aware scanning.Required to enable diff-aware scanning for CI providers except GitHub Actions or GitLab CI/CD.
SEMGREP_TIMEOUTSet the Semgrep job's timeout.Optional for all CI providers.
SEMGREP_REPO_NAMEEnables hyperlinks to your codebase from Semgrep Cloud Platform and the creation of PR or MR comments.Set these environment variables as needed to troubleshoot broken links for any CI provider except GitHub Actions and GitLab CI/CD.
SEMGREP_REPO_URL
SEMGREP_BRANCH
SEMGREP_JOB_URL
SEMGREP_COMMIT
SEMGREP_PR_IDRequired to enable hyperlinks and PR or MR comments for Azure Pipelines.

Examples of other CI providers not listed in Semgrep Cloud Platform

The following CI providers have been tested by the community to run with Semgrep Cloud Platform:

  • AppVeyor
  • Bamboo
  • Bitrise
  • Buildbot
  • Codeship
  • Codefresh
  • Drone CI
  • Nomad
  • TeamCity CI
  • Travis CI

Find what you needed in this doc? Join the Semgrep Community Slack group to ask the maintainers and the community if you need help.