- Semgrep in CI
- Team & Enterprise Tier
Running Semgrep in continuous integration (CI) with Semgrep Cloud Platform
Run Semgrep in your continuous integration (CI) pipeline to scan your repository for code vulnerabilities and other issues. Connect your CI pipeline with Semgrep Cloud Platform to:
- Block pull or merge requests (PRs or MRs) based on the rule that generated the finding.
- Scan many repositories and manage their findings in bulk.
- Ignore false-positive findings from noisy rules.
- Fork existing rules to create custom rules and add them to Semgrep Cloud Platform for scanning.
This guide explains how to connect your repository to Semgrep Cloud Platform (SCP) to scan continuously.
- This guide's configuration and feature support are specific to SCP-connected CI jobs. Refer to Running Semgrep in CI without Semgrep Cloud Platform for stand-alone CI jobs.
- Semgrep 0.98.0 introduced changes to how certain CI providers fetch environment variables. Refer to the appendix at the end of this document for more information.
- Semgrep Cloud Platform creates a SAST (Static Application Security Testing) job by default. To run dependency scans exclusively, refer to Sample CI configurations.
The following video walks you through setting Semgrep in your CI through Semgrep Cloud Platform.
Semgrep Cloud Platform feature support
Support for certain features of Semgrep Cloud Platform may depend on your CI provider, source code management tool (SCM), or both. The following table breaks down the features and their availability:
Integrations with source code providers, dependent on CI provider:
Feature | GitHub with GitHub Actions | GitLab with GL CI/CD | GitHub, GitLab, or BitBucket with other CI providers |
---|---|---|---|
Diff-aware scanning | ✅ | ✅ | ✅ (May need additional set up) |
Hyperlinks | ✅ | ✅ | ✅ (May need additional set up) |
PR or MR comments | ✅ | ✅ | ✅ (May need additional set up) |
SCM security dashboard | ✅ GitHub Advanced Security Dashboard | ✅ GitLab Security Dashboard | ❌ No |
For example, if you use CircleCI as your CI provider on a GitHub repository, SCP does not have any support for GitHub Advanced Security Dashboard.
The following list defines the above features.
- Diff-aware scanning
- Semgrep Cloud Platform can scan only changes in files when running on a pull or merge request (PR or MR). This keeps the scan fast and reduces finding duplication.
- Hyperlinks to code
- Semgrep Cloud Platform collects findings in a Findings page. In this page, you can click on a finding to return to your SCM (Github, GitLab, or Bitbucket) to view the lines of code in your repository that generated the finding.
- Receiving results (findings) as PR or MR comments
- This feature enables you to receive PR or MR comments from Semgrep Cloud Platform on the lines of code that generated a finding.
- SCM security dashboard
- Send Semgrep findings to your SCM's security dashboard.
- Your code does not leave your environment and is not sent to Semgrep Cloud Platform servers.
- Semgrep Cloud Platform collects findings data, which includes the line number of the code match, not the code. It is hashed using a one-way hashing function. Findings data is used to generate hyperlinks and support other Semgrep functions.
Setting up the CI job and Semgrep Cloud Platform connection
Figure 1. Steps to run Semgrep in CI with Semgrep Cloud Platform.
The next sections provide guidance for specific CI providers.
CI providers listed in Semgrep Cloud Platform
This section applies to the following providers:
- GitHub Actions
- GitLab CI/CD
- Jenkins
- Bitbucket Pipelines
- CircleCI
- Buildkite
- Azure Pipelines
In-app providers are explicitly listed in Semgrep Cloud Platform, and Semgrep Cloud Platform can generate CI configuration files to commit in your repository.
GitHub, GitLab, and BitBucket SCMs are compatible with the above mentioned CI providers, but steps and feature enablement may vary for on-premise, self-hosted, or virtual private cloud (VPC) deployments, such as GitHub Enterprise Server.
To set up the CI job and connect with Semgrep Cloud Platform:
- Sign in to Semgrep Cloud Platform. See Signing in to Semgrep Cloud Platform for details on requested repository permissions and access.
- Click Projects > Scan New Project > Run Scan in CI.
- Select your CI provider from the menu.
- Optional: Some providers can ask you to select your organization if applicable to your SCM tool.
- Follow the steps outlined in the page:
- Optional: Additional permissions may be requested for Semgrep Cloud Platform to perform certain actions in your SCM tool, such as GitHub. If you prefer not to grant these permissions, Semgrep Cloud Platform provides alternative instructions in the Don't want to install the app? section within the page itself.
- Click Create new API token. This is your
SEMGREP_APP_TOKEN
environment variable. - Click Copy snippet, then paste and commit the snippet into your configuration file (the filename is indicated in the page).
- Click Check connection. Semgrep Cloud Platform starts the scan.
- After verifying that Semgrep Cloud Platform is able to scan the repository, you can customize the CI job or Semgrep Cloud Platform configuration.
Sample CI configuration snippets
Refer to the following table for links to sample CI configuration snippets:
In-app CI provider | Sample CI configuration snippet |
---|---|
GitHub Actions | semgrep.yml |
GitLab CI/CD | .gitlab-ci.yml |
Jenkins | Jenkinsfile |
Bitbucket Pipelines | bitbucket-pipelines.yml |
CircleCI | config.yml |
Buildkite | pipelines.yml |
Azure Pipelines | azure-pipelines.yml |
Setting up security dashboards for GitHub and GitLab
Refer to the following sample configurations to set up security dashboards for GitHub and GitLab.
GitHub: Sample semgrep.yml
configuration file
# Name of this GitHub Actions workflow.
name: Semgrep
on:
# Scan changed files in PRs (diff-aware scanning):
pull_request: {}
jobs:
semgrep:
# User definable name of this GitHub Actions job:
name: Scan
# Only change the if you are self-hosting. See also:
# If you are self-hosting, change the following `runs-on` value:
runs-on: ubuntu-latest
container:
# A Docker image with Semgrep installed. Do not change this.
image: returntocorp/semgrep
# To skip any PR created by dependabot to avoid permission issues:
if: (github.actor != 'dependabot[bot]')
steps:
- uses: actions/checkout@v3
- run: semgrep scan --sarif --config=policy > semgrep.sarif
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
- name: Upload SARIF file for GitHub Advanced Security Dashboard
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: semgrep.sarif
if: always()
GitLab: Sample .gitlab-ci.yml
configuration snippet
semgrep:
# A Docker image with Semgrep installed.
image: returntocorp/semgrep
rules:
# Scan changed files in MRs (diff-aware scanning):
- if: $CI_MERGE_REQUEST_IID
# Scan all files on the default branch and report any findings:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
variables:
# Add the rules that Semgrep uses by setting the SEMGREP_RULES environment variable.
SEMGREP_RULES: p/default # See more rules at semgrep.dev/explore.
# Uncomment SEMGREP_TIMEOUT to set this job's timeout (in seconds):
# Default timeout is 1800 seconds (30 minutes).
# Set to 0 to disable the timeout.
# SEMGREP_TIMEOUT: 300
# Upload findings to GitLab SAST Dashboard
SEMGREP_GITLAB_JSON: "1"
script: semgrep ci --gitlab-sast > gl-sast-report.json || true
artifacts:
reports:
sast: gl-sast-report.json
Other CI providers (environment variables setup)
Other CI providers, such as Drone CI and AppVeyor, can run Semgrep continuously and connect to Semgrep Cloud Platform through the use of environment variables provided in this document. The general steps are:
- Create a
SEMGREP_APP_TOKEN
. - Add this token as a credential, secret, or token into your CI provider and CI configuration file.
- For GitHub repositories: Grant permissions for Semgrep Cloud Platform.
- Create a CI job running Semgrep and commit the updated configuration file.
- The CI job starts automatically depending on your configuration and CI provider. If the job does not start, run the job by committing code or creating a pull request (PR) or merge request (MR).
- Semgrep detects the
SEMGREP_APP_TOKEN
, sends it to Semgrep Cloud Platform for verification, and if verified, findings are sent to Semgrep Cloud Platform. - Define additional environment variables to enable other Semgrep Cloud Platform features. This is done last because it is easier to set up and troubleshoot CI jobs after ensuring that the CI job runs correctly.
The next sections go over these steps in detail.
Creating a SEMGREP_APP_TOKEN
To create a SEMGREP_APP_TOKEN
, follow these steps:
- Sign in to Semgrep Cloud Platform.
- Click Settings > Tokens.
- Click Create new token.
- Copy the name and value, then click Update.
- Store the token value into your CI provider. Tokens can also be referred to as
secrets
,credentials
, orsecure variables
. The steps to do this vary depending on your CI provider. - Add the
SEMGREP_APP_TOKEN
environment variable into your Semgrep CI job. Refer to your CI provider's documentation for the correct syntax. You can also see the examples in Create a CI job.
Granting permissions for Semgrep Cloud Platform (GitHub repositories only)
Perform these steps before committing your CI job configuration to ensure that Semgrep Cloud Platform has the necessary permissions to scan your code.
Follow these steps for GitHub permissions access:
- Go to the Semgrep application within GitHub Marketplace.
- Click on Install it for free. Follow the instructions to begin the installation.
- Once
semgrep-app
is installed, select what repositoriessemgrep-app
can access. Select All repositories or Only select repositories. - Click Install & Authorize to finalize your installation.
Creating a CI job running Semgrep
- Add Semgrep to your CI pipeline. Do either of the following:
- Reference or add the Semgrep Docker image. This is the recommended method.
- Add
pip install semgrep
into your configuration file as a step or command, depending on your CI provider's syntax.
- Add
semgrep ci
as a step or command. - Set the
SEMGREP_APP_TOKEN
environment variable within your configuration file.
The following example is a bitbucket-pipelines.yml
file that adds Semgrep through the Docker image:
Add Semgrep through the Docker image.
image: atlassian/default-image:latest
pipelines:
default:
- parallel:
- step:
name: 'Run Semgrep scan with current branch'
deployment: dev
# Reference the Semgrep Docker image:
image: returntocorp/semgrep
script:
# You need to set the token as an environment variable
# (see Create a `SEMGREP_APP_TOKEN` section).
- export $SEMGREP_APP_TOKEN
# Run semgrep ci:
- semgrep ci
The next example is a Jenkinsfile
configuration that adds Semgrep by installing it:
Add Semgrep by installing it.
pipeline {
agent any
stages {
stage('Semgrep-Scan') {
environment {
// You need to set the token as an environment variable
// (see Create a `SEMGREP_APP_TOKEN` section).
SEMGREP_APP_TOKEN = credentials('SEMGREP_APP_TOKEN')
}
steps {
// Install and run Semgrep:
sh 'pip3 install semgrep'
sh 'semgrep ci'
}
}
}
}
Running the job
Depending on your CI provider and configuration, the job runs automatically. Otherwise, trigger the job by committing code or opening a PR or MR.
Verifying the connection between your CI job and Semgrep Cloud Platform
To verify that your Semgrep CI job is connected to Semgrep Cloud Platform:
- Go to your Semgrep Cloud Platform Projects page.
- Verify that your repository is listed on the Projects page and that Semgrep Cloud Platform is running a scan.
Refer to the following section to set up additional environment variables.
Configuring the Semgrep Cloud Platform CI job
Diff-aware scanning
- Diff-aware scanning is automatically configured for GitHub Actions and GitLab CI/CD when the user runs Semgrep on PR or MR events. Do not set
SEMGREP_BASELINE_REF
for GitHub Actions or GitLab CI/DD. - For other CI providers, Semgrep Cloud Platform provides a full scan configuration by default. You can set up both diff-aware scanning and full scans through either of the following:
- Create separate jobs for diff-aware scans and full scans.
- If your CI provider supports conditional statements, use an if/then statement that detects the presence of
SEMGREP_BASELINE_REF
.
Semgrep scans can be classified by scope. The scope of a scan refers to what lines of code are scanned in a codebase. When classifying scans by scope, there are two types of scans:
- Full scan
A full scan runs on your entire codebase and reports every finding in the codebase. It is recommended to perform a full scan of your
main
branch at a regular cadence, such as every night or every week. This ensures that Semgrep Cloud Platform has a full list of all findings in your code base, regardless of when they were introduced. To run a full scan, runsemgrep ci
without setting theSEMGREP_BASELINE_REF
environment variable.- Diff-aware scan
A diff-aware scan runs on your code before and after some "baseline" and only reports findings that are newly introduced in the commits after that baseline.
For example, imagine a hypothetical repository with 10 commits. You set commit number 8 as the baseline. Consequently, Semgrep only returns scan results introduced by changes in commits 9 and 10. This is how
semgrep ci
can run in pull requests and merge requests, since it reports only the findings that are created by those code changes. To run a diff-aware scan, useSEMGREP_BASELINE_REF=REF semgrep ci
where REF can be a commit hash, branch name, or other Git reference.
- Do not perform diff-aware scans on your
main
branch. Semgrep Cloud Platform keeps track of which findings have been fixed on a given branch. If you configure diff-aware scans on your main branch, and compare the last commit to the penultimate commit, Semgrep wrongly considers all findings from before the penultimate commit to be fixed. - Do not perform full scans on non-mainline or non-trunk branches. Performing full scans on every branch slows down your CI jobs, displays findings that developers did not introduce, and results in many duplicated findings in Semgrep Cloud Platform, resulting in a poorer experience.
Examples of SEMGREP_BASELINE_REF
To only report findings newly added since branching off from your main
branch, set the following:
SEMGREP_BASELINE_REF=main
To only report findings newly added after a specific commit, set the following:
SEMGREP_BASELINE_REF=INSERT_GIT_COMMIT_HASH
Enabling hyperlinks to code
Hyperlinks are automatically enabled for all CI providers listed in Semgrep Cloud Platform.
Hyperlinks enable you to view the code that generated the finding from within your repository.
Figure 2. Partial screenshot of findings page with no hyperlinks.
Figure 3. Partial screenshot of findings page with hyperlinks.
To enable hyperlinks, additional environment variables must be added into your CI configuration file. The following example provides sample values that the environment variables accept. You can substitute these values with variables following your CI provider's syntax.
SEMGREP_REPO_NAME="foo/bar"
SEMGREP_REPO_URL="https://github.com/foo/bar"
SEMGREP_BRANCH="feature/add-new-bugs"
SEMGREP_JOB_URL="https://ci-server.com/jobs/1234"
SEMGREP_COMMIT="a52bc1ef"
SEMGREP_PR_ID="44"
Receiving PR or MR comments
To receive PR or MR comments in your repository, follow the steps to enable hyperlinks.
Test that comments are sent by adding rules to your Policy's Comment or Block modes. These rules must match some code in your codebase to generate a finding.
To configure PR or MR comments, review Alerts and notifications documentation.
Only rules in the Comment and Block modes of your Policies create the PR or MR comments. Rules from the Block column also block the PR or MR pipeline. To unblock the pipeline, the detected code needs to be fixed.
Setting a custom timeout
By default, Semgrep times out after 30 minutes. To set a custom timeout for the Semgrep job, set the SEMGREP_TIMEOUT
environment variable in seconds. For example:
SEMGREP_TIMEOUT="300"
Customizing rules through Policies
Semgrep Cloud Platform's Policies page displays all rules and rulesets that are used to scan repositories. These rules are scanned based on the repository's programming language and framework as well as additional Semgrep parameters, such as ignored files.
For example, given five repositories each with different programming languages, Semgrep only scans using rules and rulesets for that repository's language that are in the Policies page.
Semgrep's speed is not affected by having multiple rules for different languages in the Policies page.
You may select rules and rulesets from your own rules, your organization's rules, or rules from the Registry.
The Policies page uses rule modes to determine what actions to undertake when a finding is generated by the rule. Users are able to select the following rule modes:
- Monitor
- Rules set to Monitor mode show findings only on Semgrep Cloud Platform, without notifying developers.
- Comment
- Rules set to Comment mode show findings to developers through PR or MR comments.
- Block
- Rules set to Block mode prevent merges and commits, in addition to showing findings in Semgrep Cloud Platform and PRs or MRs.
To add rules and rulesets to your Policies page:
- Click Policies on the left sidebar.
- Click Add Rules. You are taken to Semgrep Registry.
- Enter a search term in the Registry search bar or browse to find rulesets and rules.
- When you have found a rule to add, click on the rule's card.
- Click Add to Policy.
- Select what rule mode to set the rule to.
For more information on operations such as filtering and deleting as well as Policies management, see Policies.
Setting a scan schedule
The following table is a summary of methods and resources to set up schedules for different CI providers.
CI provider | Where to set schedule | Resource |
---|---|---|
GitHub Actions | Within semgrep.yml file | Sample code snippet |
GitLab CI/CD | Within GitLab CI/CD interface | Official documentation |
Jenkins | Within Jenkins interface | Official documentation |
BitBucket Pipelines | Within BitBucket Pipelines interface | Official documentation |
CircleCI | Within CircleCI interface | Official documentation |
Buildkite | Within Buildkite interface | Official documentation |
Azure Pipelines | Within Pipelines interface (recommended) | Official documentation |
Ignoring files
By default semgrep ci
skips files and directories such as tests/
, node_modules/
, and vendor/
. It uses the default .semgrepignore
file which you can find in the Semgrep GitHub repository. This default is used when no explicit .semgrepignore
file is found in the root of your repository.
Optional: Copy and commit the default .semgrepignore
file to the root of your repository and extend it with your own entries or write your .semgrepignore
file from scratch. If Semgrep detects a .semgrepignore
file within your repository, it does not append entries from the default .semgrepignore
file.
For a complete example, see the .semgrepignore file in Semgrep’s source code.
.semgrepignore
is only used by Semgrep. Integrations such as GitLab's Semgrep SAST Analyzer do not use it.
For information on ignoring individual findings in code, see the Ignoring findings page.
Appendix
Compatibility of environment variables
Starting from Semgrep 0.98.0, Semgrep Cloud Platform can fetch values of environment variables for CI providers listed in Semgrep Cloud Platform. Therefore, not all CI providers need the same environment variables.
To help troubleshoot the features in this guide, ensure that you have updated your Semgrep installation.
Environment variable | Function | Affected CI providers |
---|---|---|
SEMGREP_APP_TOKEN | Establishes a connection to Semgrep Cloud Platform. | Required to enable Semgrep Cloud Platform for all CI providers. |
SEMGREP_BASELINE_REF | Enable diff-aware scanning. | Required to enable diff-aware scanning for CI providers except GitHub Actions or GitLab CI/CD. |
SEMGREP_TIMEOUT | Set the Semgrep job's timeout. | Optional for all CI providers. |
SEMGREP_REPO_NAME | Enables hyperlinks to your codebase from Semgrep Cloud Platform and the creation of PR or MR comments. | Set these environment variables as needed to troubleshoot broken links for any CI provider except GitHub Actions and GitLab CI/CD. |
SEMGREP_REPO_URL | ||
SEMGREP_BRANCH | ||
SEMGREP_JOB_URL | ||
SEMGREP_COMMIT | ||
SEMGREP_PR_ID | Required to enable hyperlinks and PR or MR comments for Azure Pipelines. |
Examples of other CI providers not listed in Semgrep Cloud Platform
The following CI providers have been tested by the community to run with Semgrep Cloud Platform:
- AppVeyor
- Bamboo
- Bitrise
- Buildbot
- Codeship
- Codefresh
- Drone CI
- Nomad
- TeamCity CI
- Travis CI
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.