Skip to main content

Troubleshooting Semgrep issues in CI

This document outlines troubleshooting steps for issues related to Semgrep scans in a CI environment. Refer to the following sections if you're seeing results reported on files that have not changed since the last scan, frequent timeouts, or other issues.

For issues on deployment or CI configuration, such as adding repositories, see the knowledge base articles in Semgrep in CI.

Reproducing the issue locally

To aid in debugging, you can reproduce some aspects of your Semgrep CI job locally. This enables you to inspect the logs and behavior through your terminal rather than in your CI provider's interface. Perform the following steps:

  1. Run the following command in your terminal:
    semgrep login
  2. After logging in, return to the CLI and run the following code:
    SEMGREP_REPO_NAME=your-organization/repository-name semgrep ci
    For example, given a GitHub repository is vulncorp/juice-shop, the full command would be:
SEMGREP_REPO_NAME=vulncorp/juice-shop semgrep ci

When running semgrep ci, Semgrep fetches rules and any other configurations specific to your CI environment. Setting SEMGREP_REPO_NAME is optional, but ensures that:

  • Results are sent to the same project (repository) in Semgrep AppSec Platform.
  • Any project-specific configurations, such as file ignores, are also respected.

Troubleshooting GitHub

The first piece of information that the team at Semgrep uses are the GitHub Actions logs.

To retrieve a log, perform the following steps:

  1. Navigate to the main page of the GitHub repository you are troubleshooting or scanning.
  2. Click the Actions tab. actions-tab
  3. In the Actions page, click the Semgrep workflow run that you want to retrieve logs for. The name depends on your configuration. By default, it is named Semgrep.
    tip

    Your repository may have different workflow runs, such as linters. To quickly browse through workflow runs, you can also click the name of your workflow, typically Semgrep under Actions in the navigation bar to view only Semgrep runs.

  4. Click the job name, typically semgrep/ci.
  5. You are taken to the specific job page. Click the gear icon > Download log archive. Retrieve a GitHub Actions log.

You have successfully downloaded a GitHub Actions log. You can send this as part of your ticket to Semgrep Support.

Troubleshooting GitLab SAST

GitLab SAST includes and maintains a Semgrep integration called semgrep-sast for vulnerability finding.

tip

Please visit GitLab’s SAST troubleshooting guide for help with general GitLab SAST issues.

The semgrep-sast CI job is slow

The semgrep-sast job should take less than a minute to scan a large project with 50k lines of Python and TypeScript code. If you see worse performance, please reach out to the Semgrep maintainers for help with tracking down the cause. Long runtimes are typically caused by just one rule or source code file taking too long. You can also try these solutions:

Solution #1: Review global CI job configuration

You might be creating large files or directories in your GitLab CI config's before_script:, cache:, or similar sections. The semgrep-sast job scans all files available to it, not just the source code committed to git, so if for example you have a cache configuration of

cache:
paths:
- node_modules/

you should prevent those files from being scanned by disabling caching for the semgrep-sast job like this:

semgrep-sast:
cache: {}

Solution #2: Exclude large paths

If you know which large files might be taking too long to scan, you can use GitLab SAST's path exclusion feature to skip files or directories matching given patterns.

  • SAST_EXCLUDED_PATHS: "*.py" will ignore the paths at: foo.py, src/foo.py, foo.py/bar.sh.
  • SAST_EXCLUDED_PATHS: "tests" will ignore tests/foo.py as well as a/b/tests/c/foo.py.

You can use a comma separated list to ignore multiple patterns: SAST_EXCLUDED_PATHS: "*.py, tests" would ignore all of the above paths.

semgrep-sast reports false positives or false negatives

If you're not getting results where you should, or you get too many results, the problem might be with the patterns Semgrep scans for. Semgrep search patterns look just like the source code they're meant to find, so they are easy to learn and update.

You can review the search patterns in the rules directory of the semgrep-sast analyzer and report issues to the GitLab team. Refer to the Semgrep rule writing tutorial to help better understand these rule files. You can also refer to the Semgrep Registry which is a collection of 2,000+ Semgrep rules curated by Semgrep, Inc.

semgrep-sast crashes, fails, or is otherwise broken

Semgrep prints an error message to explain what went wrong upon crashes, and often also what to do to fix it.

The output of Semgrep is hidden by default, but GitLab provides a way to see it by setting an environment variable:

variables:
SECURE_LOG_LEVEL: "debug"

How to get help

If you’re a GitLab customer and suspect there’s an issue with GitLab, please contact GitLab support and open a support ticket. Users of GitLab’s free plans should open a thread in the GitLab Community Forum.


Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.