Skip to main content

Troubleshooting CI scans

This document outlines troubleshooting steps for issues related to Semgrep scans in a CI environment. Refer to the following sections if you're seeing results reported on files that have not changed since the last scan, frequent timeouts, or other issues.

For issues on deployment or CI configuration, such as adding repositories, see the knowledge base articles in Semgrep in CI.

Reproducing the issue locally

To aid in debugging, you can reproduce some aspects of your Semgrep CI job locally. This enables you to inspect the logs and behavior through your terminal rather than in your CI provider's interface. Perform the following steps:

  1. Run the following command in your terminal:
    semgrep login
  2. After logging in, return to the CLI and enter the following:
    SEMGREP_REPO_NAME=your-organization/repository-name semgrep ci
    For example, given a GitHub repository vulncorp/juice-shop, the full command would be:
    SEMGREP_REPO_NAME=vulncorp/juice-shop semgrep ci

When running semgrep ci, Semgrep fetches rules and any other configurations specific to your CI environment. Setting SEMGREP_REPO_NAME is optional, but ensures that:

  • Results are sent to the same project (repository) in Semgrep AppSec Platform.
  • Any project-specific configurations, such as file ignores, are also respected.

Troubleshooting GitHub

The first piece of information that the team at Semgrep uses are the GitHub Actions logs.

To retrieve a log, perform the following steps:

  1. Navigate to the main page of the GitHub repository you are troubleshooting or scanning.
  2. Click the Actions tab. actions-tab
  3. In the Actions page, click the Semgrep workflow run that you want to retrieve logs for. The name depends on your configuration. By default, it is named Semgrep.
    tip

    Your repository may have different workflow runs, such as linters. To quickly browse through workflow runs, you can also click the name of your workflow, typically Semgrep under Actions in the navigation bar to view only Semgrep runs.

  4. Click the job name, typically semgrep/ci.
  5. You are taken to the specific job page. Click the gear icon > Download log archive. Retrieve a GitHub Actions log.

You have successfully downloaded a GitHub Actions log. You can send this as part of your ticket to Semgrep Support.

Troubleshooting GitLab SAST

GitLab SAST includes and maintains a Semgrep integration called semgrep-sast for vulnerability finding.

tip

Please visit GitLab’s SAST troubleshooting guide for help with general GitLab SAST issues.

The semgrep-sast CI job is slow

The semgrep-sast job should take less than a minute to scan a large project with 50k lines of Python and TypeScript code. If you see worse performance, please reach out to the Semgrep maintainers for help with tracking down the cause. Long runtimes are typically caused by just one rule or source code file taking too long. You can also try these solutions:

Review global CI job configuration

You might be creating large files or directories in your GitLab CI config's before_script:, cache:, or similar sections. The semgrep-sast job scans all files available to it, not just the source code committed to Git, so if for example you have a cache configuration of

cache:
paths:
- node_modules/

you should prevent those files from being scanned by disabling caching for the semgrep-sast job like this:

semgrep-sast:
cache: {}

Exclude large paths

If you know which large files might be taking too long to scan, you can use GitLab SAST's path exclusion feature to skip files or directories matching given patterns.

  • SAST_EXCLUDED_PATHS: "*.py" will ignore the paths at: foo.py, src/foo.py, foo.py/bar.sh.
  • SAST_EXCLUDED_PATHS: "tests" will ignore tests/foo.py as well as a/b/tests/c/foo.py.

You can use a comma separated list to ignore multiple patterns: SAST_EXCLUDED_PATHS: "*.py, tests" will ignore all of the preceding paths.

semgrep-sast reports false positives or false negatives

If you're not getting results where you should, or you get too many results, the problem might be with the patterns Semgrep scans for.

You can review the search patterns in the rules directory of the semgrep-sast analyzer and report issues to the GitLab team. Refer to the Semgrep rule writing tutorial to help better understand these rule files. You can also refer to the Semgrep Registry which is a collection of 2,000+ Semgrep rules curated by Semgrep, Inc.

semgrep-sast crashes, fails, or is otherwise broken

Semgrep prints an error message to explain what went wrong upon crashes, and often also what to do to fix it.

The output of Semgrep is hidden by default, but GitLab provides a way to see it by setting an environment variable:

variables:
SECURE_LOG_LEVEL: "debug"

How to get GitLab assistance

If you’re a GitLab customer and suspect there’s an issue with GitLab, please contact GitLab support and open a support ticket. Users of GitLab’s free plans should open a thread in the GitLab Community Forum.

Project-specific issues

A project is any repository you have added to Semgrep Cloud Platform for scanning. Refer to the following sections for issues in the Semgrep AppSec Platform > Projects page.

If a project reports the last scan "Never started"

This status means that your CI job never authenticated to Semgrep AppSec Platform.

Check your CI provider (such as GitHub Actions) for the latest Semgrep job execution.

If you can’t find a Semgrep CI job

The issue is likely with the CI configuration.

  • Make sure that the branch you committed a CI job to is included in the list of branches the job is triggered on.
  • Make sure that the CI configuration file has valid syntax. Most providers have a tool for checking the syntax of configuration files.

If a Semgrep CI job exists

Check the log output for any hints about what the issue is.

  • If the logs mention a missing token or an authentication failure, you can get a new token from the Settings page of Semgrep AppSec Platform, and set it as SEMGREP_APP_TOKEN in your CI provider's secret management UI.
  • Alternatively, if this is the first scan after adding a new GitHub repository, and the repository is a fork, check your Actions tab to see if workflows are enabled: Screenshot of GitHub's Actions tab with workflows disabled
    • Enable workflows by clicking I understand my workflows, go ahead and enable them to allow Semgrep to scan.

If a project reports the last scan 'Never finished'

This status means that your CI jobs start and authenticate correctly, but fail before completion.

Check your CI provider (such as GitHub Actions) for the log output of the latest Semgrep job execution. In most cases you will see an error message with detailed instructions on what to do.

If the job is aborted due to taking too long

Many CI providers have a time limit for how long a job can run. Semgrep CI also aborts itself if it runs for too long. If your CI scans regularly take too long and fail to complete:

  • Please reach out to the Semgrep team for help with tracking down the cause. Semgrep scans most large projects with hundreds of rules within a few minutes, and long run times are typically caused by just one rule or source code file taking too long.
  • To drastically cut run times, you can use Semgrep's diff-aware scanning to skip scanning unchanged files. For more details, see Semgrep's behavior.
  • You can skip scanning large and complex source code files (such as minified JS or generated code) if you know their path by adding a .semgrepignore file. See how to ignore files & directories in Semgrep CI.
  • You can increase Semgrep's own run time limit by setting a semgrep ci --timeout [SECONDS] flag, or by setting a SEMGREP_TIMEOUT=[SECONDS] environment variable.
    • To fully disable the time limit, set this value to 0.

Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.