Skip to main content
  • Semgrep in CI
  • Community Tier
  • Team & Enterprise Tier

Troubleshooting Semgrep in CI

If you're seeing results reported on files that have not changed since the last scan, frequent time outs, or other issues related to running Semgrep in CI, see instructions in the sections below.

GitHub

The first piece of information that the team at Semgrep uses are the GitHub Actions logs. You can send them to Semgrep by clicking the settings button next to search logs and then download log archive.

If this does not have the information you need, save the logs that Semgrep CI produces. On each run, Semgrep CI creates a .semgrep_logs folder with the following information:

  • The debug logs
  • The output collected from Semgrep (including the timing data described below).
  • If a run used a Semgrep configuration, the flat list of rules run is listed.

To collect these logs, you need to upload them as an artifact. Modify your workflow to match the following:

semgrep:
name: semgrep with managed policy
runs-on: ubuntu-20.04
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v3
- run: semgrep ci
- name: package-logs
if: always()
run: tar czf logs.tgz ~/.semgrep/last.log
- name: upload-logs
if: always()
uses: actions/upload-artifact@v3
with:
name: logs.tgz
path: logs.tgz
retention-days: 1

Retrieving Semgrep CI logs

When you run semgrep ci --config p/ci logs are saved in ~/.semgrep/last.log.

Reproducing the run locally

It is possible to reproduce some parts of Semgrep CI locally to aid in debugging through the following steps:

  1. Go to the API token page and create a new API token.
  2. Run the following command, and then paste in your API key when prompted:
    semgrep login
  3. Run the following code:
    SEMGREP_REPO_NAME=your-organization/repository-name semgrep ci
    For example, SEMGREP_REPO_NAME=returntocorp/semgrep semgrep ci would be used for the GitHub repository returntocorp/semgrep. As a result, Semgrep fetches the rules configured on all Semgrep Cloud Platform policies for this repository and run a local Semgrep scan using those rules.

Troubleshooting GitLab SAST

GitLab SAST includes and maintains a Semgrep integration called semgrep-sast for vulnerability finding.

tip

Please visit GitLab’s SAST troubleshooting guide for help with general GitLab SAST issues.

The semgrep-sast CI job is slow

The semgrep-sast job should take less than a minute to scan a large project with 50k lines of Python and TypeScript code. If you see worse performance, please reach out to the Semgrep maintainers for help with tracking down the cause. Long runtimes are typically caused by just one rule or source code file taking too long. You can also try these solutions:

Solution #1: Review global CI job configuration

You might be creating large files or directories in your GitLab CI config's before_script:, cache:, or similar sections. The semgrep-sast job will scan all files available to it, not just the source code committed to git, so if for example you have a cache configuration of

cache:
paths:
- node_modules/

you should prevent those files from being scanned by disabling caching for the semgrep-sast job like this:

semgrep-sast:
cache: {}

Solution #2: Exclude large paths

If you know which large files might be taking too long to scan, you can use GitLab SAST's path exclusion feature to skip files or directories matching given patterns.

  • SAST_EXCLUDED_PATHS: "*.py" will ignore the paths at: foo.py, src/foo.py, foo.py/bar.sh.
  • SAST_EXCLUDED_PATHS: "tests" will ignore tests/foo.py as well as a/b/tests/c/foo.py.

You can use a comma separated list to ignore multiple patterns: SAST_EXCLUDED_PATHS: "*.py, tests" would ignore all of the above paths.

Solution #3: Upgrade to Semgrep CI

To improve performance by 10x on a typical project, you can use our own CI agent Semgrep CI directly by adding the job definition as shown on the GitLab + Semgrep page.

Semgrep CI skips scanning unchanged files in merge requests but still lets you keep your GitLab SAST workflow.

semgrep-sast reports false positives or false negatives

If you're not getting results where you should, or you get too many results, the problem might be with the patterns Semgrep scans for. Semgrep search patterns look just like the source code they're meant to find, so they are easy to learn and update.

You can review the search patterns in the rules directory of the semgrep-sast analyzer and report issues to the GitLab team. We have a Semgrep rule writing tutorial that will help better understand these rule files. You can also refer to the Semgrep Registry which is a collection of 2,000+ Semgrep rules curated by Semgrep, Inc.

semgrep-sast crashes, fails, or is otherwise broken

Semgrep will print an error message to explain what went wrong upon crashes, and often also what to do to fix it.

The output of Semgrep is hidden by default, but GitLab provides a way to see it by setting an environment variable:

variables:
SECURE_LOG_LEVEL: "debug"

Help us to guide Semgrep development

Semgrep is made by a small team, and you can directly guide our work by answering just one question below or on the form page.

How to get help

If you’re a GitLab customer and suspect there’s an issue with GitLab, please contact GitLab support and open a support ticket. Users of GitLab’s free plans should open a thread in the GitLab Community Forum.

If you suspect the issue is with Semgrep, please check the Semgrep Support page to get help from the Semgrep maintainers & community via Slack, email, or phone.


Find what you needed in this doc? Join the Semgrep Community Slack group to ask the maintainers and the community if you need help.