- Semgrep in CI
- Community Tier
- Team & Enterprise Tier
Troubleshooting Semgrep in CI
If you're seeing results reported on files that have not changed since the last scan, frequent time outs, or other issues related to running Semgrep in CI, see instructions in the sections below.
GitHub
The first piece of information that the team at Semgrep uses are the GitHub Actions logs. You can send them to Semgrep by clicking the settings button next to search logs and then download log archive.
If this does not have the information you need, save the logs that Semgrep CI produces. On each run, Semgrep CI creates a .semgrep_logs
folder with the following information:
- The debug logs
- The output collected from Semgrep (including the timing data described below).
- If a run used a Semgrep configuration, the flat list of rules run is listed.
To collect these logs, you need to upload them as an artifact. Modify your workflow to match the following:
semgrep:
name: semgrep with managed policy
runs-on: ubuntu-20.04
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v3
- run: semgrep ci
- name: package-logs
if: always()
run: tar czf logs.tgz ~/.semgrep/last.log
- name: upload-logs
if: always()
uses: actions/upload-artifact@v3
with:
name: logs.tgz
path: logs.tgz
retention-days: 1
Retrieving Semgrep CI logs
When you run semgrep ci --config p/ci
logs are saved in ~/.semgrep/last.log
.
Reproducing the run locally
It is possible to reproduce some parts of Semgrep CI locally to aid in debugging through the following steps:
- Go to the API token page and create a new API token.
- Run the following command, and then paste in your API key when prompted:
semgrep login
- Run the following code:
For example,SEMGREP_REPO_NAME=your-organization/repository-name semgrep ci
SEMGREP_REPO_NAME=returntocorp/semgrep semgrep ci
would be used for the GitHub repositoryreturntocorp/semgrep
. As a result, Semgrep fetches the rules configured on all Semgrep Cloud Platform policies for this repository and run a local Semgrep scan using those rules.
Troubleshooting GitLab SAST
GitLab SAST includes and maintains a Semgrep integration called semgrep-sast
for vulnerability finding.
Please visit GitLab’s SAST troubleshooting guide for help with general GitLab SAST issues.
The semgrep-sast
CI job is slow
The semgrep-sast
job should take less than a minute to scan a large project with 50k lines of Python and TypeScript code. If you see worse performance, please reach out to the Semgrep maintainers for help with tracking down the cause. Long runtimes are typically caused by just one rule or source code file taking too long. You can also try these solutions:
Solution #1: Review global CI job configuration
You might be creating large files or directories in your GitLab CI config's before_script:
, cache:
, or similar sections. The semgrep-sast
job will scan all files available to it, not just the source code committed to git, so if for example you have a cache configuration of
cache:
paths:
- node_modules/
you should prevent those files from being scanned by disabling caching for the semgrep-sast
job like this:
semgrep-sast:
cache: {}
Solution #2: Exclude large paths
If you know which large files might be taking too long to scan, you can use GitLab SAST's path exclusion feature to skip files or directories matching given patterns.
SAST_EXCLUDED_PATHS: "*.py"
will ignore the paths at:foo.py
,src/foo.py
,foo.py/bar.sh
.SAST_EXCLUDED_PATHS: "tests"
will ignoretests/foo.py
as well asa/b/tests/c/foo.py
.
You can use a comma separated list to ignore multiple patterns: SAST_EXCLUDED_PATHS: "*.py, tests"
would ignore all of the above paths.
Solution #3: Upgrade to Semgrep CI
To improve performance by 10x on a typical project, you can use our own CI agent Semgrep CI directly by adding the job definition as shown on the GitLab + Semgrep page.
Semgrep CI skips scanning unchanged files in merge requests but still lets you keep your GitLab SAST workflow.
semgrep-sast
reports false positives or false negatives
If you're not getting results where you should, or you get too many results, the problem might be with the patterns Semgrep scans for. Semgrep search patterns look just like the source code they're meant to find, so they are easy to learn and update.
You can review the search patterns in the rules directory of the semgrep-sast
analyzer and report issues to the GitLab team. We have a Semgrep rule writing tutorial that will help better understand these rule files. You can also refer to the Semgrep Registry which is a collection of 2,000+ Semgrep rules curated by Semgrep, Inc.
semgrep-sast
crashes, fails, or is otherwise broken
Semgrep will print an error message to explain what went wrong upon crashes, and often also what to do to fix it.
The output of Semgrep is hidden by default, but GitLab provides a way to see it by setting an environment variable:
variables:
SECURE_LOG_LEVEL: "debug"
Help us to guide Semgrep development
Semgrep is made by a small team, and you can directly guide our work by answering just one question below or on the form page.
How to get help
If you’re a GitLab customer and suspect there’s an issue with GitLab, please contact GitLab support and open a support ticket. Users of GitLab’s free plans should open a thread in the GitLab Community Forum.
If you suspect the issue is with Semgrep, please check the Semgrep Support page to get help from the Semgrep maintainers & community via Slack, email, or phone.
Find what you needed in this doc? Join the Semgrep Community Slack group to ask the maintainers and the community if you need help.