Semgrepignore v2 reference

This document covers the Semgrepignore v2 target filtering system that is currently available with the --experimental option of the semgrep command. It differs from the legacy v1 implementation. referred to as "v1".

The target filtering process

A semgrep scan command takes one or more scan roots as arguments. The default scan root is the current folder, .. Scan roots are folders, individual files, or named pipes that should be expanded into a list of regular files to be analyzed. Symbolic links are allowed as scan roots.

Expanding a folder consists of listing its contents recursively with the following exceptions:

Symbolic links other than the original scan roots are ignored.
In Git projects, Git submodules are ignored.
Paths excluded via Semgrepignore patterns are ignored. Semgrepignore patterns can be of different sources which are detailed in the upcoming section.

The list of files obtained by expanding the scan roots are called target files. To obtain target files, Semgrep follows a number of fixed rules and some configurable filters.

For each scan root, Semgrep infers a project root (v2 only). The project root determines the location of applicable .semgrepignore files as well as .gitignore files in Git projects. In v1 where is no notion of a project root, the .semgrepignore file is unique and looked up in the current folder.

Semgrep determines the project root for each scan root by first obtaining the real path (physical path) to the scan root. Then, Semgrep searches up the file hierarchy for a .git folder or similar used by one of the popular file version control systems (Git, Mercurial, etc.) indicating a project root. If no project root is found this way, it defaults to the scan root itself if it is a folder or to its containing folder if it is a regular file.

caution

As an experimental debugging aid, Semgrep provides the --x-ls option to list the target files. --x-ls-long additionally prints excluded files and a brief justification. Beware that these two options are likely to be renamed or change their behavior in the future. Meanwhile, its typical usage is:

semgrep --x-ls

semgrep --x-ls --experimental

Sources of Semgrepignore patterns

A Semgrepignore pattern is a glob pattern that is matched by Semgrep against file paths to determine whether these paths should be allowed or disallowed as target files.

Semgrep looks up Semgrepignore patterns in the following places:

command-line --exclude and --include filters;
the .semgrepignore file in the current folder (v1 only);
all the .semgrepignore files in the project (v2 only);
all the .gitignore files in the project in Git projects (v2 only);
default Semgrepignore patterns.

These sources of filters are grouped into precedence levels. Within a precedence level, a path can be deselected and reselected any number of times. After applying all the filters within a precedence level, only the selected paths make it to the next level. There are two precedence levels:

command-line --exclude and --include filters;
default Semgrepignore patterns, .gitignore files, .semgrepignore files.

For example, consider this .semgrepignore file:

*.c
!hello.c

In the absence of --exclude or --include filters, hello.c will be first deselected by *.c and then reselected by the negated pattern !hello.c.

However, if we move the *.c exclusion pattern to the command line by invoking semgrep --exclude *.c, the file hello.c is deselected and ignored even if the .semgrepignore file contains !hello.c.

In a Git project under Semgrepignore v2, .gitignore and .semgrepignore files are consulted in the same order as in the Gitignore specification. In a folder containing both a .gitignore and a .semgrepignore file, the .gitignore file is read before the .semgrepignore file.

Default Semgrepignore patterns apply in projects that lack a main .semgrepignore file. In v1, the main .semgrepignore file is expected in the current folder. In v2, it is expected at the project root. These default patterns are:

# Common large paths
node_modules/
build/
dist/
vendor/
.env/
.venv/
.tox/
*.min.js
.npm/
.yarn/

# Common test paths
test/
tests/
testsuite/
*_test.go

# Semgrep rules folder
.semgrep

# Semgrep-action log folder
.semgrep_logs/

Semgrepignore pattern syntax

In Semgrepignore v2, the pattern syntax conforms to the Gitignore pattern syntax. They are glob patterns which support * and ** with their usual meanings. For example, pattern **/tmp/*.js matches paths tmp/foo.js and src/tmp/bar.js. Note that the Gitignore specification contains subtleties associated with determining whether a pattern is anchored (relative to the folder containing the pattern) or floating (relative to the folder containing the pattern or any of its subfolders). For example, /a and a/b are anchored patterns but not a/. Please consult the Gitignore documentation for details.

As a deviation from the Gitignore syntax, Semgrepignore syntax supports :include directives. :include followed by an unquoted file path relative to the path of folder of the source .semgrepignore file (the current folder in v1) inserts patterns from that file. A common use case is to insert the line :include .gitignore at the beginning of a .semgrepignore file so as to avoid duplicating the Gitignore patterns. Included files may not contain include directives.

Legacy Semgrepignore v1

In Semgrepignore v1, the following exceptions to the v2 specification apply:

unsupported: pattern negation with !
unsupported: character ranges such as [a-z]
only one .semgrepignore file is supported and it must be in the current folder

Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.

The target filtering process​

Sources of Semgrepignore patterns​

Semgrepignore pattern syntax​

Legacy Semgrepignore v1​

The target filtering process

Sources of Semgrepignore patterns

Semgrepignore pattern syntax

Legacy Semgrepignore v1