Semgrepignore v2 reference
This document covers the Semgrepignore v2 target filtering system that is
currently available with the --experimental
option of the semgrep
command. It differs from the legacy v1 implementation.
referred to as "v1".
The target filtering process
A semgrep scan
command takes one or more scan roots as
arguments. The default scan root is the current folder, .
.
Scan roots are folders, individual files, or named pipes that should be
expanded into a list of regular files to be analyzed. Symbolic links are
allowed as scan roots.
Expanding a folder consists of listing its contents recursively with the following exceptions:
- Symbolic links other than the original scan roots are ignored.
- In Git projects, Git submodules are ignored.
- Paths excluded via Semgrepignore patterns are ignored. Semgrepignore patterns can be of different sources which are detailed in the upcoming section.
The list of files obtained by expanding the scan roots are called target files. To obtain target files, Semgrep follows a number of fixed rules and some configurable filters.
For each scan root, Semgrep infers a project root (v2 only). The
project root determines the location of applicable .semgrepignore
files as well as .gitignore
files in Git projects. In v1 where is no
notion of a project root, the .semgrepignore
file is unique and
looked up in the current folder.
Semgrep determines the project root for each scan root by first
obtaining the real path (physical path) to the scan root. Then,
Semgrep searches up the file hierarchy for a .git
folder or
similar used by one of the popular file version control systems
(Git, Mercurial, etc.) indicating a project root.
If no project root is found this way, it
defaults to the scan root itself if it is a folder or to its containing
folder if it is a regular file.
As an experimental debugging aid, Semgrep provides the --x-ls
option
to list the target files. --x-ls-long
additionally prints excluded
files and a brief justification. Beware that these two options are
likely to be renamed or change their behavior in the
future. Meanwhile, its typical usage is:
semgrep --x-ls
or
semgrep --x-ls --experimental
Sources of Semgrepignore patterns
A Semgrepignore pattern is a glob pattern that is matched by Semgrep against file paths to determine whether these paths should be allowed or disallowed as target files.
Semgrep looks up Semgrepignore patterns in the following places:
- command-line
--exclude
and--include
filters; - the
.semgrepignore
file in the current folder (v1 only); - all the
.semgrepignore
files in the project (v2 only); - all the
.gitignore
files in the project in Git projects (v2 only); - default Semgrepignore patterns.
These sources of filters are grouped into precedence levels. Within a precedence level, a path can be deselected and reselected any number of times. After applying all the filters within a precedence level, only the selected paths make it to the next level. There are two precedence levels:
- command-line
--exclude
and--include
filters; - default Semgrepignore patterns,
.gitignore
files,.semgrepignore
files.
For example, consider this .semgrepignore
file:
*.c
!hello.c
In the absence of --exclude
or --include
filters,
hello.c
will be first deselected by *.c
and then
reselected by the negated pattern !hello.c
.
However, if we move the *.c
exclusion pattern to the command line by
invoking semgrep --exclude *.c
,
the file hello.c
is deselected and ignored even if
the .semgrepignore
file contains !hello.c
.
In a Git project under Semgrepignore v2, .gitignore
and
.semgrepignore
files are consulted in the same order as in the
Gitignore specification. In a folder containing both a .gitignore
and a .semgrepignore
file, the .gitignore
file is read before the
.semgrepignore
file.
Default Semgrepignore patterns apply in projects that lack a main
.semgrepignore
file. In v1, the main .semgrepignore
file is
expected in the current folder. In v2, it is expected at the project
root. These default patterns are:
# Common large paths
node_modules/
build/
dist/
vendor/
.env/
.venv/
.tox/
*.min.js
.npm/
.yarn/
# Common test paths
test/
tests/
testsuite/
*_test.go
# Semgrep rules folder
.semgrep
# Semgrep-action log folder
.semgrep_logs/
Semgrepignore pattern syntax
In Semgrepignore v2, the pattern syntax conforms to the
Gitignore pattern
syntax.
They are glob patterns which support *
and **
with their usual
meanings. For example, pattern **/tmp/*.js
matches paths tmp/foo.js
and
src/tmp/bar.js
.
Note that the Gitignore specification contains subtleties associated
with determining whether a pattern is anchored (relative to the folder
containing the pattern) or floating (relative to the folder containing
the pattern or any of its subfolders). For
example, /a
and a/b
are anchored patterns but not a/
. Please
consult the Gitignore documentation for details.
As a deviation from the Gitignore syntax, Semgrepignore syntax supports
:include
directives. :include
followed by an unquoted file path
relative to the path of folder of the source .semgrepignore
file
(the current folder in v1) inserts patterns from that file.
A common use case is to insert the line :include .gitignore
at the
beginning of a .semgrepignore
file so as to avoid duplicating the
Gitignore patterns. Included files may not contain include
directives.
Legacy Semgrepignore v1
In Semgrepignore v1, the following exceptions to the v2 specification apply:
- unsupported: pattern negation with
!
- unsupported: character ranges such as
[a-z]
- only one
.semgrepignore
file is supported and it must be in the current folder
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.