Getting started with rule writing? Try the Semgrep Tutorial 🎓
This document describes Semgrep’s YAML rule syntax.
All required fields must be present at the top-level of a rule, immediately underneath
|Unique, descriptive identifier, e.g., |
|Message highlighting why this rule fired and how to remediate the issue|
|One of: |
|See supported languages|
|Find code matching this expression|
|Logical AND of multiple patterns|
|Logical OR of multiple patterns|
|Search files for Python |
Only one of
pattern-regex is required.
|Options object to enable/disable certain matching features|
|Simple search-and-replace autofix functionality|
|Arbitrary user-provided data; attach data to rules without affecting Semgrep’s behavior|
|Paths to include or exclude when running this rule|
The below optional fields must reside underneath a
|Search metavariables for Python |
|Matches metavariables with a pattern formula|
|Compare metavariables against basic Python expressions|
|Logical NOT - remove findings matching this expression|
|Keep findings that lie inside this pattern|
|Keep findings that do not lie inside this pattern|
|Filter results using Python |
|Remove findings matching this Python expression|
pattern operator looks for code matching its expression. This can be basic expressions like
$X == $X or unwanted function calls like
patterns operator performs a logical AND operation on one or more child patterns. This is useful for chaining multiple patterns together that all must be true.
pattern-either operator performs a logical OR operation on one or more child patterns. This is useful for chaining multiple patterns together where any may be true.
This rule looks for usage of the Python standard library functions
hashlib.sha1. Depending on their usage, these hashing functions are considered insecure.
pattern-regex operator searches files for a Python
re compatible expression. This is useful for migrating existing regular expression code search functionality to Semgrep.
pattern-regex operator can be combined with other pattern operators:
It can also be used as a standalone, top-level operator:
') and double (
") quotes behave differently in YAML syntax. Single quotes are typically preferred when using backslashes (
Note that if the regex uses groups, the metavariables
$2, etc. will be binded to the content of the captured group.
pattern-not-regex operator filters results using a Python
re regular expression. This is most useful when combined with regular-expression only rules, providing an easy way to filter findings without having to use negative lookaheads.
pattern-not-regex will work with regular
pattern clauses, too.
The syntax for this operator is the same as
This operator will filter findings that have any overlap with the supplied regular expression. For example, if you use
pattern-regex to detect
Foo==1.1.1 and it also detects
Bar-Foo==3.0.8, you can use
pattern-not-regex to filter the unwanted findings.
metavariable-regex operator searches metavariables for a Python
re compatible expression. This is useful for filtering results based on a metavariable’s value. It requires the
regex keys and can be combined with other pattern operators.
metavariable-pattern operator matches metavariables with a pattern formula. This is useful for filtering results based on a metavariable’s value. It requires the
metavariable key, and exactly one key of
pattern-regex. This operator can be nested as well as combined with other operators.
For example, it can be used to filter out matches that do not match certain criteria:
In this case it is possible to start a
patterns AND operation with a
pattern-not, because there is an implicit
pattern: ... that matches the content of the metavariable.
It is also useful in combination with
It is possible to nest
The metavariable should be bound to an expression, a statement, or a list of statements, for this test to be meaningful. A metavariable bound to a list of function arguments, a type, or a pattern, will always evaluate to false.
If the metavariable's content is a string, then it is possible to use
metavariable-pattern to match this string as code by specifying the target language via the
We can also use this feature to filter regex matches:
metavariable-comparison operator is a mapping which requires the
comparison keys. It can be combined with other pattern operators:
This will catch code like
set_port(443), but not
Comparison expressions support simple arithmetic as well as composition with boolean operators to allow for more complex matching. This is particularly useful for checking that metavariables are divisible by particular values, such as enforcing that a particular value is even or odd:
Building off of the previous example this will still catch code like
set_port(80) but will no longer catch
metavariable-comparison operator also takes optional
base: int and
strip: bool keys. These keys set the integer base the metavariable value should be interpreted as and remove quotes from the metavariable value, respectively.
This will interpret metavariable values found in code as octal, so
0700 will be detected, but
0400 will not.
This will remove quotes (
`) from both ends of the metavariable content. So
"2147483648" will be detected but
"2147483646" will not. This is useful when you expect strings to contain integer or float data.
pattern-not operator is the opposite of the
pattern operator. It finds code that does not match its expression. This is useful for eliminating common false positives.
pattern-inside operator keeps matched findings that reside within its expression. This is useful for finding code inside other pieces of code like functions or if blocks.
pattern-not-inside operator keeps matched findings that do not reside within its expression. It is the opposite of
pattern-inside. This is useful for finding code that’s missing a corresponding cleanup action like disconnect, close, or shutdown. It’s also useful for finding problematic code that isn't inside code that mitigates the issue.
The above rule looks for files that are opened but never closed, possibly leading to resource exhaustion. It looks for the
open(...) pattern and not a following
$F metavariable ensures that the same variable name is used in the
close calls. The ellipsis operator allows for any arguments to be passed to
open and any sequence of code statements in-between the
close calls. The rule ignores how
open is called or what happens up to a
close call — it only needs to make sure
close is called.
pattern-where-python is the most flexible operator. It allows for writing custom Python logic to filter findings. This is useful when none of the other operators provide the functionality needed to create a rule.
Use caution with this operator. It allows for arbitrary Python code execution.
As a defensive measure, the
--dangerously-allow-arbitrary-code-execution-from-rules flag must be passed to use rules containing
rules: - id: use-decimalfield-for-money patterns: - pattern: $FIELD = django.db.models.FloatField(...) - pattern-inside: | class $CLASS(...): ... - pattern-where-python: "'price' in vars['$FIELD'] or 'salary' in vars['$FIELD']" message: "use DecimalField for currency fields to avoid float-rounding errors" languages: [python] severity: ERROR
The above rule looks for use of Django’s
FloatField model when storing currency information.
FloatField can lead to rounding errors and should be avoided in favor of
DecimalField when dealing with currency. Here the
pattern-where-python operator allows us to utilize the Python
in statement to filter findings that look like currency.
Metavariable matching operates differently for logical AND (
patterns) and logical OR (
pattern-either) parent operators. Behavior is consistent across all child operators:
Metavariable values must be identical across sub-patterns when performing logical AND operations with the
rules: - id: function-args-to-open patterns: - pattern-inside: | def $F($X): ... - pattern: open($X) message: "Function argument passed to open() builtin" languages: [python] severity: ERROR
This rule matches the following code:
def foo(path): open(path)
The example rule doesn’t match this code:
def foo(path): open(something_else)
Metavariable matching does not affect the matching of logical OR operations with the
rules: - id: insecure-function-call pattern-either: - pattern: insecure_func1($X) - pattern: insecure_func2($X) message: "Insecure function use" languages: [python] severity: ERROR
The above rule matches both examples below:
Metavariable matching still affects subsequent logical ORs if the parent is a logical AND.
patterns: - pattern-inside: | def $F($X): ... - pattern-either: - pattern: bar($X) - pattern: baz($X)
The above rule matches both examples below:
def foo(something): bar(something)
def foo(something): baz(something)
The example rule doesn’t match this code:
def foo(something): bar(something_else)
Enable/disable the following matching features:
|Constant propagation, including intra-procedural flow-sensitive constant propagation.|
|Matching modulo associativity and commutativity, we treat Boolean AND/OR as associative, and bitwise AND/OR/XOR as both associative and commutative.|
|Treat Boolean AND/OR as commutative even if not semantically accurate.|
The full list of available options can be consulted here. Note that options not included in the table above are considered experimental, and they may change or be removed without notice.
fix top-level key allows for simple autofixing of a pattern by suggesting an autofix for each match. Run
--autofix to apply the changes to the files.
rules: - id: use-dict-get patterns: - pattern: $DICT[$KEY] fix: $DICT.get($KEY) message: "Use `.get()` method to avoid a KeyNotFound error" languages: [python] severity: ERROR
To note extra information on a rule, such as a related CVE or the name of the security engineer who wrote the rule, use the
rules: - id: eqeq-is-bad patterns: - [...] message: "useless comparison operation `$X == $X` or `$X != $X`" metadata: cve: CVE-2077-1234 discovered-by: Ikwa L'equale
The metadata will also be shown in Semgrep’s output if you’re running it with
To ignore a specific rule on specific files, set the
paths: key with one or more filters.
rules: - id: eqeq-is-bad pattern: $X == $X paths: exclude: - "*.jinja2" - "*_test.go" - "project/tests" - project/static/*.js
When invoked with
semgrep -f rule.yaml project/, the above rule will run on files inside
project/, but no results will be returned for:
- any file with a
- any file whose name ends in
_test.go, such as
- any file inside
project/testsor its subdirectories
- any file matching the
The glob syntax is from Python's
pathlib and is used to match against the given file and all its parent directories.
Conversely, to run a rule only on specific files, set a
paths: key with one or more of these filters:
rules: - id: eqeq-is-bad pattern: $X == $X paths: include: - "*_test.go" - "project/server" - "project/schemata" - "project/static/*.js"
When invoked with
semgrep -f rule.yaml project/, this rule will run on files inside
project/, but results will be returned only for:
- files whose name ends in
_test.go, such as
- files inside
project/schemata, or their subdirectories
- files matching the
When mixing inclusion and exclusion filters, the exclusion ones take precedence.
paths: include: "project/schemata" exclude: "*_internal.py"
The above rule returns results from
project/schemata/scan.py but not from
This section contains more complex rules that perform advanced code searching.
rules: - id: eqeq-is-bad patterns: - pattern-not-inside: | def __eq__(...): ... - pattern-not-inside: assert(...) - pattern-not-inside: assertTrue(...) - pattern-not-inside: assertFalse(...) - pattern-either: - pattern: $X == $X - pattern: $X != $X - patterns: - pattern-inside: | def __init__(...): ... - pattern: self.$X == self.$X - pattern-not: 1 == 1 message: "useless comparison operation `$X == $X` or `$X != $X`"
The above rule makes use of many operators. It uses
pattern-inside to carefully consider different cases, and uses
pattern-not to whitelist certain useless comparisons.
Find what you needed in this doc? Join the Slack group to ask the maintainers and the community if you need help.