Rule syntax
Getting started with rule writing? Try the Semgrep Tutorial ๐
This document describes the YAML rule syntax of Semgrep.
Schemaโ
Requiredโ
All required fields must be present at the top-level of a rule, immediately under the rules
key.
Field | Type | Description |
---|---|---|
id | string | Unique, descriptive identifier, for example: no-unused-variable |
message | string | Message that includes why Semgrep matched this pattern and how to remediate it. See also Rule messages. |
severity | string | One of the following values: Low , Medium , High , Critical . The severity key specifies how critical are the issues that a rule potentially detects. Note: Semgrep Supply Chain differs, as its rules use CVE assignments for severity. For more information, see Filters section in Semgrep Supply Chain documentation. |
languages | array | See language extensions and tags |
pattern * | string | Find code matching this expression |
patterns * | array | Logical AND of multiple patterns |
pattern-either * | array | Logical OR of multiple patterns |
pattern-regex * | string | Find code matching this PCRE2-compatible pattern in multiline mode |
Only one of the following is required: pattern
, patterns
, pattern-either
, pattern-regex
Language extensions and languages key valuesโ
The following table includes languages supported by Semgrep, accepted file extensions for test files that accompany rules, and valid values that Semgrep rules require in the languages
key.
Language | Extensions | languages key values |
---|---|---|
Apex (only in Semgrep Pro Engine) | .cls | apex |
Bash | .bash , .sh | bash , sh |
C | .c | c |
Cairo | .cairo | cairo |
Clojure | .clj , .cljs , .cljc , .edn | clojure |
C++ | .cc , .cpp | cpp , c++ |
C# | .cs | csharp , c# |
Dart | .dart | dart |
Dockerfile | .dockerfile , .Dockerfile | dockerfile , docker |
Elixir | .ex , .exs | ex , elixir |
Generic | generic | |
Go | .go | go , golang |
HTML | .htm , .html | html |
Java | .java | java |
JavaScript | .js , .jsx | js , javascript |
JSON | .json , .ipynb | json |
Jsonnet | .jsonnet , .libsonnet | jsonnet |
JSX | .js , .jsx | js , javascript |
Julia | .jl | julia |
Kotlin | .kt , .kts , .ktm | kt , kotlin |
Lisp | .lisp , .cl , .el | lisp |
Lua | .lua | lua |
OCaml | .ml , .mli | ocaml |
PHP | .php , .tpl | php |
Python | .py , .pyi | python , python2 , python3 , py |
R | .r , .R | r |
Ruby | .rb | ruby |
Rust | .rs | rust |
Scala | .scala | scala |
Scheme | .scm , .ss | scheme |
Solidity | .sol | solidity , sol |
Swift | .swift | swift |
Terraform | .tf , .hcl | tf , hcl , terraform |
TypeScript | .ts , .tsx | ts , typescript |
YAML | .yml , .yaml | yaml |
XML | .xml | xml |
To see the maturity level of each supported language, see the following references:
Optionalโ
Field | Type | Description |
---|---|---|
options | object | Options object to enable/disable certain matching features |
fix | object | Simple search-and-replace autofix functionality |
metadata | object | Arbitrary user-provided data; attach data to rules without affecting Semgrep behavior |
min-version | string | Minimum Semgrep version compatible with this rule |
max-version | string | Maximum Semgrep version compatible with this rule |
paths | object | Paths to include or exclude when running this rule |
The below optional fields must reside underneath a patterns
or pattern-either
field.
Field | Type | Description |
---|---|---|
pattern-inside | string | Keep findings that lie inside this pattern |
The below optional fields must reside underneath a patterns
field.
Field | Type | Description |
---|---|---|
metavariable-regex | map | Search metavariables for Python re compatible expressions; regex matching is left anchored |
metavariable-pattern | map | Matches metavariables with a pattern formula |
metavariable-comparison | map | Compare metavariables against basic Python expressions |
metavariable-name | map | Matches metavariables against constraints on what they name |
pattern-not | string | Logical NOT - remove findings matching this expression |
pattern-not-inside | string | Keep findings that do not lie inside this pattern |
pattern-not-regex | string | Filter results using a PCRE2-compatible pattern in multiline mode |
Operatorsโ
pattern
โ
The pattern
operator looks for code matching its expression. This can be basic expressions like $X == $X
or unwanted function calls like hashlib.md5(...)
.
rules:
- id: md5-usage
languages:
- python
message: Found md5 usage
pattern: hashlib.md5(...)
severity: ERROR
The pattern immediately above matches the following:
import hashlib
# ruleid: md5-usage
digest = hashlib.md5(b"test")
# ok: md5-usage
digest = hashlib.sha256(b"test")
patterns
โ
The patterns
operator performs a logical AND operation on one or more child patterns. This is useful for chaining multiple patterns together that all must be true.
rules:
- id: unverified-db-query
patterns:
- pattern: db_query(...)
- pattern-not: db_query(..., verify=True, ...)
message: Found unverified db query
severity: ERROR
languages:
- python
The pattern immediately above matches the following:
# ruleid: unverified-db-query
db_query("SELECT * FROM ...")
# ok: unverified-db-query
db_query("SELECT * FROM ...", verify=True, env="prod")
patterns
operator evaluation strategyโ
Note that the order in which the child patterns are declared in a patterns
operator has no effect on the final result. A patterns
operator is always evaluated in the same way:
- Semgrep evaluates all positive patterns, that is
pattern-inside
s,pattern
s,pattern-regex
es, andpattern-either
s. Each range matched by each one of these patterns is intersected with the ranges matched by the other operators. The result is a set of positive ranges. The positive ranges carry metavariable bindings. For example, in one range$X
can be bound to the function callfoo()
, and in another range$X
can be bound to the expressiona + b
. - Semgrep evaluates all negative patterns, that is
pattern-not-inside
s,pattern-not
s, andpattern-not-regex
es. This gives a set of negative ranges which are used to filter the positive ranges. This results in a strict subset of the positive ranges computed in the previous step. - Semgrep evaluates all conditionals, that is
metavariable-regex
es,metavariable-pattern
s andmetavariable-comparison
s. These conditional operators can only examine the metavariables bound in the positive ranges in step 1, that passed through the filter of negative patterns in step 2. Note that metavariables bound by negative patterns are not available here. - Semgrep applies all
focus-metavariable
s, by computing the intersection of each positive range with the range of the metavariable on which we want to focus. Again, the only metavariables available to focus on are those bound by positive patterns.
pattern-either
โ
The pattern-either
operator performs a logical OR operation on one or more child patterns. This is useful for chaining multiple patterns together where any may be true.
rules:
- id: insecure-crypto-usage
pattern-either:
- pattern: hashlib.sha1(...)
- pattern: hashlib.md5(...)
message: Found insecure crypto usage
languages:
- python
severity: ERROR
The pattern immediately above matches the following:
import hashlib
# ruleid: insecure-crypto-usage
digest = hashlib.md5(b"test")
# ruleid: insecure-crypto-usage
digest = hashlib.sha1(b"test")
# ok: insecure-crypto-usage
digest = hashlib.sha256(b"test")
This rule looks for usage of the Python standard library functions hashlib.md5
or hashlib.sha1
. Depending on their usage, these hashing functions are considered insecure.
pattern-regex
โ
The pattern-regex
operator searches files for substrings matching the given PCRE2 pattern. This is useful for migrating existing regular expression code search functionality to Semgrep. Perl-Compatible Regular Expressions (PCRE) is a full-featured regex library that is widely compatible with Perl, but also with the respective regex libraries of Python, JavaScript, Go, Ruby, and Java. Patterns are compiled in multiline mode, for example ^
and $
matches at the beginning and end of lines respectively in addition to the beginning and end of input.
PCRE2 supports some Unicode character properties, but not some Perl properties. For example, \p{Egyptian_Hieroglyphs}
is supported but \p{InMusicalSymbols}
isn't.
Example: pattern-regex
combined with other pattern operatorsโ
rules:
- id: boto-client-ip
patterns:
- pattern-inside: boto3.client(host="...")
- pattern-regex: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
message: boto client using IP address
languages:
- python
severity: ERROR
The pattern immediately above matches the following:
import boto3
# ruleid: boto-client-ip
client = boto3.client(host="192.168.1.200")
# ok: boto-client-ip
client = boto3.client(host="dev.internal.example.com")
Example: pattern-regex
used as a standalone, top-level operatorโ
rules:
- id: legacy-eval-search
pattern-regex: eval\(
message: Insecure code execution
languages:
- javascript
severity: ERROR
The pattern immediately above matches the following:
# ruleid: legacy-eval-search
eval('var a = 5')
Single ('
) and double ("
) quotes behave differently in YAML syntax. Single quotes are typically preferred when using backslashes (\
) with pattern-regex
.
Note that you may bind a section of a regular expression to a metavariable, by using named capturing groups. In this case, the name of the capturing group must be a valid metavariable name.
rules:
- id: my_pattern_id-copy
patterns:
- pattern-regex: a(?P<FIRST>.*)b(?P<SECOND>.*)
message: Semgrep found a match, with $FIRST and $SECOND
languages:
- regex
severity: WARNING
The pattern immediately above matches the following:
acbd