Skip to main content

Pattern syntax (experimental)

Patterns are the expressions Semgrep uses to match code when it scans for vulnerabilities. This article describes the new syntax for Semgrep pattern operators. See Pattern syntax for information on the existing pattern syntax.

There is often a one-to-one translation from the existing syntax to the experimental syntax. These changes are marked with . However, some changes are quite different. These changes are marked with

danger
  • These patterns are experimental and subject to change.
  • You can't mix and match existing pattern syntax with the experimental syntax.

pattern

The pattern operator looks for code matching its expression in the existing syntax. However, pattern is no longer required when using the experimental syntax. For example, you can use ... wherever pattern: "...``` appears. For example, you can omit pattern` and write the following:

any:
- "badthing1"
- "badthing2"
- "badthing3"

or, for multi-line patterns

any:
- |
manylines(
badthinghere($A)
)
- |
orshort()

You don't need double quotes for a single-line pattern when omitting the pattern key, but note that this can cause YAML parsing issues.

As an example, the following YAML parses:

any:
- "def foo(): ..."

This, however, causes problems since : is also used to denote a YAML dictionary:

any:
- def foo(): ...

any

Replaces pattern-either. Matches any of the patterns specified.

any:
- <pat1>
- <pat2>
...
- <patn>

all

Replaces patterns. Matches all of the patterns specified.

all:
- <pat1>
- <pat2>
...
- <patn>

inside

Replaces pattern-inside. Match any of the sub-patterns inside of the primary pattern.

inside:
any:
- <pat1>
- <pat2>

Alternatively:

any:
- inside: <pat1>
- inside: <pat2>

not

Replaces pattern-not. Accepts any pattern and does not match on those patterns.

not:
any:
- <pat1>
- <pat2>

Alternatively:

all:
- not: <pat1>
- not: <pat2>

regex

Replaces pattern-regex Matches based on the regex provided.

regex: "(.*)"

Metavariables

Metavariables are an abstraction to match code when you don't know the value or contents beforehand. They're similar to capture groups in regular expressions and can track values across a specific code scope. This includes variables, functions, arguments, classes, object methods, imports, exceptions, and more.

Metavariables begin with a $ and can only contain uppercase characters, _, or digits. Names like $x or $some_value are invalid. Examples of valid metavariables include $X, $WIDGET, or $USERS_2.

where

Unlike Semgrep's existing pattern syntax, the following operators no longer occur under pattern or all:

  • metavariable-pattern
  • metavariable-regex
  • metavariable-comparison
  • metavariable-analysis
  • focus-metavariable

These operators must occur within a where clause.

A where clause is required in a pattern where you're using metavariable operators. It indicates that Semgrep should match based on the pattern if all the conditions are true.

As an example, take a look at the following example:

all:
- inside: |
def $FUNC(...):
...
- |
eval($X)
where:
- <condition>

Because the where clause is on the same indentation level as all, Semgrep understands that everything under where must be paired with the entire all pattern. As such, the results of the ranges matched by the all pattern are modified by the where pattern, and the output includes some final set of ranges that are matched.

metavariable

Replaces:

This operator looks inside the metavariable for a match.

...
where:
- metavariable: $A
regex: "(.*)
- metavariable: $B
patterns: |
- "foo($C)"
- metavariable: $D
analyzer: entropy

comparison

Replaces metavariable-comparison. Compares metavariables against a basic Python comparison expression.

...
where:
- comparison: $A == $B

focus

Replaces focus-metavariable. Puts focus on the code region matched by a single metavariable or a list of metavariables.

...
where:
- focus: $A

Syntax search mode

New syntax search mode rules must be nested underneath a top-level match key. For example:

rules:
- id: find-bad-stuff
severity: ERROR
languages: [python]
message: |
Don't put bad stuff!
match:
any:
- |
eval(input())
- all:
- inside: |
def $FUNC(..., $X, ...):
...
- |
eval($X)

Taint mode

The new syntax supports taint mode, and such roles no longer require mode: taint in the rule. Instead, everything must be nested under a top-level taint key.

rules:
- id: find-bad-stuff
severity: ERROR
languages: [python]
message: |
Don't put bad stuff!
taint:
sources:
- input()
sinks:
- eval(...)
propagators:
- pattern: |
$X = $Y
from: $Y
to: $X
sanitizers:
- magiccleanfunction(...)

Taint mode key names

The key names for the new syntax taint rules are as follows:

  • pattern-sources --> sources
  • pattern-sinks --> sinks
  • pattern-propagators --> propagators
  • pattern-sanitizers --> sanitizers