Join mode overview
Join mode runs several Semgrep rules at once and only returns results if certain conditions on the results are met. Semgrep Community Edition (CE) is good for finding code patterns with an easy syntax, but its search is typically limited to single files. Join mode is an experimental mode that lets you cross file boundaries, allowing you to write rules for whole code bases instead of individual files. As the name implies, this was inspired by join clauses in SQL queries.
Think of join mode like this: distinct Semgrep rules are used to gather information about a code base. Then, the conditions you define are used to select specific results from these rules, and the selected results are reported by Semgrep. You can join results on metavariable contents or on the result's file path.
You can also use cross-file (interfile) analysis. For more information, see Perform cross-file analysis.
Example
Here’s an example join mode rule that detects a cross-site scripting (XSS) vulnerability with high precision.
rules:
- id: flask-likely-xss
mode: join
join:
refs:
- rule: flask-user-input.yaml
as: user-input
- rule: unescaped-template-extension.yaml
as: unescaped-extensions
- rule: any-template-var.yaml
renames:
- from: '$...EXPR'
to: '$VAR'
as: template-vars
on:
- 'user-input.$VAR == unescaped-extensions.$VALUE'
- 'unescaped-extensions.$VAR == template-vars.$VAR'
- 'unescaped-extensions.$PATH > template-vars.path'
message: |
Detected a XSS vulnerability: '$VAR' is rendered
unsafely in '$PATH'.
severity: ERROR
Let's explore how this works. First, some background on the vulnerability. Second, we'll walk through the join mode rule.
Vulnerability background
In Flask, templates are only HTML-escaped if the template file ends with the .html
extension. Therefore, detecting these two conditions present in a Flask application is a high indicator of
- User input directly enters a template without the
.html
extension - The user input is directly rendered in the template
Join mode rule explanation
Now, let's turn these conditions into the join mode rule. We need to find three code patterns:
- User input
- Templates without the
.html
extension - Variables rendered in a template
We can write individual Semgrep rules for each of these code patterns.
rules:
- id: flask-user-input
languages: [python]
severity: INFO
message: $VAR
pattern: '$VAR = flask.request.$SOMETHING.get(...)'
rules:
- id: unescaped-template-extension
message: |
Flask does not automatically escape Jinja templates unless they have
.html as an extension. This could lead to XSS attacks.
patterns:
- pattern: flask.render_template("$PATH", ..., $VAR=$VALUE, ...)
- metavariable-pattern:
metavariable: $PATH
language: generic
patterns:
- pattern-not-regex: .*\.html$
languages: [python]
severity: WARNING
rules:
- id: any-template-var
languages: [generic]
severity: INFO
message: '$...EXPR'
pattern: '{{ $...EXPR }}'
Finally, we want to "join" the results from these together. Below are the join conditions, in plain language.
- The variable
$VAR
fromflask-user-input
has the same content as the value$VALUE
fromunescaped-template-extension
- The keyword argument
$VAR
fromunescaped-template-extension
has the same content as$...EXPR
fromany-template-var
- The template file name
$PATH
fromunescaped-template-extension
is a substring of the file path of a result fromany-template-var
We can translate these roughly into the following condition statements.
- 'user-input.$VAR == unescaped-extensions.$VALUE'
- 'unescaped-extensions.$VAR == template-vars.$VAR'
- 'unescaped-extensions.$PATH > template-vars.path'
Combining the three code pattern Semgrep rules and the three conditions gives us the join rule at the top of this section. This rule matches the code displayed below.
> semgrep -f flask-likely-xss.yaml
running 1 rules...
running 3 rules...
ran 3 rules on 16 files: 14 findings
matching...
matching done.
./templates/launch.htm.j2
severity:error rule:flask-likely-xss: Detected a XSS vulnerability: '$VAR' is rendered unsafely in '$PATH'.
9: <li>person_name_full is <b>{{ person_name_full }}</b></li>
Helpers
For convenience, when writing a join mode rule, you can use the renames
and as
keys.
The renames
key lets you rename metavariables from one rule to something else in your conditions. This is necessary for named expressions, e.g., $...EXPR
.
The as
key behaves similarly to AS
clauses in SQL. This lets you rename the result set for use in the conditions. If the as
key is not specified, the result set uses the rule ID.
Syntax
join
The join
key is required when in join mode. This is just a top-level key that groups the join rule parts together.
Inline rule example
The following rule attempts to detect cross-site scripting in a Flask application by checking whether a template variable is rendered unsafely through Python code.
rules:
- id: flask-likely-xss
mode: join
join:
rules:
- id: user-input
pattern: |
$VAR = flask.request.$SOMETHING.get(...)
languages: [python]
- id: unescaped-extensions
languages: [python]
patterns:
- pattern: |
flask.render_template("$TEMPLATE", ..., $KWARG=$VAR, ...)
- metavariable-pattern:
metavariable: $TEMPLATE
language: generic
patterns:
- pattern-not-regex: .*\.html$
- id: template-vars
languages: [generic]
pattern: |
{{ $VAR }}
on:
- 'user-input.$VAR == unescaped-extensions.$VAR'
- 'unescaped-extensions.$KWARG == template-vars.$VAR'
- 'unescaped-extensions.$TEMPLATE < template-vars.path'
message: |
Detected a XSS vulnerability: '$VAR' is rendered
unsafely in '$TEMPLATE'.
severity: ERROR
The required fields under the rules
key are the following:
id
languages
- A set of
pattern
clauses.
The optional fields under the rules
key are the following:
message
severity
Refer to the metavariables captured by the rule in the on
conditions by the rule id
. For inline rules, aliases do not work.
refs
Short for references, refs
is a list of external rules that make up your code patterns. Each entry in refs
is an object with the required key rule
and optional keys renames
and as
.
rule
Used with refs
, rule
points to an external rule location to use in this join rule. Even though Semgrep rule files can typically contain multiple rules under the rules
key, join mode only uses the first rule in the provided file.
Anything that works with semgrep --config <here>
also works as the value for rule
.
renames
An optional key for an object in refs
, renames
renames the metavariables from the associated rule
. The value of renames
is a list of objects whose keys are from
and to
. The from
key specifies the metavariable to rename, and the to
key specifies the new name of the metavariable.
Renaming is necessary for named expressions, e.g., $...EXPR
.