Testing rules
Semgrep provides a convenient testing mechanism for your rules. You can simply write code and provide a few annotations to let Semgrep know where you are or aren't expecting findings. Semgrep provides the following annotations:
ruleid: <rule-id>
, for protecting against false negativesok: <rule-id>
for protecting against false positivestodoruleid: <rule-id>
for future "positive" rule improvementstodook: <rule-id>
for future "negative" rule improvements
Other than annotations there are three things to remember when creating tests:
- The
--test
flag tells Semgrep to run tests in the specified directory. - Annotations are specified as a comment above the offending line.
- Semgrep looks for tests based on the rule filename and the languages
specified in the rule. In other words,
path/to/rule.yaml
will look forpath/to/rule.py
,path/to/rule.js
, etc., based on the languages specified in the rule.
info
The .test.yaml
file extension can also be used for test files. This is necessary when testing YAML language rules.
Exampleโ
Consider the following rule:
rules:
- id: insecure-eval-use
patterns:
- pattern: eval(...)
- pattern-not: eval("...")
message: Calling 'eval' with user input
languages: [python]
severity: WARNING
Given the above is named rules/detect-eval.yaml
, you can create rules/detect-eval.py
:
from lib import get_user_input, safe_get_user_input
user_input = get_user_input()
# ruleid: insecure-eval-use
eval(user_input)
# ok: insecure-eval-use
eval('print("Hardcoded eval")')
totally_safe_eval = eval
# todoruleid: insecure-eval-use
totally_safe_eval(user_input)
# todook: insecure-eval-use
eval(safe_get_user_input())
Run the tests with the following:
python -m semgrep --quiet --test rules/
Which will produce the following output:
1 yaml files tested
check id scoring:
--------------------------------------------------------------------------------
(TODO: 2) rules/detect-eval.yaml
โ insecure-eval-use TP: 1 TN: 2 FP: 1 FN: 1
test: rules/detect-eval.py, expected lines: [5, 12], reported lines: [5, 15]
--------------------------------------------------------------------------------
final confusion matrix: TP: 1 TN: 2 FP: 1 FN: 1
--------------------------------------------------------------------------------
- True positives (
TP
) correspond toruleid
- True negatives (
TN
) correspond took
- False positives (
FP
) correspond totodook
- False negatives (
FN
) correspond totodoruleid
To avoid failing on TODOs you can specify --test-ignore-todo
:
python -m semgrep --quiet --test --test-ignore-todo rules/
This will produce the following output:
1 yaml files tested
check id scoring:
--------------------------------------------------------------------------------
(TODO: 2) rules/detect-eval.yaml
โ insecure-eval-use TP: 1 TN: 1 FP: 0 FN: 0
--------------------------------------------------------------------------------
final confusion matrix: TP: 1 TN: 1 FP: 0 FN: 0
--------------------------------------------------------------------------------
To store rules and test targets in different directories you can specify --config
:
tree tests
will produce the following output:
tests
โโโ rules
โย ย โโโ python
โย ย โโโ test.yaml
โโโ targets
โโโ python
โโโ test.py
4 directories, 2 files
python -m semgrep --quiet --test --config /tmp/tests/rules/ /tmp/tests/targets/
will produce the following output:
1 yaml files tested
check id scoring:
--------------------------------------------------------------------------------
(TODO: 0) /tmp/tests/rules/python/test.yaml
โ eqeq-is-bad TP: 1 TN: 0 FP: 0 FN: 0
--------------------------------------------------------------------------------
final confusion matrix: TP: 1 TN: 0 FP: 0 FN: 0
--------------------------------------------------------------------------------
The subdirectory structure of these two directories must be the same for Semgrep to correctly find the associated files.
Validating rulesโ
At r2c, we believe in checking the code we write, and that includes rules.
You can run semgrep --validate --config [file]
to check the given config. This will run a combination of Semgrep rules and OCaml checks against your rules to search for things like duplicate patterns and missing fields. All rules submitted to the semgrep-rules repository are validated.
The semgrep rules are pulled from p/semgrep-rule-lints
.
This feature is still experimental and under active development. Feedback is welcome!
Find what you needed in this doc? Join the Slack group to ask the maintainers and the community if you need help.