This document describes experimental features and how to try them. Have fun, file bugs, tweak the code, and most importantly share your thoughts!
Hands down the best way to enforce a code standard is to just automatically fix it. Semgrep's rule format supports a
fix: key that supports metavariable replacement, much like message fields. This allows for value capture and rewriting.
The autofix can be applied directly to the file using the
--autofix flag, or you can use both the
--dryrun flags to test the autofix.
Example autofix (see in Playground here):
rules:- id: use-sys-exit languages: - python message: | Use `sys.exit` over the python shell `exit` built-in. `exit` is a helper for the interactive shell and is not be available on all Python implementations. https://stackoverflow.com/a/6501134 pattern: exit($X) fix: sys.exit($X) severity: WARNING
A variant on the experimental
fix key is
fix-regex, which applies regular expression replacements (think
sed) to matches found by Semgrep.
fix-regex has two required fields:
regexspecifies the regular expression to replace within the match found by Semgrep
replacementspecifies what to replace the regular expression with.
fix-regex also takes an optional
count field, which specifies how many occurrences of
regex to replace with
replacement, from left-to-right and top-to-bottom. By default,
fix-regex will replace all occurrences of
regex does not match anything, no replacements are made.
The replacement behavior is identical to the
re.sub function in Python. See these Python docs for more information.
An example rule with
fix-regex is shown below.
regex uses a capture group to greedily capture everything up to the final parenthesis in the match found by Semgrep.
replacement replaces this with everything in the capture group (
\1), a comma,
timeout=30, and a closing parenthesis. Effectively, this adds
timeout=30 to the end of every match.
rules:- id: python.requests.best-practice.use-timeout.use-timeout patterns: - pattern-not: requests.$W(..., timeout=$N, ...) - pattern-not: requests.$W(..., **$KWARGS) - pattern-either: - pattern: requests.request(...) - pattern: requests.get(...) - pattern: requests.post(...) - pattern: requests.put(...) - pattern: requests.delete(...) - pattern: requests.head(...) - pattern: requests.patch(...) fix-regex: regex: '(.*)\)' replacement: '\1, timeout=30)' message: | 'requests' calls default to waiting until the connection is closed. This means a 'requests' call without a timeout will hang the program if a response is never received. Consider setting a timeout for all 'requests'. languages: [python] severity: WARNING
Equivalences enable defining equivalent code patterns (i.e. a commutative property:
$X + $Y <==> $Y + $X). Equivalence rules use the
equivalences top-level key and one
equivalence key for each equivalence.
Semgrep can perform intra-procedural flow-sensitive analyses. The data-flow engine still has several limitations, therefore expect both false positives and false negatives. False positives could be removed by using pattern-not.
A non-exhaustive list of current limitations:
- The analyses are not aware of aliasing.
- The analyses do not track individual elements in data structures, although there is limited support for record fields.
switchstatements are not properly handled yet.
try-catch-finallyis only partially supported, not all possible execution paths are considered.
Semgrep supports intra-file taint tracking. Taint tracking rules must specify
mode: taint. Additionally, the following operators are enabled:
Semgrep supports intra-procedural constant propagation. This tracks whether a variable must carry a constant value at each point in the program.