tl;dr: Symbolic propagation is a new experimental feature, a generalization of constant propagation, that enables Semgrep to perform matching modulo variable assignments. You can then write simple patterns like $OBJ.foo().bar()
and they will match equivalent code even if there are intermediate variables. For example, you can do this!
Let's say that you want to find all places where a function returns with code 42
. With Semgrep's pattern language, you would simply write the pattern return 42
, isn't that simple? But, would such a simple pattern match the following code?
C = 42
def test():
return C
Yes it does! Thanks to constant propagation our simple pattern can match code that is equivalent to return 42
. Wonderful.
Now, let me propose another exercise, let’s write a rule that matches a chain of method calls, first foo()
then bar()
, on some arbitrary object. Most of us would start with a pattern such as $OBJ.foo().bar()
. But, would such a simple rule also match the following code?
def test(obj):
x = obj.foo()
# ruleid: test
x.bar()
Unfortunately, this time, the rule does not match. If you have some experience writing Semgrep rules, you may have found yourself in this situation before. Intermediate assignments can make writing Semgrep rules somewhat tricky. It is not really difficult, but it is a bit cumbersome:
pattern-either:
- pattern: $OBJ.foo().bar()
- pattern: |
$VAR = $OBJ.foo();
...
$VAR.bar()
We need to enumerate the different shapes of the target code depending on the use of intermediate variables. If our base rule were more complex, then doing this can result in rather complex rules like r/python.django.security.injection.open-redirect.open-redirect, with large pattern-either
s that look like this:
- pattern: |
$DATA = request.$W.get(...)
...
$INTERM = $DATA
...
django.shortcuts.redirect(..., $INTERM, ...)
Wouldn't it be wonderful if our simple pattern $OBJ.foo().bar()
could match equivalent code, regardless of any intermediate assignments, just the same way as we do with constants? Why cannot Semgrep "just know" that x
is equals to obj.foo()
? I have good news for you, since version 0.78.0, Semgrep can actually do this! Simply set symbolic_propagation: true
using rule options:
et voilà: