best practices

Keep your rules simple with symbolic propagation

Symbolic propagation is an experimental feature that enables Semgrep to perform matching modulo variable assignments, so you can keep rules simple but powerful.
Iago Abal
Iago Abal
February 07, 2022

tl;dr: Symbolic propagation is a new experimental feature, a generalization of constant propagation, that enables Semgrep to perform matching modulo variable assignments. You can then write simple patterns like $ and they will match equivalent code even if there are intermediate variables. For example, you can do this!

Let's say that you want to find all places where a function returns with code 42. With Semgrep's pattern language, you would simply write the pattern return 42, isn't that simple? But, would such a simple pattern match the following code?

1C = 42
3def test():
4  return C

Yes it does! Thanks to constant propagation our simple pattern can match code that is equivalent to return 42. Wonderful.

Now, let me propose another exercise, let’s write a rule that matches a chain of method calls, first foo() then bar(), on some arbitrary object. Most of us would start with a pattern such as $ But, would such a simple rule also match the following code?

1def test(obj):
2    x =
3    # ruleid: test

Unfortunately, this time, the rule does not match. If you have some experience writing Semgrep rules, you may have found yourself in this situation before. Intermediate assignments can make writing Semgrep rules somewhat tricky. It is not really difficult, but it is a bit cumbersome:

2  - pattern: $
3  - pattern: |
4      $VAR = $;
5      ...
6      $

We need to enumerate the different shapes of the target code depending on the use of intermediate variables. If our base rule were more complex, then doing this can result in rather complex rules like r/, with large pattern-eithers that look like this:

1- pattern: |
2    $DATA = request.$W.get(...)
3    ...
4    $INTERM = $DATA
5    ...
6    django.shortcuts.redirect(..., $INTERM, ...)

Wouldn't it be wonderful if our simple pattern $ could match equivalent code, regardless of any intermediate assignments, just the same way as we do with constants? Why cannot Semgrep "just know" that x is equals to I have good news for you, since version 0.78.0, Semgrep can actually do this! Simply set symbolic_propagation: true using rule options: et voilà:

Isn't that beautiful?

This feature, that we coined symbolic propagation, is a generalization of constant propagation. It tracks simple assignments and enables Semgrep’s matching engine to automatically unfold a definition when needed.

Please give it a try and let us know what you think. We hope that you will find this new feature useful for writing simple yet powerful Semgrep rules. For example, what would it take to write a Semgrep rule that does this without symbolic propagation? See also how succinctly you can now write rule r/ by combining metavariable-pattern and symbolic propagation:!

For now, this is still an experimental feature, but we are very excited about the power it provides, and we want to promote it to be the default as soon as possible.



Semgrep Logo

Semgrep is a fast, open-source, code scanning tool for finding bugs, detecting dependency vulnerabilities, and enforcing code standards.

Code scanning at ludicrous speed

Find bugs and enforce code standards