Semgrep Spring 2022 meetup recap

A look into the information shared at Semgrep's Spring 2022 meetup

Emily Fortuna
April 6th, 2022
Share

We were thrilled to meet so many folks, from as far away as Germany and Sweden to as near as *ahem* San Francisco, at our Spring 2022 Meetup! Check out the recording below:

If you prefer to read a summary, you’re in the right place.

If you’re new to Semgrep, welcome!

Semgrep is a lightweight tool for security engineers and developers to catch security (and other!) bugs before shipping your code. Scans are lightning-fast. It’s like grep, but so much more than just regular expression matching. Semgrep understands your language structure and control flow to catch far more subtle bugs – and it supports over 25 languages/frameworks and counting!

Security trends with Clint Gibler

Clint Gibler gave a survey of trends in terms of how security and engineering teams are organized and how companies approach security. As companies have moved to shorter release cycles, the need for writing code that’s secure by default becomes ever more important. An overly strict security team can quickly become the development team’s enemy. Companies have found more success recently by building teams of security engineers who are code-savvy and build secure code writing systems into the development process itself.

Shifts in tech and security

Rather than targeting one-off vulnerabilities, skilled security engineers target classes of bugs that can be caught automatically with the right tools. This is where the last key pillar of modern security comes in: having a good developer experience. The path of secure code development should have an even better developer experience than without. As Avi Douglen points out,

“Security at the expense of usability comes at the expense of security.”

Clint regularly shares his perspective on scaling security, secure defaults, and the latest in security research in a weekly security newsletter: tldrsec.com.

Semgrep feature deep dives

Autofix and Developer Feedback

Part of a way into a developer’s heart is to write their code for them! While we’re not quite there yet, Semgrep’s autofix feature allows you to write rules that catch potential security vulnerabilities and then perform the fix for them with the click of a button.

Semgrep autofix in PRs

Raghav Jain demoed this whole workflow, as well as the new Developer Feedback feature which allows developers to annotate whether a rule is helpful or not. In a first for Static Application Security Testing tools, this feature allows developers to directly communicate their feedback on findings to security teams. Security teams can then iteratively fine-tune their configurations to ensure a positive developer experience and rule authors can improve rule quality based on real-world feedback. This feedback loop also creates powerful network effects for the Semgrep Community as a whole. Better rules help everyone using Semgrep!

Developer feedback on Semgrep rules

Dataflow analysis deep dive: symbolic propagation and taint mode

Iago Abal explained two powerful ways that Semgrep uses data-flow analysis to help write simple rules that catch complex security vulnerabilities: symbolic propagation and taint mode.

Symbolic propagation works by feeding Semgrep's matching engine semantic information. Take the following example:

The rule return 42 will successfully find a match in the code snippet:

1    x = 42
2    def test():
3        return x

Thanks to constant propagation, Semgrep’s matching engine can determine that x in line 3 has the value 42. But, perhaps unexpectedly for many, the rule $X.foo().bar() will NOT find a match in the following code snippet using constant propagation alone:

def test(obj):
    x = obj.foo()
    x.bar()

When clearly the intent of the rule is to catch that second line of the test function. Semgrep’s symbolic propagation is a new experimental feature released in version 0.78.0 (mid-January), that is a generalization of constant propagation, and it allows you to catch exactly these (and potentially much more complex) scenarios. Rewriting a rule to use symbolic propagation can sometimes provide as much as a 10x reduction in the number of lines in your rule patterns (*cough* looking at you, r/python.django.security.injection.open-redirect.open-redirect)! For more info on how to use symbolic propagation, check out our blog post on the topic.

Another way Semgrep uses data-flow analysis is with taint mode. Taint mode rules track the flow of data from source to sink through all potential execution paths. Suppose we have the potentially unsafe input request.cookies.get(‘user’) and we want to ensure that it doesn’t reach pickle.loads(obj). In the following function (written non-Pythonically to illustrate variable data flow):

1    def foo():
2        user_input = request.cookies.get(‘user’)
3        decoded = b64decode(user_input)
4        obj = “b’” + decoded + “‘“
5        return “Hey there!”.format(pickle.loads(obj))

It’s easy to write a rule using Semgrep’s metavariables and ellipses that catches small variations of this very specific example. But if line 2 is replaced with the following lines:

if (True):
    user_input = request.cookies.get(‘user’)
else:
    user_input = ‘hello’

To handle this case as well we add pattern-either to our rule, and suddenly our rule has nearly doubled in size. The if/else branch is one example, but in a complex codebase there might be many different control-flow-based ways that the unsafe input can reach pickle.loads. Suddenly our Semgrep rule starts looking a bit like the Epic of Gilgamesh.

That’s where taint mode comes to the rescue. You simply define a source (request.cookies.get(...)) and a sink (pickle.loads(...)), and Semgrep does the rest!

Taint mode in Semgrep App

Milan Williams walked through the process of using the new editor to fork a rule, edit that rule, test it, and then add it to your rule board.

Taint mode in Semgrep Editor

Milan Williams also demonstrated rules using taint mode run in the editor as well.

Semgrep community presentations

Two community members shared some neat tools they’ve built for and with Semgrep.

Lewis Ardern demoed a convenient VS Code extension he wrote for reducing typing while writing Semgrep rules. The VS Code plugin provides Semgrep rule snippets: pre-populating the rule template, saving keystrokes, preventing typos, and rounding out the pattern logic syntax that might not have been top-of-mind for the rule-writer. In addition, the extension creates syntactically correct programming language templates where you can create your test rules.

Natan Yellin demoed WhyProfiler, a CPU profiler for Python and Jupyter notebooks powered by Semgrep to target slow Jupyter notebook code and automatically fix it to be more performant! WhyProfiler combines the static analysis power of Semgrep with a dynamic profiler to prioritize points in your code that are the biggest contributors to performance issues, along with suggested fixes. Creative developers take note: static analysis with Semgrep holds a lot of untapped potential for cool new development tools when combined with tools such as a dynamic profiler!

Semgrep roadmap

Isaac Evans highlighted a few additional new Semgrep features and talked about r2c’s plans for the future. We here at r2c want Semgrep to support every major programming language! Scala and Kotlin are headed to general availability (GA) and support for Dockerfiles is now in Beta. Swift, SQL, and work on C/C++ are in the queue for promotion from experimental to Beta as well. Afterwards Isaac discussed project-depends-on, an early-stage initiative to look for vulnerable dependencies in 3rd-party code using Semgrep.

Performance & developer-first

Another key focus is the AppSec developer experience using Semgrep. Semgrep now scans 2-3x faster in version 0.82 compared to 0.71, and startup time dropped to 100ms (from >500ms) for a single file.

Semgrep scan time on Python projects

Future improvements include an offline local cache of rules and support for popular IDEs.

r2c strives to make it simple to write your first Semgrep rule and the power to catch subtle security bugs while minimizing false positives. Isaac highlighted the release of recursive join mode, metavariable-pattern (scanning code of one language embedded in another, such as JS in HTML), and metachecker (auto-suggesting more precise and performant rules). Upcoming features include: focus-metavariable (improving debugging around why a finding appeared), composable rules with keys like patterns-from, and deeper rule registry integration from the command line.

Semgrep App roadmap

Recent improvements include the new command semgrep --config=auto, which recommends rules based on the code and frameworks you’re using. We’ve built a workflow that allows AppSec Engineers to gradually introduce warnings – from “audit,” to “comment,” and then “block.”

Rule board columns

However, we plan to make the first run experience even better! We’re also working on better surfacing information to debug rules, including context traces for taint mode.

The event was hosted by yours truly, Emily Fortuna. I just joined r2c as a Developer Advocate, and I very much look forward to building out r2c’s educational content, clarifying the ins and outs of rule writing, and making your code more secure. Prior to r2c I worked at Google for many years on several teams, including as a software engineer building a compiler and the programming language Dart. As a Developer Advocate at Google I worked on the Flutter team. I’m comparatively new to the security world but I’ve been pleasantly surprised at the amount of overlap between static analysis of code for security and specialized type inference <slides nerd glasses up bridge of nose>. Developer Advocacy is a two-way street, so if you’re running into challenges working with Semgrep and don’t know where to turn, I want to hear about them. Looking forward to seeing you at our next community meetup!

About

Semgrep lets security teams partner with developers and shift left organically, without introducing friction. Semgrep gives security teams confidence that they are only surfacing true, actionable issues to developers, and makes it easy for developers to fix these issues in their existing environments.