Semgrep is a lightweight static analysis tool for many languages. It can find bug variants with patterns that look like source code.
As you think about contributing to the Semgrep CLI, consider these design principles that have guided Semgrep's development so far:
“If a developer has to convince their manager to spend a few million dollars on advanced security tools each time they change jobs, the future is bleak.” — see our introductory blog post for more. It’s important to us (and the community) that r2c is able to develop a sustainable business around Semgrep to support its development, but we strongly believe the tooling itself must always be free.
Semgrep is LGPL and powered not just by r2c but also by community of brilliant external contributors. We welcome feedback and contributions and strive to be a welcoming community for new developers.
High sloc/sec scanning speed and low startup cost. We’ll never be as fast as ripgrep but we want to get as close as we can.
Code never leaves your machine
Semgrep by default runs entirely locally (unless you set it up yourself in a server/client mode). Code never leaves your machine to be analyzed.
Support every programming language
“If grep supports it, we will too!” This even includes those that aren’t thought of as programming languages, like Bash or Docker.
Semgrep is small (<100MB), has minimal runtime dependencies, and should be easily installable via your programming language or operating system package manager.
You shouldn’t need a PhD in program analysis, or even to understand what an AST is, in order to be effective with Semgrep. A novice programmer should be able to write their first Semgrep rule in 60 seconds.
Rules should look like code and be easy to read and reason about — hopefully easier than if they were written in grep or a native linter.
Self-contained rule files
You shouldn’t need an additional plugin, dependency, or internet access to run a YAML rule. It should just work.
Deterministic (implies reproducible, idempotent)
Given the same input, Semgrep gives the same output.
Semgrep can run without internet access so developers can write code from airplanes or beaches.
Rules are safe to run no matter where they came from
Rules shouldn’t have the capability to run arbitrary code on your system, only to act as a function that produces a deterministic output message. We may let the user explicitly violate this trust boundary through flags like —dangerously-run-rules.
To stay fast and limit complexity, we draw a line at crossing file boundaries during analysis. We lose the ability to detect certain complex interprocedural issues, but that’s an explicit tradeoff we make.
Our goal is to catch what a senior engineer would catch in code review: Semgrep isn’t designed to find a crazy issue that’s 300 calls from start to finish and evaded the team for 20 years. Instead, it’s designed for enforcing best-practices and automating the code review tasks that an excellent senior engineer would be capable of.
As a corollary: if you design your codebase so that code in a file is safe today, it's still safe after a colleague makes a change twenty function calls away in another file.
Designed to run while code is being written
Semgrep is optimized for running in the IDE, git commit hooks, or CI—not for at the tail-end of a release process.
A platform for program analysis
We will expose stable internals so that researchers and engineers can develop novel program analysis work off of APIs like Semgrep’s generic AST.