Building a scalable, production-ready security program that detects issues in an application’s code can be incredibly challenging. Some challenges (as highlighted in Razorpay’s excellent blog post) include building support for all programming languages used by an organization, running scans quickly so that the developers are not blocked, and getting accurate results.
Semgrep (Semgrep OSS Engine) started as an open-source project with the goal of building a lightweight static analysis tool that developers would love. Some design principles that have guided Semgrep’s development are supporting many languages and being beginner-friendly, amongst others. Semgrep’s development accelerated in earnest in 2020, and it quickly became widely adopted in the security community. A few examples of adoption in enterprise applications are GitLab SAST (which uses Semgrep as an analyzer for many languages), HashiCorp (which uses Semgrep rules for Terraform provider for AWS), and Datadog (to detect malicious PyPI packages).
Over the years, the engineers at r2c have made significant efforts to improve the speed of the Semgrep OSS Engine. These efforts have varied from using native binaries instead of bytecode (which made Semgrep ~2x faster) to significant architectural changes (implementing all the pattern-composition logic in OCaml, passing all the rules at once to the engine) that enabled optimizations regarding how rules are evaluated.
The Semgrep OSS engine supports 25+ languages (enabling coverage for almost all languages used in production) and simplifies rule writing (enabling developers to write rules to catch vulnerabilities) among other features, setting the foundation for securing application code at scale. Semgrep App (r2c’s orchestration layer built on the Semgrep OSS Engine) enables easy rule management, triaging of findings, getting alerts about relevant findings, and supports various enterprise features (SSO, RBAC, etc.), thus making it possible for Semgrep to be deployed at scale.
The rest of the blog post details how some organizations use Semgrep at scale, in production.
(As a security-conscious company, we intentionally refrained from using actual organization names. For more information, please contact us.)
Code-scanning metrics in organizations
A production-ready code scanning solution should be able to scan and manage security findings from hundreds or thousands of code repositories. Features like the bulk onboarding for GitHub and simple and highly configurable CI steps make setting up Semgrep easy and efficient.
To build a scalable security program, the entire codebase should be scanned. The number of scans per week can demonstrate that pushed code is constantly scanned and vulnerabilities are detected. Fast scan times are critical to providing results sooner rather than later. Getting results sooner means developers don’t have to wait to see the vulnerabilities in their code and can quickly address them.
Figure 1: Semgrep’s scan time (in seconds) for different repositories
In most organizations, developer teams often grow more quickly than their security counterparts, making it hard for security teams to track and fix all security issues. Semgrep facilitates that relationship by being developer first and thus making it faster and easier to find and fix vulnerabilities. Features like triage in PR comment put the power in the developer’s already existing workflow so that they never have to leave their preferred development environment.
Figure 2: Triage by PR comment
Examples of metrics at scale
One of our customers, a Global 2000 financial services company, scans 1,300+ code repositories using Semgrep. Scans are run more than 3,000 times per week by 400+ developers contributing to the codebase. More so, the average scan time is under a minute - which is a huge advantage for companies that are looking to shift left. Developers see vulnerabilities in their code and can thus address them immediately, resulting in huge time savings.
Another organization, an online insurance marketplace, has tightly integrated Semgrep in 250+ code repositories and is used by 60+ developers to get results in under a minute in their pull request (PR) comments.
These are just two examples of organizations using Semgrep at scale. Semgrep also has an uptime of more than 99.9%, which is just one reason why organizations can rely on Semgrep in production. Talk to us if you’d like to learn more about deploying Semgrep at scale in your organization.
Ease of writing custom rules
Customizing security rules to a company’s environment and specific context is a much more efficient and cost-effective way of protecting software than a black-box solution that cannot be customized. The means by which Semgrep does this is via custom rules.
Custom rules, whether edited/forked from an existing rule or written as a completely original one, can augment a rule to be specific to that company's codebase. Whether it be finding a function written in a certain way or detecting a vulnerability that is written in a specific way, custom rules can be used for any specific check.
Our customers using Semgrep have created over 400 rules (on top of the 2,400+ rules that are open source and publicly available in the Semgrep Registry), enabling them to find vulnerabilities that are specific to their own environment and to their own specifications. Thus, Semgrep also enables security teams to build a scalable security program that is contextual to their organization.
Semgrep is now used as the de facto static analysis solution in many organizations. Security teams can catch security issues specific to their organization. With hundreds of thousands of lines of code being scanned with Semgrep daily, organizations are confident in deploying a scalable static analysis solution in production.
Join the Semgrep Community Slack to say “hi” or ask questions — there’s a friendly and active community ready to help!
Semgrep is a fast, open-source, code scanning tool for finding bugs, detecting dependency vulnerabilities, and enforcing code standards.