Skip to main content

Frequently asked questions

General

How are Semgrep and its rules licensed?

Semgrep OSS Engine

The Semgrep Engine is open source, licensed under LGPL 2.1 - you can use it at work, on private and proprietary code, no problem!

Semgrep offers three paid products:

  • Semgrep Code, a static application security testing (SAST) tool that can perform taint, cross-file, and cross-function analysis.
  • Semgrep Supply Chain, which performs dependency scanning.
  • Semgrep Secrets, which can detect and validate leaked secrets in code.

Semgrep Registry

The Semgrep Registry contains rules from different contributors.

Semgrep Community rules written by the Semgrep team are licensed under the Commons Clause. The source for these Registry rules is available at semgrep/semgrep-rules.

Those rules licensed under the Commons Clause license cannot be resold without Semgrep, Inc. (“Semgrep”)’s permission. Since Semgrep offers a paid, hosted application, it’s important to have this restriction so other companies, like major cloud providers, cannot resell Semgrep’s rules as a competing service.

In addition to Semgrep Community rules, Semgrep Code includes Pro rules which are proprietary and only available to paying customers.

Is it OK to run Semgrep or Semgrep, Inc. rules on my work projects?

Yes! Semgrep is safe to run on your private code. The Semgrep Registry license’s commercial restrictions only come into effect if you are selling a product using rules provided in the Semgrep Registry. If that’s the case, contact partners@semgrep.com for a license.

I’m a security professional. Do I have to pay for Semgrep?

If you are a security consultant and you want to use Semgrep OSS Engine with the Semgrep Community Rules as part of your assessments, that’s great and you don’t have to pay. Feel free to refer your clients to our Semgrep product suite.

If your service delivers code scanning, meaning a service that includes static application security testing (SAST), software composition analysis (SCA), or secrets scanning, and you want to charge for scanning that includes rules in the semgrep-rules repository, you must purchase a license.

If you want to use Semgrep Code, including its proprietary cross-file (interfile) analysis, Semgrep Supply Chain (SCA), or Semgrep Secrets rules as part of your consulting services, you need a license. Please contact us at sales@semgrep.com.

Can I ship my own code analysis software that uses Semgrep?

Yes, you can use the Semgrep OSS Engine in your own code analysis software, subject to the terms of the LGPL 2.1 license (among other things, you must open source any modification you make to it). If you are writing your own, original rules for your scanner, there are no further restrictions. But your rules cannot be derived from Semgrep Community Rules or Semgrep Pro Rules (see below).

The Semgrep Community Rules are licensed under the Commons Clause. You can use the Semgrep community rules as long as you are shipping a free and open source software (FOSS) product. You have to open source any modifications you have done to the rules.

You cannot ship the Semgrep Community or Pro rules in a commercial product without a license from Semgrep, Inc. For more information, please contact partners@semgrep.com.

Contacting Semgrep support

All users can contact Semgrep support. Regardless if you are a free tier or paid tier user, reach our support through the Semgrep Community Slack. Paying Semgrep Team tier customers receive 8*5 email and Slack support with committed SLAs. See Support for more details.

Embedding the Playground in my website or blog post

Embed a special version of Semgrep Playground with an iframe. The source is https://semgrep.dev/embed/editor?snippet=<snippet-id> where the snippet-id is either the short identifier generated when you share a Playground link (this usually looks like DzKv) or the named identifier from a saved rule (this usually looks like username:rule-name).

<iframe title="Semgrep example no prints" src="https://semgrep.dev/embed/editor?snippet=KPzL" width="100%" height="432" frameborder="0"></iframe>

How does Semgrep go "beyond regex"?

Semgrep is semantic grep for code: it understands the structure of code and builds a syntax tree to search for matches. Where grep "2" only matches the exact string 2, Semgrep matches other equivalent forms, such as x = 1; y = x + 1 when searching for 2. Semgrep's pattern syntax provides specific mechanisms to fine-tune matches, such as the ellipsis operator and metavariables.

See the following rule for a more complex example illustrating Semgrep features:

  • It uses typed metavariables so it can specify the type http.Request.
  • In the sink, the rule tracks imports down to function usage.
  • In the sanitizer, it removes type aware Booleans and a string convert function.
  • It leverages regex only to reduce how many patterns to write for finding dangerous functions.

Does Semgrep support all versions of a language?

See Support for all versions of a programming language.

Comparisons

How is Semgrep different from $OTHER_TOOL or $GENERIC_SAST?

Semgrep is an open source tool with a simple syntax for writing rules: if you can write code, you can write a Semgrep rule—no program analysis Ph. D. required!

To the Semgrep team's knowledge, the only other tool with the explicit goal of allowing custom rules is GitHub’s proprietary tool, CodeQL. CodeQL has a domain-specific language that is extremely powerful but is designed for those with significant program analysis expertise, whereas Semgrep is designed for the security engineer or developer who wants to automate code review. Our goal is to make writing a Semgrep rule as easy as copying the code you want to find—and letting the Semgrep engine make the rule and autofix high-quality enough to run in CI or your text editor or IDE.

Semgrep AppSec Platform provides a Team tier that is free for up to 10 contributors on private repositories. It offers a hosted CI integration with a quick setup so you can start running Semgrep right away.

Semgrep's diff-awareness lets you scan new code and doesn’t force you to fix all the existing issues when you first start. For users running inside organizations with many repositories, the hosted offering also offers a policy and notification system that makes it easy to tune Semgrep so that it only reports issues or suggests fixes that get applied.

Our goal is a 99% fix rate for what Semgrep reports.

Besides the ease of writing new rules, what else is different about Semgrep?

Speedy and offline: Semgrep runs offline on every keystroke

If you are shipping code daily a code analysis tool that takes a week to run is not helpful. We think modern static analysis tools should run on every keystroke in the editor, without needing network access. Semgrep runs at approximately 20K-100K loc/sec per rule but our goal is to be even faster.

Semantic: Semgrep is smart

Semgrep automatically handles the nuance of “there’s more than one way to do it”: you write your query and all equivalent variations of that code are automatically matched.

As Semgrep evolves, queries similar to foo("password") become smarter. In the original version of Semgrep, this query would only match the code foo("password"). But a few months after release Semgrep would match const x = "password"; foo(x).

Today Semgrep can do even more with intraprocedural dataflow analysis, and we’re working on adding more of these semantic features with every release.

Integrated: Semgrep understands Git

It’s easy to write a new Semgrep rule and have it only apply going forward. You can ignore findings of course, but we have built-in support for this with Semgrep AppSec Platform and various repository integrations.

Portable: If you write a Semgrep rule, it runs anywhere

Many other tools require a buildable environment or can only be run in a VM. Semgrep runs “on the metal” and has minimal dependencies around a statically linked core; our parsers are declaratively generated C libraries (we contribute to and use tree-sitter).

See the Semgrep philosophy for further reading.

Comparing Semgrep to linters

Similar to a linter, Semgrep can be run in your developer's IDE. Semgrep has three IDE extensions:

Linters use static analysis but typically have a narrower scope for analysis (most rules typically operate on a single line). Some linters also cover stylistic decisions (for example use of tabs versus spaces), but Semgrep doesn’t care about whitespace or formatting.

Semgrep’s registry includes rulesets inspired by the rules of many popular linters and checkers, including ESLint, RuboCop, Bandit, and FindSecBugs. But Semgrep also allows you to enable multiple rulesets at the same time without adding linter-specific artifacts or installation to your code repository.

Some popular linter tools may use tools like Semgrep as an internal engine, and we encourage this! For instance, the popular scanner NodeJSScan was re-written to use Semgrep as the core.

Lastly, while many linters are extensible, you need to learn specific abstract syntax tree (AST) based patterns for writing custom rules. Semgrep works across languages and you learn its syntax once; you don't have to mess with MemberExpressions, node visitors, and all that. Before Semgrep, many of us on the maintainer team were writing AST-based rules as well: one of us wrote an article comparing writing linter rules to Semgrep expressions.

Comparing Semgrep to CodeQL

Both Semgrep and CodeQL use static analysis to find bugs, but there are a few differences:

  • Semgrep operates directly on source code, whereas CodeQL requires a buildable environment.
  • Semgrep provides both proprietary and open source options that can be run anywhere; CodeQL is not open source and you must pay to run it on any non-open-source code.
  • Semgrep focuses on speed and ease of use. and doesn’t require compiled code.
    • Semgrep OSS engine provides intraprocedural dataflow. Semgrep Code's cross-file and cross-function analysis has similar capabilities as CodeQL in terms of cross-function dataflow analysis for a subset of supported languages.
  • Both have publicly available rules.
  • Semgrep rules look like the source code you’re writing; CodeQL has a separate domain-specific-language for writing queries.
  • Semgrep has an online, hosted free plan for up to ten contributors to private repositories; both have a hosted paid plan.

See the Semgrep development philosophy for more about what makes Semgrep different.

Comparing Semgrep to Endor Labs

Prioritization

Both Endor Labs and Semgrep support the prioritization of findings so that AppSec teams focus on the most impactful findings. While both companies offer findings filters based on criteria like reachability and EPSS scores, Semgrep offers support for statuses in addition to the basic reachability statuses of reachable and not reachable, such as always reachable and conditionally reachable.

Furthermore, Semgrep Assistant uses AI to help organization admins receive information on top backlog tasks, allowing them to prioritize findings from all products, including the SAST and SCA products, not just those resulting from dependency vulnerability scans.

Reachability for transitive dependencies

Reachability has been a fundamental part of Semgrep Supply Chain from the beginning. Supply Chain offers advanced reachability analysis for direct dependencies in the form of dataflow reachability, offering accuracy beyond that offered by Endor Labs. This coverage is offered for seven languages and counting.

Vulnerable functions

Semgrep doesn't just identify a vulnerability as reachable when a vulnerable function is called -- it also takes into account how the vulnerable function is called and what data flows into that function. These functions are achieved through the use of Semgrep's rule syntax; when a rule is written, all possible permutations of the vulnerability are encapsulated in the rule. This functionality is something that Endor Labs doesn't have.

Semgrep's security research team doesn't just focus on analyzing a vulnerable function when writing rules. The team extends the scope of analysis to all the third-party callers of the vulnerable functions, not just the reported third-party function that's vulnerable. This extends the set of vulnerable functions greatly. The following rule demonstrates this functionality:

---
rules:
- id: ssc-a462c702-1797-4f92-a577-2232cc25ab08
message: Affected versions of paddlepaddle are vulnerable to Improper Limitation
Of A Pathname To A Restricted Directory ('Path Traversal') in the
`download` and `_check_exists_and_download` of `paddle.dataset.common`.
severity: ERROR
metadata:
confidence: HIGH
category: security
cve: CVE-2024-0818
cwe:
- "CWE-22: Improper Limitation of a Pathname to a Restricted Directory
('Path Traversal')"
ghsa: GHSA-2rp8-hff9-c5wr
owasp:
- A01:2021 - Broken Access Control
- A05:2017 - Broken Access Control
- A06:2021 - Vulnerable and Outdated Components
publish-date: 2024-03-07T15:30:38Z
references:
- https://github.com/advisories/GHSA-2rp8-hff9-c5wr
- https://nvd.nist.gov/vuln/detail/CVE-2024-0818
sca-fix-versions: []
sca-kind: reachable
sca-schema: 20230302
sca-severity: CRITICAL
sca-vuln-database-identifier: CVE-2024-0818
technology:
- python
r2c-internal-project-depends-on:
depends-on-either:
- namespace: pypi
package: paddlepaddle
version: <=2.6.0
languages:
- python
patterns:
- pattern-either:
- pattern: paddle.dataset.common.download(...)
- pattern: paddle.dataset.common._check_exists_and_download(...)

The vulnerable function is download, as shown by the fix commit. The function _check_exists_and_download calls download, which you can see in the source code. Thus, both functions are flagged in the rule in the final three lines.

Learn more about how the security research team writes rules in A day in the life: Supply Chain Security Researcher

Policies and flexibility

Semgrep Supply Chain results in a failed CI job only when there are critical or high-severity findings. However, Semgrep supports notifications and integration with Jira to create tickets for all Supply Chain findings, and it offers the ability to only leave comments on PRs or block a change regarding license detection.

The policies for Semgrep's other products, Semgrep Code and Semgrep Secrets, provide extensive flexibility, especially with respect to a developer's workflow, by allowing results to appear:

  • Only in the AppSec team’s view (monitor mode)
  • In the AppSec team's view and in the developer’s workflow, while not failing the CI job (comment mode)
  • In the AppSec team's view and in the developer’s workflow, while also failing the CI job (block mode)

Dependency lifecycle management

To help you manage your findings, Semgrep provides information, including EPSS probabilities, severity levels, transitivity information, and multiple levels of dataflow reachability.

Accuracy of results

Semgrep has reachability analysis for over 80% of critical CVEs dating back to 2017 and 100% of critical and high severity CVEs dating back to May 2022. Endor Labs' reachability data, however, dates back to 2018.

Comparing Semgrep to Snyk

SAST

Both Semgrep and Snyk offer out-of-the-box SAST solutions. Semgrep makes it easier to customize the rules that run against your code. Because these rules are visible and customizable, you can analyze your results to see if the relevant vulnerabilities were caught.

In addition to selecting your rules, Semgrep allows you to write custom rules to capture use cases driven by your organization's goals. To help you write rules, Semgrep Editor provides a structure mode to guide you through the process, allows you to test your in-progress rules, and adds them to your organization’s Policies page. Semgrep offers rule-writing capabilities to all users, while Snyk limits it to Enterprise users.

Both Semgrep and Snyk offer remediation advice for findings identified during scans. Snyk displays its recommendations in its web app, in supported IDEs, and CLI, while Semgrep displays remediation advice and guidance in its web app, CLI, supported IDEs, and in the form of PR or MR comments.

Snyk and Semgrep both display prioritization metrics to help you decide which findings you should work on first. For SAST, Snyk encapsulates this information into a priority score, which provides you with information on the impact and actionability related to the finding. Semgrep, on the other hand, provides severity information, confidence in the rule to detect findings that are true positives, and likelihood that an attacker can exploit the issues found.

Additionally, Semgrep provides action recommendations through Assistant, which offers AI-powered security recommendations to help you review, triage, and remediate your Semgrep findings.

Snyk offers autofix capability for its SCA product, but not its SAST product. Semgrep offers autofix suggestions for SAST and SCA, where its rules contain suggested fixes to resolve findings. In the event of a true positive where the rule doesn't have a human-written autofix, Assistant can generate an autofix.

SCA

Snyk offers reachability analysis for Java, JavaScript, and TypeScript, while Semgrep offers reachability analysis for multiple languages, including Java, JavaScript, and Ruby

Snyk can detect whether dependencies are direct or transitive. However, this information is only available with Enterprise plans, and the information is limited to projects using Maven or Node.js, specifically npm and Yarn packages. Semgrep Supply Chain offers advanced reachability analysis for direct dependencies in the form of dataflow reachability. Semgrep offers this coverage for seven languages and counting.

Semgrep and Snyk both offer license compliance features, ensuring that the dependencies that your developers use meet the requirements set by your organization.

To help you manage your findings, Semgrep provides you with the findings' EPSS probabilities, severity levels and transitivity information. Snyk assesses impact and likelihood and encapsulates this information into a risk score.

Policies and rules management

Semgrep Code and Semgrep Secret's policies management feature provides extensive flexibility, especially with respect to a developer's workflow, by allowing results to appear:

  • Only in the AppSec team’s view (monitor mode)
  • In the AppSec team's view and in the developer’s workflow, while not failing the CI job (comment mode)
  • In the AppSec team's view and in the developer’s workflow, while also failing the CI job (block mode)

Semgrep Supply Chain results in a failed CI job only when there are critical or high-severity findings.

Secrets detection

Semgrep Secrets leverages semantic analysis, entropy analysis, and validation to accurately detect and fix secrets. Snyk maintains a business partnership with GitGuardian to offer secrets scanning as part of Snyk Code.

Comparing Semgrep to SonarQube

Both Semgrep and SonarQube use static analysis to find bugs, but there are a few differences:

  • Extending Semgrep with custom rules is simple since Semgrep rules look like the source code you’re writing. Writing custom rules with SonarQube is restricted to a handful of languages and requires familiarity with Java and abstract syntax trees (ASTs).
  • Semgrep supports user-defined autofixes; SonarQube does not.
  • Semgrep focuses on speed and ease-of-use, making analysis possible at up to 20K-100K loc/sec per rule. SonarQube authors report approximately 0.4K loc/sec for rulesets in production.
  • Semgrep supports scanning only changed files (differential analysis), SonarQube does not.
  • Both have publicly available rules
  • Semgrep has an online, hosted free plan for up to ten contributors to private repositories; both have a hosted paid plan.

See the Semgrep development philosophy for more about what makes Semgrep different

Privacy and Security

Where do you store data?

Semgrep, Inc uses Amazon Web Services (US region) for storing customer data.

How is data secured, including data-at-rest and data-in-transit?

All customer data is located in AWS (US region). Amazon RDS encrypted database instances use industry-standard AES-256 encryption and TLS 1.2 or higher is used for all data-in-transit.

Is private source code shared with Semgrep, Inc?

By default, Semgrep configurations run fully in your CI pipeline and your source code never leaves your environment. Only metadata related to Semgrep runs (see the following question) are sent to Semgrep's service.

If you choose to enable it, Semgrep Assistant requires code access. See the Privacy and legal considerations section to understand how your code is stored and retained.

What data is stored?

Semgrep sends data to Semgrep AppSec Platform in accordance with the metrics policy.

These types of data include scan data and findings data.

  • Scan data includes project name, CI environment, and scan meta-data.
  • Findings data are used to provide human-readable content for notifications and integrations, as well as tracking results as new, fixed, or duplicate.

For more information and a detailed description of each data field, refer to the relevant section in metrics.md.

What network requests are made?

Semgrep makes network requests in accordance with the data storage previously mentioned.

Semgrep makes the following network requests:

  • When running without --disable-version-check, Semgrep makes a network request to check for updates.
  • When providing a URL to --output, Semgrep performs an HTTP POST of the results to the specified URL.
  • When providing a registry ID like p/ci to --config, Semgrep requests the configuration from the Registry and may send metrics in accordance with the metrics policy.

Configuration

How do I configure Semgrep for different projects?

Semgrep AppSec Platform provides centralized policy management. See the Policies documentation for more details.

What is a policy?

A policy is a simple collection of rules and a definition of what to do with rule results: fail the Semgrep CI run and/or send non-blocking notifications to third-party services like Slack. Please see the Policies documentation for more details.

Monitoring

Do you have a visualization UI?

Semgrep Team users can create custom dashboards and visualizations. Semgrep also supports posting results through webhooks to any JSON endpoint, so you can easily integrate it with your favorite visualization tool.


Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.