Developer-focused results and improved coverage with Semgrep Pro rules

Semgrep Pro rules are high confidence SAST rules that leverage the latest Semgrep features, and are designed to produce actionable results that can be surfaced directly to developers for vulnerability remediation.

Claudio Merloni
February 24th, 2023
Share

Background

Our open-source community has been instrumental in the impressive growth of Semgrep. With the contributions of the community, we now offer more than 2500 community rules in the Semgrep Registry, which are a valuable resource for security auditors. However, for security teams that want to develop a robust SAST program, higher coverage and confidence rules are necessary to deliver accurate findings to developers.

Last week we announced Semgrep Code, our code security product that combines the new Semgrep Pro Engine with Pro rules to provide developers with highly actionable security findings.


Getting started with Pro rules

Semgrep Pro rules are proprietary rules created by r2c’s Security Research team. The goal is to provide a set of supported “high confidence” rules with improved coverage (across languages and vulnerability types), leverage the latest Semgrep features, and focus on developers as the target audience, by providing highly accurate findings.

Using Semgrep Pro rules is as simple as using any other rules, as long as you have access to the Team Tier of Semgrep Code: just browse the Semgrep Registry and add them to your Rule Board!

Pro rules are identified by a 💎(diamond) icon, just like the rulesets that contain them. You can also use the “Visibility” filter to browse Pro rules only, by selecting “Team tier”.


Motivation for Pro rules

Semgrep offers powerful analysis features and rule syntax to detect code patterns that could lead to security vulnerabilities. However rule authors often have different use cases: security auditors who scour the codebase for anything suspicious, AppSec teams trying to enforce secure defaults and developers who want only the most high-confidence results. So far, most of the Community rules fall into the "cast a wide net" category.


Developer-focused high-confidence rules

The term "confidence" is used by r2c's Security Research team to indicate the likelihood that a rule will report actual security issues and whether a developer should address the finding.

Confidence can have one of three values: high, medium, and low. High confidence rules typically use features such as taint tracking analysis with sets of sources, sinks, propagators, and sanitizers curated by r2c’s Security Research team. Over time, we will keep translating our research work into new rules and patterns to increase coverage of languages, frameworks and classes of vulnerabilities. Moreover, the team goes the extra mile to craft patterns that will match as accurately as possible vulnerable source code.

At the other end of the spectrum, low confidence rules, while still important to a security auditor or simply to have a comprehensive view of potential security issues, will match the source code more loosely and typically produce a larger number of false positives.

By focusing mostly on high confidence, Pro rules allow organizations to enhance their CI/CD pipelines with actionable security findings and avoid lengthy triage sessions.


From Community to Pro rules

Let's look at an example of how advanced program analysis features in the Semgrep engine can create a more accurate rule.

For example, in the Community rules we can find the following rule that aims at finding SQL injection vulnerabilities in Go programs: go.lang.security.audit.sqli.pgx-sqli.pgx-sqli

As you can see by looking at the rule in Semgrep Registry, it’s made of a series of patterns that tries to match different properties of a potentially vulnerable piece of source code.

A section of the rule focuses on matching a database connection:

- pattern-either:
  - pattern-inside: |
      $DB, ... = pgx.Connect(...)
      ...
  - pattern-inside: |
      $DB, ... = pgx.NewConnPool(...)
      ...
  - pattern-inside: |
      $DB, ... = pgx.ConnectConfig(...)
      ...
  - pattern-inside: |
      func $FUNCNAME(..., $DB *pgx.Conn, ...) {
        ...
      }

Another section of the rule enumerates various ways a dangerous SQL query could be created and used:

- pattern-either:
  - patterns:
      - pattern: $DB.$METHOD(...,$QUERY,...)
      - pattern-either:
          - pattern-inside: |
              $QUERY = $X + $Y
              ...
          - pattern-inside: |
              $QUERY += $X
              ...
          - pattern-inside: |
              $QUERY = fmt.Sprintf("...", $PARAM1, ...)
              ...
      - pattern-not-inside: |
          $QUERY += "..."
          ...
      - pattern-not-inside: |
          $QUERY = "..." + "..."
          ...
  - pattern: $DB.$METHOD(..., $X + $Y, ...)
  - pattern: $DB.$METHOD(..., fmt.Sprintf("...", $PARAM1, ...), ...)

And so on.

This allows the rule to be quite generic and match a very large number of potentially vulnerable pieces of code. Something that a security auditor performing a code review would be happy to triage. This rule however doesn’t take into account whether the SQL query is built using user-controlled input. As a result it will likely flag some queries that are not (yet) vulnerable to SQL injection, making it less suitable for a developer looking for actionable findings.

Pro rules leverage Semgrep features such as typed metavariables, dataflow analysis and taint labels. This allows r2c’s Security Research team to write very accurate rules, by drilling down on specific combinations of patterns that make a vulnerability possible.

To improve over the “audit style” rule described above, we can for instance identify certain sources of user input (simplified for the purpose of this article):

- patterns:
  - pattern-inside: |
      import "net/http"
      ...
  - pattern-either:
      - pattern: |
          ($REQ : http.Request).$FIELD
      - pattern: |
          ($REQ : *http.Request).$FIELD
  - metavariable-regex:
      metavariable: $FIELD
      regex: ^(Body|Cookie|Form)$
label: USERINPUT

Then reason about the database library we want to target:

- patterns:
    - pattern-inside: |
        import "$IMPORT"
        ...
    - metavariable-regex:
        metavariable: $IMPORT
        regex: (.*jackc\/pgx\/v(4|5).*)
  label: IMPORTPGX

And finally put everything together in the dataflow analysis sinks:

- requires: IMPORTPGX and USERINPUT
  patterns:
    - pattern: |
        ($DB : $CONNTYPE). ... .$METHOD($CTX, $QUERY, ...)
    - metavariable-regex:
        metavariable: $CONNTYPE
        regex: ^((\*)?pgx.(Conn))$
    - metavariable-regex:
        metavariable: $METHOD
        regex: ^(Exec|Query|QueryFunc|QueryRow)$
    - focus-metavariable: $QUERY

An example of a Semgrep Pro rule that builds on these concepts, and takes them even beyond, is go.net.sql.pgx-sqli-taint.pgx-sqli-taint (access to Semgrep Code’s Team tier is required to view the rule’s full content).


Available Pro rules

The set of Semgrep Pro rules available today cover a wide range of vulnerabilities:

  • Hard-coded secrets: leveraging Semgrep’s advanced features like taint mode, which enable great accuracy, r2c’s Security Research team has created more than 110 rules to find hard-coded secrets in Java, JavaScript, TypeScript, Python, C#, Swift, and Ruby. The target of these rules at the moment is primarily database and network libraries, and they support more than 40 database and network APIs across the above-mentioned languages. This is in addition to the Community rules geared toward more generic secret scanning. For more details, please check out the secrets ruleset in the Registry.

  • XXE: r2c’s Security Researchers dove deep into XML External Entity vulnerabilities in Java, providing a novel insight into how developers can detect them and fix their code. For more details, please read our blog post on our research. We support the most common Java libraries and classes and can help identify the many different ways they can be insecurely configured and used.

  • Deserialization: Pro rules have almost 70 rules with a focus on Python and Java, supporting 14 Python libraries/frameworks and 3 commonly used Java libraries, both standalone or in combination with Java Servlets and the Spring Framework. This complements the already available Community rules.

  • Injection vulnerabilities, such as SQLi, XSS, and many more across languages such as Java, JavaScript, Go, and PHP.

  • Dom-based XSS detection for the Angular, React, and Next.js frameworks.

  • Extensive support for frameworks and technologies such as Java Servlets, Spring, ExpressJs, Laravel, and Go net/http.

Please see the Rule updates for an overview of updates and improvements released for Semgrep’s rules, including Pro rules.

For more information, please see the documentation or contact us to get started with Semgrep Code!

About

Semgrep lets security teams partner with developers and shift left organically, without introducing friction. Semgrep gives security teams confidence that they are only surfacing true, actionable issues to developers, and makes it easy for developers to fix these issues in their existing environments.