Skip to main content

Contributing rules

Publish rules in the open-source Semgrep Registry and share them with the Semgrep community to help others benefit from your rule-writing efforts and contribute to the field of software security. There are two ways in which you can contribute rules to the Semgrep Registry:

For users of Semgrep App
Contribute rules to the Semgrep Registry through Semgrep App. This workflow is recommended. See Contributing through Semgrep App (recommended). This workflow creates the necessary pull request for you and streamlines the whole process.
For contributors to the repository through GitHub
Contribute rules to the Semgrep Registry through a pull request. See the Contributing through GitHub section for detailed information.

To contribute and publish rules to the Semgrep Registry through Semgrep App, follow these steps:

  1. Go to Playground.
  2. Click Create New Rule.
  3. Choose one of the following:
    • Create a new rule and test code by clicking plus icon, and then click Save. Note: The test file must contain at least one true positive and one true negative test case to be approved. See the Tests section of this document for more information.
    • In the Library panel, select a rule from a category in Semgrep Registry. Click Fork, modify the rule or test code, and then click Save.
  4. Click Share.
  5. Click Publish to Registry.
  6. Fill in the required and optional fields.
  7. Click Continue, and then click Create PR.

This workflow automatically creates a pull request in the GitHub Semgrep Registry. Find more about the Semgrep Registry by reading the Rule writing and Tests sections.

You can also publish rules as private rules outside of Semgrep Registry. These rules are not included in the Semgrep Registry, but they are accessible to your Semgrep organisation. See the Private rules documentation for more information.

Contributing through GitHubโ€‹

Fork our repository and make a pull request. Sign our Contributor License Agreement (CLA) on GitHub before r2c can accept your contributions. Make a pull request to the Semgrep Registry with two files:

  1. The semgrep pattern (as YAML file).
  2. The test file (with the file extension of the language or framework). The test file must contain at least one true positive and one true negative test case to be approved. See the Tests section of this document for more information.

Pull requests require the approval of at least one maintainer and successfully passed CI jobs.

Find more about the Semgrep Registry by reading the Rule writing and Tests sections.

Writing a rule for Semgrep Registryโ€‹

The following sections document necessary fields in rule files of Semgrep Registry, provide information about rule messages, inform about test files, mention rule quality checkers, and describe additional fields required by rules in the security category.

General rule requirementsโ€‹

All rules in general, regardless of whether they are intended only as local rules or for Semgrep Registry, have the same initial requirements. The following table is also included in the Rule Syntax article.

All required fields must be present at the top-level of a rule, immediately under the rules key.

FieldTypeDescription
idstringUnique, descriptive identifier, for example: no-unused-variable
messagestringMessage that includes why Semgrep matched this pattern and how to remediate it. See also Rule messages.
severitystringOne of: INFO, WARNING, or ERROR
languagesarraySee language extensions and tags
pattern*stringFind code matching this expression
patterns*arrayLogical AND of multiple patterns
pattern-either*arrayLogical OR of multiple patterns
pattern-regex*stringFind code matching this PCRE-compatible pattern in multiline mode
info

Only one of the following is required: pattern, patterns, pattern-either, pattern-regex

Every rule also requires a test file in the language that the rule is targeting. See Tests for more details.

Semgrep registry rule requirementsโ€‹

In addition to the fields mentioned above, rules submitted to Semgrep Registry have additional required fields:

FieldDescriptionPossible valuesExample
metadataAll rules require technology, category, and references. The category: security has more requirements. See Including fields required by security category.Required by all Semgrep Registry rules:
  • references
  • category
  • technology
metadata:
cwe:
- "CWE-94: (...)"
category: security
technology:
- unicode
references:
- https://trojansource.codes/
Additionally required by category: security:
  • cwe
  • confidence
  • subcategory
  • likelihood
  • impact
  • subcategory
technologyNested under the metadata field. Additional information about the technology. This helps to specify rulesets in Semgrep Registry.
  • django
  • docker
  • express
  • kubernetes
  • nginx
  • react
  • terraform
  • --no-technology--
metadata:
technology:
- react
categoryNested under the metadata field. If you use catagory security, include additional metadata. See Including fields required by security category.
  • best-practice
  • correctness
  • maintainability
  • performance
  • portability
  • security
category: security
referencesAdditional information that gives more context to the user of the rule. This helps developers understand the issue and how to fix it.No finite value. Any additional information that gives more context.
references:
- https://cheatsheetseries.owasp.
org/cheatsheets/DOM_based_XSS_
Prevention_Cheat_Sheet.html
info

If you use catagory security, include additional metadata. See Including fields required by security category.

Understanding rule namespacingโ€‹

The namespacing format for contributing rules in the Semgrep Registry is <language>/<framework>/<category>/$MORE. If the rule does not belong to a particular framework, add it to the language directory, which uses the word lang in place of the <framework> - <language>/<lang>.

Testsโ€‹

Include a test file in the language that your rule is targeting. A test file includes the following:

  • At least one test where the rule detects a finding. This is called a true positive finding.
  • At least one test where the rule does not detect a finding. This is called a true negative finding.

Test file names must match the rule file name, except for the file extension. For example, if the rule is in my-rule.yaml, the test file name must be my-rule.js. Use any valid extension for the target language.

Requirements of test files
  • In the test file, include examples that mark:
    • What is expected to be a finding.
    • What is not a finding.
  • The test file name must match the rule file name, except for the file extension.

See the examples of the rule and test file below:

Rule file:

rules:
- id: my-rule
pattern: var $X = "...";
โ€ฆ

In the test file, mark an expected finding with a comment tag, and mention ruleid of your rule in the comment before the expected finding. Also, mark the code that is expected not to be a finding with a comment stating ok and add the ruleid also. See the example below:

// ruleid: my-rule
var strdata = "hello";
// ok: my-rule
var numdata = 1;

For more information, visit Testing rules.

Rule messagesโ€‹

Include a rule message that provides details about the matched pattern and informs about how to mitigate any related issues. Provide the following information in a rule message:

  1. Description of the pattern. For example: missing parameter, dangerous flag, out-of-order function calls.
  2. Description of why this pattern was detected. For example: logic bug, introduces a security vulnerability, bad practice.
  3. An alternative that resolves the issue. For example: Use another function, validate data first, and discard the dangerous flag.

Use the YAML multiline string operator >- when rule messages span multiple lines. This presents the best-looking rule message on the command line without having to worry about line wrapping or escaping the quote or using the backslash.

For an example of a good rule message, see: this rule for Django's mark_safe.

Rule message example

mark_safe() is used to mark a string as safe for HTML output. This disables escaping and may expose the content to XSS attacks. Instead, use django.utils.html.format_html() to build HTML for rendering.

Rule quality checkerโ€‹

When you contribute rules to the Semgrep Registry, our quality checkers (linters) evaluate if the rule conforms to r2c standards. The semgrep-rule-lints job runs linters on a new rule to check for mistakes, performance problems, and best practices for submitting to the Semgrep Registry. To improve your rule writing, use Semgrep itself to scan semgrep-rules.

Including fields required by security categoryโ€‹

Rules in category security in the Semgrep Registry require specific metadata fields that ensure consistency across the ecosystem in both Semgrep App and Semgrep CLI. Nest these metadata under the metadata field.

If your rule has a category: security, the following metadata are required:

Required metadata fieldValuesExample use
cweA Comment Weakness Enumeration (CWE).
cwe: "CWE-502: Deserialization of Untrusted Data"
confidenceHIGH, MEDIUM, LOW
confidence: MEDIUM
likelihoodHIGH, MEDIUM, LOW
likelihood: MEDIUM
impactHIGH, MEDIUM, LOW
impact: HIGH
subcategoryvuln, audit, guardrail
subcategory:
- vuln

These fields help you to find rules in different categories such as:

  • High confidence security rules for CI pipelines.
  • OWASP Top 10 or CWE Top 25 rulesets.
  • Technology. For example, react so it is easy to find Reac rulesets.
  • Audit rules with lower confidence are intended for code auditors.

Examples of rules with a full list of required metadata:

note

Details of each field mentioned above are provided in the subsections below with examples.

CWEโ€‹

Include the appropriate Comment Weakness Enumeration (CWE). CWE can explain what vulnerability your rule is trying to find. Examples:

If you write an SQL Injection rule, use the following:

cwe:                
- "CWE-89: Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')"

If you write an XSS rule, use the following:

cwe: 
- "CWE-79: Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')"

Confidenceโ€‹

Indicate confidence of the rule to detect true positives. See the possible options below:

  • HIGH - Security concern, with high true positives. Useful in CI/CD pipelines.
  • MEDIUM - Security concern, but some false positives. Useful in CI/CD pipelines.
  • LOW - Expect a fair amount of false positives, similar to audit style rules. These rules can detect many false positives.
HIGHโ€‹

HIGH confidence rules can use Semgrep advanced features such as metavariable-comparison or taint mode, to detect true positives. See examples below:

confidence: HIGH
MEDIUMโ€‹

MEDIUM confidence rules can use Semgrep advanced features such as metavariable-comparison or taint mode, but with some false positives. See examples below:

confidence: MEDIUM
LOWโ€‹

Low confidence rules generally find something which appears to be dangerous while reporting a lot of false positives. See examples below:

confidence: LOW

Likelihoodโ€‹

Specify how likely it is that an attacker can exploit the issue that has been found. The possible values are LOW, MEDIUM, HIGH.

HIGHโ€‹

HIGH likelihood rules specify a very high concern that the vulnerability can be exploited. Examples:

likelihood: HIGH
MEDIUMโ€‹

MEDIUM likelihood rules detect a vulnerability in most circumstances. Although it can be hard for an attacker to exploit them. Also, these rules can detect part of a problem, but not the whole issue. Examples:

likelihood: MEDIUM
LOWโ€‹

LOW likelihood rules tend to find something dangerous, but are not evaluating whether something is truly vulnerable, for example:

likelihood: LOW

Impactโ€‹

Indicate how much damage can a vulnerability cause. Use LOW, MEDIUM, and HIGH.

HIGHโ€‹

HIGH impact rules can detect extremely damaging vulnerabilities, such as injection vulnerabilities. Examples:

impact: HIGH
MEDIUMโ€‹

MEDIUM impact rules are issues that are less likely to lead to full system compromise but still are fairly damaging. Examples:

impact: MEDIUM
LOWโ€‹

LOW impact rules are rules that leverage a security issue, but the impact is not too damaging to the application if discovered.

impact: LOW 

Subcategoryโ€‹

Include a subcategory to explain what is the type of the rule. See the subsections below for more details.

vulnโ€‹

A vulnerability rule is something that developers certainly want to resolve. For example, an SQL Injection rule that uses taint mode. Example:

subcategory:
- vuln
auditโ€‹

An audit rule is useful for code auditors. For example, an SQL rule which finds all uses of the database.exec(...) that can be problematic. Example:

subcategory:          
- audit
guardrailโ€‹

A guardrail rule is useful for companies writing custom rules. For example, finding all usages to non-standard XML parsing libraries within the company. The rule can also bring a message that a developer can use only a company-approved library.

subcategory:
- guardrail

Technologyโ€‹

Technology helps to define specific rulesets for languages, libraries, and frameworks that are available in Semgrep Registry, for example express will be included in the p/express rulepack.

technology:
- express

Referencesโ€‹

References help provide more context to a developer on what the issue is, and how to remediate the vulnerability, see examples below:


Find what you needed in this doc? Join the Slack group to ask the maintainers and the community if you need help.