Skip to main content

    Contributing rules

    Publish rules in the open source Semgrep Registry and share them with the Semgrep community to help others benefit from your rule-writing efforts and contribute to the field of software security. There are two ways in which you can contribute rules to the Semgrep Registry:

    For users of Semgrep AppSec Platform
    Contribute rules to the Semgrep Registry through Semgrep AppSec Platform. This workflow is recommended. See Contributing through Semgrep AppSec Platform (recommended). This workflow creates the necessary pull request for you and streamlines the whole process.
    For contributors to the repository through GitHub
    Contribute rules to the Semgrep Registry through a pull request. See the Contributing through GitHub section for detailed information.

    To contribute and publish rules to the Semgrep Registry through Semgrep AppSec Platform, follow these steps:

    1. Go to Playground.
    2. Click Create New Rule.
    3. Choose one of the following:
      • Create a new rule and test code by clicking plus icon, select New rule and then click Save. Note: The test file must contain at least one true positive and one true negative test case to be approved. See the Tests section of this document for more information.
      • In the Library panel, select a rule from a category in Semgrep Registry. Click Fork, modify the rule or test code, and then click Save.
    4. Click Share.
    5. Click Publish to Registry.
    6. Fill in the required and optional fields.
    7. Click Continue, and then click Create PR.

    This workflow automatically creates a pull request in the GitHub Semgrep Registry. Find more about the Semgrep Registry by reading the Rule writing and Tests sections.

    You can also publish rules as private rules outside of Semgrep Registry. These rules are not included in the Semgrep Registry, but they are accessible to your Semgrep organisation. See the Private rules documentation for more information.

    Contributing through GitHub

    Fork our repository and make a pull request. Sign our Contributor License Agreement (CLA) on GitHub before Semgrep, Inc. can accept your contributions. Make a pull request to the Semgrep Registry with two files:

    1. The semgrep pattern (as YAML file).
    2. The test file (with the file extension of the language or framework). The test file must contain at least one true positive and one true negative test case to be approved. See the Tests section of this document for more information.

    Pull requests require the approval of at least one maintainer and successfully passed CI jobs.

    Find more about the Semgrep Registry by reading the Rule writing and Tests sections.

    Writing a rule for Semgrep Registry

    The following sections document necessary fields in rule files of Semgrep Registry, provide information about rule messages, inform about test files, mention rule quality checkers, and describe additional fields required by rules in the security category.

    General rule requirements

    All rules in general, regardless of whether they are intended only as local rules or for Semgrep Registry, have the same initial requirements. The following table is also included in the Rule Syntax article.

    All required fields must be present at the top-level of a rule, immediately under the rules key.

    FieldTypeDescription
    idstringUnique, descriptive identifier, for example: no-unused-variable
    messagestringMessage that includes why Semgrep matched this pattern and how to remediate it. See also Rule messages.
    severitystringOne of the following values: INFO (Low severity), WARNING (Medium severity), or ERROR (High severity). The severity key specifies how critical are the issues that a rule potentially detects. Note: Semgrep Supply Chain differs, as its rules use CVE assignments for severity. For more information, see Filters section in Semgrep Supply Chain documentation.
    languagesarraySee language extensions and tags
    pattern*stringFind code matching this expression
    patterns*arrayLogical AND of multiple patterns
    pattern-either*arrayLogical OR of multiple patterns
    pattern-regex*stringFind code matching this PCRE2-compatible pattern in multiline mode
    info

    Only one of the following is required: pattern, patterns, pattern-either, pattern-regex

    Every rule also requires a test file in the language that the rule is targeting. See Tests for more details.

    Semgrep registry rule requirements

    In addition to the fields mentioned above, rules submitted to Semgrep Registry have additional required fields:

    FieldDescriptionPossible valuesExample
    metadata

    All rules require technology, category, and references. The category: security has more requirements. See Including fields required by security category.

    Required by all Semgrep Registry rules:

    • references
    • category
    • technology
    metadata:
    cwe:
    - "CWE-94: (...)"
    category: security
    technology:
    - unicode
    references:
    - https://trojansource.codes/

    Additionally required by category: security:

    • cwe
    • confidence
    • subcategory
    • likelihood
    • impact
    • subcategory
    technologyNested under the metadata field. Additional information about the technology. This helps to specify rulesets in Semgrep Registry.
    • django
    • docker
    • express
    • kubernetes
    • nginx
    • react
    • terraform
    • --no-technology--
    metadata:
    technology:
    - react
    categoryNested under the metadata field. If you use catagory security, include additional metadata. See Including fields required by security category.
    • best-practice
    • correctness
    • maintainability
    • performance
    • portability
    • security

    category: security

    referencesAdditional information that gives more context to the user of the rule. This helps developers understand the issue and how to fix it.No finite value. Any additional information that gives more context.
    references:
    - OWASP DOM based XSS Prevention Cheat Sheet
    info

    Understanding rule namespacing

    The namespacing format for contributing rules in the Semgrep Registry is <language>/<framework>/<category>/$MORE. If the rule does not belong to a particular framework, add it to the language directory, which uses the word lang in place of the <framework> - <language>/<lang>.

    Tests

    Include a test file in the language that your rule is targeting. A test file includes the following:

    • At least one test where the rule detects a finding. This is called a true positive finding.
    • At least one test where the rule does not detect a finding. This is called a true negative finding.

    Test file names must match the rule filename, except for the file extension. For example, if the rule is in my-rule.yaml, the test filename must be my-rule.js. Use any valid extension for the target language.

    Requirements of test files
    • In the test file, include examples that mark:
      • What is expected to be a finding.
      • What is not a finding.
    • The test filename must match the rule filename, except for the file extension.

    See the examples of the rule and test file below:

    Rule file:

    rules:
    - id: my-rule
    pattern: var $X = "...";

    In the test file, mark an expected finding with a comment tag and the ruleid of your rule in the comment before the expected finding. Also, mark the code that is expected not to be a finding with a comment stating ok and add the ruleid also. See the example below:

    // ruleid: my-rule
    var strdata = "hello";
    // ok: my-rule
    var numdata = 1;

    For more information, visit Testing rules.

    Rule messages

    Include a rule message that provides details about the matched pattern and informs about how to mitigate any related issues. Provide the following information in a rule message:

    1. Description of the pattern. For example: missing parameter, dangerous flag, out-of-order function calls.
    2. Description of why this pattern was detected. For example: logic bug, introduces a security vulnerability, bad practice.
    3. An alternative that resolves the issue. For example: Use another function, validate data first, and discard the dangerous flag.

    Use the YAML multiline string operator >- when rule messages span multiple lines. This presents the best-looking rule message on the command line without having to worry about line wrapping or escaping the quote or using the backslash.

    For an example of a good rule message, see: this rule for Django's mark_safe.

    Rule message example

    mark_safe() is used to mark a string as safe for HTML output. This disables escaping and may expose the content to XSS attacks. Instead, use django.utils.html.format_html() to build HTML for rendering.

    Rule quality checker

    When you contribute rules to the Semgrep Registry, our quality checkers (linters) evaluate if the rule conforms to Semgrep, Inc. standards. The semgrep-rule-lints job runs linters on a new rule to check for mistakes, performance problems, and best practices for submitting to the Semgrep Registry. To improve your rule writing, use Semgrep itself to scan semgrep-rules.

    Including fields required by security category

    Rules in category security in the Semgrep Registry require specific metadata fields that ensure consistency across the ecosystem in both Semgrep AppSec Platform and Semgrep CLI. Nest these metadata under the metadata field.

    If your rule has a category: security, the following metadata are required:

    Required metadata fieldValuesExample use
    cweA Comment Weakness Enumeration (CWE).
    cwe: "CWE-502: Deserialization of Untrusted Data"
    confidenceHIGH, MEDIUM, LOW
    confidence: MEDIUM
    likelihoodHIGH, MEDIUM, LOW
    likelihood: MEDIUM
    impactHIGH, MEDIUM, LOW
    impact: HIGH
    subcategoryvuln, audit, secure default
    subcategory:
    - vuln

    These fields help you to find rules in different categories such as:

    • High confidence security rules for CI pipelines.
    • OWASP Top 10 or CWE Top 25 rulesets.
    • Technology. For example, react so it is easy to find React rulesets.
    • Audit rules with lower confidence are intended for code auditors.

    Examples of rules with a full list of required metadata:

    note

    Details of each field mentioned above are provided in the subsections below with examples.

    CWE

    Include the appropriate Comment Weakness Enumeration (CWE). CWE can explain what vulnerability your rule is trying to find. Examples:

    If you write an SQL Injection rule, use the following:

    cwe:
    - "CWE-89: Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')"

    If you write an XSS rule, use the following:

    cwe:
    - "CWE-79: Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')"

    Confidence

    Indicate confidence of the rule to detect true positives. See the possible options below:

    • HIGH - Security concern, with high true positives. Useful in CI/CD pipelines.
    • MEDIUM - Security concern, but some false positives. Useful in CI/CD pipelines.
    • LOW - Expect a fair amount of false positives, similar to audit style rules. These rules can detect many false positives.
    HIGH

    HIGH confidence rules can use Semgrep advanced features such as metavariable-comparison or taint mode, to detect true positives. See examples below:

    confidence: HIGH
    MEDIUM

    MEDIUM confidence rules can use Semgrep advanced features such as metavariable-comparison or taint mode, but with some false positives. See examples below:

    confidence: MEDIUM
    LOW

    Low confidence rules generally find something which appears to be dangerous while reporting a lot of false positives. See examples below:

    confidence: LOW

    Likelihood

    Specify how likely it is that an attacker can exploit the issue that has been found. The possible values are LOW, MEDIUM, HIGH.

    HIGH

    HIGH likelihood rules specify a very high concern that the vulnerability can be exploited. Examples:

    likelihood: HIGH
    MEDIUM

    MEDIUM likelihood rules detect a vulnerability in most circumstances. Although it can be hard for an attacker to exploit them. Also, these rules can detect part of a problem, but not the whole issue. Examples:

    likelihood: MEDIUM
    LOW

    LOW likelihood rules tend to find something dangerous, but are not evaluating whether something is truly vulnerable, for example:

    likelihood: LOW

    Impact

    Indicate how much damage can a vulnerability cause. Use LOW, MEDIUM, and HIGH.

    HIGH

    HIGH impact rules can detect extremely damaging vulnerabilities, such as injection vulnerabilities. Examples:

    impact: HIGH
    MEDIUM

    MEDIUM impact rules are issues that are less likely to lead to full system compromise but still are fairly damaging. Examples:

    impact: MEDIUM
    LOW

    LOW impact rules are rules that leverage a security issue, but the impact is not too damaging to the application if discovered.

    impact: LOW

    Subcategory

    Include a subcategory to explain what is the type of the rule. See the subsections below for more details.

    vuln

    A vulnerability rule is something that developers certainly want to resolve. For example, an SQL Injection rule that uses taint mode. Example:

    subcategory:
    - vuln
    audit

    An audit rule is useful for code auditors. For example, an SQL rule which finds all uses of the database.exec(...) that can be problematic. Example:

    subcategory:
    - audit
    secure default

    A secure default rule makes use of inherently secure libraries, frameworks, configurations, or settings. These rules enforce the mitigation of common security concerns, such as preventing cross-site request forgery (CSRF) by properly verifying inbound requests in Django or Flask applications.

    A secure default rule must contain remediation that suggests applying a one-time setting that ensures security throughout the codebase without the need for repeated application by developers. For example, configuring a global security setting in a web application framework that applies to all routes and inputs.

    subcategory:
    - secure default

    Technology

    Technology helps to define specific rulesets for languages, libraries, and frameworks that are available in Semgrep Registry, for example express will be included in the p/express ruleset.

    technology:
    - express

    References

    References help provide more context to a developer on what the issue is, and how to remediate the vulnerability, see examples below:

    Updating existing open source rules in Semgrep Registry

    To update an existing open source rule, follow these steps:

    1. Find a rule you want to update in the semgrep-rules repository.
    2. Submit a PR to the repository with your new update.
    3. Follow the same instructions and recommendations as you can find in the rest of this document. For example the security category has specific metadata requirements.
    4. Leave a message in the PR. Explain why are you making changes. What is the motivation for this update?

    See a PR example.

    There can be specific messages in the repository’s pipeline informing you about specific details of your rule. Ensure that your rule fulfills all of the necessities and requirements. However, sometimes the pipeline running in the semgrep-rules repository can have specific issues. In such a case, wait for a Semgrep reviewer's help.


    Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.