Skip to main content

Write custom validators

Semgrep Secrets uses proprietary validators to determine if a secret is actively being used. Validators are included in the rules that Semgrep Secrets uses.

This article walks you through the syntax required to write your own custom validators.

note
  • The syntax for Semgrep Secrets validators is experimental and subject to change.
  • Semgrep currently supports validation using HTTP and HTTPS.

Sample validator

validators:
- http:
request:
headers:
Authorization: Bearer $REGEX
Host: api.semgrep.dev
User-Agent: Semgrep
method: GET
url: https://api.semgrep.dev/user
response:
- match:
- status-code: 200
result:
validity: valid
- match:
- status-code: 401
result:
validity: invalid
See a validator in the context of a full rule.
rules:
- id: exampleCo_example
message: >-
This is an example rule that performs validation against semgrep.dev
severity: WARNING
metadata:
product: secrets
secret_type: exampleCo
languages:
- regex
validators:
- http:
request:
headers:
Authorization: Bearer $REGEX
Host: api.semgrep.dev
User-Agent: Semgrep
method: GET
url: https://api.semgrep.dev/user
response:
- match:
- status-code: 200
result:
validity: valid
- match:
- status-code: 401
result:
validity: invalid
patterns:
- patterns:
- pattern-regex: (?<REGEX>\b(someprefix_someRegex[0-9A-Z]{32})\b)
- focus-metavariable: $REGEX
- metavariable-analysis:
analyzer: entropy
metavariable: $REGEX

Syntax

validator

KeyRequiredDescription
validatorYesUsed to define a list of validators within a Semgrep rule.

type

KeyRequiredDescription
httpYesIndicates that the request type is http.
note

Semgrep only supports web services with HTTP(S).

request

KeyRequiredDescription
requestYesDescribes the request object and the URL to which the request object should be sent
methodYesThe HTTP method Semgrep uses to make the call. Accepted values: GET, POST, PUT, DELETE, OPTIONS, PATCH
urlYesThe URL to which the call is made
headersYesThe headers to include with the call
bodyNoThe body used with POST, PUT, and PATCH requests

Subkeys for headers

The following keys are for use with headers:

KeyRequiredDescription
HostNoThe host to which the call is made. Only the url field is required, but you can override the host if needed
Other-valuesNoThe request header. Accepts all values, including Authorization, Content-Type, User-Agent, and so on

Example

request:
headers:
Authorization: Bearer $REGEX
Host: api.semgrep.dev
User-Agent: Semgrep
method: GET
url: https://api.semgrep.dev/user

response

The response key is used to determine the validation state. It accepts a list of objects with the Subkeys match and result.

KeyRequiredDescription
matchYesDefines the list of match conditions.
resultYesDefines the validity. Accepted values: Valid, Invalid

Subkeys for match

Match accepts a list of objects. No specific key is required, but at least one key must be present.

KeyDescription
status-codeThe HTTP status code expected by Semgrep Secrets for it to consider the secret a match
contentThe response body; you can inspect it for a specific value to determine if the request is valid. An example of where this is useful is when both invalid and valid responses return the same status code
headersAccepts a list of objects with the keys name/value they must be exact values

Subkeys for result

KeyRequiredDescription
validityYesSets the validity based on the HTTP status code received. Accepted values: valid and invalid
messageNoUsed to override the rule message based on the secret's validity state
metadataNoUsed to override existing metadata fields or add new metadata fields based on the secret's validity state
severityNoUsed to override the existing rule severity based on the validity state

Subkeys for content

KeyRequiredDescription
languageYesIndicates the pattern language to use; this must be regex or generic
pattern-regexYesDefines the regex used to search the response body. Alternatively, you can use the patterns key and define patterns as you would for rules

Example

response:
- match:
- status-code: 200
- content:
language: regex
pattern-regex: (\"ok\":true)
status-code: 200

Sample rules with validators

Sample POST request
rules:
- id: exampleCo_example
message: >-
This is an example rule that performs validation against semgrep.dev
severity: WARNING
metadata:
product: secrets
secret_type: exampleCo
languages:
- regex
validators:
- http:
request:
headers:
Host: api.semgrep.dev
User-Agent: Semgrep
method: POST
body: |
{"key": "$REGEX"}
url: https://api.semgrep.dev/user
response:
- match:
- status-code: 200
result:
validity: valid
- match:
- status-code: 401
result:
validity: invalid
patterns:
- patterns:
- pattern-regex: (?<REGEX>\b(someprefix_someRegex[0-9A-Z]{32})\b)
- focus-metavariable: $REGEX
- metavariable-analysis:
analyzer: entropy
metavariable: $REGEX
All fields
rules:
- id: exampleCo_example
message: >-
This is an example rule that performs validation against semgrep.dev
severity: WARNING
metadata:
product: secrets
secret_type: exampleCo
languages:
- regex
validators:
- http:
request:
headers:
Host: api.semgrep.dev
User-Agent: Semgrep
method: POST
body: |
{"key": "$REGEX"}
url: https://api.semgrep.dev/user
response:
- match:
- status-code: 200
- content:
language: regex
pattern-regex: (\"role\":admin)
result:
validity: valid
severity: ERROR
message: >-
The token exposed is for an admin user, and this should be fixed immediately!
See https://howtorotate.com/docs/introduction/key-rotation-101/ on how to
rotate secrets and https://blog.gitguardian.com/what-to-do-if-you-expose-a-secret/
on how to look for suspicious activity.
metadata:
context:
- admin: true
- match:
- status-code: 200
result:
validity: invalid
patterns:
- patterns:
- pattern-regex: (?<REGEX>\b(someprefix_someRegex[0-9A-Z]{32})\b)
- focus-metavariable: $REGEX
- metavariable-analysis:
analyzer: entropy
metavariable: $REGEX

Base64 encoding

You can use Base64 encoding by leveraging the __semgrep_internal_encode_64(...) utility. Base64 encoding can be applied to the following fields:

  • url
  • body
  • header values
note

The Base64 encoding of fields is experimental and can change at any time.

Sample Semgrep rule with validator using Base64 encoding
rules:
- id: exampleCo_example
message: >-
This is an example rule that performs validation against semgrep.dev
severity: WARNING
metadata:
product: secrets
secret_type: exampleCo
languages:
- regex
validators:
- http:
request:
headers:
Authorization: Basic __semgrep_internal_encode_64($REGEX:)
Host: api.semgrep.dev
User-Agent: Semgrep
method: GET
url: https://api.semgrep.dev/user
response:
- match:
- status-code: 200
result:
validity: valid
- match:
- status-code: 401
result:
validity: invalid
patterns:
- patterns:
- pattern-regex: (?<REGEX>\b(someprefix_someRegex[0-9A-Z]{32})\b)
- focus-metavariable: $REGEX
- metavariable-analysis:
analyzer: entropy
metavariable: $REGEX