Skip to main content

Semgrep with self-hosted Ubuntu runners in Azure Pipelines

Semgrep provides a sample configuration for Azure-hosted runners. If you use self-hosted Ubuntu Linux runners, you have significantly more control over their configuration, but as a result, they require additional preparation and configuration to run Semgrep.

This guide adds two approaches to configuring self-hosted runners that use Ubuntu (the default self-hosted option for Azure DevOps Linux runners):

Using pipx

While the sample configuration uses pip, this approach uses pipx, which avoids issues with system-managed Python vs user-installed Python.

Prepare your runner

Access the runner and execute the following commands:

$ sudo apt update
$ sudo apt install pipx
$ pipx ensurepath

After completing the commands:

  1. Start a new shell session, so that the changes from pipx ensurepath are available.
  2. Ensure the Azure DevOps agent is set up and running.

Create your configuration

  1. Follow the steps provided in the sample configuration for Azure-hosted runners.
  2. Add the following snippet to the azure-pipelines.yml for the repository.
variables:
- group: Semgrep_Variables

pool:
name: Default

steps:
- checkout: self
clean: true
fetchDepth: 20
persistCredentials: true
- script: |
pipx install semgrep
if [ $(Build.SourceBranchName) = "master" ]; then
echo "Semgrep full scan"
semgrep ci
elif [ $(System.PullRequest.PullRequestId) -ge 0 ]; then
echo "Semgrep diff scan"
git fetch origin master:origin/master
export SEMGREP_PR_ID=$(System.PullRequest.PullRequestId)
export SEMGREP_BASELINE_REF='origin/master'
semgrep ci
fi
env:
SEMGREP_APP_TOKEN: $(SEMGREP_APP_TOKEN)
Customizing the configuration
  • If your self-hosted runner agent pool has a different name, update the name key under pool to match the desired agent pool.
  • If your default branch is not called master, update the references to master to match the name of your default branch.

Set environment variables in Azure Pipelines

Semgrep minimally requires the variable SEMGREP_APP_TOKEN in order to report results to the platform, and other variables may be helpful as well. To set these variables in Azure Pipelines:

  1. Set up a variable group called Semgrep_Variables.
  2. Set SEMGREP_APP_TOKEN in the variable group, following the steps for secret variables. The variable is mapped into the env in the provided config.
  3. Optional: Add the following environment variables to the group if you aren't seeing hyperlinks to the code that generated a finding, or if you are not receiving PR or MR comments. Review the use of these variables at Environment variables for creating hyperlinks in Semgrep AppSec Platform.These variables are not sensitive and do not need to be secret variables.
    • SEMGREP_REPO_NAME
    • SEMGREP_REPO_URL
    • SEMGREP_BRANCH
    • SEMGREP_COMMIT
    • SEMGREP_JOB_URL
  4. Set variables for diff-aware scanning. The provided config sets SEMGREP_PR_ID to the system variable System.PullRequest.PullRequestId and SEMGREP_BASELINE_REF to origin/master within the script section of the config. The value of SEMGREP_BASELINE_REF is typically your trunk or default branch, so if you use a different branch than master, update the name accordingly. as main or master.
    • If you prefer not to implement diff-aware scanning, you can skip setting these variables and remove the elif section of the script step.

Using pip with a virtual environment

Prepare your runner

This approach uses built-in Azure DevOps tasks, including UsePythonVersion and Bash, and uses a virtual environment to install pip, another approach that prevents issues with system-managed Python vs user-installed Python.

  1. Ensure you have a pre-installed and configured compatible version of Python 3, following the instructions for UsePythonVersion for self-hosted runners.
  2. Ensure the Azure DevOps agent is set up and running.

Create your configuration

Add the following snippet to the azure-pipelines.yml for the repository.

variables:
- group: Semgrep_Variables

pool:
name: Default

steps:
- checkout: self
clean: true
persistCredentials: true
- task: UsePythonVersion@0
displayName: 'Use Python 3.12'
inputs:
versionSpec: 3.12
- task: Bash@3
env:
SEMGREP_APP_TOKEN: $(SEMGREP_APP_TOKEN)
inputs:
targetType: 'inline'
script: |
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
pip install semgrep

if [ $(Build.SourceBranchName) = "master" ]; then
export SEMGREP_BRANCH=$(Build.SourceBranchName)
echo "Semgrep full scan of master"
semgrep ci
elif [ $(System.PullRequest.PullRequestId) -ge 0 ]; then
echo "Semgrep diff scan"
git fetch origin master:origin/master
export SEMGREP_PR_ID=$(System.PullRequest.PullRequestId)
export SEMGREP_BASELINE_REF='origin/master'
semgrep ci
fi
Customizing the configuration
  • If your self-hosted runner agent pool has a different name, update the name key under pool to match the desired agent pool.
  • If your default branch is not called master, update the references to master to match the name of your default branch.

Set environment variables in Azure Pipelines

Semgrep minimally requires the variable SEMGREP_APP_TOKEN in order to report results to the platform, and other variables may be helpful as well. To set these variables in Azure Pipelines:

  1. Set up a variable group called Semgrep_Variables.
  2. Set SEMGREP_APP_TOKEN in the variable group, following the steps for secret variables. The variable is mapped into the env in the provided config.
  3. Optional: Add the following environment variables to the group if you aren't seeing hyperlinks to the code that generated a finding, or if you are not receiving PR or MR comments. Review the use of these variables at Environment variables for creating hyperlinks in Semgrep AppSec Platform.These variables are not sensitive and do not need to be secret variables.
    • SEMGREP_REPO_NAME
    • SEMGREP_REPO_URL
    • SEMGREP_BRANCH
    • SEMGREP_COMMIT
    • SEMGREP_JOB_URL
  4. Set variables for diff-aware scanning. The provided config sets SEMGREP_PR_ID to the system variable System.PullRequest.PullRequestId and SEMGREP_BASELINE_REF to origin/master within the script section of the config. The value of SEMGREP_BASELINE_REF is typically your trunk or default branch, so if you use a different branch than master, update the name accordingly. as main or master.
    • If you prefer not to implement diff-aware scanning, you can skip setting these variables and remove the elif section of the script step.

Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.