Semgrep with self-hosted Ubuntu runners in Azure Pipelines
Semgrep provides a sample configuration for Azure-hosted runners. If you use self-hosted Ubuntu Linux runners, you have significantly more control over their configuration, but as a result, they require additional preparation and configuration to run Semgrep.
This guide adds two approaches to configuring self-hosted runners that use Ubuntu (the default self-hosted option for Azure DevOps Linux runners):
Using pipx
While the sample configuration uses pip
, this approach uses pipx
, which avoids issues with system-managed Python vs user-installed Python.
Prepare your runner
Access the runner and execute the following commands:
$ sudo apt update
$ sudo apt install pipx
$ pipx ensurepath
After completing the commands:
- Start a new shell session, so that the changes from
pipx ensurepath
are available. - Ensure the Azure DevOps agent is set up and running.
Create your configuration
- Follow the steps provided in the sample configuration for Azure-hosted runners.
- Add the following snippet to the
azure-pipelines.yml
for the repository.
variables:
- group: Semgrep_Variables
pool:
name: Default
steps:
- checkout: self
clean: true
fetchDepth: 20
persistCredentials: true
- script: |
pipx install semgrep
if [ $(Build.SourceBranchName) = "master" ]; then
echo "Semgrep full scan"
semgrep ci
elif [ $(System.PullRequest.PullRequestId) -ge 0 ]; then
echo "Semgrep diff scan"
git fetch origin master:origin/master
export SEMGREP_PR_ID=$(System.PullRequest.PullRequestId)
export SEMGREP_BASELINE_REF='origin/master'
semgrep ci
fi
env:
SEMGREP_APP_TOKEN: $(SEMGREP_APP_TOKEN)
- If your self-hosted runner agent pool has a different name, update the
name
key underpool
to match the desired agent pool. - If your default branch is not called
master
, update the references tomaster
to match the name of your default branch.
Set environment variables in Azure Pipelines
Semgrep minimally requires the variable SEMGREP_APP_TOKEN
in order to report results to the platform, and other variables may be helpful as well. To set these variables in Azure Pipelines:
- Set up a variable group called
Semgrep_Variables
. - Set
SEMGREP_APP_TOKEN
in the variable group, following the steps for secret variables. The variable is mapped into theenv
in the provided config. - Optional: Add the following environment variables to the group if you aren't seeing hyperlinks to the code that generated a finding, or if you are not receiving PR or MR comments. Review the use of these variables at Environment variables for creating hyperlinks in Semgrep AppSec Platform.These variables are not sensitive and do not need to be secret variables.
SEMGREP_REPO_NAME
SEMGREP_REPO_URL
SEMGREP_BRANCH
SEMGREP_COMMIT
SEMGREP_JOB_URL
- Set variables for diff-aware scanning. The provided config sets
SEMGREP_PR_ID
to the system variableSystem.PullRequest.PullRequestId
andSEMGREP_BASELINE_REF
toorigin/master
within thescript
section of the config. The value ofSEMGREP_BASELINE_REF
is typically your trunk or default branch, so if you use a different branch than master, update the name accordingly. asmain
ormaster
.- If you prefer not to implement diff-aware scanning, you can skip setting these variables and remove the
elif
section of thescript
step.
- If you prefer not to implement diff-aware scanning, you can skip setting these variables and remove the
Using pip with a virtual environment
Prepare your runner
This approach uses built-in Azure DevOps tasks, including UsePythonVersion
and Bash
, and uses a virtual environment to install pip
, another approach that prevents issues with system-managed Python vs user-installed Python.
- Ensure you have a pre-installed and configured compatible version of Python 3, following the instructions for UsePythonVersion for self-hosted runners.
- Ensure the Azure DevOps agent is set up and running.
Create your configuration
Add the following snippet to the azure-pipelines.yml
for the repository.
variables:
- group: Semgrep_Variables
pool:
name: Default
steps:
- checkout: self
clean: true
persistCredentials: true
- task: UsePythonVersion@0
displayName: 'Use Python 3.12'
inputs:
versionSpec: 3.12
- task: Bash@3
env:
SEMGREP_APP_TOKEN: $(SEMGREP_APP_TOKEN)
inputs:
targetType: 'inline'
script: |
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
pip install semgrep
if [ $(Build.SourceBranchName) = "master" ]; then
export SEMGREP_BRANCH=$(Build.SourceBranchName)
echo "Semgrep full scan of master"
semgrep ci
elif [ $(System.PullRequest.PullRequestId) -ge 0 ]; then
echo "Semgrep diff scan"
git fetch origin master:origin/master
export SEMGREP_PR_ID=$(System.PullRequest.PullRequestId)
export SEMGREP_BASELINE_REF='origin/master'
semgrep ci
fi
- If your self-hosted runner agent pool has a different name, update the
name
key underpool
to match the desired agent pool. - If your default branch is not called
master
, update the references tomaster
to match the name of your default branch.
Set environment variables in Azure Pipelines
Semgrep minimally requires the variable SEMGREP_APP_TOKEN
in order to report results to the platform, and other variables may be helpful as well. To set these variables in Azure Pipelines:
- Set up a variable group called
Semgrep_Variables
. - Set
SEMGREP_APP_TOKEN
in the variable group, following the steps for secret variables. The variable is mapped into theenv
in the provided config. - Optional: Add the following environment variables to the group if you aren't seeing hyperlinks to the code that generated a finding, or if you are not receiving PR or MR comments. Review the use of these variables at Environment variables for creating hyperlinks in Semgrep AppSec Platform.These variables are not sensitive and do not need to be secret variables.
SEMGREP_REPO_NAME
SEMGREP_REPO_URL
SEMGREP_BRANCH
SEMGREP_COMMIT
SEMGREP_JOB_URL
- Set variables for diff-aware scanning. The provided config sets
SEMGREP_PR_ID
to the system variableSystem.PullRequest.PullRequestId
andSEMGREP_BASELINE_REF
toorigin/master
within thescript
section of the config. The value ofSEMGREP_BASELINE_REF
is typically your trunk or default branch, so if you use a different branch than master, update the name accordingly. asmain
ormaster
.- If you prefer not to implement diff-aware scanning, you can skip setting these variables and remove the
elif
section of thescript
step.
- If you prefer not to implement diff-aware scanning, you can skip setting these variables and remove the
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.