Generating Python lockfiles for Semgrep Supply Chain scans
To correctly scan all dependencies in a project, Semgrep Supply Chain requires a Python lockfile. This article describes methods to generate the following Python lockfiles:
requirements.txt
, including those in a requirements folder, such as**/requirements/*.txt
requirements.pip
requirement.txt
, including those in a requirement folder, such as**/requirement/*.txt
requirement.pip
Pipfile.lock
Poetry.lock
You can use any of these three lockfiles to get a successful Semgrep Supply Chain scan. Your lockfiles must have one of these three names in order to be scanned.
Generating requirements.txt
Using requirements.in
- A
requirements.in
file with direct Python packages. Do not include transitive packages inrequirements.in
. pip-tools
must be installed on your machine. See the pip-tools GitHub repository for installation instructions.
To generate a requirements.txt
file from requirements.in
, enter the following command in the root of your project directory:
pip-compile -o requirements.txt
Now, you have successfully generated a requirements.txt
file with direct and transitive dependencies that Semgrep Supply Chain can scan.
Example of requirements.txt
generated from requirements.in
Given the following example project Binder examples, the requirements.in
file contains the following direct dependencies:
numpy
matplotlib==3.*
seaborn==0.10.1
pandas
Executing the command pip-compile -o requirements.txt
, generates the following requirements.txt
:
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile --output-file=requirements.txt
#
contourpy==1.0.7
# via matplotlib
cycler==0.11.0
# via matplotlib
fonttools==4.39.4
# via matplotlib
kiwisolver==1.4.4
# via matplotlib
matplotlib==3.7.1
# via
# -r requirements.in
# seaborn
numpy==1.24.3
# via
# -r requirements.in
# contourpy
# matplotlib
# pandas
# scipy
# seaborn
packaging==23.1
# via matplotlib
pandas==2.0.2
# via
# -r requirements.in
# seaborn
pillow==9.5.0
# via matplotlib
pyparsing==3.0.9
# via matplotlib
python-dateutil==2.8.2
# via
# matplotlib
# pandas
pytz==2023.3
# via pandas
scipy==1.10.1
# via seaborn
seaborn==0.10.1
# via -r requirements.in
six==1.16.0
# via python-dateutil
tzdata==2023.3
# via pandas
This file has all direct and transitive dependencies of the example project and can be used by Semgrep as an entry point for the supply chain scan.
Using pip freeze
- The
pip freeze
utility uses dependencies from packages already installed in your current environment to generaterequirements.txt
. You must be in an isolated or virtual environment. - An existing
setup.py
file.
To generate requirements.txt
through pip freeze
, enter the following commands:
pip3 install .
pip freeze --all > tee requirements.txt
Example CI configuration
The following GitHub Actions workflow provides an example on how to generate requirements.txt
in a CI environment based on the preceding methods.
In the following example there are two jobs:
my_first_job
: Generatingrequirements.txt
and uploading it as an artifactmy_second_job
: Downloading the artifact and scanning it with Semgrep
on:
pull_request: {}
workflow_dispatch: {}
push:
branches:
- master
paths:
- .github/workflows/semgrep.yml
schedule:
- cron: '0 1 * * 0'
name: Semgrep
jobs:
my_first_job:
name: requirementsGeneration
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Generate requirements txt
run: |
pip3 install pip-tools
pip-compile -o requirements.txt
- name: Upload Requirements File as Artifact
uses: actions/upload-artifact@v3
with:
name: requirementstxt
path: requirements.txt
my_second_job:
needs: my_first_job
name: Scan
runs-on: ubuntu-20.04
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
container:
image: semgrep/semgrep
steps:
- uses: actions/checkout@v4
- name: Download artifact from previous job
uses: actions/download-artifact@v3
with:
name: requirementstxt
- run: semgrep ci --supply-chain
Generating Pipfile.lock
An existing Pipfile
. Depending on your development environment, a Pipfile may already be automatically generated for you.
Example of Pipfile
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
flasgger = "==0.9.5"
flask = "==2.2.2"
flask-cors = "==3.0.10"
marshmallow = "==3.18.0"
requests = "==2.25.1"
sqlalchemy = "==1.4.41"
waitress = "==2.1.2"
psycopg2 = "==2.9.5"
defusedxml = "==0.7.1"
[dev-packages]
[requires]
python_version = "3.9"
Generating a Pipfile.lock
Generate a Pipfile.lock
with the following commands:
pip install pipenv --user
pipenv lock
The newly generated Pipfile.lock
is a JSON file with all Python dependencies (direct and transitive) and their sha256 code.
The beginning of the file may look something like this:
{
"_meta": {
"hash": {
"sha256": "af0d5c3f87bd23f340a214b12ad766ca83aead0c462aa08dbc4f012ac2796708"
},
"pipfile-spec": 6,
"requires": {
"python_version": "3.9"
},
"sources": [
{
"name": "pypi",
"url": "https://pypi.org/simple",
"verify_ssl": true
}
]
},
"default": {
"attrs": {
"hashes": [
"sha256:1f28b4522cdc2fb4256ac1a020c78acf9cba2c6b461ccd2c126f3aa8e8335d04",
"sha256:6279836d581513a26f1bf235f9acd333bc9115683f14f7e8fae46c98fc50e015"
],
"markers": "python_version >= '3.7'",
"version": "==23.1.0"
},
Generating Poetry.lock
Poetry is a tool for dependency management and packaging in Python.
A pyproject.toml
file.
Example pyproject.toml
[build-system]
requires = ["poetry-core>=1.1.0"]
build-backend = "poetry.core.masonry.api"
[tool.poetry]
name = "example-project"
version = "1.0.0"
description = "An example project"
authors = ["Your Name <yourname@example.com>"]
[tool.poetry.dependencies]
python = "^3.9"
requests = "^2.25.1"
numpy = "^1.21.0"
[tool.poetry.dev-dependencies]
pytest = "^6.2.4"
flake8 = "^3.9.2"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Generating a Poetry.lock
Generate a Poetry.lock
file with the following command:
poetry lock
The generated Poetry.lock
file contains all transitive and direct dependencies that the project uses.
Selecting a single lockfile among many
While there may already be a lockfile in the repository, such as a Pipfile.lock
, you may want to generate a new one, for example a requirements.txt
, to be sure it has the latest dependencies.
When scanning with Semgrep Supply Chain, you can use the flag --include
to specify that only a single lockfile should be scanned. The lockfile must still have one of the supported names.
semgrep ci --supply-chain --include=requirements.txt
Conclusions
There are several ways to generate lockfiles for Python dependencies. Depending on your preferences, you can select one or another. Keep in mind that the lockfile should be generated before the Semgrep scan and within the proper environment. This ensures that you are scanning only the dependencies of your project and not all the Python dependencies of your system.
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.