Skip to main content

Generating Python lockfiles for Semgrep Supply Chain scans

To correctly scan all dependencies in a project, Semgrep Supply Chain requires a Python lockfile. This article describes methods to generate the following Python lockfiles:

  • requirements.txt
  • Pipfile.lock
  • Poetry.lock

You can use any of these three lockfiles to get a successful Semgrep Supply Chain scan.

Generating requirements.txt

Using requirements.in

Prerequisites
  • A requirements.in file with direct Python packages. Do not include transitive packages in requirements.in.
  • pip-tools must be installed on your machine. See the pip-tools GitHub repository for installation instructions.

To generate a requirements.txt file from requirements.in, enter the following command in the root of your project directory:

pip-compile -o requirements.txt

Now, you have successfully generated a requirements.txt file with direct and transitive dependencies that Semgrep Supply Chain can scan.

Example of requirements.txt generated from requirements.in

Given the following example project Binder examples, the requirements.in file contains the following direct dependencies:

numpy
matplotlib==3.*
seaborn==0.10.1
pandas

Executing the command pip-compile -o requirements.txt, generates the following requirements.txt:

#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile --output-file=requirements.txt
#
contourpy==1.0.7
# via matplotlib
cycler==0.11.0
# via matplotlib
fonttools==4.39.4
# via matplotlib
kiwisolver==1.4.4
# via matplotlib
matplotlib==3.7.1
# via
# -r requirements.in
# seaborn
numpy==1.24.3
# via
# -r requirements.in
# contourpy
# matplotlib
# pandas
# scipy
# seaborn
packaging==23.1
# via matplotlib
pandas==2.0.2
# via
# -r requirements.in
# seaborn
pillow==9.5.0
# via matplotlib
pyparsing==3.0.9
# via matplotlib
python-dateutil==2.8.2
# via
# matplotlib
# pandas
pytz==2023.3
# via pandas
scipy==1.10.1
# via seaborn
seaborn==0.10.1
# via -r requirements.in
six==1.16.0
# via python-dateutil
tzdata==2023.3
# via pandas

This file has all direct and transitive dependencies of the example project and can be used by Semgrep as an entry point for the supply chain scan.

Using pip freeze

Prerequisites
  • The pip freeze utility uses dependencies from packages already installed in your current environment to generate requirements.txt. You must be in an isolated or virtual environment.
  • An existing setup.py file.

To generate requirements.txt through pip freeze, enter the following commands:

pip3 install .
pip freeze --all > tee requirements.txt

Example CI configuration

The following GitHub Actions workflow provides an example on how to generate requirements.txt in a CI environment based on the preceding methods.

In the following example there are two jobs:

  • my_first_job: Generating requirements.txt and uploading it as an artifact
  • my_second_job: Downloading the artifact and scanning it with Semgrep
on:
pull_request: {}
workflow_dispatch: {}
push:
branches:
- master
paths:
- .github/workflows/semgrep.yml
schedule:
- cron: '0 1 * * 0'
name: Semgrep
jobs:
my_first_job:
name: requirementsGeneration
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Generate requirements txt
run: |
pip3 install pip-tools
pip-compile -o requirements.txt
- name: Upload Requirements File as Artifact
uses: actions/upload-artifact@v3
with:
name: requirementstxt
path: requirements.txt
my_second_job:
needs: my_first_job
name: Scan
runs-on: ubuntu-20.04
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
container:
image: semgrep/semgrep
steps:
- uses: actions/checkout@v4
- name: Download artifact from previous job
uses: actions/download-artifact@v3
with:
name: requirementstxt
- run: semgrep ci --supply-chain

Generating Pipfile.lock

Prerequisite

An existing Pipfile. Depending on your development environment, a Pipfile may already be automatically generated for you.

Example of Pipfile

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
flasgger = "==0.9.5"
flask = "==2.2.2"
flask-cors = "==3.0.10"
marshmallow = "==3.18.0"
requests = "==2.25.1"
sqlalchemy = "==1.4.41"
waitress = "==2.1.2"
psycopg2 = "==2.9.5"
defusedxml = "==0.7.1"

[dev-packages]

[requires]
python_version = "3.9"

Generating a Pipfile.lock

Generate a Pipfile.lock with the following commands:

pip install pipenv --user
pipenv lock

The newly generated Pipfile.lock is a JSON file with all Python dependencies (direct and transitive) and their sha256 code.

The beginning of the file may look something like this:

{
"_meta": {
"hash": {
"sha256": "af0d5c3f87bd23f340a214b12ad766ca83aead0c462aa08dbc4f012ac2796708"
},
"pipfile-spec": 6,
"requires": {
"python_version": "3.9"
},
"sources": [
{
"name": "pypi",
"url": "https://pypi.org/simple",
"verify_ssl": true
}
]
},
"default": {
"attrs": {
"hashes": [
"sha256:1f28b4522cdc2fb4256ac1a020c78acf9cba2c6b461ccd2c126f3aa8e8335d04",
"sha256:6279836d581513a26f1bf235f9acd333bc9115683f14f7e8fae46c98fc50e015"
],
"markers": "python_version >= '3.7'",
"version": "==23.1.0"
},

Generating Poetry.lock

Poetry is a tool for dependency management and packaging in Python.

Prerequisite

A pyproject.toml file.

Example pyproject.toml

[build-system]
requires = ["poetry-core>=1.1.0"]
build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "example-project"
version = "1.0.0"
description = "An example project"
authors = ["Your Name <yourname@example.com>"]

[tool.poetry.dependencies]
python = "^3.9"
requests = "^2.25.1"
numpy = "^1.21.0"

[tool.poetry.dev-dependencies]
pytest = "^6.2.4"
flake8 = "^3.9.2"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Generating a Poetry.lock

Generate a Poetry.lock file with the following command:

poetry lock

The generated Poetry.lock file contains all transitive and direct dependencies that the project uses.

Selecting a single lockfile among many

While there may already be a lockfile in the repository, such as a Pipfile.lock, you may want to generate a new one, for example a requirements.txt, to be sure it has the latest dependencies.

In Semgrep, you can use the flag --include to specify only one lockfile:

semgrep --supply-chain --include=requirements.txt

Alternatively, your repository may already have a requirements.txt file, but you want to generate a fresh and updated version.

However, generating a new requirements.txt file and running the previous command may result in the following error:

[ERROR] Found pending changes in tracked files. Baseline scans runs require a clean git state.

This is due to git conflicts between the previously committed requirements.txt file and the newly generated requirements.txt file.

A solution to this issue can be to generate the new requirements.txt file in a different folder and then specifically include it in the Semgrep scan:

semgrep --supply-chain --include=ssc/requirements.txt

Conclusions

There are several ways to generate lockfiles for Python dependencies. Depending on your preferences, you can select one or another. Keep in mind that the lockfile should be generated before the Semgrep scan and within the proper environment. This ensures that you are scanning only the dependencies of your project and not all the Python dependencies of your system.