best practices

The best free, open-source supply-chain security tool? The lockfile

Lockfiles: the best investment you can make for supply chain security

the-best-free-open-source-supply-chain-security-tool-the-lockfile

tl;dr: Lockfiles often protect you from malicious new versions of dependencies. When something bad happens, they empower you to know exactly which systems were affected and when, which is critical during incident response. This posts discusses "why lockfiles" and the details of setting them up properly across ~9 different package managers.

It's wonderful to write a few lines of code and then shortcut the next million lines by depending on code written by thousands of other developers.

But there is a cost: trusting thousands of other developers. Sometimes this goes wrong. The security implications of this trust are generally known as "supply chain security." And since the median seniority of a developer is dropping as the number of new developers grows, future developers will be reusing more code written by increasingly junior developers.

A prerequisite to having a handle on supply chain problems are Lockfiles, which reduce the surface area of dependency code by specifying exact dependency versions and content.

A quick outline of this post:

What is a lockfile?

Dependency manifest

Before explaining lockfiles, let's look at the dependency manifest. Most package managers have a manifest file that specifies dependencies–usually a tuple of (package, version or range). Often it allows specifying a range of acceptable versions, typically using a Semver expression. You've probably seen this file (package.json, requirements.txt, pom.xml) before, but here's an example snippet from a Python Pipfile manifest:

1    [packages]
2    click = "~=8.0.1"

It includes a package (click) and a version range that is considered acceptable (any patch on version 8.0 above 8.0.1).

Lockfile

The lockfile is a "compiled" version of a dependency manifest. It specifies the exact version of every dependency installed. A good lockfile format recursively specifies all dependencies of dependencies. Some lockfiles also specify the set of allowed SHA hashes for the dependency binary or source (see later in the post for which lockfiles support this extra level of specificity).

For example, in Python, the corresponding lockfile entry in Pipfile.lock might look like:

1    "click": {
2        "hashes": ["sha256:353f466495adaeb40b6b5f592f9f91cb22372351c84caeb068132442a4518ef3", "sha256:410e932b050f5eed773c4cda94de75971c89cdb3155a72a0831139a79e5ecb5b"],
3        "index": "pypi",
4        "version": "==8.0.3"
5    },

Why are lockfiles critical for supply chain security?

The most fundamental question for a supply chain is: what's in it? If you can't answer the question of what code you depend on, you can’t reason about risk inside it.

Without a lockfile, you don't know:

  • Which versions of a dependency were actually installed

  • Where they were installed

  • At what time a dependency version or content changed

Will knowing these prevent you from getting hacked via a dependency? The preventative angle is limited, but there is one small benefit: just trust on first use (TOFU). You are protected from a silent update of the package at the same version.

The bigger benefit is responding to an event in the supply chain. With a content-hashed lockfile, you can reason about whether you were impacted by a vulnerability/malicious package as opposed to having to guess based on a version or range specifier, because the build environment is deterministic and reproducible.

Imagine you're trying to tell whether you were affected by a vulnerability in lodash 4.17.2. You open package.json which says lodash is version "*". You have no way of knowing whether you were ever exposed, when you were exposed, or on which systems.

A lockfile guarantees which version of the dependency you install. But not all package managers are immutable: what if a developer changed the package that was listed as "1.0"? Well-designed lockfiles lock specify not just the exact version, but also the precise content of the package, by specifying an acceptable hash for the package file. So you are guaranteed to get the same code regardless of which machine you use or when you install it.

Of course much more goes into supply chain security: knowing whether the package has vulnerabilities, knowing who the authors are, etc. But the lockfile provides a powerful determinism guarantee for free!

* Note that for this to work, you have to check your lockfile into source control and actually enforce it at install time; see later in this post for details.

What are the arguments against lockfiles?

  • If you use lockfiles, you'll be stuck on old versions!

    The lockfile does create friction; you are no longer running with possibly wildly different versions of your dependency across every developer's machine and CI job.

    But modern package managers provide a command that lets you update the lockfile in a single step (see below for the specific commands)–I think this complaint is a failure of having a thoughtful upgrade process rather than a fault of the determinism one gets from lockfiles.

  • You always just want to run the latest version anyways–that's the most secure, so you don't need a lockfile

    If you really do want to always run the latest version: a lockfile is still a good idea. Imagine you pin to a wildcard somepackage=* which always picks the latest version at install time. You'll still need a separate process to re-run package installation every time the package updates, otherwise your local machine might have the latest version available when you installed a month ago, but a new build server will use the latest available when it fetches dependencies. If you will need an update step anyways, just use a lockfile as part of that update process and enjoy the benefits.

  • Lockfiles don't have any benefit if you aren't reading the original source code or trusting the developer

    Dependency locking is orthogonal to dependency verification, and in fact they are complementary. Check out the OWASP guidelines (for NPM, they apply to most package managers as well).

What languages/package managers support lockfiles?

Unfortunately not every package manager supports lockfiles. And of those that do, some of them are missing crucial features like content-hash locking. Probably in 10 years this will become part of some boring compliance standard, but for now it's not great. In the table below I linked to tickets for the package managers that are missing those capabilities.

As of January 2022, here's what I was able to find:

LanguagePackage managerLock transitive dep versionsLock versionsLock hashesLock local or source-only deps by hash
GoGo modules
Pythonpipenv?
JavaScriptnpm?
JavaScriptyarn?
C#nuget?
PHPcomposeryes?
Javagradleno
OCamlopamnono
Rubybundleroptionalnono
SwiftSPMno?nono
C/C++ha!nononono
C++conanunclearnono
PerlCpan ?nonono

Notes: There are some open-source tools that can help find all dependencies even if you don't have a lockfile, through dynamic or static analysis (for example, It-Depends from Trail of Bits). But it's much better for the package manager to support it natively.

And please be polite if you comment on the issue trackers! Remember–these are OSS, community projects.

I'm convinced! How do I get started using lockfiles?

  • Create the lockfile

  • Check the lockfile into source control: see Yarn's explanation for why you should definitely check in lockfiles

  • Make sure your installs are enforcing the lockfile at install: many base commands will update the lockfile from the manifest and then install, rather than relying only on the lockfile. I wrote a semgrep rule for pip, yarn, and npm that you can use as a simplistic grep to find scripts, dockerfiles, etc. where install commands may be run without utilization of lockfiles.

LanguagePackage managerCreate the lockfileCheck in the lockfileInstall using the lockfile command
GoGo modules(built in)git add go.sum(built in)
Pythonpipenvpipenv lockgit add Pipfile.lockpipenv install --ignore-pipfile
JavaScriptnpmnpm i --package-lock-onlygit add package-lock.jsonnpm ci
JavaScriptyarnyarn lockgit add yarn.lockyarn install --frozen-lockfile
C#nugetSet RestorePackagesWithLockFile property in project filegit add packages.lock.jsonnuget restore -LockMode \ \ dotnet.exe restore –locked-mode
PHPcomposercomposer updategit add composer.lockcomposer install
Java(gradleedit build.gradle, then: `gradle dependencies --write-locks \ \ gradle --write-verification-metadata sha256`find . -name 'gradle.lockfile | xargs git add \ \ git add verification-metadata.xml(built in)
Rubybundlerbundle install [no way to only make lockfile]git add Gemfile.lockbundle install
OCamlopamopam lockgit add opam.lockedopam install opam.locked

Thanks to Margaret Fero (Latacora), Alex Useche (Trail of Bits), Greg Guthe (Figma), Dev Akhawe (Figma), and r2c reviewers Pablo Estrada, Clint Gibler, Emma Jin, and Bence Nagy for reviewing drafts of this post!

References & notes

  • It's interesting to observe that Go is far ahead of the rest in terms of maturity of these features. A great talk from Eric Brewer explains some of the philosophy around managing 3rd-party dependency risk at Google.

  • Here is a much deeper dive than you probably wanted about dependencies and reproducibility from the perspective of designing a package manager.

About

Semgrep Logo

Semgrep lets security teams partner with developers and shift left organically, without introducing friction. Semgrep gives security teams confidence that they are only surfacing true, actionable issues to developers, and makes it easy for developers to fix these issues in their existing environments.

Find and fix the issues that matter before build time

Semgrep helps organizations shift left without the developer productivity tax.

Get started in minutesBook a demo