tl;dr: Lockfiles often protect you from malicious new versions of dependencies. When something bad happens, they empower you to know exactly which systems were affected and when, which is critical during incident response. This posts discusses "why lockfiles" and the details of setting them up properly across ~9 different package managers.
It's wonderful to write a few lines of code and then shortcut the next million lines by depending on code written by thousands of other developers.
But there is a cost: trusting thousands of other developers. Sometimes this goes wrong. The security implications of this trust are generally known as "supply chain security." And since the median seniority of a developer is dropping as the number of new developers grows, future developers will be reusing more code written by increasingly junior developers.
A prerequisite to having a handle on supply chain problems are Lockfiles, which reduce the surface area of dependency code by specifying exact dependency versions and content.
A quick outline of this post:
What is a lockfile?
Why are lockfiles critical for supply chain security?
What are the arguments against lockfiles?
What languages/package managers support lockfiles?
I'm convinced! How do I get started using lockfiles?
What is a lockfile?
Dependency manifest
Before explaining lockfiles, let's look at the dependency manifest. Most package managers have a manifest file that specifies dependencies–usually a tuple of (package, version or range). Often it allows specifying a range of acceptable versions, typically using a Semver expression. You've probably seen this file (package.json, requirements.txt, pom.xml) before, but here's an example snippet from a Python Pipfile manifest:
1 [packages]
2 click = "~=8.0.1"
It includes a package (click) and a version range that is considered acceptable (any patch on version 8.0 above 8.0.1).
Lockfile
The lockfile is a "compiled" version of a dependency manifest. It specifies the exact version of every dependency installed. A good lockfile format recursively specifies all dependencies of dependencies. Some lockfiles also specify the set of allowed SHA hashes for the dependency binary or source (see later in the post for which lockfiles support this extra level of specificity).
For example, in Python, the corresponding lockfile entry in Pipfile.lock might look like:
1 "click": {
2 "hashes": ["sha256:353f466495adaeb40b6b5f592f9f91cb22372351c84caeb068132442a4518ef3", "sha256:410e932b050f5eed773c4cda94de75971c89cdb3155a72a0831139a79e5ecb5b"],
3 "index": "pypi",
4 "version": "==8.0.3"
5 },
Why are lockfiles critical for supply chain security?
The most fundamental question for a supply chain is: what's in it? If you can't answer the question of what code you depend on, you can’t reason about risk inside it.
Without a lockfile, you don't know:
Which versions of a dependency were actually installed
Where they were installed
At what time a dependency version or content changed
Will knowing these prevent you from getting hacked via a dependency? The preventative angle is limited, but there is one small benefit: just trust on first use (TOFU). You are protected from a silent update of the package at the same version.
The bigger benefit is responding to an event in the supply chain. With a content-hashed lockfile, you can reason about whether you were impacted by a vulnerability/malicious package as opposed to having to guess based on a version or range specifier, because the build environment is deterministic and reproducible.
Imagine you're trying to tell whether you were affected by a vulnerability in lodash 4.17.2. You open package.json which says lodash is version "*". You have no way of knowing whether you were ever exposed, when you were exposed, or on which systems.
A lockfile guarantees which version of the dependency you install. But not all package managers are immutable: what if a developer changed the package that was listed as "1.0"? Well-designed lockfiles lock specify not just the exact version, but also the precise content of the package, by specifying an acceptable hash for the package file. So you are guaranteed to get the same code regardless of which machine you use or when you install it.
Of course much more goes into supply chain security: knowing whether the package has vulnerabilities, knowing who the authors are, etc. But the lockfile provides a powerful determinism guarantee for free!
* Note that for this to work, you have to check your lockfile into source control and actually enforce it at install time; see later in this post for details.
What are the arguments against lockfiles?
If you use lockfiles, you'll be stuck on old versions!
The lockfile does create friction; you are no longer running with possibly wildly different versions of your dependency across every developer's machine and CI job.
But modern package managers provide a command that lets you update the lockfile in a single step (see below for the specific commands)–I think this complaint is a failure of having a thoughtful upgrade process rather than a fault of the determinism one gets from lockfiles.
You always just want to run the latest version anyways–that's the most secure, so you don't need a lockfile
If you really do want to always run the latest version: a lockfile is still a good idea. Imagine you pin to a wildcard somepackage=* which always picks the latest version at install time. You'll still need a separate process to re-run package installation every time the package updates, otherwise your local machine might have the latest version available when you installed a month ago, but a new build server will use the latest available when it fetches dependencies. If you will need an update step anyways, just use a lockfile as part of that update process and enjoy the benefits.
Lockfiles don't have any benefit if you aren't reading the original source code or trusting the developer
Dependency locking is orthogonal to dependency verification, and in fact they are complementary. Check out the OWASP guidelines (for NPM, they apply to most package managers as well).
What languages/package managers support lockfiles?
Unfortunately not every package manager supports lockfiles. And of those that do, some of them are missing crucial features like content-hash locking. Probably in 10 years this will become part of some boring compliance standard, but for now it's not great. In the table below I linked to tickets for the package managers that are missing those capabilities.
As of January 2022, here's what I was able to find:
Language | Package manager | Lock transitive dep versions | Lock versions | Lock hashes | Lock local or source-only deps by hash |
---|---|---|---|---|---|
Go | Go modules | ✅ | ✅ | ✅ | ✅ |
Python | pipenv | ✅ | ✅ | ✅ | ? |
JavaScript | npm | ✅ | ✅ | ✅ | ? |
JavaScript | yarn | ✅ | ✅ | ✅ | ? |
C# | nuget | ✅ | ✅ | ✅ | ? |
PHP | composer | ✅ | ✅ | ✅ | yes? |
Java | gradle | ✅ | ✅ | ✅ | no |
OCaml | opam | ✅ | ✅ | no | no |
Ruby | bundler | ✅ | optional | no | no |
Swift | SPM | no | ? | no | no |
C/C++ | ha! | no | no | no | no |
C++ | conan | unclear | ✅ | no | no |
Perl | Cpan ? | no | ✅ | no | no |
Notes: There are some open-source tools that can help find all dependencies even if you don't have a lockfile, through dynamic or static analysis (for example, It-Depends from Trail of Bits). But it's much better for the package manager to support it natively.
And please be polite if you comment on the issue trackers! Remember–these are OSS, community projects.
I'm convinced! How do I get started using lockfiles?
Create the lockfile
Check the lockfile into source control: see Yarn's explanation for why you should definitely check in lockfiles
Make sure your installs are enforcing the lockfile at install: many base commands will update the lockfile from the manifest and then install, rather than relying only on the lockfile. I wrote a semgrep rule for pip, yarn, and npm that you can use as a simplistic grep to find scripts, dockerfiles, etc. where install commands may be run without utilization of lockfiles.
Language | Package manager | Create the lockfile | Check in the lockfile | Install using the lockfile command |
---|---|---|---|---|
Go | Go modules | (built in) | git add go.sum | (built in) |
Python | pipenv | pipenv lock | git add Pipfile.lock | pipenv install --ignore-pipfile |
JavaScript | npm | npm i --package-lock-only | git add package-lock.json | npm ci |
JavaScript | yarn | yarn lock | git add yarn.lock | yarn install --frozen-lockfile |
C# | nuget | Set RestorePackagesWithLockFile property in project file | git add packages.lock.json | nuget restore -LockMode \ \ dotnet.exe restore –locked-mode |
PHP | composer | composer update | git add composer.lock | composer install |
Java( | gradle | edit build.gradle, then: | find . -name 'gradle.lockfile | xargs git add \ \ git add verification-metadata.xml | (built in) |
Ruby | bundler | bundle install [no way to only make lockfile] | git add Gemfile.lock | bundle install |
OCaml | opam | opam lock | git add opam.locked | opam install opam.locked |
Thanks to Margaret Fero (Latacora), Alex Useche (Trail of Bits), Greg Guthe (Figma), Dev Akhawe (Figma), and r2c reviewers Pablo Estrada, Clint Gibler, Emma Jin, and Bence Nagy for reviewing drafts of this post!
References & notes
It's interesting to observe that Go is far ahead of the rest in terms of maturity of these features. A great talk from Eric Brewer explains some of the philosophy around managing 3rd-party dependency risk at Google.
Here is a much deeper dive than you probably wanted about dependencies and reproducibility from the perspective of designing a package manager.