Reproducible Builds With Gitian (1/2)
Justin Moon
・Posted on December 27, 2020
In the last post we explored the SHA256SUMS.asc
file that Bitcoin Core distributes alongside each release to help users verify their downloads. But we had a lingering question: Are we sure the hashes contained in that SHA256SUMS.asc
correspond to the Bitcoin Core source code?
To facilitate this independent verification Bitcoin Core uses a tool called Gitian which allows anyone to build releases of Bitcoin Core and verify the same output. Before exploring how Bitcoin Core uses Gitian to achieve these "reproducible builds", I want to show you a quick example of a trivial C++ project that has reproducibility problems, and then create a reproducible build environment for it using Gitian.
Example Project
Here's a simple C++ program which prints "Hello, <time program was built>!" If you build this project at different times, different timestamps will be embedded in the executable, producing different executables. As we'll see later when we inspect Bitcoin Core's reproducible build system, much of the challenge involves preventing stuff like timestamps from sneaking into the build. Save it as hello.cpp
:
#include <iostream>
int main()
{
std::cout << "Hello, " << __TIME__ << "!" << std::endl;
return 0;
}
Let's build and run it twice to demonstrate a non-reproducible build (you may need to install GCC):
$ g++ hello.cpp -o hello
$ ./hello
Hello, 21:20:39!
$ sha256sum hello
481b7cf190e4b3bcf20af857c48f3fb33da7a199d17fd6560f34608a96c1a425 hello
$ g++ hello.cpp -o hello
$ ./hello
Hello, 21:20:47!
$ sha256sum hello
6d978a670cf26a39634b0f284a3ad05266d1ad8a5ee8b83d9469f79985fe77eb hello
Voila! Build this code at different times and you get different outputs.
For the purposes of this extremely convoluted example, let's pretend this timestamp doesn't have any meaning for our program, but that we can't just remove __TIME__
from the code. How, then, could we make the build reproducible? Well, GCC allows you to override the __TIME__
macro by setting a SOURCE_DATE_EPOCH
environment variable. Let's set it to Unix time 0 (midnight 1/1/1970):
$ SOURCE_DATE_EPOCH=0 g++ hello.cpp -o hello
$ ./hello
Hello, 00:00:00!
Now you can build this as many times as you like and you'll get exactly the same output.
One problem here is that different toolchains have different ways of overriding __TIME__
. GCC uses SOURCE_DATE_EPOCH
, Clang has ZERO_AR_DATE
and MSVC has a /Brepro
linker flag. Things are already getting complicated in our stupid example. Now imagine how it looks for Bitcoin Core, a big project with a handful of dependencies like QT, Boost, BerkeleyDB, LevelDB and SQLite where this kind of stuff might sneak in.
And even if we figured that out, different compilers produce different outputs. When I compile hello.cpp without __TIME__ using GCC and Clang (clang++ simple_hello.cpp -o simple_hello
), I get different files hashes. And the same compiler on a different operating system might give a different output. So it seems best to just use one compiler, one operating system and eliminate the reproducability leaks for it (learn more here). This is what Gitian does.
Gitian
Quoting from the Gitian website,
Gitian is a secure source-control oriented software distribution method. This means you can download trusted binaries that are verified by multiple builders.
Gitian uses a deterministic build process to allow multiple builders to create identical binaries. This allows multiple parties to sign the resulting binaries, guaranteeing that the binaries and tool chain were not tampered with and that the same source was used. It removes the build and distribution process as a single point of failure.
You'll notice that almost all the top committers for Gitian are Bitcoin Core devs. For a few years it was also used by Tor.
The first step with Gitian to run the bin/make-base-vm Ruby script which builds a Ubuntu VM / container (honestly I still don't understand the difference!) using either LXC, KVM, Vagrant or Docker. This VM is independent of the project you're building.
Once you build a VM, you run bin/gbuild <path>
is a path to a YAML "gitian descriptor" which specifies what to build and how to build it. For example, here is Bitcoin Core's Gitian descriptor for Linux builds. This command should output final builds that are distributed to users.
Once the build completes, you can run bin/gsign script which creates a .assert
file describing the hashes of all dependencies and outputs from the build as well as a .assert.sig
containing your PGP signature of that .assert
file. For example, here are Luke Jr's .assert and .assert.sig files for the 0.10 Linux release. Together these two files are a cryptographic proof that a certain output is produced when building a release of your project. He's claiming "The hash of bitcoin-0.10.0-linux64.tar.gz
is 4be12ac4e1a2e1a27135009eb3dc003529f9e56c11df97e59c5b0415f79ed4ec
".
Lastly, there is a bin/gverify command which can PGP-verify pairs of .assert
and .assert.sig
files. Using this tool you are able to verify that a group of independent developers all agree what the build should output.
In Bitcoin's case this process unfolds in the bitcoin-core/gitian.sigs Git reposotory. When it's time for a new release, developers upload .assert
and .assert.sig
files for it. Once a handful of developers assert the same hashes, SHA256SUMS.asc
is created. In fact, you can skip SHA256SUMS.asc
altogether when installing Bitcoin Core and directly verify the .assert
and .assert.sig
files in found in this repository. Luke Jr published detailed instructions.
Building Example Project Inside Gitian
Now that we've covered the concepts, let's build our example project inside Gitian. In order to get started make sure you have Docker, Ruby and Git installed.
For simplicity sake, let's clone gitian-builder
alongside our hello.cpp
file:
$ git clone https://github.com/devrandom/gitian-builder.git
make-base-vm
Now let's construct a build environment inside Docker:
# no docker images exist
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
$ gitian-builder/bin/make-base-vm --suite bionic --arch amd64 --docker
...
# docker image created
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
base-bionic-amd64 latest 45d3b1e23d01 9 seconds ago 1.01GB
gbuild
In order to run bin/gbuild
we need a "gitian descriptor". Save the following as gitian-linux.yml
:
---
name: "hello-linux"
enable_cache: true
distro: "ubuntu"
suites:
- "bionic"
architectures:
- "amd64"
packages:
- "gcc"
remotes:
- "url": "https://github.com/mooniversity/gitian-example.git"
"dir": "gitian-example"
"commit": "1875b8dcdeae3f755ae9d9bf779142c42e792589"
files: []
script: |
cd gitian-example
export SOURCE_DATE_EPOCH=0
g++ hello.cpp -o hello
mv hello $OUTDIR
Notes:
enable_cache
will save some state to our filesystem to speed up builds.architecture
andsuites
say we only build on theamd64
architecture using Ubuntu 18 Bionic Beaver OS.packages
will beapt-get install
ed inside the container.remotes
will checkout and build this Git repo at this commit. Cloning a Git repo at a specific commit is how you know exactly what code we're building.script
contains commands to build the code. This is where we drop ourSOURCE_DATE_EPOCH
hack.mv hello $OUTDIR
will fish the final build out from Docker and onto our filesystem.
Let's run it. From the gitian-builder
directory we set a USE_DOCKER=1
environment variable to tell Gitian we want to use Docker and run bin/gbuild
with a path to our descriptor file:
$ cd gitian-builder
$ USE_DOCKER=1 bin/gbuild ../gitian-linux.yml
$ ./build/out/hello
Hello, 00:00:00!
Notes:
- Our output shows up at
build/out/hello
. - A build cache is saved to
cache/
because of theenable_cache
setting in the descriptor. It's actually empty for our tiny project! - Installation and build logs can be found in
var/install.log
andvar/build.log
. Helpful for debugging!
gsign
Now PGP sign them (this requires a PGP key, instructions to create one here):
$ bin/gsign --signer <your-github-username> --release 0.1-linux --destination ../sigs ../gitian-linux.yml
$ ls ../sigs/0.1-linux/<your-github-username>/
hello-linux-build.assert hello-linux-build.assert.sig
Notes:
gsign
requires a username (gitian.sigs
uses GitHub handles), a release name, a destination to save.assert
and.assert.sig
files plus a path to our Gitian descriptor.- The assert files are saved in
<destination>/<release>/<your-github-username>
. This is the file structure of the gitian.sigs Bitcoin Core repo mentioned earlier. This is why you need to specify a username -- your assert files are saved to their own directory.
Here is my .assert file. out_manifest
contains hashes for each output file -- just hello
in our case. in_manifest
contains a hash of our descriptor and commit hash for our Git repository. base_manifests
contains hashes for all dependencies on the OS / architecture pair our example is built on: Ubuntu 18 Bionic Beaver and amd64
.
gverify
Now use gverify
to your signature.
$ bin/gverify -v --release 0.1-linux --destination ../sigs ../gitian-linux.yml
gpg: Signature made Sun 27 Dec 2020 07:46:20 PM CST
gpg: using RSA key 2066733A74BD6A6C0489E2BE5E35A85919DD0D22
gpg: issuer "mail@justinmoon.com"
gpg: Good signature from "Justin Moon <mail@justinmoon.com>" [ultimate]
justinmoon: OK
I admit this is a little anticlimactic. If you can't PGP verify your own signature you have big problems! But it will get more exciting in the next post when we run it in gitian.sigs
to verify that an army of independent developers reproducibly build Bitcoin Core.
Mooniversity Newsletter
Receive emails about new articles and courses