Reproducible Builds With Gitian (1/2)

Justin Moon

Posted on December 27, 2020

In the last post we explored the SHA256SUMS.asc file that Bitcoin Core distributes alongside each release to help users verify their downloads. But we had a lingering question: Are we sure the hashes contained in that SHA256SUMS.asc correspond to the Bitcoin Core source code?

To facilitate this independent verification Bitcoin Core uses a tool called Gitian which allows anyone to build releases of Bitcoin Core and verify the same output. Before exploring how Bitcoin Core uses Gitian to achieve these "reproducible builds", I want to show you a quick example of a trivial C++ project that has reproducibility problems, and then create a reproducible build environment for it using Gitian.

Example Project

Here's a simple C++ program which prints "Hello, <time program was built>!" If you build this project at different times, different timestamps will be embedded in the executable, producing different executables. As we'll see later when we inspect Bitcoin Core's reproducible build system, much of the challenge involves preventing stuff like timestamps from sneaking into the build. Save it as hello.cpp:

#include <iostream>

int main()
{
    std::cout << "Hello, " << __TIME__ << "!" << std::endl;
    return 0;
}

Let's build and run it twice to demonstrate a non-reproducible build (you may need to install GCC):

$ g++ hello.cpp -o hello
$ ./hello
Hello, 21:20:39!
$ sha256sum hello
481b7cf190e4b3bcf20af857c48f3fb33da7a199d17fd6560f34608a96c1a425  hello

$ g++ hello.cpp -o hello
$ ./hello
Hello, 21:20:47!
$ sha256sum hello
6d978a670cf26a39634b0f284a3ad05266d1ad8a5ee8b83d9469f79985fe77eb  hello

Voila! Build this code at different times and you get different outputs.

For the purposes of this extremely convoluted example, let's pretend this timestamp doesn't have any meaning for our program, but that we can't just remove __TIME__ from the code. How, then, could we make the build reproducible? Well, GCC allows you to override the __TIME__ macro by setting a SOURCE_DATE_EPOCH environment variable. Let's set it to Unix time 0 (midnight 1/1/1970):

$ SOURCE_DATE_EPOCH=0 g++ hello.cpp -o hello
$ ./hello
Hello, 00:00:00!

Now you can build this as many times as you like and you'll get exactly the same output.

One problem here is that different toolchains have different ways of overriding __TIME__. GCC uses SOURCE_DATE_EPOCH, Clang has ZERO_AR_DATE and MSVC has a /Brepro linker flag. Things are already getting complicated in our stupid example. Now imagine how it looks for Bitcoin Core, a big project with a handful of dependencies like QT, Boost, BerkeleyDB, LevelDB and SQLite where this kind of stuff might sneak in.

And even if we figured that out, different compilers produce different outputs. When I compile hello.cpp without __TIME__ using GCC and Clang (clang++ simple_hello.cpp -o simple_hello), I get different files hashes. And the same compiler on a different operating system might give a different output. So it seems best to just use one compiler, one operating system and eliminate the reproducability leaks for it (learn more here). This is what Gitian does.

Gitian

Quoting from the Gitian website,

Gitian is a secure source-control oriented software distribution method. This means you can download trusted binaries that are verified by multiple builders.

Gitian uses a deterministic build process to allow multiple builders to create identical binaries. This allows multiple parties to sign the resulting binaries, guaranteeing that the binaries and tool chain were not tampered with and that the same source was used. It removes the build and distribution process as a single point of failure.

You'll notice that almost all the top committers for Gitian are Bitcoin Core devs. For a few years it was also used by Tor.

The first step with Gitian to run the bin/make-base-vm Ruby script which builds a Ubuntu VM / container (honestly I still don't understand the difference!) using either LXC, KVM, Vagrant or Docker. This VM is independent of the project you're building.

Once you build a VM, you run bin/gbuild to initiate a build inside the VM. <path> is a path to a YAML "gitian descriptor" which specifies what to build and how to build it. For example, here is Bitcoin Core's Gitian descriptor for Linux builds. This command should output final builds that are distributed to users.

Once the build completes, you can run bin/gsign script which creates a .assert file describing the hashes of all dependencies and outputs from the build as well as a .assert.sig containing your PGP signature of that .assert file. For example, here are Luke Jr's .assert and .assert.sig files for the 0.10 Linux release. Together these two files are a cryptographic proof that a certain output is produced when building a release of your project. He's claiming "The hash of bitcoin-0.10.0-linux64.tar.gz is 4be12ac4e1a2e1a27135009eb3dc003529f9e56c11df97e59c5b0415f79ed4ec".

Lastly, there is a bin/gverify command which can PGP-verify pairs of .assert and .assert.sig files. Using this tool you are able to verify that a group of independent developers all agree what the build should output.

In Bitcoin's case this process unfolds in the bitcoin-core/gitian.sigs Git reposotory. When it's time for a new release, developers upload .assert and .assert.sig files for it. Once a handful of developers assert the same hashes, SHA256SUMS.asc is created. In fact, you can skip SHA256SUMS.asc altogether when installing Bitcoin Core and directly verify the .assert and .assert.sig files in found in this repository. Luke Jr published detailed instructions.

Building Example Project Inside Gitian

Now that we've covered the concepts, let's build our example project inside Gitian. In order to get started make sure you have Docker, Ruby and Git installed.

For simplicity sake, let's clone gitian-builder alongside our hello.cpp file:

$ git clone https://github.com/devrandom/gitian-builder.git

make-base-vm

Now let's construct a build environment inside Docker:

# no docker images exist
$ docker image ls
REPOSITORY    TAG       IMAGE ID       CREATED         SIZE

$ gitian-builder/bin/make-base-vm --suite bionic --arch amd64 --docker
...

# docker image created
$ docker image ls
REPOSITORY          TAG       IMAGE ID       CREATED         SIZE
base-bionic-amd64   latest    45d3b1e23d01   9 seconds ago   1.01GB

gbuild

In order to run bin/gbuild we need a "gitian descriptor". Save the following as gitian-linux.yml:

---
name: "hello-linux"
enable_cache: true
distro: "ubuntu"
suites:
- "bionic"
architectures:
- "amd64"
packages:
- "gcc"
remotes:
- "url": "https://github.com/mooniversity/gitian-example.git"
  "dir": "gitian-example"
  "commit": "1875b8dcdeae3f755ae9d9bf779142c42e792589"
files: []
script: |
  cd gitian-example
  export SOURCE_DATE_EPOCH=0
  g++ hello.cpp -o hello
  mv hello $OUTDIR

Notes:

  • enable_cache will save some state to our filesystem to speed up builds.
  • architecture and suites say we only build on the amd64 architecture using Ubuntu 18 Bionic Beaver OS.
  • packages will be apt-get installed inside the container.
  • remotes will checkout and build this Git repo at this commit. Cloning a Git repo at a specific commit is how you know exactly what code we're building.
  • script contains commands to build the code. This is where we drop our SOURCE_DATE_EPOCH hack. mv hello $OUTDIR will fish the final build out from Docker and onto our filesystem.

Let's run it. From the gitian-builder directory we set a USE_DOCKER=1 environment variable to tell Gitian we want to use Docker and run bin/gbuild with a path to our descriptor file:

$ cd gitian-builder
$ USE_DOCKER=1 bin/gbuild ../gitian-linux.yml
$ ./build/out/hello
Hello, 00:00:00!

Notes:

  • Our output shows up at build/out/hello.
  • A build cache is saved to cache/ because of the enable_cache setting in the descriptor. It's actually empty for our tiny project!
  • Installation and build logs can be found in var/install.log and var/build.log. Helpful for debugging!

gsign

Now PGP sign them (this requires a PGP key, instructions to create one here):

$ bin/gsign --signer <your-github-username> --release 0.1-linux --destination ../sigs ../gitian-linux.yml
$ ls ../sigs/0.1-linux/<your-github-username>/
hello-linux-build.assert  hello-linux-build.assert.sig

Notes:

  • gsign requires a username (gitian.sigs uses GitHub handles), a release name, a destination to save .assert and .assert.sig files plus a path to our Gitian descriptor.
  • The assert files are saved in <destination>/<release>/<your-github-username>. This is the file structure of the gitian.sigs Bitcoin Core repo mentioned earlier. This is why you need to specify a username -- your assert files are saved to their own directory.

Here is my .assert file. out_manifest contains hashes for each output file -- just hello in our case. in_manifest contains a hash of our descriptor and commit hash for our Git repository. base_manifests contains hashes for all dependencies on the OS / architecture pair our example is built on: Ubuntu 18 Bionic Beaver and amd64.

gverify

Now use gverify to your signature.

$ bin/gverify -v --release 0.1-linux --destination ../sigs ../gitian-linux.yml
gpg: Signature made Sun 27 Dec 2020 07:46:20 PM CST
gpg:                using RSA key 2066733A74BD6A6C0489E2BE5E35A85919DD0D22
gpg:                issuer "mail@justinmoon.com"
gpg: Good signature from "Justin Moon <mail@justinmoon.com>" [ultimate]
justinmoon: OK

I admit this is a little anticlimactic. If you can't PGP verify your own signature you have big problems! But it will get more exciting in the next post when we run it in gitian.sigs to verify that an army of independent developers reproducibly build Bitcoin Core.

Mooniversity Newsletter

Receive emails about new articles and courses