case studies

Continuous delivery for the TON blockchain

Current development practices suggest having a staging environment, deploying code on push, and A/B testing. Each of these practices increases productivity, but comes with an upfront setup cost.

This can lead to a dynamic where intensive product growth captures engineering resources with feature work, letting infrastructure stagnate and amounting to a net-negative in velocity midterm.

Read on to learn how Multicore supplemented TON Foundation’s blockchain expertise with devops capabilities to accelerate development, improve deploy confidence, and solve packaging woes.

Design goals

The TON blockchain moves fast, merging tens to hundreds PR per month; further, it is complicated, being essentially a peer-to-peer virtual machine. Collecting real-world feedback is essential to discover bugs in design and implementation.

Thus, our primary goal for this project was to set up nightly builds that both end users and automated test harnessess could easily consume. TON’s users span all major platforms and CPU architectures, so we ship packages for all of them.

The Foundation’s leadership was concerned with adding overhead to the dev process. Thus, it was important to minimize the maintenance CI/CD would require.

Next, our experience suggested it’s imperative for the CI environment to be reproducible locally. Building both CI and local development environments from a single source of truth reduces the maintenance footprint and makes it easy to reproduce issues. Further, new developers can use it to onboard quickly.

Finally, we aimed for reproducible builds: our release binaries bit-for-bit match those users could build themselves. This allows most users to use our pre-built binaries, relying on the few “validators” doing independent builds to check the GitHub source indeed corresponds to the release binary.


So, design goals:

  • Nightly builds for all platforms (accelerate real-world testing)
  • Minimal maintenance (reduce operational expenses)
  • Build environment reproducible locally (debugging, onboarding speed)
  • Reproducible builds (security)

Getting to work

The TON codebase is some 250KLOC of C++ code. We have to ship TON binaries across x86, ARM and M1 devices running Windows, Linux, and macOS.

Each platform has its own packaging tool and package host and format we’ll interface with later (that’s Chocolatey on Windows, DEB, RPM or AUR on most Linux distros, and Brew on macOS).

First, we’ll build the binaries — and upload them to GitHub Releases.

And to build the binaries, we have to build the build environment. Below, we’ll discuss this for Linux and macOS; for Windows, we piggybacked on prior work by the Foundation.

Build environment

We have two requirements for the environment:

  • To meet security goals, it must be reproducible from source
  • For build issues to be debuggable and to ease dev onboarding, it must work on both build hosts and dev machines

Our choices here are essentially Vagrant, Docker, and Nix.

TON’s developers don’t currently use VMs for development, so we don’t need Vagrant’s VM capability; then, Docker is the leaner choice if we’re using containers anyway.

Docker vs. Nix is a tricky choice: Nix’s reproducible builds capabilities fit our requirements better, but it is a less mature technology.

Upon consultation with TON Foundation, we went with Nix for this project: this choice allowed us to maintain a leaner bespoke code footprint.

Here’s the build env definition we’re starting with:

stdenv.mkDerivation {
  pname = "ton";
  version = "dev";

  src = ./.;

  nativeBuildInputs =
    [ cmake ninja pkg-config git ];
  buildInputs =
    [ openssl zlib libmicrohttpd ];
};

A developer calls it like this:

$ nix develop # spawns a subshell with deps
dev$ cmake -S . -B build -GNinja
dev$ cmake --build build

This manifest also doubles as a build instruction, which essentially runs cmake as above.

nix build # outputs binaries to ./result/

At this point, we have reproducible binaries and development environments for Linux and macOS.


Static build

Most users will only use TON executables. To ease deployment, and given we aim for rolling-deps nightly builds, we’d like to build those statically.

Fortunately, that’s as easy as nix-build -A pkgsStatic.ton (and a few CMake flags in the .nix file, and a few patches to TON to allow pure-dynamic and pure-static builds, and a special-case build env for aarch64-linux).

Of course, it couldn’t be that easy.

Old glibc build

Some users will use TON’s shared libraries to interface with the TON daemon: for instance, one might write Python scripts to query the node’s API.

So we do have to ship dynamically-linked artifacts. How?

Well, we’ll rebuild the world with the build environment from NixOS 19.09 (ships with glibc 2.27 = Ubuntu 18.04) and call it a day.

(We’ll also plug in Cachix to cache the rebuilt world.)

What we’re doing here is rebuilding our dependencies against old glibc, linking them statically, and building dynamically against glibc present in reasonably modern Linux distros: Ubuntu 18.04+ and CentOS 8+. We can do that since glibc is forward-compatible.

For a package, we’ll use binaries from the static build and shared libraries from the old glibc build.


Now, we can ship static binaries to users and widely-compatible dynamic libraries to developers.


Nightly CI builds

We are a YAML away from nightly builds on GitHub Actions:

name: "Nightly Linux packages"
on:
  schedule:
    - cron: '0 0 * * *'
jobs:
  build:
    runs-on: ubuntu-latest
    - uses: actions/checkout@v3
    - uses: cachix/install-nix-action@v18
    - uses: cachix/cachix-action@v11
    - run: nix build .?submodules=1#packages.x86_64-linux.ton-oldglibc_staticbinaries

We can upload releases with a further

tar --create -C result
gh release create nightly-linux-$(uname -m) result.tar.gz

ARM builds

At the time of writing, GitHub Actions cloud runners only support x86 builds. We work around this with QEMU + binfmt hook on Linux, and reuse the Foundation’s M1 hosts for macOS ARM builds.

sudo apt install -q -y qemu-system-aarch64 qemu-efi binfmt-support qemu-user-static # binfmt
nix build --system aarch64-linux

We now have binaries for {Linux,macOS}×{x86_64,aarch64}\{\text{Linux}, \text{macOS}\} \times \{\text{x86\_64}, \text{aarch64}\} and Windows@x86_64\text{Windows@x86\_64}.


Let’s get packaging

It is high time to wrap those binaries into packages, so instead of going

cd /opt
curl -L https://github.com/ton/ton/releases/download/nightly-linux-x86_64/ton-x86_64.tar.gz | tar zx

to install a nightly, users could just call

apt-get install ton

We target five package managers, or package formats:

  • DEB (apt)
  • RPM (yum, dnf)
  • AUR (makepkg helpers)
  • Chocolatey (Windows)
  • Brew (macOS)

And use a variety of distribution strategies:

Thus, our DEB, RPM, and Brew repositories are entirely self-hosted, while AUR and Choco packages are hosted at centralized repositories.

DEB and RPM bundle the binaries into a package we host, while AUR, Choco, and Brew package scripts just fetch the latest GitHub release.


Status: packages for all the things!


Rolling dependencies builds

A little futureproofing side feature:

Our builds to date pinned dependencies to a known-working version. However, we’d like to get some notice when things break due to updates upstream.

To catch this, we’ll add a CI job building with the latest nixpkgs on push or PR:

nix develop 
  --override-input nixpkgs https://channels.nixos.org/nixpkgs-unstable/nixexprs.tar.xz 
  --command ctest

Recap

This was a story how Multicore collaborated with the TON Foundation to deliver reproducible development environments and builds and packages for major platforms and architectures.

While the pipeline we end up with might look large and scary, reflecting the breadth of the task (recall the diagram this post started with), our development is quite lean, totaling about 400LOC:

YAML             220 # github actions
Nix              122 # reproducible binaries and envs
Bourne Shell     47  # linux + macos packages
XML + PowerShell 40  # windows packages

This is a testament to the power of abstractions we used. We are particularly happy with Nix: delivering a reproducible musl + old glibc build and developer environments that work across Linux and macOS in a 100LOC is a nice feat.

Our packaging infrastructure is hosted on plain GitHub with minor exceptions (Cachix for builds cache, AUR for Arch repos, Choco official host for Windows repos), minimizing the dependency footprint and addressing the Foundation’s maintenance concerns.

Finally, our builds are reproducible, increasing trust and addressing security concerns.

We are happy to have worked with the TON Foundation to improve TON’s user and developer experience.


Note: our code snippets in this post are abbreviated to highlight the essential detail. You can check the full code out by following the links.