Herbie 2.2 Release Notes

The Herbie developers are excited to announce Herbie 2.2! This release focuses extensible compilation targets using platforms.

What is Herbie? Herbie compiles mathematical expressions to fast and accurate floating point programs, avoiding the bugs, errors, and surprises of floating point arithmetic. Visit the main page to learn more.

Releasing the Platforms API

(define-operation (cosd [x ]) 
          #:spec (cos (* x (/ (PI) 180)))
          #:impl (from-rival)
          #:cost 12.5)

Herbie 2.2's platforms allow you to define custom compilation targets, including novel hardware (CPUs, GPUs, FPGAs, TPUs), programming languages (Julia, Matlab, Fortran), or software libraries (Numpy, Eigen, cuBLAS). Platforms can define new number formats, new floating-point operations on those formats, and new cost models for those operations. Herbie will automatically optimize its results for your platform.

Last year, Herbie 2.1 included the beginnings of platforms, Herbie's API for multiple backends. We've simplified and improved this API, and are now ready to release it.

In short, platforms allow you define the functions Herbie is allowed to use when compiling floating-point programs. A platform can expose Python's fsum, Julia's cosd, or AVX's rcpps to Herbie, which can then use those functions to produce even faster and more accurate output. Our recent publications demonstrate dramatic accuracy and performance improvements using platforms.

Platforms are a large change, and we invite users to try it out. The platforms documentation explains how to select different built-in platforms, or even how to write your own.

Besides the new feature, platforms allow us to deprecate a variety of now-superseded features:

The default platform now uses a realistic, auto-tuned cost model which should lead to faster code in practice.
The Herbie 2.0 and 2.1 cost models are replaced by the herbie20 platform.
The --no-pareto flag is replaced by the herbie10 platform.
Plugins are now replaced by ordinary Racket libraries imported directly by plugins.
Preprocessing is now replaced by standard platform-provided functions such as fmin, fmax, fabs, and copysign.

Restructuring the Main Loop

The new Herbie main loop.
The simplify step and patch table have been removed,
while the compute phase has been added. — The Herbie main loop is dramatically simplified, with no separate simplify or localize phase and a new compute phase that aid with hard-to-compute constants.

We've overhauled the Herbie main loop, which is the high-level algorithm that invokes different Herbie sub-systems to generate and filter different candidate programs. This overhaul both introduces new systems, which should make Herbie more capable, and reduce duplicative work, dramatically speeding up Herbie. Specifically:

A new compute phase was added, which replaces variable-free subexpressions with constants. The constants are computed using high-precision arithmetic and are guaranteed to be exact.
The simplification phase was removed. This phase mostly duplicated work already done by the rewrite phase, so removing it had very little impact on performance and accuracy of generated code, but reduced Herbie runtime significantly.
The localize phase was also removed. This phase existed to reduce duplicate work in other phases; batches deduplicate the work automatically so localize is no longer needed. Removing this phase also significantly sped up Herbie.
Minor phases like initial and final simplify were removed as well. Most importantly, this allowed replacing the existing, complex "accelerator" API (never fully released) and replace it with the much simpler "platform" API.
Herbie more correctly tracks Taylor approximations inside Herbie, and will now avoid making "approximations to approximations" leading to runaway error in rare cases.

Flattening Expressions to Batches

%0 = x
%1 = (literal 1)
%2 = (+ %0 %1)
%3 = (* %2 %2)
%4 = (- %3 %1)
%5 = (neg %2)
%6 = (* %5 %5)
%7 = (- %6 %1))

An example of a batch, containing two expressions (and their subexpressions): (x + 1)² - 1 and (-(x + 1))² - 1. Shared subexpressions are only represented once in the batch, and are only processed once by Herbie.

We have also changed Herbie's main data structure for representing programs. Until now, Herbie represented programs via their abstract syntax tree. While simple and effective, this representation lead to processing identical subexpressions repeatedly. This was especially problematic for larger programs, which can sometimes have the same subexpression appear exponentially many times.

The batch data structure instead uses a flat array with back-references to represent programs, which means that duplicate subexpressions appear and are processed just once. This lead to significant speedups throughout Herbie:

Batches automatically deduplicates identical candidate programs, which occur when multiple Herbie phases produce the same output. There were a lot more of these than we thought; in some cases as much as 25× deduplication occurs.
The series phase, now uses batches internally, allowing Herbie to consider many more series expansions.
Herbie's FFI layer for interacting with the egg library now leverages batches to reduce time spent copying data in and out of egg.
In some cases, one batch is shared across multiple phases (like evaluation and pruning), reducing duplication even further.
The derivations phase was dramatically sped up by caching proof objects across multiple candidate programs. Additionally, a debugging pass called "soundiness" was removed entirely, since the problems it was built to debug no longer occur. Thanks to both changes, derivations take almost no time at all any more.

Sister Projects

The Odyssey numerics workbench is releasing version 1.2 today, featuring a new "local error" component and various design tweaks and improvements. Odyssey can be tried from its online demo or installed locally from the VS Code Marketplace.

The Rival real-arithmetic package is releasing version 2.2 today, including various tweaks, optimizations, and improvements. Most significantly, a new "hints" feature significantly speeds up sampling expressions that use fmin, fmax, and if.

Other improvements

Herbie now offers a basic HTTP API, including both synchronous and asynchronous jobs and access to various internal Herbie features. This HTTP API is a work in progress and is primarily built to support Odyssey. However, our work on the API has also enabled the standard Herbie web interface to support threads, which should make Herbie faster in a lot of common use cases.
Small quality-of-life changes have been made to the Herbie reports, including hiding duplicate alternatives, updating various dependencies, and generating the reports more quickly.
Optional, off-by-default support for the egglog rewriting engine has been added. We are excited about growth and competition among rewrite engines and hope to continue driving forward this research area.
Fixing a variety of small bugs, such as nondeterminism due to an incorrect type signature for fma and new rewrite rules to unlock more accurate results.
Many general-purpose cleanups, including removing a lot of now-unnecessary nightly infrastructure, adoption of a new formatting guideline, reorganization of the source code, and new debug infrastructure.

Try it out!

We want Herbie to be more useful to scientists, engineers, and programmers around the world. We've got a lot of features we're excited to work on in the coming months. Please report bugs or contribute.

If you find Herbie useful, let us know!