This page describes papers related to the Herbie project, either because they describe Herbie and its internals or because they were inspired by our work on Herbie. In total, this page lists over a dozen publications, including papers in ★ major venues and which received 🏆 awards.
Scientific and engineering applications depend on floating point arithmetic to approximate real arithmetic. This approximation introduces rounding error, which can accumulate to produce unacceptable results. While the numerical methods literature provides techniques to mitigate rounding error, applying these techniques requires manually rearranging expressions and understanding the finer details of floating point arithmetic.
We introduce Herbie, a tool which automatically discovers the rewrites experts perform to improve accuracy. Herbie's heuristic search estimates and localizes rounding error using sampled points (rather than static error analysis), applies a database of rules to generate improvements, takes series expansions, and combines improvements for different input regions. We evaluated Herbie on examples from a classic numerical methods textbook, and found that Herbie was able to improve accuracy on each example, some by up to 60 bits, while imposing a median performance overhead of 40%. Colleagues in machine learning have used Herbie to significantly improve the results of a clustering algorithm, and a mathematical library has accepted two patches generated using Herbie.
The paper, video abstract, conference talk, and conference talk slides are available.
We introduce FPBench, a standard benchmark format for validation and optimization of numerical accuracy in floating-point computations. FPBench is a first step toward addressing an increasing need in our community for comparisons and combinations of tools from different application domains. To this end, FPBench provides a basic floating-point benchmark format and accuracy measures for comparing different tools. The FPBench format and measures allow comparing and composing different floating-point tools. We describe the FPBench format and measures and show that FPBench expresses benchmarks from recent papers in the literature, by building an initial benchmark suite drawn from these papers. We intend for FPBench to grow into a standard benchmark suite for the members of the floating-point tools research community.
The paper, conference talk, and conference talk slides are available. The standards presented in the paper have grown to become the FPBench project.
Floating-point arithmetic plays a central role in science, engineering, and finance by enabling developers to approximate real arithmetic. To address numerical issues in large floating-point applications, developers must identify root causes, which is difficult because floating-point errors are generally non-local, non-compositional, and non-uniform.
This paper presents Herbgrind, a tool to help developers identify and address root causes in numerical code written in low-level languages like C/C++ and Fortran. Herbgrind dynamically tracks dependencies between operations and program outputs to avoid false positives and abstracts erroneous computations to simplified program fragments whose improvement can reduce output error. We perform several case studies applying Herbgrind to large, expert-crafted numerical programs and show that it scales to applications spanning hundreds of thousands of lines, correctly handling the low-level details of modern floating point hardware and mathematical libraries and tracking error across function boundaries and through the heap.
The paper, conference talk, and conference talk slides are available. Herbgrind is available on its website.
Recent renewed interest in optimizing and analyzing floating-point programs has lead to a diverse array of new tools for numerical programs. These tools are often complementary, each focusing on a distinct aspect of numerical programming. Building reliable floating point applications typically requires addressing several of these aspects, which makes easy composition essential. This paper describes the composition of two recent floating-point tools: Herbie, which performs accuracy optimization, and Daisy, which performs accuracy verification. We find that the combination provides numerous benefits to users, such as being able to use Daisy to check whether Herbie's unsound optimizations improved the worst-case roundoff error, as well as benefits to tool authors, including uncovering a number of bugs in both tools. The combination also allowed us to compare the different program rewriting techniques implemented by these tools for the first time. The paper lays out a road map for combining other floating-point tools and for surmounting common challenges.
The paper is available.
Recent research has provided new, domain-specific number systems that accelerate modern workloads. Using these number systems effectively requires analyzing subtle multi-format, multi-precision (MPMF) code. Ideally, recent programming tools that automate numerical analysis tasks could help make MPMF programs both accurate and fast. However, three key challenges must be addressed: existing automated tools are difficult to compose due to subtle incompatibilities; there is no "gold standard" for correct MPMF execution; and no methodology exists for generalizing existing, IEEE-754-specialized tools to support MPMF. In this paper we report on recent work towards mitigating these related challenges. First, we extend the FPBench standard to support multi-precision, multi-format (MPMF) applications. Second, we present Titanic, a tool which provides reference results for arbitrary MPMF computations. Third, we describe our experience adapting an existing numerical tool to support MPMF programs.
The paper is available.
The last few years have seen an explosion of work on tools that address numerical error in scientific, mathematical, and engineering software. The resulting tools can provide essential guidance to expert non-experts: scientists, mathematicians, and engineers for whom mathematical computation is essential but who may have little formal training in numerical methods. It is now time for these tools to move into practice.
Practitioners need a "numerical workbench" that not only succeeds as a research artifact but as a daily tool. We describe our experience adapting Herbie, a tool for numerical error repair, from a research prototype to a reliable workhorse for daily use. In particular, we focus on how we worked to increase user trust and use internal measurement to polish the tool. Looking more broadly, we show that community development and an investment in the generality of our tools, such as through the FPBench project, will better support users and strengthen our research community.
The conference talk and conference talk slides are available.
egg:
    Fast and extensible equality saturation was published at
    ★ PLDI 2021,
    where it won the 🏆 Distinguished Paper Award.
    The paper describes a new library for equality saturation used by Herbie.
  An e-graph efficiently represents a congruence relation over many expressions. Although they were originally developed in the late 1970s for use in automated theorem provers, a more recent technique known as equality saturation repurposes e-graphs to implement state-of-the-art, rewrite-driven compiler optimizations and program synthesizers. However, e-graphs remain unspecialized for this newer use case. Equality saturation workloads exhibit distinct characteristics and often require ad-hoc e-graph extensions to incorporate transformations beyond purely syntactic rewrites.
This work contributes two techniques that make e-graphs fast and extensible, specializing them to equality saturation. A new amortized invariant restoration technique called rebuilding takes advantage of equality saturation's distinct workload, providing asymptotic speedups over current techniques in practice. A general mechanism called e-class analyses integrates domain-specific analyses into the e-graph, reducing the need for ad hoc manipulation.
We implemented these techniques in a new open-source library called egg. Our case studies on three previously published applications of equality saturation highlight how egg's performance and flexibility enable state-of-the-art results across diverse domains.
The paper, video abstract, and conference talk are available.
Precision tuning and rewriting can improve both the accuracy and speed of floating point expressions, yet these techniques are typically applied separately. This paper explores how finer-grained interleaving of precision tuning and rewriting can help automatically generate a richer set of Pareto-optimal accuracy versus speed trade-offs.
We introduce Pherbie (Pareto Herbie), a tool providing both precision tuning and rewriting, and evaluate interleaving these two strategies at different granularities. Our results demonstrate that finer-grained interleavings improve both the Pareto curve of candidate implementations and overall optimization time. On a popular set of tests from the FPBench suite, Pherbie finds both implementations that are significantly more accurate for a given cost and significantly faster for a given accuracy bound compared to baselines using precision tuning and rewriting alone or in sequence.
The paper, conference talk, and conference talk slides are available.
Many compilers, synthesizers, and theorem provers rely on rewrite rules to simplify expressions or prove equivalences. Developing rewrite rules can be difficult: rules may be subtly incorrect, profitable rules are easy to miss, and rulesets must be rechecked or extended whenever semantics are tweaked. Large rulesets can also be challenging to apply: redundant rules slow down rule-based search and frustrate debugging.
This paper explores how equality saturation, a promising technique that uses e-graphs to apply rewrite rules, can also be used to infer rewrite rules. E-graphs can compactly represent the exponentially large sets of enumerated terms and potential rewrite rules. We show that equality saturation efficiently shrinks both sets, leading to faster synthesis of smaller, more general rulesets.
We prototyped these strategies in a tool dubbed Ruler. Compared to a similar tool built on CVC4, Ruler synthesizes 5.8× smaller rulesets 25× faster without compromising on proving power. In an end-to-end case study, we show Ruler-synthesized rules which perform as well as those crafted by domain experts, and addressed a longstanding issue in a popular open source tool.
The paper and conference talk are available.
New heterogeneous computing platforms—especially GPUs and other accelerators—are being introduced at a brisk pace, motivated by the goals of exploiting parallelism and reducing data movement. Unfortunately, their sheer variety as well as the optimization options supported by them have been observed to alter the computed numerical results to the extent that reproducible results are no longer possible to obtain without extra effort. Our main contribution in this paper is to document the scope and magnitude of this problem which we classify under the heading of numerics. We propose a taxonomy to classify specific problems to be addressed by the community, a few immediately actionable topics as the next steps, and also forums within which to continue discussions.
The paper is available.
Identities compactly describe properties of a mathematical expression and can be leveraged into faster and more accurate function implementations. However, identities must currently be discovered manually, which requires a lot of expertise. We propose a two-phase synthesis and deduplication pipeline that discovers these identities automatically. In the synthesis step, a set of rewrite rules is composed, using an e-graph, to discover candidate identities. However, most of these candidates are duplicates, which a secondary de-duplication step discards using integer linear programming and another e-graph. Applied to a set of 61 benchmarks, the synthesis phase generates 7 215 candidate identities which the de-duplication phase then reduces down to 125 core identities.
The paper is available.
sin can be combined to achieve better
    combinations of accuracy and precision.
  Standard implementations of functions like
sinandexpoptimize for accuracy, not speed, because they are intended for general-purpose use. But just like many applications tolerate inaccuracy from cancellation, rounding error, and singularities, many application could also tolerate less-accurate function implementations. This raises an intriguing possibility: speeding up numerical code by using different function implementations.This paper thus introduces OpTuner, an automated tool for selecting the best implementation for each mathematical function call site. OpTuner uses error Taylor series and integer linear programming to compute optimal assignments of 297 function implementations to call sites and presents the user with a speed-accuracy Pareto curve. In a case study on the POV-Ray ray tracer, OpTuner speeds up a critical computation by 2.48×, leading to a whole program speedup of 1.09× with no change in the program output; human efforts result in slower code and lower-quality output. On a broader study of 36 standard benchmarks, OpTuner demonstrates speedups of 2.05× for negligible decreases in accuracy and of up to 5.37× for error-tolerant applications.
The conference talk is available.
egg computes auditable proofs of
    the rewrites it performs, which are used in Herbie as a debugging
    tool.
  Satisfiability Modulo Theory (SMT) solvers and equality saturation engines must generate proof certificates from e-graph-based congruence closure procedures to enable verification and conflict clause generation. Smaller proof certificates speed up these activities. Though the problem of generating proofs of minimal size is known to be NP-complete, existing proof minimization algorithms for congruence closure generate unnecessarily large proofs and introduce asymptotic overhead over the core congruence closure procedure. In this paper, we introduce an
O(n5)time algorithm which generates optimal proofs under a new relaxed “proof tree size” metric that directly bounds proof size. We then relax this approach further to a practicalO(n log(n))greedy algorithm which generates small proofs with no asymptotic overhead. We implemented our techniques in the egg equality saturation toolkit, yielding the first certifying equality saturation engine. We show that our greedy approach in egg quickly generates substantially smaller proofs than the state-of-the-art Z3 SMT solver on a corpus of 3 760 benchmarks.
The paper and conference talk are available.
In recent years, researchers have proposed a number of automated tools to identify and improve floating-point rounding error in mathematical expressions. However, users struggle to effectively apply these tools. In this paper, we work with novices, experts, and tool developers to investigate user needs during the expression rewriting process. We find that users follow an iterative design process. They want to compare expressions on multiple input ranges, integrate and guide various rewriting tools, and understand where errors come from. We organize this investigation's results into a three-stage workflow and implement that workflow in a new, extensible workbench dubbed Odyssey. Odyssey enables users to: (1) diagnose problems in an expression, (2) generate solutions automatically or by hand, and (3) tune their results. Odyssey tracks a working set of expressions and turns a state-of-the-art automated tool "inside out," giving the user access to internal heuristics, algorithms, and functionality. In a user study, Odyssey enabled five expert numerical analysts to solve challenging rewriting problems where state-of-the-art automated tools fail. In particular, the experts unanimously praised Odyssey’s novel support for interactive range modification and local error visualization.
The paper and application are available.
Herbie is a tool for automatically improving floating-point accuracy in programs. Egglog is a new language for performing rewriting over equality, supporting robust analysis. In this tutorial, we show how we improved Herbie in two ways. First, we will show how we leverage egglog to perform sound rewriting in the presence of division for Herbie. Second, we show how to use egglog's powerful rules to extract more accurate programs from the database.
In theory, interval arithmetic at high precision can compute mathematical expressions to any required accuracy. An arbitrary precision library like MPFR can thus be used to evaluate real-valued expressions. In practice, however, MPFR's maximum representable exponent means some inputs cannot be evaluated to the required accuracy. This paper introduces movability flags, which soundly detect the majority of these inputs. Movability flags are set when overflow occurs and track whether recomputing at higher precision could possibly result in narrow intervals. In our tests on 481 expressions, movability flags detect 81.0% of unsamplable inputs. Compared to Mathematica, our movability-flag-enhanced interval arithmetic library resolves 60.3% more challenging inputs, returns 7.6× fewer completely indeterminate results, and avoids 64 cases of fatal error.
A long preprint, documentation, and source code is available.
sin and log.
  Achieving speed and accuracy for math library functions like exp, sin, and log is difficult. This is because low-level implementation languages like C do not help math library developers catch mathematical errors, build implementations incrementally, or separate high-level and low-level decision making. This ultimately puts development of such functions out of reach for all but the most experienced experts. To address this, we introduce MegaLibm, a domain-specific language for implementing, testing, and tuning math library implementations. MegaLibm is safe, modular, and tunable. Implementations in MegaLibm can automatically detect mathematical mistakes like sign flips via semantic well-formedness checks, and components like range reductions can be implemented in a modular, composable way, simplifying implementations. Once the high-level algorithm is done, tuning parameters like working precisions and evaluation schemes can be adjusted through orthogonal tuning parameters to achieve the desired speed and accuracy. MegaLibm also enables math library developers to work interactively, compiling, testing, and tuning their implementations and invoking tools like Sollya and type-directed synthesis to complete components and synthesize entire implementations. MegaLibm can express 8 state-of-the-art math library implementations with comparable speed and accuracy to the original C code, and can synthesize 5 variations and 3 from-scratch implementations with minimal guidance.
A preprint, talk, source code are available.
Evaluating a real-valued expression to high precision is a key building block in computational mathematics, physics, and numerics. A typical implementation uses a uniform precision for each operation, and doubles that precision until the real result can be bounded to some sufficiently narrow interval. However, this is wasteful: usually only a few operations really need to be performed at high precision, and the bulk of the expression could use much lower precision. Uniform precision can also waste iterations discovering the necessary precision and then still overestimate by up to a factor of two. We propose to instead use mixed-precision interval arithmetic to evaluate real-valued expressions. A key challenge is deriving the mixed-precision assignment both soundly and quickly. To do so, we introduce a sound variation of error Taylor series and condition numbers, specialized to interval arithmetic, that can be evaluated with minimal overhead thanks to an "exponent trick". Our implementation, Reval, achieves a speed-up of 1.25x compared to the state-of-the-art Sollya tool, with the speed-up increasing to 2.99x on the most difficult input points. An examination of the precisions used with and without precision tuning shows that the speed-up results come from quickly assigning lower precisions for the majority of operations.
A paper, documentation, and source code are available.
Recent advances have made numeric debugging tools much faster by using double-double oracles, and numeric analysis tools much more accurate by using condition numbers. But these techniques have downsides: double-double oracles have correlated error so miss floating-point errors while condition numbers cannot cleanly handle over- and under- flow. We combine both techniques to avoid these downsides. Our combination, ExplaniFloat, computes condition numbers using double-double arithmetic, which avoids correlated errors. To handle over- and under- flow, it introduces a separate logarithmic oracle. As a result, ExplaniFloat achieves a precision of 80.0% and a recall of 96.1% on a collection of 546 difficult numeric benchmarks: more accurate than double-double oracles yet dramatically faster than arbitrary-precision condition number computations.
A paper, and source code are available.