The Herbie developers are excited to announce
Herbie 2.1! This release focuses performance, both in the generated
code and in the Herbie kernel itself.
What is Herbie? Herbie automatically improves the accuracy
of floating point expressions. This avoids the bugs, errors, and
surprises that so often occur with floating point arithmetic.
Visit the main page to learn more about Herbie.
OOPSLA, ASPLOS, and POPL Reviewers, please
do not read further, because some of the work described below is
submitted for publication to these venues.
Faster Generated Code
Last year, Herbie 2.0
released pareto mode, in which Herbie generates multiple
expressions with different speeds and accuracies. Herbie 2.1 now
makes Herbie's generated code, especially at the
highest-performance/lowest-accuracy level, dramatically better.
Some features that contribute to these improvements:
With type-based extraction (#875, #883, and #887) Herbie
considers performance optimizations like precision tuning at the
same time as it considers rewrites, instead of considering each
separately.
A new cost-opportunity heuristic
(#746)
allows Herbie to focus on parts of the program that can be sped up,
generating faster low-accuracy code.
Preprocessing for odd functions
(#645)
generates range reductions for odd functions, which can mean more
accurate generated code.
Polynomials are now evaluated in Horner form
(#727).
Together with some bug fixes
(#660),
this means faster and more accurate polynomial approximations.
While Herbie's generated results are much better, these changes
alone would make Herbie more than twice as slow. This leads to the
second category of changes.
Faster Herbie Kernel
Iteration 0
Iteration 1
Iteration 2
Operation
Precision
Time (µs)
Precision
Time (µs)
Precision
Time (µs)
Tuning
22.9
21.0
cos
78
75.9
592
98.9
1695
173.1
add
83
11.0
2107
10.0
2698
11.0
cos
78
8.1
593
99.1
1695
176.0
sub
73
10.0
73
10.0
73
11.0
Total
105.0
241.0
392.1
A precision-tuned execution of cos(x) - cos(x + ɛ)
when x = 10300 and ɛ =
10-300. Each row of the table represents one
mathematical operation (or the time spent precision-tuning), and
each pair of columns describes one iteration precision and
execution time for that operation. Each operation's precision is
chosen independently, so the precision column is not uniform.
Nearly every part of Herbie has been sped up, often significantly,
meaning that Herbie 2.1 overall—despite the much faster
generated code—is only 20% or so slower than Herbie 2.0.
The most challenging improvement is a complete rewrite of Herbie's
real evaluation system, Rival. Herbie 2.1 uses precision tuning to
reduce the time and memory costs by approximately 40%, with the
biggest impacts to the largest and slowest expressions. Moreover,
Rival has
been packaged
for use in other projects.
Other optimizations to Herbie include:
Regimes saw a series of improvements, including to
data layout (#696),
sharing (#706),
types (#748),
and algorithms (#772),
which together lead to a 2–3× speed up to regimes.
The floating-point program interpreter was rewritten and sped up
(#766).
Batching sped up derivation generation significantly
(#736.)
An accidentally-quadratic lookup in pruning was found and fixed
(#781).
Analysis capped an exponential blow-up that occurred for some preconditions
(#762).
Random number generation saw a small speed up
(#792).
New Features: Platforms and Explanations
Two new features are in development and available in an
undocumented alpha state in this release: platforms and explanations.
Platforms allow Herbie to generate code specific
to a given programming language, library, or hardware platform.
Herbie can use platform-specific operators, cost models,
and compilation styles, which leads to faster and more accurate
code. We hope to clean up the platforms code and release it for real
in Herbie 2.2.
Explanations describe what floating-point errors Herbie found and
what inputs they occur for. This should make Herbie easier to
understand and a more valuable tool for learning about
floating-point error.
Sister Projects
The Odyssey numerics
workbench is releasing version 1.1 today, featuring FPTaylor
support and expression export. Odyssey and supporting tools like
Herbie and FPTaylor can be installed and run locally through the
Odyssey VSCode extension. New features include:
Support for using FPTaylor to compute sound error bounds in
Odyssey. Select "FPTaylor Analysis" from the tool dropdown for an
expression. This is a part of a larger effort to combine different
floating point tools as parts of an analysis.
Odyssey now supports exporting expressions to different
languages using the new "Expression Export" tool.
Herbie has been updated with an HTTP API endpoint to support
Odyssey's expression export. Herbie's HTTP API endpoints are
documented here.
Like the Herbie demo, Odyssey now shows the percent accuracy
of expressions, rather than bits of error.
The layout of the Odyssey interface has been updated and will
continue to see rolling updates.
The Rival
real-arithmetic package is releasing version 2.0 today,
featuring the correct-rounding code from Herbie
(#804),
including the new precision tuning algorithm and a newly-build
profiling system.
Development Improvements
Herbie now supports the FPCore :alt field,
including multiple alternative expressions
(#764,
#783,
and #805).
Many of Herbie's oldest benchmarks have gained new preconditions
and human-written target programs
(#693,
#697),
which will drive Herbie development in the future.
Herbie's report page has been totally rewritten, and now uses
JavaScript. This has allowed us to sorting, filtration, and
diffing capabilities, which has really made development easier.
(#641,
#651,
#687).
However, this does mean that you need to run a local server to view
report pages saved on your local disk—this is a browser security policy
that we can't avoid. You'll see an error message explaning how
(#863).
Friends at Intel contributed benchmarks from the DirectX specification
(#656).
Caching has sped up Herbie's continuous integration
(#663).
Other improvements
We now build and publish macOS Arm64 packages
(#787,
#862).
Thank you to Github for hosting our build infrastructure.
We fixed a memory leak
(#636)
and segfault
(#665)
in Herbie's supporting Rust libraries. Eventually the egg folks
tracked
down the
root cause, so this won't be a problem any more.
Some sources of non-determinism were tracked down and fixed
(#661).
Herbie's internals now distinguish between real and
floating-point expressions
(#676,
#702,
#732),
which has previously been an ad-hoc distinction.
Herbie's API endpoints now use an internal job astraction
(#845),
which should eventually allow them to be threaded and asynchronous.
Try it out!
We want Herbie to be more useful to scientists, engineers, and
programmers around the world. We've got a lot of features we're
excited to work on in the coming months. Please
report bugs
or contribute.