Herbie helps you identify and correct floating point error in your numeric programs. But what is floating point error, and how does Herbie measure it?
When Herbie reports a "percentage accuracy" number like 92.3%, it's usually best to think of that as the percentage of floating-point inputs where the expression is reasonably accurate.
The impact of this error on your application will depend a lot on which 7.7% of inputs are inaccurate, and what kind of error that is. You can find this out using the accuracy graph in Herbie reports. You can also use preconditions to restrict the inputs Herbie is considering.
In mathematics, we work with real numbers, but on a computer, we typically use floating-point numbers. Because there are infinitely many real numbers, but only finitely many floating-point numbers, some real numbers can't be accurately represented. This means that every time you do an operation, the true result typically has to be rounded to a representable one.
Take an extreme example: the code 1e100 + 1
, which
increments a huge number in IEEE 754 double-precision floating-point.
There's an exact real-number answer, a one followed by 99 zeros and
then another 1, but the closest floating-point value is the
same as 1e100
.
Errors like this can cause problems. In the example above, the
answers differ by one part per googol, which is pretty small. But the
error can grow. For example, since 1e100 + 1
rounds to
the same value as 1e100
, the larger computation
(1e100 + 1) - 1e100returns
0
instead of the correct answer, 1
.
Now the difference is pretty stark, and can grow even bigger through
later operations.
There are lots of ways to measure how much rounding error there is in a computation. Most people find the notions of absolute and relative error most intuitive, but Herbie internally uses a more complex notion called bits of error.
The bits of error metric imagines you have a list of all of the possible floating-point numbers, from largest to smallest. In that list, compare the floating-point value you computed to the one closest to the true answer. If they are the same, that's called 0 bits of error; if they are one apart, that's called one bit of error; three apart is two bits of error; and so on.
In general, if the two floating-point values are n items
apart, Herbie says they have log2(n + 1)
bits of error.
Values like 0
and -0
are treated as having 0
bits of error, and NaN
is considered to have the maximum
number of bits of error against non-NaN
values. While
there's all sorts of theoretical justifications, Herbie mainly uses
this error metric because we've found it to give the best
results.
On a single input, the best way to interpret the "bits of error" metric is that it tells you roughly how many bits of the answer, starting at the end, are useless. With zero bits of error, you have the best answer possible. With four bits, that's still pretty good because it's four out of 64. But with 40 or 50 bits of error, you're getting less accuracy out of the computation than even a single-precision floating-point value. And it's even possible to have something like 58 or 60 bits of error, in which case even the sign and exponent bits (which in double-precision floating-point occupy the top 12 bits of the number) are incorrect.
Because different number representations have different numbers of bits, Herbie shows the percentage of bits that are accurate instead of the bits of error. With double-precision floats, for example, 75% accurate means 16 bits of error.
Bits of error measures the error of a computation for some specific input. But usually you're interested in the error of a computation across many possible inputs. Herbie therefore averages the accuracy percentage across many randomly-sampled valid inputs.
Typically, inputs points are either very accurate or not accurate at all. So when computing percentage accuracy, Herbie's averaging a lot of points with near-100% accuracy and a lot of points with near-0% accuracy. In that sense, you can think of percentage accuracy as measuring mostly what percentage of inputs are accurate. If Herbie says your computation is 75% accurate what it's really saying is that about a quarter of inputs produce usable outputs.
Since percentage accuracy averages over many points, it depends on what points it is averaging over, and how is samples among them. Herbie samples among valid inputs, uniformly distributed.
Herbie considers an input valid if it is a floating-point value in the appropriate precision and its true, real-number output 1) exists; 2) satisfies the user's precondition; and 3) can be computed. Let's dive into each requirement.
(< 1 x 2)
, then the input x = 3
is
invalid.(/ (exp
1e9) (exp 1e9))
, which divides two identical but gargantuan
numbers, does have an exact real-number answer (one), but Herbie
can't compute that exact answer because the intermediate steps are
too large. This input is also invalid, but you'll
get a warning if this
happens.Herbie's percentage accuracy metric only averages over valid points. This means that when you change your precondition, you change which points are valid, and that can change the percentage accuracy reported by Herbie. This is useful: if you know there's a rare error, but Herbie thinks error is low, you can change your precondition to focus on the points of interest.
Some inputs are valid in surprising ways. For example, consider the
expression (exp x)
for x = 1e100
. The true
real-number result is a very large but finite number; but that real
number, whatever it is, rounds to infinity in double-precision
floating point. Herbie does consider this input valid, but infinite
values are a little odd in floating-point so you might find this
surprising. For this reason, Herbie
issues a warning if too many outputs
are infinite.
When randomly sampling inputs, Herbie considers each valid input equally likely. Importantly, this does not mean that it uses a uniform distribution, because floating-point values themselves are not uniformly distributed.
For example, there are as many floating-point values between 1 and 2 as there are between one and one half, because floating-point values use an exponential encoding. But that means that if you're looking at a computation where the input comes from the interval [0.5, 2], Herbie will weigh the first third of that range twice as high as the other two thirds.
This can produce unintuitive results, especially for intervals that cross 0. For example, in the interval [0, 1], the second half of that interval (from one half to one) has a tiny proportion of the weight (in double-precision floating-point, about 0.1%). If Herbie can improve accuracy by a little bit between zero and one half, while dramatically reducing accuracy between one half and one, it will think that that's an accuracy improvement. You might disagree.
Unfortunately, there's no way for Herbie to intuit exactly what you
mean, so understanding this error distribution is essential to
understanding Herbie's outputs. For example, if Herbie reports that
accuracy improved from 75% to 76%, it's essential to know: is the
improvement happening on inputs between one half and one, or
between 1e-100
and 2e-100
? To answer this
question, it's important to look over
the reports and graphs generated by
Herbie.