Herbie helps you identify and correct floating point error in your numeric programs. But what is floating point error, and how does Herbie measure it?
When Herbie reports a "percentage accuracy" number like 92.3%, it's usually best to think of that as the percentage of floating-point inputs where the expression is reasonably accurate.
The impact of this error on your application will depend a lot on which 7.7% of inputs are inaccurate, and what kind of error that is. You can find this out using the accuracy graph in Herbie reports. You can also use preconditions to restrict the inputs Herbie is considering.
In mathematics, we work with real numbers, but on a computer, we typically use floating-point numbers. Because there are infinitely many real numbers, but only finitely many floating-point numbers, some real numbers can't be accurately represented. This means that every time you do an operation, the true result will be rounded to a representable one.
Take an extreme example: the code 1e100 + 1
, which
increments a huge number in IEEE 754 double-precision floating-point.
There's an exact real-number answer, a one followed by 99 zeros and
then another 1, but the closest floating-point value is the
same as 1e100
.
Errors like this can cause problems. In the example above, the
answers differ by one part per googol, which is pretty small. But the
error can grow. For example, since 1e100 + 1
rounds to
the same value as 1e100
, the larger computation
(1e100 + 1) - 1e100returns
0
instead of the correct answer, 1
.
Now the difference is pretty stark, and can grow even bigger through
later operations.
There are lots of ways to measure how much rounding error there is in a computation. Most people find the notions of absolute and relative error most intuitive, but Herbie internally uses a more complex notion called bits of error.
The bits of error metric imagines you have a list of all of the possible floating-point numbers, from largest to smallest. In that list, compare the floating-point value you computed to the one closest to the true answer. If they are the same, that's called 0 bits of error; if they are one apart, that's called one bit of error; three apart is two bits of error; seven apart is three bits; and so on.
In general, if the two floating-point values are n items
apart, Herbie says they have log2(n + 1)
bits of error.
Values like 0
and -0
are treated as having 0
bits of error, and NaN is considered to have the maximum number of
bits of error against non-NaN values. While there's all sorts of
theoretical justifications, Herbie mainly uses this error metric
because we've found it to give the best results.
On a single input, the best way to interpret the "bits of error" metric is that it tells you roughly how many bits of the answer, starting at the end, are useless. With zero bits of error, you have the best answer possible. With four bits, that's still pretty good because it's four out of 64. But with 40 or 50 bits of error, you're getting less accuracy out of the computation than even a single-precision floating-point value. And it's even possible to have something like 58 or 60 bits of error, in which case even the sign and exponent bits (which in double-precision floating-point the the most significant 12 bits) are incorrect.
Because different number representations have different numbers of bits, Herbie shows the percentage of bits that are accurate instead of the bits of error. With double-precision floats, for example, 75% accurate means 16 bits of error.
Bits of error measures the error of a computation for some specific input. But usually you're interested in the error of a computation across many possible inputs. Herbie therefore averages the accuracy percentage across many randomly-sampled valid inputs.
Typically, input points are either very accurate or not accurate at all. So when computing percentage accuracy, Herbie's averaging a lot of points with near-100% accuracy and a lot of points with near-0% accuracy. In that sense, you can think of percentage accuracy as measuring mostly what percentage of inputs are accurate. If Herbie says your computation is 75% accurate what it's really saying is that about a quarter of inputs produce usable outputs.
When Herbie computes this average, it's over valid, uniformly distributed input points.
Herbie considers an input valid if it is a floating-point value in the appropriate precision and its true, real-number output 1) exists; 2) satisfies the user's precondition; and 3) can be computed. Let's dive into each requirement.
(< 1 x 2)
, then the input x = 3
is
invalid.(/ (exp
1e9) (exp 1e9))
, which divides two identical but gargantuan
numbers, does have an exact real-number answer (one), but Herbie
can't compute that exact answer because the intermediate values
are too large. This input is also invalid, but you'll
get a warning if this
happens.Herbie's percentage accuracy metric only averages over valid points. This means that when you change your precondition, you change which points are valid, and that can change the percentage accuracy reported by Herbie. This is useful: if you've observed a floating-point error, you can tailor your precondition to focus on inputs near the one you've observed.
Infinite outputs can be valid. For example, consider (exp
x)
for x = 1e100
. The true real-number result is
some very large finite number. That real number, whatever it is,
rounds to infinity in double-precision floating point. Herbie thus
considers this input valid. Since you might find this surprising,
Herbie issues a warning if too many
outputs are infinite.
When randomly sampling inputs, Herbie considers each valid input equally likely. Importantly, this does not mean that it uses a uniform distribution, because floating-point values themselves are not uniformly distributed.
For example, there are as many floating-point values between 1 and 2 as there are between one and one half, because floating-point values use an exponential encoding. But that means that, in the interval [0.5, 2], Herbie will sample from the first third of that range twice as often as from the other two thirds.
This can produce unintuitive results, especially for intervals that
cross 0. For example, in the interval [0, 1], the second
half of that interval (from one half to one) has a tiny proportion of
the weight (in double-precision floating-point, about 0.1%). If Herbie
can improve accuracy by a little bit between zero and one half, while
dramatically reducing accuracy between one half and one, it will think
that that's an accuracy improvement. For this reason, Herbie prompts
you to add a minimum absolute value for ranges that cross zero. Even a
trivial minimum absolute value, like 1e-10
, can
dramatically improve Herbie's results.
Unfortunately, there's no way for Herbie to intuit exactly what you
mean, so understanding this error distribution is essential to
understanding Herbie's outputs. For example, if Herbie reports that
accuracy improved from 75% to 76%, it's essential to know: is the
improvement happening on inputs between one half and one, or
between 1e-100
and 2e-100
? To answer this
question, it's important to look over
the reports and graphs generated by
Herbie.