Herbie helps you identify and correct floating point error in your numeric programs. But what is floating point error, and how does Herbie measure it?

When Herbie reports a "percentage accuracy" number like 92.3%, it's usually best to think of that as the percentage of floating-point inputs where the expression is reasonably accurate.

The impact of this error on your application will depend a lot
on *which* 7.7% of inputs are inaccurate, and what kind of
error that is. You can find this out using
the accuracy graph in Herbie reports. You
can also use preconditions to restrict the inputs Herbie is
considering.

In mathematics, we work with real numbers, but on a computer, we
typically use floating-point numbers. Because there are infinitely
many real numbers, but only finitely many floating-point numbers, some
real numbers can't be accurately represented. This means that every
time you do an operation, the true result typically has to
be *rounded* to a representable one.

Take an extreme example: the code `1e100 + 1`

, which
increments a huge number in IEEE 754 double-precision floating-point.
There's an exact real-number answer, a one followed by 99 zeros and
then another 1, but the closest *floating-point* value is the
same as `1e100`

.

Errors like this can cause problems. In the example above, the
answers differ by one part per googol, which is pretty small. But the
error can grow. For example, since `1e100 + 1`

rounds to
the same value as `1e100`

, the larger computation

(1e100 + 1) - 1e100returns

`0`

instead of the correct answer, `1`

.
Now the difference is pretty stark, and can grow even bigger through
later operations.
There are lots of ways to *measure* how much rounding error
there is in a computation. Most people find the notions of absolute
and relative error most intuitive, but Herbie internally uses a more
complex notion called *bits of error*.

The bits of error metric imagines you have a list of all of the possible floating-point numbers, from largest to smallest. In that list, compare the floating-point value you computed to the one closest to the true answer. If they are the same, that's called 0 bits of error; if they are one apart, that's called one bit of error; three apart is two bits of error; and so on.

In general, if the two floating-point values are `n` items
apart, Herbie says they have `log2(n + 1)`

bits of error.
Values like `0`

and `-0`

are treated as having 0
bits of error, and `NaN`

is considered to have the maximum
number of bits of error against non-`NaN`

values. While
there's all sorts of theoretical justifications, Herbie mainly uses
this error metric because we've found it to give the best
results.

On a single input, the best way to interpret the "bits of error" metric is that it tells you roughly how many bits of the answer, starting at the end, are useless. With zero bits of error, you have the best answer possible. With four bits, that's still pretty good because it's four out of 64. But with 40 or 50 bits of error, you're getting less accuracy out of the computation than even a single-precision floating-point value. And it's even possible to have something like 58 or 60 bits of error, in which case even the sign and exponent bits (which in double-precision floating-point occupy the top 12 bits of the number) are incorrect.

Because different number representations have different numbers of bits, Herbie shows the percentage of bits that are accurate instead of the bits of error. With double-precision floats, for example, 75% accurate means 16 bits of error.

Bits of error measures the error of a computation for some specific input. But usually you're interested in the error of a computation across many possible inputs. Herbie therefore averages the accuracy percentage across many randomly-sampled valid inputs.

Typically, inputs points are either very accurate or not accurate
at all. So when computing percentage accuracy, Herbie's averaging a
lot of points with near-100% accuracy and a lot of points with near-0%
accuracy. In that sense, you can think of percentage accuracy as
measuring mostly what percentage *of inputs* are accurate. If
Herbie says your computation is 75% accurate what it's really saying
is that about a quarter of inputs produce usable outputs.

Since percentage accuracy averages over many points, it depends on
what points it is averaging over, and how is samples among them.
Herbie samples among *valid* inputs, uniformly distributed.

Herbie considers an input valid if it is a floating-point value in the appropriate precision and its true, real-number output 1) exists; 2) satisfies the user's precondition; and 3) can be computed. Let's dive into each requirement.

- An output can fail to exist for an input if that input does something like divide by zero or take the square root of a negative number. Then that input is invalid.
- An input can fail to satisfy the user's precondition.
Preconditions can be basically anything, but most often they
specify a range of inputs. For example, if the precondition
is
`(< 1 x 2)`

, then the input`x = 3`

is invalid. - Finally, and most rarely, Herbie can fail to compute the output
for a particular input. For example, the computation
`(/ (exp 1e9) (exp 1e9))`

, which divides two identical but gargantuan numbers, does have an exact real-number answer (one), but Herbie can't compute that exact answer because the intermediate steps are too large. This input is also invalid, but you'll get a warning if this happens.

Herbie's percentage accuracy metric only averages over valid points. This means that when you change your precondition, you change which points are valid, and that can change the percentage accuracy reported by Herbie. This is useful: if you know there's a rare error, but Herbie thinks error is low, you can change your precondition to focus on the points of interest.

Some inputs are valid in surprising ways. For example, consider the
expression `(exp x)`

for `x = 1e100`

. The true
real-number result is a very large but finite number; but that real
number, whatever it is, rounds to infinity in double-precision
floating point. Herbie does consider this input valid, but infinite
values are a little odd in floating-point so you might find this
surprising. For this reason, Herbie
issues a warning if too many outputs
are infinite.

When randomly sampling inputs, Herbie considers each valid input equally likely. Importantly, this does not mean that it uses a uniform distribution, because floating-point values themselves are not uniformly distributed.

For example, there are as many floating-point values between 1 and
2 as there are between one and one half, because floating-point values
use an exponential encoding. But that means that if you're looking at
a computation where the input comes from the interval `[0.5,
2]`, Herbie will weigh the first third of that range twice as
high as the other two thirds.

This can produce unintuitive results, especially for intervals that
cross 0. For example, in the interval `[0, 1]`, the second
half of that interval (from one half to one) has a tiny proportion of
the weight (in double-precision floating-point, about 0.1%). If Herbie
can improve accuracy by a little bit between zero and one half, while
dramatically reducing accuracy between one half and one, it will think
that that's an accuracy improvement. You might disagree.

Unfortunately, there's no way for Herbie to intuit exactly what you
mean, so understanding this error distribution is essential to
understanding Herbie's outputs. For example, if Herbie reports that
accuracy improved from 75% to 76%, it's essential to know: is the
improvement happening on inputs between one half and one, or
between `1e-100`

and `2e-100`

? To answer this
question, it's important to look over
the reports and graphs generated by
Herbie.