What is Error?

Herbie helps you identify and correct floating point error in your numeric programs. But what is floating point error, and how does Herbie measure it?

The summary

When Herbie reports a "percentage accuracy" number like 92.3%, it's usually best to think of that as the percentage of floating-point inputs where the expression is reasonably accurate.

The impact of this error on your application will depend a lot on which 7.7% of inputs are inaccurate, and what kind of error that is. You can find this out using the accuracy graph in Herbie reports. You can also use preconditions to restrict the inputs Herbie is considering.

Why rounding matters

In mathematics, we work with real numbers, but on a computer, we typically use floating-point numbers. Because there are infinitely many real numbers, but only finitely many floating-point numbers, some real numbers can't be accurately represented. This means that every time you do an operation, the true result typically has to be rounded to a representable one.

Take an extreme example: the code 1e100 + 1, which increments a huge number in IEEE 754 double-precision floating-point. There's an exact real-number answer, a one followed by 99 zeros and then another 1, but the closest floating-point value is the same as 1e100.

Errors like this can cause problems. In the example above, the answers differ by one part per googol, which is pretty small. But the error can grow. For example, since 1e100 + 1 rounds to the same value as 1e100, the larger computation

(1e100 + 1) - 1e100
returns 0 instead of the correct answer, 1. Now the difference is pretty stark, and can grow even bigger through later operations.

Bits of error

There are lots of ways to measure how much rounding error there is in a computation. Most people find the notions of absolute and relative error most intuitive, but Herbie internally uses a more complex notion called bits of error.

The bits of error metric imagines you have a list of all of the possible floating-point numbers, from largest to smallest. In that list, compare the floating-point value you computed to the one closest to the true answer. If they are the same, that's called 0 bits of error; if they are one apart, that's called one bit of error; three apart is two bits of error; and so on.

In general, if the two floating-point values are n items apart, Herbie says they have log2(n + 1) bits of error. Values like 0 and -0 are treated as having 0 bits of error, and NaN is considered to have the maximum number of bits of error against non-NaN values. While there's all sorts of theoretical justifications, Herbie mainly uses this error metric because we've found it to give the best results.

On a single input, the best way to interpret the "bits of error" metric is that it tells you roughly how many bits of the answer, starting at the end, are useless. With zero bits of error, you have the best answer possible. With four bits, that's still pretty good because it's four out of 64. But with 40 or 50 bits of error, you're getting less accuracy out of the computation than even a single-precision floating-point value. And it's even possible to have something like 58 or 60 bits of error, in which case even the sign and exponent bits (which in double-precision floating-point occupy the top 12 bits of the number) are incorrect.

Percentage accuracy

Because different number representations have different numbers of bits, Herbie shows the percentage of bits that are accurate instead of the bits of error. With double-precision floats, for example, 75% accurate means 16 bits of error.

Bits of error measures the error of a computation for some specific input. But usually you're interested in the error of a computation across many possible inputs. Herbie therefore averages the accuracy percentage across many randomly-sampled valid inputs.

Typically, inputs points are either very accurate or not accurate at all. So when computing percentage accuracy, Herbie's averaging a lot of points with near-100% accuracy and a lot of points with near-0% accuracy. In that sense, you can think of percentage accuracy as measuring mostly what percentage of inputs are accurate. If Herbie says your computation is 75% accurate what it's really saying is that about a quarter of inputs produce usable outputs.

Valid inputs

Since percentage accuracy averages over many points, it depends on what points it is averaging over, and how is samples among them. Herbie samples among valid inputs, uniformly distributed.

Herbie considers an input valid if it is a floating-point value in the appropriate precision and its true, real-number output 1) exists; 2) satisfies the user's precondition; and 3) can be computed. Let's dive into each requirement.

  1. An output can fail to exist for an input if that input does something like divide by zero or take the square root of a negative number. Then that input is invalid.
  2. An input can fail to satisfy the user's precondition. Preconditions can be basically anything, but most often they specify a range of inputs. For example, if the precondition is (< 1 x 2), then the input x = 3 is invalid.
  3. Finally, and most rarely, Herbie can fail to compute the output for a particular input. For example, the computation (/ (exp 1e9) (exp 1e9)), which divides two identical but gargantuan numbers, does have an exact real-number answer (one), but Herbie can't compute that exact answer because the intermediate steps are too large. This input is also invalid.

Herbie's percentage accuracy metric only averages over valid points. This means that when you change your precondition, you change which points are valid, and that can change the percentage accuracy reported by Herbie. This is useful: if you know there's a rare error, but Herbie thinks error is low, you can change your precondition to focus on the points of interest.

Sampling inputs

When randomly sampling inputs, Herbie considers each valid input equally likely. Importantly, this does not mean that it uses a uniform distribution, because floating-point values themselves are not uniformly distributed.

For example, there are as many floating-point values between 1 and 2 as there are between one and one half, because floating-point values use an exponential encoding. But that means that if you're looking at a computation where the input comes from the interval [0.5, 2], Herbie will weight the first third of that range twice as high as the other two thirds.

This can produce unintuitive results, especially for intervals that cross 0. For example, in the interval [0, 1], the second half of that interval (from one half to one) has a tiny proportion of the weight (in double-precision floating-point, about 0.1%). If Herbie can improve accuracy by a little bit between zero and one half, while dramatically reducing accuracy between one half and one, it will think that that's an accuracy improvement. You might disagree.

Unfortunately, there's no way for Herbie to intuit exactly what you mean, so understanding this error distribution is essential to understanding Herbie's outputs. For example, if Herbie reports that accuracy improved from 75% to 76%, it's essential to know: is the improvement happening on inputs between one half and one, or between 1e-100 and 2e-100? To answer this question, it's important to look over the reports and graphs generated by Herbie.