By user1006274

2012-02-03 23:26:14 8 Comments

I want to determine (in c++) if one float number is the multiplicative inverse of another float number. The problem is that i have to use a third variable to do it. For instance this code:

float x=5,y=0.2;
if(x==(1/y)) cout<<"They are the multiplicative inverse of eachother"<<endl;
else cout<<"They are NOT the multiplicative inverse of eachother"<<endl;

will output: "they are not..." which is wrong and this code:

float x=5,y=0.2,z;
if(x==z) cout<<"They are the multiplicative inverse of eachother"<<endl;
else cout<<"They are NOT the multiplicative inverse of eachother"<<endl;

will output: "they are..." which is right.
why is this happening?


@Alexey Frunze 2012-02-16 15:14:41

The discussions in other replies are great and so I won't repeat any of them, but there's no code. Here's a little bit of code to actually check if a pair of floats gives exactly 1.0 when multiplied.

The code makes a few assumptions/assertions (which are normally met on the x86 platform):
- float's are 32-bit binary (AKA single precision) IEEE-754
- either int's or long's are 32-bit (I decided not to rely on the availability of uint32_t)
- memcpy() copies floats to ints/longs such that 8873283.0f becomes 0x4B076543 (i.e. certain "endianness" is expected)

One extra assumption is this:
- it receives the actual floats that * would multiply (i.e. multiplication of floats wouldn't use higher precision values that the math hardware/library can use internally)

#include <stdio.h>
#include <string.h>
#include <limits.h>
#include <assert.h>

#define C_ASSERT(expr) extern char CAssertExtern[(expr)?1:-1]

typedef unsigned int uint32;
typedef unsigned long uint32;
typedef unsigned long long uint64;

C_ASSERT(sizeof(uint32) == 4);
C_ASSERT(sizeof(float) == 4);

int ProductIsOne(float f1, float f2)
  uint32 m1, m2;
  int e1, e2, s1, s2;
  int e;
  uint64 m;

  // Make sure floats are 32-bit IEE754 and
  // reinterpreted as integers as we expect
    static const float testf = 8873283.0f;
    uint32 testi;
    memcpy(&testi, &testf, sizeof(testf));
    assert(testi == 0x4B076543);

  memcpy(&m1, &f1, sizeof(f1));
  s1 = m1 >= 0x80000000;
  m1 &= 0x7FFFFFFF;
  e1 = m1 >> 23;
  m1 &= 0x7FFFFF;
  if (e1 > 0) m1 |= 0x800000;

  memcpy(&m2, &f2, sizeof(f2));
  s2 = m2 >= 0x80000000;
  m2 &= 0x7FFFFFFF;
  e2 = m2 >> 23;
  m2 &= 0x7FFFFF;
  if (e2 > 0) m2 |= 0x800000;

  if (e1 == 0xFF || e2 == 0xFF || s1 != s2) // Inf, NaN, different signs
    return 0;

  m = (uint64)m1 * m2;

  if (!m || (m & (m - 1))) // not a power of 2
    return 0;

  e = e1 + !e1 - 0x7F - 23 + e2 + !e2 - 0x7F - 23;
  while (m > 1) m >>= 1, e++;

  return e == 0;

const float testData[][2] =
  { .1f, 10.0f },
  { 0.5f, 2.0f },
  { 0.25f, 2.0f },
  { 4.0f, 0.25f },
  { 0.33333333f, 3.0f },
  { 0.00000762939453125f, 131072.0f }, // 2^-17 * 2^17
  { 1.26765060022822940E30f, 7.88860905221011805E-31f }, // 2^100 * 2^-100
  { 5.87747175411143754E-39f, 1.70141183460469232E38f }, // 2^-127 (denormalized) * 2^127

int main(void)
  int i;
  for (i = 0; i < sizeof(testData) / sizeof(testData[0]); i++)
    printf("%g * %g %c= 1\n",
           testData[i][0], testData[i][1],
           "!="[ProductIsOne(testData[i][0], testData[i][1])]);
  return 0;

Output (see at

0.1 * 10 != 1
0.5 * 2 == 1
0.25 * 2 != 1
4 * 0.25 == 1
0.333333 * 3 != 1
7.62939e-06 * 131072 == 1
1.26765e+30 * 7.88861e-31 == 1
5.87747e-39 * 1.70141e+38 == 1

@Gangnus 2012-02-16 16:28:56

+1. So, the binary fractions are precise there. Haven't you tried 2^(-100)* 2^(+100)?

@Alexey Frunze 2012-02-16 17:06:16

@Gangnus: Sure, if it's binary, powers of 2 are exact. See the updated code on ideone. We don't even need all significant digits of 2^100 or 2^-100 in decimal.

@Gangnus 2012-02-16 20:11:04

I meant, that above some power there will be problems to place the power of 2 into the secont part of the float.

@Alexey Frunze 2012-02-16 20:35:16

@Gangnus: Beyond the maximum exponent there's only infinity (the code returns 0 on Inf's and NaN's). Below the minimum exponent there are denormalized values (the code handles them too). See another update on ideone demonstrating a denormalized case.

@Gangnus 2012-02-16 20:40:59

Yes. I see. Thank you. I thought how would like the operations near that border to NaN or to 0.

@Gangnus 2012-02-03 23:33:39

The Float Precision Problem

    You have two problems here, but both come from the same root

You can't compare floats precisely. You can't subtract or divide them precisely. You can't count anything for them precisely. Any operation with them could (and almost always does) bring some error into the result. Even a=0.2f is not a precise operation. The deeper reasons of that are very well explained by the authors of the other answers here. (My thanks and votes to them for that.)

Here comes your first and more simple error. You should never, never, never, never, NEVER use on them == or its equivalent in any language.

Instead of a==b, use Abs(a-b)<HighestPossibleError instead.

    But this is not the sole problem in your task.

Abs(1/y-x)<HighestPossibleError won't work, either. At least, it won't work often enough. Why?

Let's take pair x=1000 and y=0.001. Let's take the "starting" relative error of y for 10-6.

(Relative error = error/value).

Relative errors of values are adding to at multiplication and division.

1/y is about 1000. Its relative error is the same 10-6. ("1" hasn't errors)

That makes absolute error =1000*10-6=0.001. When you subtract x later, that error will be all that remains. (Absolute errors are adding to at adding and subtracting, and the error of x is negligibly small.) Surely, you are not counting on so large errors, HighestPossibleError would be surely set lower and your program would throw off a good pair of x,y

So, the next two rule for float operations: try not to divide greater valuer by lesser one and God save you from subtracting the close values after that.

There are two simple ways to escape this problem.

  • By founding what of x,y has the greater abs value and divide 1 by the greater one and only later to subtract the lesser one.

  • If you want to compare 1/y against x, while you are working yet with letters, not values, and your operations make no errors, multiply the both sides of comparison by y and you have 1 against x*y. (Usually you should check signs in that operation, but here we use abs values, so, it is clean.) The result comparison has no division at all.

In a shorter way:

1/y V x   <=>   y*(1/y) V x*y   <=>   1 V x*y 

We already know that such comparison as 1 against x*y should be done so:

const float HighestPossibleError=1e-10;

That is all.

P.S. If you really need it all on one line, use:


But it is bad style. I wouldn't advise it.

P.P.S. In your second example the compiler optimizes the code so, that it sets z to 5 before running any code. So, checking 5 against 5 works even for floats.

@Yves Daoust 2012-02-14 11:49:07

What is striking is that whatever the rounding rule is, you expect the outcome of the two versions to be the same (either twice wrong or twice right)!

Most probably, in the first case a promotion to higher accuracy in the FPU registers takes place when evaluating x==1/y, whereas z= 1/y really stores the single-precision result.

Other contributors have explaine why 5==1/0.2 can fail, I needn't repeat that.

@hammar 2012-02-03 23:52:13

The problem is that 0.2 cannot be represented exactly in binary, because its binary expansion has an infinite number of digits:

 1/5: 0.0011001100110011001100110011001100110011...

This is similar to how 1/3 cannot be represented exactly in decimal. Since x is stored in a float which has a finite number of bits, these digits will get cut off at some point, for example:

   x: 0.0011001100110011001100110011001

The problem arises because CPUs often use a higher precision internally, so when you've just calculated 1/y, the result will have more digits, and when you load x to compare them, x will get extended to match the internal precision of the CPU.

 1/y: 0.0011001100110011001100110011001100110011001100110011
   x: 0.0011001100110011001100110011001000000000000000000000

So when you do a direct bit-by-bit comparison, they are different.

In your second example, however, storing the result into a variable means it gets truncated before doing the comparison, so comparing them at this precision, they're equal:

   x: 0.0011001100110011001100110011001
   z: 0.0011001100110011001100110011001

Many compilers have switches you can enable to force intermediate values to be truncated at every step for consistency, however the usual advice is to avoid doing direct comparisons between floating-point values and instead check if they differ by less than some epsilon value, which is what Gangnus is suggesting.

@David Schwartz 2012-02-03 23:42:14

You will have to precisely define what it means for two approximations to be multiplicative inverses. Otherwise, you won't know what it is you're supposed to be testing.

0.2 has no exact binary representation. If you store numbers that have no exact representation with limited precision, you won't get answers that are exactly correct.

The same things happens in decimal. For example, 1/3 has no exact decimal representation. You can store it as .333333. But then you have a problem. Are 3 and .333333 multiplicative inverses? If you multiply them, you get .999999. If you want the answer to be "yes" you'll have to create a test for multiplicative inverses that isn't as simple as multiplying and testing for equality to 1.

The same thing happens with binary.

Related Questions

Sponsored Content

30 Answered Questions

[SOLVED] Is floating point math broken?

53 Answered Questions

79 Answered Questions

[SOLVED] How do I iterate over the words of a string?

  • 2008-10-25 08:58:21
  • Ashwin Nanjappa
  • 2183821 View
  • 2999 Score
  • 79 Answer
  • Tags:   c++ string split

27 Answered Questions

[SOLVED] How do I parse a string to a float or int?

18 Answered Questions

[SOLVED] Difference between decimal, float and double in .NET?

42 Answered Questions

[SOLVED] How to deal with floating point number precision in JavaScript?

36 Answered Questions

[SOLVED] How can I pair socks from a pile efficiently?

33 Answered Questions

[SOLVED] How do I check if a string is a number (float)?

25 Answered Questions

[SOLVED] Limiting floats to two decimal points

11 Answered Questions

[SOLVED] How dangerous is it to compare floating point values?

Sponsored Content