By plasmacel


2013-07-22 14:18:08 8 Comments

The code below performs a fast inverse square root operation by some bit hacks. The algorithm was probably developed by Silicon Graphics in early 1990's and it's appeared in Quake 3 too. more info

However I get the following warning from GCC C++ compiler: dereferencing type-punned pointer will break strict-aliasing rules

Should I use static_cast, reinterpret_cast or dynamic_cast instead in such situations?

float InverseSquareRoot(float x)
{
    float xhalf = 0.5f*x;
    int32_t i = *(int32_t*)&x;
    i = 0x5f3759df - (i>>1);
    x = *(float*)&i;
    x = x*(1.5f - xhalf*x*x);
    return x;
}

7 comments

@Howard Hinnant 2013-07-23 00:04:45

Update

I no longer believe this answer is correct, due to feedback I've gotten from the committee. But I want to leave it up for informational purposes. And I am purposefully hopeful that this answer can be made correct by the committee (if it chooses to do so). I.e. there's nothing about the underlying hardware that makes this answer incorrect, it is just the judgement of a committee that makes it so, or not so.


I'm adding an answer not to refute the accepted answer, but to augment it. I believe the accepted answer is both correct and efficient (and I've just upvoted it). However I wanted to demonstrate another technique that is just as correct and efficient:

float InverseSquareRoot(float x)
{
    union
    {
        float as_float;
        int32_t as_int;
    };
    float xhalf = 0.5f*x;
    as_float = x;
    as_int = 0x5f3759df - (as_int>>1);
    as_float = as_float*(1.5f - xhalf*as_float*as_float);
    return as_float;
}

Using clang++ with optimization at -O3, I compiled plasmacel's code, R. Martinho Fernandes code, and this code, and compared the assembly line by line. All three were identical. This is due to the compiler's choice to compile it like this. It had been equally valid for the compiler to produce different, broken code.

@jogojapan 2013-07-23 00:47:24

First of all, the proposed solution is the same as in doron's answer, although you spell it out in code. But secondly, does this mean that the way we interpret the rules for unions these days means they can be used for type punning after all? There used to be a rule that you cannot set one member and then read another. See here as well: stackoverflow.com/questions/11373203/…

@Howard Hinnant 2013-07-23 03:11:53

@jogojapan: Good comment. I thought it important to show code. English can be so vague and ambiguous. On your second point, I'm glad you brought it up. I think the C++ committee needs to address this issue and recommend safe techniques for type punning. There have been discussions in the Library Working Group specifically requesting direction from the Core or Evolution Working Group on how this should best be accomplished. These questions, to the best of my knowledge, have so far gone unanswered. However, my belief at this time is that compiler writers will not violate the C99 contract.

@jogojapan 2013-07-23 03:16:49

Alright then... +1 for the time being :) (I'd suggest incorporating the comment in the answer, though.)

@bames53 2013-07-23 16:04:33

As far as I'm aware memcpy is the recommended approach for type punning in C++. C++ compilers may maintain the C99 union guarantee, however it's still undefined behavior in C++.

@Johannes Schaub - litb 2013-07-23 19:32:07

@jogojapan I have summarized how I understand the current rules here: stackoverflow.com/a/17819398/34509

@Johannes Schaub - litb 2013-07-23 20:05:35

For up to C++11, you can use my above union trick to be free of aliasing issues: union A { float x; int32_t y; }; int32_t value = A{3.14f}.y; (I don't really think that this is anymore safer than doing it without the temporary :D). The reason this "works" is that the initializer is a prvalue and hence is free of the aliasing rule restrictions. This however will change in C++14 because the initializer will be an xvalue :)

@bames53 2013-07-23 21:29:22

@JohannesSchaub-litb I don't believe the strict aliasing rules are the only restriction on type punning with unions in C++. For example I believe your example runs afoul of 8.5 [dcl.init]/16 where it says "the initial value of the object being initialized is the (possibly converted) value of the initializer expression." since the object designated by A{3.14f}.y does not have any value. C++ omits specification of behavior in this situation and so the behavior is undefined.

@Johannes Schaub - litb 2013-07-23 21:52:15

@bames53 why is the object designated not the float object previously initialized?

@Johannes Schaub - litb 2013-07-23 22:02:49

@bames53 I agree that the rules need clarification though. I suspect the way to read this as being UB is that the access using another member does not alias the lvalue with the object currently being alive, but with a distinct nonalive object associated with the other member. That needs the concept of a "nonclass type object of type T that is not alive but existing". From previous runs through the spec, I couldn't make sense of that. Perhaps i failed. Or the solution is that it simply refers to no object at all.

@bames53 2013-07-24 00:59:10

@JohannesSchaub-litb As I read it, the int32_t member is designated because it's the one named, that object is distinct from the float member object, and as far as I can tell none of the cases where using one object's name ends up referring to a different object (e.g., 3.8/7) apply so the name y doesn't refer to the x object currently using y's storage. The spec definitely leaves something to be desired in defining the lifetime of union members.

@bames53 2013-07-24 01:04:13

@JohannesSchaub-litb And by 'C99 guarantee' I mean the guarantee that reading a non-active member will reinterpret the stored representation of the active member. In C11 they added an explicit note to that effect, but I haven't looked at the C spec enough to know if the normative language of the C99 spec truly does imply what the C11 note claims.

@Ben Voigt 2013-07-24 01:07:37

Howard, I notice that 9.5 says about unions "Each non-static data member is allocated as if it were the sole member of a struct." but "A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. ]" Should the second rule read standard-layout class and not standard-layout struct?

@Ben Voigt 2013-07-24 01:07:57

The second quote is 9.2/19

@jogojapan 2013-07-24 01:45:36

@JohannesSchaub-litb Just to clarify: When I said "rule" in my comment above, I didn't mean an explicit statement in the Standard. I meant a general rule that C++ programmers are expected to comply with, on the basis of an interpretation of the explicit rules of the Standard. My assumption that there is a rule of "don't set one member and then read another" for unions was based (among other things) on the interpretation given by ecatmur in the question you link to.

@davmac 2015-09-22 13:45:04

@barnes53 "In C11 they added an explicit note to that effect" - it's also in C99 as of Technical Corregendum 3 (published in 2007).

@Antonio 2016-10-24 09:06:45

Why this answer is wrong is explained very clearly here.

@plasmacel 2018-07-12 09:13:44

Based on the answers here I made a modern "pseudo-cast" function for ease of application.

C99 version (while most compilers support it, theoretically could be undefined behavior in some)

template <typename T, typename U>
inline T pseudo_cast(const U &x)
{
    static_assert(std::is_trivially_copyable<T>::value && std::is_trivially_copyable<U>::value, "pseudo_cast can't handle types which are not trivially copyable");

    union { U from; T to; } __x = {x};
    return __x.to;
}

Universal versions (based on the accepted answer)

Cast types with the same size:

#include <cstring>

template <typename T, typename U>
inline T pseudo_cast(const U &x)
{
    static_assert(std::is_trivially_copyable<T>::value && std::is_trivially_copyable<U>::value, "pseudo_cast can't handle types which are not trivially copyable");
    static_assert(sizeof(T) == sizeof(U), "pseudo_cast can't handle types with different size");

    T to;
    std::memcpy(&to, &x, sizeof(T));
    return to;
}

Cast types with any sizes:

#include <cstring>

template <typename T, typename U>
inline T pseudo_cast(const U &x)
{
    static_assert(std::is_trivially_copyable<T>::value && std::is_trivially_copyable<U>::value, "pseudo_cast can't handle types which are not trivially copyable");

    T to = T(0);
    std::memcpy(&to, &x, (sizeof(T) < sizeof(U)) ? sizeof(T) : sizeof(U));
    return to;
}

Use it like:

float f = 3.14f;
uint32_t u = pseudo_cast<uint32_t>(f);

@Jan Hudec 2013-07-22 14:26:39

The cast invokes undefined behaviour. No matter what form of cast you use, it will still be undefined behaviour. It is undefined no matter what type of cast you use.

Most compilers will do what you expect, but gcc likes being mean and is probably going to assume you didn't assign the pointers despite all indication you did and reorder the operation so they give some strange result.

Casting a pointer to incompatible type and dereferencing it is an undefined behaviour. The only exception is casting it to or from char, so the only workaround is using std::memcpy (as per R. Martinho Fernandes' answer). (I am not sure how much it is defined using unions; It does stand a better chance of working though).

That said, you should not use C-style cast in C++. In this case, static_cast would not compile, nor would dynamic_cast, forcing you to use reinterpret_cast and reinterpret_cast is a strong suggestion you might be violating strict aliasing rules.

@James Kanze 2013-07-22 14:30:02

There are different reasons for undefined behavior. In this case, of course, there's no way the standards committee could define it, particularly in the face of possible trapping representations for float (usual) or int (rare, but can exist). On the other hand, the intend is clearly that it should have the behavior which someone familiar with the architecture would expect. A compiler which breaks this, when the cast is immediately visible, is simply broken.

@James Kanze 2013-07-22 14:31:30

@R.MartinhoFernandes He's casting pointers, not float nor int, and static_cast won't convert between arbitrary pointers.

@Jan Hudec 2013-07-22 14:32:58

@JamesKanze: The reason for the undefined behaviour is that the specification explicitely undefines it.

@Jan Hudec 2013-07-22 14:35:55

@JamesKanze: Compiler that breaks this when the cast is immediately visible is conforming to the specification. Gcc does just that (in fact I think it won't break it as written, because the int pointer is cast back, but it would almost certainly read the original value back from the original variable, because it deleted the dependency and would consider the value left in floating pointer register valid.)

@James Kanze 2013-07-22 14:36:27

@JanHudec That's circular reasoning. The standards committee didn't decide to make it undefined because the standard says it is undefined; they had specific motives. And other language in the same section makes it clear that the intent was for the behavior to be "unsurprising" for someone familiar with the architecture of the machine.

@James Kanze 2013-07-22 14:38:17

@JanHudec A compiler which ignores aliases due to a reinterpret_cast in the immediate block is broken. It may be conform to the letter of the standard, but it clearly violates the intent.

@Zan Lynx 2013-07-22 14:52:08

@JamesKanze: Better to break right away in ALL situations rather than work sometimes and not others. If you move the cast out of the function and it silently breaks would you be happy?

@Casey 2013-07-22 14:54:41

...all of which makes this a case where the compiler should "do the right thing" with the code when it can, and throw up red flags all over the place to warn you that you are teetering on the edge. Remember that the standard's requirement that the compiler diagnose nonconforming code doesn't preclude the compiler from producing a working program.

@James Kanze 2013-07-22 16:41:01

@Casey Good point. A warning when you pass a cast pointer to a function would definitely be in order.

@James Kanze 2013-07-22 16:44:49

@ZanLynx In other words, you're saying that the compiler should break aliasing with union as well. The situation is basically the same: historically, both casting and unions were used for type punning. For whatever reasons, the C committee decided to bless casting (although only via intent, because you cannot define the behavior), but a good compiler will support both, in the same situations. (And of course, the technique with memcpy is just as much undefined behavior as the others, for the same reasons.)

@Jan Hudec 2013-07-23 06:02:37

@JamesKanze: No, the memcpy technique is defined. The exact values are implementation defined, but they must be the value represented by the same sequence of bytes in memory. On the other hand type punning is undefined and the compiler may do anything it pleases. For union I don't remember the specification exactly, but the rules are basically similar.

@James Kanze 2013-07-23 07:58:09

@JanHudec The behavior using memcpy is only defined if you copy back into the original type. The same concerns as for reinterpret_cast are involved: the standard cannot define the results of copying the bit pattern of an int into a float (even if the two types have the same size), since it does not want to constrain the implementation. On the machines I know, for example, copying an int into a float, then accessing the float, may cause the process to abort (if the bits in the int correspond to a trapping NaN, for example).

@Jan Hudec 2013-07-23 08:03:56

@JamesKanze: No. You are mixing two very different undefineds. The behaviour of reinterpret cast is "Undefined". That means the compiler is free to do anything ranging from what you expect to formatting your harddrive and making daemons flying from your nose. But the memcpy behaviour is "Implementation Defined". The specification says the implementation must copy the actual bytes and only does not say what the value represented by the same bits shall mean. Yes, it may cause fault if the value is a signalling NaN, but it may not involve any daemons. And must be deterministic.

@James Kanze 2013-07-23 08:30:25

@JanHudec Behavior of the reinterpret_cast is implementation defined. (In C++11, it's fully defined if the types are standard layout types, as they are here.) The issue is accessing something as a float, when it hasn't been initialized as a float.

@Jan Hudec 2013-07-23 09:19:40

@JamesKanze: Accessing something as float when it hasn't been initialized as float is also implementation defined. The thing that is undefined is accessing the same memory location as two incompatible types neither of which is char (so called aliasing rules). The char exception allows you to initialize something not as float (via memcpy or other standard library functions taking buffer) and access it as, say, float and still be (implementation) defined.

@James Kanze 2013-07-23 10:34:46

@JanHudec Accessing something as a float which wasn't initialized as a float is undefined behavior. (Of course, an implementation may choose to define it, but it isn't required to.)

@R. Martinho Fernandes 2013-07-22 14:23:02

Forget casts. Use memcpy.

float xhalf = 0.5f*x;
uint32_t i;
assert(sizeof(x) == sizeof(i));
std::memcpy(&i, &x, sizeof(i));
i = 0x5f375a86 - (i>>1);
std::memcpy(&x, &i, sizeof(i));
x = x*(1.5f - xhalf*x*x);
return x;

The original code tries to initialize the int32_t by first accessing the float object through an int32_t pointer, which is where the rules are broken. The C-style cast is equivalent to a reinterpret_cast, so changing it to reinterpret_cast would not make much difference.

The important difference when using memcpy is that the bytes are copied from the float into the int32_t, but the float object is never accessed through an int32_t lvalue, because memcpy takes pointers to void and its insides are "magical" and don't break the aliasing rules.

@plasmacel 2013-07-22 14:25:30

Yeah, is it faster too, or just because of the clarity?

@Jan Hudec 2013-07-22 14:25:57

@plasmacel: No. It's obviously slower. But it is actually defined.

@Konrad Rudolph 2013-07-22 14:27:55

@Jan Why is it “obviously” slower? I don’t find that obvious at all.

@R. Martinho Fernandes 2013-07-22 14:31:34

@plasmacel Mostly because it works. I wouldn't expect any discernible difference in performance from a reasonable compiler. (Yes, I am sure MSVC will prove me wrong...)

@Casey 2013-07-22 14:31:47

It's only slower if the compiler does a horrible job of optimizing it. Here's proof that gcc 4.8 compiles the cast and memcpy version to the same assembly. And unknown version of clang.

@James Kanze 2013-07-22 14:34:18

@Casey For some definition of "horrible". It requires some special handling in the compiler to optimize this, and a C++ compiler is probably in the right to consider that calls to memcpy are so rare that they don't matter, and to not bother. (Of course, most C++ compilers share some parts with C, where it does matter.)

@plasmacel 2013-07-22 14:34:31

The only question that the original code was right in C? Or why is it implemented (by Silicon Graphics, Quake 3, etc...) in such way?

@Casey 2013-07-22 14:36:35

@JamesKanze Yes, "horrible" is hyperbole. Nevertheless, I'd expect a current mainstream compiler to optimize the memcpy to a register copy.

@R. Martinho Fernandes 2013-07-22 14:37:39

@Casey just for completeness, the clang running on Coliru is a 3.4 trunk build (coliru.stacked-crooked.com/…)

@Casey 2013-07-22 14:38:27

@plasmacel The original code has been illegal in C for the same reason since C99.

@Mike Seymour 2013-07-22 14:42:37

@plasmacel: If memory serves, that code would have been valid C (and probably C++ too) when it was written in the early 1990s. I don't think the aliasing rules existed before C++98 and C99.

@Casey 2013-07-22 14:49:08

I fear that my "proof that it compiles to the same assembly" comment above will be construed by someone to indicate that it's reasonable to write code that breaks the aliasing rules. Realize that this is relying on undefined behavior, and just as in any other case of undefined behavior it may produce different results with different compilers, or different compiler versions, or on Tuesday instead of Monday, .... etc. The most dangerous possible outcome of undefined behavior is to do what you expect during testing.

@Jeffrey Yasskin 2013-07-25 15:10:19

@MikeSeymour, C89 said, "An object shall have its stored value accessed only by an lvalue that has one of the following types: the declared type of the object, a qualified version of the declared type of the object, [signed or unsigned variants], an aggregate or union type that includes one of the aforementioned types among its members [], or a character type" in 6.3 Expressions. That's basically the same as the C99 wording and bans the original code.

@Jeffrey Yasskin 2013-07-25 15:30:47

You should use uint32_t for this, although I think a signed overflow is impossible with this code.

@rustyx 2018-07-02 08:19:43

How is memcpy "magical"? Where is it specified? It must be implemented somehow, right?

@R. Martinho Fernandes 2018-07-02 13:34:52

@rustyx it's magical in the sense that it is a primitive that doesn't behave in the same way a user implementation would. By treating it as a primitive, the compiler can do all sorts of magical things with it as e.g. in Casey's code above. It doesn't really have to be implemented; in fact in many compilers it is a compiler intrinsic; attempts at implementing it yourself are likely to end up turned into the intrinsic itself, e.g. gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

@supercat 2018-07-13 18:32:36

@R.MartinhoFernandes: Unfortunately, if the destination of memcpy doesn't have a declared type, compilers are allowed to "magically" fail to update objects whose effective type doesn't match the source, and which will next be used as that other type.

@Stephen Canon 2013-08-11 01:46:53

There are a few good answers here that address the type-punning issue.

I want to address the "fast inverse square-root" part. Don't use this "trick" on modern processors. Every mainstream vector ISA has a dedicated hardware instruction to give you a fast inverse square-root. Every one of them is both faster and more accurate than this oft-copied little hack.

These instructions are all available via intrinsics, so they are relatively easy to use. In SSE, you want to use rsqrtss (intrinsic: _mm_rsqrt_ss( )); in NEON you want to use vrsqrte (intrinsic: vrsqrte_f32( )); and in AltiVec you want to use frsqrte. Most GPU ISAs have similar instructions. These estimates can be refined using the same Newton iteration, and NEON even has the vrsqrts instruction to do part of the refinement in a single instruction without needing to load constants.

@doron 2013-07-22 14:39:53

Take a look at this for more information on type punning and strict aliasing.

The only safe cast of a type into an array is into a char array. If you want one data address to be switchable to different types you will need to use a union

@jogojapan 2013-07-22 14:40:55

Using a union for type punning doesn't conform with the Standard either (although it typically works).

@doron 2013-07-22 14:50:10

@jogojapan - strictly speaking type-punning unions do violate the strict aliasing rule and are not defined but they are a common enough idiom for GCC to support them (and no generated the warning)

@jogojapan 2013-07-22 14:53:08

The strict aliasing rule doesn't mention them (at least not explicitly). But the rules for unions state that you must only read from the member of the union that was last set. For type punning you have to write one member and then read another.

@Casey 2013-07-22 15:03:43

@jogojapan 3.10/10 does explicitly mention unions in bullet 6. Accessing the value stored in an object through a glvalue whose type is a union containing the dynamic type of the object as a member is defined behavior. You may violate the "last member set" rule, but it does not violate aliasing. </nitpick>

@jogojapan 2013-07-23 00:34:32

@Casey So it mentions unions after all. But that still supports what I said.

@jogojapan 2013-07-23 03:15:43

@Casey and All, my claim about the rules for unions might have to be revised in light of Howard Hinnant's answer and comment. +1 for bringing up unions first.

@Jan Hudec 2013-07-23 08:21:18

The linked article says that using union not only is also undefined behaviour, but that it actually is getting optimized to nonsense by at least one compiler.

@Johannes Schaub - litb 2013-07-23 19:09:30

@jogojapan there is no such rule that forbids reading another member, for unions.

@Howard Hinnant 2013-07-23 19:59:13

I've opened a query about this on the Core Working Group mailing list.

@Johannes Schaub - litb 2013-07-23 20:59:53

@Casey that rule is intended for this case Object *o = ...; int *p = &o->i; f(o, p);, such that when the function f changes param1->i = 10;, it cannot cache *param2 and must rearead from memory. I'm pretty sure to apply it to union or struct members like you did (do you wanna read a struct A { float y; int x; } by an int lvalue?) is a misapplication.

@Casey 2013-07-23 21:21:19

@JohannesSchaub-litb No, I want to read a float or int through a union {float f; int i;} lvalue, which 3.10/10 ("the aliasing rule") does not describe as undefined behavior. Hence my statement that such an access does not violate the aliasing rule.

@Casey 2013-07-23 22:00:08

@HowardHinnant It would be great to clear up the language in the standard about how unions/aliasing/object lifetimes interact. Failing that, an explicit endorsement in the standard that "implementations must support type punning using this technique: ..." would make it possible to answer questions like this one without having reams and reams of discussion.

@Johannes Schaub - litb 2013-07-24 07:26:15

@casey i don't think that direction is ok either, based on my read of the C rationale document, because it gives a completely different rationale for that (and note that you not only have the union lvalue, but still the other member lvalue involved. So you still have the float vs int conflict).

@Antonio 2016-10-24 09:04:15

In C++ unions for type punning lead to undefined behaviour. Considering that memcpy provides a safe and efficient alternative, I see only danger in promoting the idea that unions are a good practical way for type punning.

@James Kanze 2013-07-22 14:27:15

The only cast that will work here is reinterpret_cast. (And even then, at least one compiler will go out of its way to ensure that it won't work.)

But what are you actually trying to do? There's certainly a better solution, that doesn't involve type punning. There are very, very few cases where type punning is appropriate, and they all are in very, very low level code, things like serialization, or implementing the C standard library (e.g. functions like modf). Otherwise (and maybe even in serialization), functions like ldexp and modf will probably work better, and certainly be more readable.

Related Questions

Sponsored Content

31 Answered Questions

46 Answered Questions

[SOLVED] What's the best way to trim std::string?

  • 2008-10-19 19:23:07
  • Milan Babuškov
  • 716007 View
  • 832 Score
  • 46 Answer
  • Tags:   c++ trim stdstring

24 Answered Questions

[SOLVED] What does the C++ standard state the size of int, long type to be?

  • 2009-02-26 07:59:23
  • Jérôme
  • 1238465 View
  • 706 Score
  • 24 Answer
  • Tags:   c++ c++-faq

28 Answered Questions

[SOLVED] Easiest way to convert int to string in C++

4 Answered Questions

[SOLVED] Interpreting lower part of an uint64_t as float

  • 2020-05-15 19:35:55
  • millow
  • 106 View
  • 0 Score
  • 4 Answer
  • Tags:   c++ c++11

2 Answered Questions

[SOLVED] Is this type punning well-defined?

5 Answered Questions

[SOLVED] Unions and type-punning

2 Answered Questions

4 Answered Questions

[SOLVED] Does this pointer casting break strict aliasing rule?

Sponsored Content