By Luchian Grigore

2012-07-07 07:37:00 8 Comments

I was under the impression that accessing an union member other than the last one set is UB, but I can't seem to find a solid reference (other than answers claiming it's UB but without any support from the standard).

So, is it undefined behavior?


@Bo Persson 2012-07-07 07:48:43

The C++11 standard says it this way

9.5 Unions

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

If only one value is stored, how can you read another? It just isn't there.

The gcc documentation lists this under Implementation defined behavior

  • A member of a union object is accessed using a member of a different type (C90

The relevant bytes of the representation of the object are treated as an object of the type used for the access. See Type-punning. This may be a trap representation.

indicating that this is not required by the C standard.

2016-01-05: Through the comments I was linked to C99 Defect Report #283 which adds a similar text as a footnote to the C standard document:

78a) If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

Not sure if it clarifies much though, considering that a footnote is not normative for the standard.

@Luchian Grigore 2012-07-07 07:49:58

I stumbled upon this, but it doesn't really say accessing another value is UB. Also, some value is there since the memory is common to all members.

@ybungalobill 2012-07-07 07:52:19

@LuchianGrigore: UB isn't what standard says is UB, instead it's what the standard doesn't describe how it should work. This is exactly such case. Does the standard describe what happens? Does it say that it's implementation defined? No and no. So it's UB. Moreover, regarding the "members share the same memory address" argument, you'll have to refer to the aliasing rules, which will bring you to UB again.

@Luchian Grigore 2012-07-07 07:53:26

@ybungalobill there are actually loads of places where the standard says the behavior is undefined. Also, it's not clear what "active" means.

@Benjamin Lindley 2012-07-07 07:55:16

@Luchian: It's quite clear what active means, "that is, the value of at most one of the non-static data members can be stored in a union at any time."

@ybungalobill 2012-07-07 07:55:26

@LuchianGrigore: Yes there are. There is infinite amount of cases that the standard does not (and cannot) address. (C++ is a Turing complete VM so it's incomplete.) So what? It does explain what "active" mean, refer to the above quote, after "that is".

@Bo Persson 2012-07-07 07:56:07

It is undefined behaviour to try to read an object that isn't there (like using a dangling pointer). The union only contains one value, the one last written to.

@jxh 2012-07-07 07:59:19

@LuchianGrigore: Omission of explicit definition of behavior is also unconsidered undefined behavior, according to the definitions section.

@Claudiu 2012-07-07 15:52:49

@Bo: Is it in the standard that all members of a union are stored at the same memory address? If so, then it would be standard-defined behavior that when you store the value into one of the union fields, the other fields values change as well. They aren't "not there" - you've just written to a memory location where you know something resides. And it would be entirely predictable behavior if you know the endianness and size of all the members of the union, as well as any padding rules. Why is it not clear that this is well-defined behavior?

@Claudiu 2012-07-07 15:53:54

@BoPersson: e.g. char hello[4]; int *p1 = (int *)hello; *pi = 10; is it now undefined behavior to access hello?

@Mysticial 2012-07-07 16:27:38

@Claudiu That's UB for a different reason - it violates strict aliasing.

@Bo Persson 2012-07-07 17:18:25

@Claudiu - The standard does say that the last written value is there, and nothing else. Many compilers will allow you to try to read the value as another type, but that would be implementation specific.

@Eitan T 2012-08-14 09:27:11

@Bo Persson, from what I understand from the first quote, it simply states that you cannot store two values at once, which is understandable since all members share the same memory. Why is it undefined behaviour? The union is doing exactly what it is supposed to do. It is the "casting" by accessing different members (of different types) that may trigger undefined behavior, no?

@Bo Persson 2012-08-14 09:39:41

@EitanT - Yes, the undefined behavior is trying to read a member that isn't there (and that's not unique to unions :-). A complication is that gcc promises to do its best if you try this type punning, and other compilers want to be gcc compatible, so they allow it too. So it often works in practice, except when it doesn't and you run into things like "This may be a trap representation".

@curiousguy 2015-08-18 16:01:45

@ybungalobill "Does the standard describe what happens?" Yes: you are simply using a lvalue

@curiousguy 2015-08-18 16:02:27

@Mysticial "it violates strict aliasing" how?

@Mysticial 2015-08-18 16:39:10

@curiousguy You can't dereference through an incompatible type pointer. In this case deferencing as int* is not compatible with char*. There is an exception that allows char* to alias with anything, but not the other way around.

@curiousguy 2015-08-18 16:49:20

@Mysticial Pointers don't alias, only lvalues do.

@underscore_d 2015-12-31 13:59:28

Is there a way to stop GCC from letting us do this, or at least provide stern warnings? The lack thereof made me think all type punning was OK up until now. As mentioned on Jerry's excellent answer, I'm just hoping that the proviso for "structs that share a common initial sequence" is going to save my (projects') ass(es) in this situation.

@underscore_d 2016-01-05 10:50:52

What you quoted for GCC is actually taken from the C standard, not implementation-defined. See:… This does not hold for the C++ standard, however, and I'm not sure whether g++ makes the same guarantee for C++.

@Bo Persson 2016-01-05 11:20:33

@underscore - Ok, I haven't read the C standard that closely, so I might have missed some footnotes. :-) I was directed to the GCC documentation by the compiler, when it didn't like some of my code and suggested union type punning instead. To be compatible, the other major compilers will allow this too, so it is kind of a de facto standard on Windows and Linux (where a trap representation isn't present either). Complicated, this...

@underscore_d 2016-01-05 11:35:55

Could you edit to reflect that GCC is following the C standard there? What is implementation defined is that g++ inherits the same rule: . So, yes, it certainly is complicated! It's lucky implementations have de facto standards, especially if you're me and coding C++ where - if my understanding of the Standard is correct - I'm depending on g++ applying the C rules. I hate having to rely on implementation-defined behaviour, but at least it's not un defined and won't delete my code...

@supercat 2016-08-08 16:40:06

@underscore_d: De-facto standards are great when they are respected. The authors of the Standard focused on things which they felt should be required even on platforms where they would be somewhat impractical (e.g. requiring that unsigned types behave as a ring with a power-of-two modulus even on platforms whose arithmetic instructions would wrap mod some other value) but saw no need to mandate behaviors which would be commonplace, practical, and useful on 99% of machines. If the lack of a mandate didn't stop compilers from supporting such behaviors before 1989, there was no reason to...

@supercat 2016-08-08 16:40:28

...think it would discourage them from continuing to support such behaviors in the 21st century.

@elyashiv 2012-08-17 07:00:47

I well explain this with a example.
assume we have the following union:

union A{
   int x;
   short y[2];

I well assume that sizeof(int) gives 4, and that sizeof(short) gives 2.
when you write union A a = {10} that well create a new var of type A in put in it the value 10.

your memory should look like that: (remember that all of the union members get the same location)

       |                   x                   |
       |        y[0]       |       y[1]        |
   a-> |0000 0000|0000 0000|0000 0000|0000 1010|

as you could see, the value of a.x is 10, the value of a.y1 is 10, and the value of a.y[0] is 0.

now, what well happen if I do this?

a.y[0] = 37;

our memory will look like this:

       |                   x                   |
       |        y[0]       |       y[1]        |
   a-> |0000 0000|0010 0101|0000 0000|0000 1010|

this will turn the value of a.x to 2424842 (in decimal).

now, if your union has a float, or double, your memory map well be more of a mess, because of the way you store exact numbers. more info you could get in here.

@Luchian Grigore 2012-08-17 07:08:17

:) This is not what I asked. I know what happens internally. I know it works. I asked whether it's in the standard.

@ecatmur 2012-08-16 23:41:10

The confusion is that C explicitly permits type-punning through a union, whereas C++ () has no such permission. Structure and union members

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

The situation with C++:

9.5 Unions [class.union]

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

C++ later has language permitting the use of unions containing structs with common initial sequences; this doesn't however permit type-punning.

To determine whether union type-punning is allowed in C++, we have to search further. Recall that is a normative reference for C++11 (and C99 has similar language to C11 permitting union type-punning):

3.9 Types [basic.types]

4 - The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values. 42
42) The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.

It gets particularly interesting when we read

3.8 Object lifetime []

The lifetime of an object of type T begins when: — storage with the proper alignment and size for type T is obtained, and — if the object has non-trivial initialization, its initialization is complete.

So for a primitive type (which ipso facto has trivial initialization) contained in a union, the lifetime of the object encompasses at least the lifetime of the union itself. This allows us to invoke

3.9.2 Compound types [basic.compound]

If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

Assuming that the operation we are interested in is type-punning i.e. taking the value of a non-active union member, and given per the above that we have a valid reference to the object referred to by that member, that operation is lvalue-to-rvalue conversion:

4.1 Lvalue-to-rvalue conversion [conv.lval]

A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

The question then is whether an object that is a non-active union member is initialized by storage to the active union member. As far as I can tell, this is not the case and so although if:

  • a union is copied into char array storage and back (3.9:2), or
  • a union is bytewise copied to another union of the same type (3.9:3), or
  • a union is accessed across language boundaries by a program element conforming to ISO/IEC 9899 (so far as that is defined) (3.9:4 note 42), then

the access to a union by a non-active member is defined and is defined to follow the object and value representation, access without one of the above interpositions is undefined behaviour. This has implications for the optimisations allowed to be performed on such a program, as the implementation may of course assume that undefined behaviour does not occur.

That is, although we can legitimately form an lvalue to a non-active union member (which is why assigning to a non-active member without construction is ok) it is considered to be uninitialized.

@mpu 2012-08-17 09:55:02

If I am not mistaken (I only have a draft version of the C99 standard), this explicit paragraph about type punning was not in C99. Though, maybe we can infer it from other information in the standard as you did it for C++. Nevertheless, this addition seems to reveal that it was not clear in previous versions of the standard.

@ecatmur 2012-08-17 10:06:22

@mpu it should be present; look for, footnote 82.

@bames53 2012-10-18 19:35:34

3.8/1 says an object's lifetime ends when its storage is reused. That indicates to me that a non-active member of a union's lifetime has ended because its storage has been reused for the active member. That would mean you're limited in how you use the member (3.8/6).

@ecatmur 2012-10-19 08:05:11

@bames53 good point, but if it has trivial initialization then its lifetime starts again immediately or when the non-active member is accessed (storage with the proper alignment and size for type T is obtained).

@bames53 2012-10-19 14:08:33

Under that interpretation then every bit of memory simultaneously contains objects of all types that are trivially initiallizable and have appropriate alignment... So then does the lifetime of any non-trivially initiallizable type immediately end as its storage is reused for all these other types (and not restart because they're not trivially initiallizable)?

@ecatmur 2012-10-19 14:28:26

@bames53 I don't think that would count as "reuse"; that would require at the least using the object as an lvalue.

@bames53 2012-10-19 15:06:39

I guess 'use' and 'reuse' aren't explicitly defined. Nor is 'obtained'. Is storage obtained every time a non-active member is accessed, or is it only obtained once during the original allocation? Anyway, this answer is a great summary of the issues.

@Ben Voigt 2013-07-24 01:00:10

I've put in a larger excerpt of the rule on (g)lvalue-to-rvalue conversion, since it seems the other part of it could be relevant as well (the object to which the glvalue refers, does that have the type of the active member, and not the type of the glvalue undergoing attempted conversion?)

@Shafik Yaghmour 2014-06-25 12:26:15

You may find some of the references I link in this answer on type-punning interesting. Especially the quote by Pascal Cuoq in my footnote. Also a side question, since you invoke C99 being a normative reference for C++11 do you have a position on Can we apply content not explicitly cited from the normative references to the C++ standard??

@user743382 2014-09-14 10:04:31

The wording 4.1 is completely and utterly broken and has since been rewritten. It disallowed all sorts of perfectly valid things: it disallowed custom memcpy implementations (accessing objects using unsigned char lvalues), it disallowed accesses to *p after int *p = 0; const int *const *pp = &p; (even though the implicit conversion from int** to const int*const* is valid), it disallowed even accessing c after struct S s; const S &c = s;. CWG issue 616. Does the new wording allow it? There's also [basic.lval].

@ecatmur 2014-09-15 07:54:45

@hvd undefined behavior resulting from evaluation of expressions producing indeterminate values has now (cf. n3936) moved to [dcl.init]/12. This resolves the memcpy issue (it's now written in terms of narrow character types).

@user743382 2014-09-15 08:10:19

@ecatmur That issue does say it's about indeterminate values, but it includes all other issues related to that paragraph. The issue with memcpy isn't about indeterminate values: using memcpy to copy an already initialised value doesn't read any indeterminate values.

@curiousguy 2015-08-18 15:33:35

Allowing type punning via unions is a crazy idea of the C committee that cannot be made to work.

@curiousguy 2015-08-18 15:53:21

@hvd "doesn't read any indeterminate values" Even when you read padding bytes?

@user743382 2015-08-18 21:51:22

@curiousguy IIRC I was thinking of simple types (and didn't clarify properly). Fair point, for other types, using memcpy to copy already initialised values can cause reads of indeterminate values.

@supercat 2017-01-20 23:00:16

@curiousguy: Allowing type punning via unions is an idea that works just fine if the lifetime of any object that's contained within another is the lifetime of the container, and object accesses behave as ways of accessing the underlying storage. Neither principle works very well in C++, but both work just fine in the language invented by Dennis Ritchie.

@Omnifarious 2017-03-20 02:20:29

The standard should be changed to explicitly allow type punning via unions given some stringent restrictions on what exactly is in the union. Anything that has a non-trivial construct or destructor would UB if it was one of the punned types. I'm not even sure if such things are allowed to be union members at all.

@supercat 2017-04-23 18:30:11

@Omnifarious: That would make sense, though it would also need to clarify (and the C Standard also needs to clarify, btw) what the unary & operator means when applied to a union member. I would think the resulting pointer should be usable to access the member at least until the next time the next direct or indirect use of any other member lvalue, but in gcc the pointer isn't usable even that long, which raises a question of what the & operator is supposed to mean.

@MikeMB 2017-09-15 12:11:22

One question regarding "Recall that c99 is a normative reference for C++11" Isn't that only relevant, where the c++ standard explicitly refers to the C standard (e.g. for the c library functions)?

@ecatmur 2017-09-15 14:19:09

@MikeMB yes, but [basic.types]/4 footnote 42 says "The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.". It's a bit tenuous, admittedly.

@Demi 2019-01-21 01:25:34

Do all major compilers allow this as an extension?

@mpu 2012-08-16 22:00:52

Something that is not yet mentioned by available answers is the footnote 37 in the paragraph 21 of the section 6.2.5:

Note that aggregate type does not include union type because an object with union type can only contain one member at a time.

This requirement seem to clearly imply that you must not write in a member and read in another one. In this case it might be undefined behavior by lack of specification.

@Luchian Grigore 2012-08-17 03:42:24

That's a good point.

@supercat 2016-09-20 21:28:59

Many implementations document their storage formats and layout rules. Such a specification would in many cases imply what the effect of reading storage of one type and writing as another would be in the absence of rules saying compilers don't have to actually use their defined storage format except when things are read and written using pointers of a character type.

@Jerry Coffin 2012-08-10 18:06:46

I think the closest the standard comes to saying it's undefined behavior is where it defines the behavior for a union containing a common initial sequence (C99, §

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

C++11 gives similar requirements/permission at §9.2/19:

If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

Though neither states it directly, these both carry a strong implication that "inspecting" (reading) a member is "permitted" only if 1) it is (part of) the member most recently written, or 2) is part of a common initial sequence.

That's not a direct statement that doing otherwise is undefined behavior, but it's the closest of which I'm aware.

@Michael Anderson 2012-08-15 08:32:08

To make this complete, you need to know what "layout-compatible types" are for C++, or "compatible types" are for C.

@Jerry Coffin 2012-08-15 15:43:18

@MichaelAnderson: Yes and no. You need to deal with those when/if you want to be certain whether something falls within this exception -- but the real question here is whether something that clearly falls outside the exception truly gives UB. I think that's strongly enough implied here to make the intent clear, but I don't think it's ever directly stated.

@underscore_d 2015-12-31 13:55:22

This "common initial sequence" thing might just have saved 2 or 3 of my projects from the Rewrite Bin. I was livid when I first read about most punning uses of unions being undefined, since I'd been given the impression by a particular blog that this was OK, and built several large structures and projects around it. Now I think I might be OK after all, since my unions do contain classes having the same types at the front

@underscore_d 2015-12-31 14:04:08

@JerryCoffin, I think you were hinting at the same question as me: what if our union contains e.g. a uint8_t and a class Something { uint8_t myByte; [...] }; - I would assume this proviso would also apply here, but it's worded very deliberately to only allow for structs. Luckily I'm already using those instead of raw primitives :O

@Jerry Coffin 2015-12-31 16:00:31

@underscore_d: The C standard at least sort of covers that question: "A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa."

@underscore_d 2015-12-31 16:26:27

Thanks, Jerry. 2 things I wonder from that: I assume it's in a C version/section that's normative for C++? and do rules for pointers (aliasing) implicitly apply to unions (type punning) too? In the course of trying to wrap my head around this, I frequently see these two things discussed as though they're equivalent, but I don't know whether that's correct.

Related Questions

Sponsored Content

9 Answered Questions

[SOLVED] Does "Undefined Behavior" really permit *anything* to happen?

14 Answered Questions

[SOLVED] Purpose of Unions in C and C++

35 Answered Questions

5 Answered Questions

[SOLVED] Undefined behavior and sequence points

9 Answered Questions

11 Answered Questions

[SOLVED] Why is f(i = -1, i = -1) undefined behavior?

1 Answered Questions

[SOLVED] Accessing same-type inactive member in unions

Sponsored Content