By Shafik Yaghmour


2014-05-01 20:04:20 8 Comments

As covered in Does initialization entail lvalue-to-rvalue conversion? Is int x = x; UB? the C++ standard has a surprising example in section 3.3.2 Point of declaration in which an int is initialized with it's own indeterminate value:

int x = 12;
{ int x = x; }

Here the second x is initialized with its own (indeterminate) value. — end example ]

Which Johannes answer to this question indicates is undefined behavior since it requires an lvalue-to-rvalue conversion.

In the latest C++14 draft standard N3936 which can be found here this example has changed to:

unsigned char x = 12;
{ unsigned char x = x; }

Here the second x is initialized with its own (indeterminate) value. — end example ]

Has something changed in C++14 with respect to indeterminate values and undefined behavior that has driven this change in the example?

1 comments

@Shafik Yaghmour 2014-05-01 20:04:20

Yes, this change was driven by changes in the language which makes it undefined behavior if an indeterminate value is produced by an evaluation but with some exceptions for unsigned narrow characters.

Defect report 1787 whose proposed text can be found in N39141 was recently accepted in 2014 and is incorporated in the latest working draft N3936:

The most interesting change with respect to indeterminate values would be to section 8.5 paragraph 12 which goes from:

If no initializer is specified for an object, the object is default-initialized; if no initialization is performed, an object with automatic or dynamic storage duration has indeterminate value. [ Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2. — end note ]

to (emphasis mine):

If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17 [expr.ass]). [Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2 [basic.start.init]. —end note] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

  • If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of:

    • the second or third operand of a conditional expression (5.16 [expr.cond]),

    • the right operand of a comma (5.18 [expr.comma]),

    • the operand of a cast or conversion to an unsigned narrow character type (4.7 [conv.integral], 5.2.3 [expr.type.conv], 5.2.9 [expr.static.cast], 5.4 [expr.cast]), or

    • a discarded-value expression (Clause 5 [expr]),

    then the result of the operation is an indeterminate value.

  • If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the right operand of a simple assignment operator (5.17 [expr.ass]) whose first operand is an lvalue of unsigned narrow character type, an indeterminate value replaces the value of the object referred to by the left operand.

  • If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the initialization expression when initializing an object of unsigned narrow character type, that object is initialized to an indeterminate value.

and included the following example:

[ Example:

int f(bool b) {
  unsigned char c;
  unsigned char d = c; // OK, d has an indeterminate value
  int e = d;           // undefined behavior
  return b ? d : 0;    // undefined behavior if b is true
}

end example ]

We can find this text in N3936 which is the current working draft and N3937 is the C++14 DIS.

Prior to C++1y

It is interesting to note that prior to this draft unlike C which has always had a well specified notion of what uses of indeterminate values were undefined C++ used the term indeterminate value without even defining it (assuming we can not borrow definition from C99) and also see defect report 616. We had to rely on the underspecified lvalue-to-rvalue conversion which in draft C++11 standard is covered in section 4.1 Lvalue-to-rvalue conversion paragraph 1 which says:

[...]if the object is uninitialized, a program that necessitates this conversion has undefined behavior.[...]


Footnotes:

  1. 1787 is a revision of defect report 616, we can find that information in N3903

@user2864740 2014-05-01 20:08:54

But why does the example (in the question) change int to unsigned char for both variables?

@Shafik Yaghmour 2014-05-01 20:10:36

@user2864740 b/c in the int case it would now be undefined behavior. It is now only well defined in the case of narrow character types.

@dyp 2014-05-01 20:27:03

IMO it's easier to do proper formatting with code and quotations by just writing it w/o spaces and > and then selecting the text and using the buttons or the shortcuts (CTRL-K for Kode, CTRL-Q for Quotations).

@Casey 2014-05-01 20:27:56

@user2864740 To be specific, fundamental types may have trap representations (e.g., signaling NaN) that do terrible things to the running program. In C and C++, this is represented as a type of undefined behavior. unsigned char is forbidden to have a trap representation, so the new examples have defined behavior.

@Shafik Yaghmour 2014-05-01 20:27:58

@dyp awesome tip, thank you for the formatting changes, I always struggle with formatting by I try to learn from everyones fixes.

@Ben Voigt 2014-05-01 20:36:10

@Casey: While that's true, it isn't the rationale for this rule. In particular, all integral unsigned types are indirectly (by the modulo arithmetic rules) forbidden to have trap representations. But only the unsigned narrow character type(s) fall into this special exemption.

@M.M 2014-05-02 04:07:10

I'm glad to see this DR, it was annoying how C++ always was unclear on the issue of accessing indeterminate ints.

@supercat 2015-04-09 20:00:35

@BenVoigt: Any operations on legitimate values of unsigned integral types are required to yield legitimate values of unsigned literal types, but at least in C it has always been legal for the number of usable bits in an unsigned type larger than unsigned char to be less than the number of bits in the chars that it occupies. Are unsigned types in C++ not allowed to have corrupt or trap representations (which could be created only by operations on the underlying storage or on corrupt values of those types, and not by operations on legitimate values the types themselves)?

@Ben Voigt 2015-04-09 20:14:12

@supercat: Both rules use essentially the same verbiage: "For unsigned narrow character types, each possible bit pattern of the value representation represents a distinct number." vs "Unsigned integers shall obey the laws of arithmetic modulo 2<sup>n</sup> where n is the number of bits in the value representation of that particular size of integer." I guess you're saying character types are unique because of the extra rule "For narrow character types, all bits of the object representation participate in the value representation."

@Ben Voigt 2015-04-09 20:16:59

@supercat: But the language makes evaluation of an indeterminate value of all types other than unsigned char into undefined behavior, regardless of whether all bits of the object representation participate in the value representation.

@supercat 2015-04-10 13:34:06

@BenVoigt: I don't know whether the intention was to avoid extra verbiage "unless all the bits of the ...", though with the evolution of UB in compilers perhaps hyper-modern compiler authors are merely seeking excuses for wacky behavior.

@supercat 2015-04-10 14:26:41

@BenVoigt: Upon further consideration, I think at one at-least-theoretical issue may be that compilers are allowed to use things like CPU registers for local or cached variables, and that even types which wouldn't have any padding if stored in main memory might have padding or trap bits in other legitimate representations. Still, I wish a standards committee would formalize a definition of "implementation-constrained behavior" which would be a cross between UB and implementations-defined behavior: implementations would be required to document what the consequences of something could be, and...

@supercat 2015-04-10 14:33:02

...would be allowed to have those consequences include UB if the documentation expressly stated that, but would be encouraged to state consequences as narrowly as practical. For example, a useful definition of the consequences of reading an uninitialized auto-variable would be to say that it may trap in debug builds and will otherwise yield an arbitrary value which will poison any variable into which it is stored, such that the variable's value may forevermore appear to change at any time for any reason. Pretty severe consequences, but not nearly as severe as having a compiler...

@supercat 2015-04-10 14:35:46

...make reverse-causal inferences that would cause it to ignore any conditions which might cause the variable to be read without having been initialized, especially if the only "use" of the value was to return it down a call chain and ultimately discard it. Unfortunately, I am unaware of any plans to codify such things; trends seem to be going in the reverse direction.

@PSkocik 2019-02-04 07:35:42

What's the rationale for not keeping it undefined? How does it interact with the rule that says using variables that could have been declared register without initializing them results in UB?

@supercat 2019-05-22 20:52:52

@PSkocik: Consider volatile unsigned char x,y; unsigned char test(uint32_t p, uint32_t q) { unsigned char result; if (q & 1) result = x; if (q & 2) result = y; return result; } On some platforms like the ARM, a function that returns unsigned char must exit with a value 0-255 in the 32-bit register R0, but the most efficient conforming machine code for that function would result in R0 holding whatever 32-bit value was passed in p if neither if condition is satisfied.

@supercat 2019-05-22 20:57:23

If the calling code wouldn't care about whether the return value is outside the range 0-255 except in cases where it passes a q value of 1, 2, or 3, requiring that the programmer set the value of result in those cases would result in less efficient code than would be necessary if neither the programmer nor the compiler had any obligation to ensure that the value is 0-255 in such cases.

Related Questions

Sponsored Content

3 Answered Questions

5 Answered Questions

[SOLVED] Undefined behavior and sequence points

9 Answered Questions

11 Answered Questions

[SOLVED] Why is f(i = -1, i = -1) undefined behavior?

5 Answered Questions

[SOLVED] In C++11, does `i += ++i + 1` exhibit undefined behavior?

4 Answered Questions

[SOLVED] Is reading an indeterminate value undefined behavior?

4 Answered Questions

Sponsored Content