By Kyle Strand


2015-08-21 04:53:14 8 Comments

EDIT: This question was not intended as a forum for discussion about the (de)merits of undefined behavior, but that's sort of what it became. In any case, this thread about a hypothetical C-compiler with no undefined behavior may be of additional interest to those who think this is an important topic.


The classic apocryphal example of "undefined behavior" is, of course, "nasal demons" — a physical impossibility, regardless of what the C and C++ standards permit.

Because the C and C++ communities tend to put such an emphasis on the unpredictability of undefined behavior and the idea that the compiler is allowed to cause the program to do literally anything when undefined behavior is encountered, I had assumed that the standard puts no restrictions whatsoever on the behavior of, well, undefined behavior.

But the relevant quote in the C++ standard seems to be:

[C++14: defns.undefined]: [..] Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). [..]

This actually specifies a small set of possible options:

  • Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons).
  • Behaving in a documented manner characteristic of the environment -- this actually sounds relatively benign. (I certainly haven't heard of any documented cases of nasal demons.)
  • Terminating translation or execution -- with a diagnostic, no less. Would that all UB would behave so nicely.

I assume that in most cases, compilers choose to ignore the undefined behavior; for example, when reading uninitialized memory, it would presumably be an anti-optimization to insert any code to ensure consistent behavior. I suppose that the stranger types of undefined behavior (such as "time travel") would fall under the second category--but this requires that such behaviors be documented and "characteristic of the environment" (so I guess nasal demons are only produced by infernal computers?).

Am I misunderstanding the definition? Are these intended as mere examples of what could constitute undefined behavior, rather than a comprehensive list of options? Is the claim that "anything can happen" meant merely as an unexpected side-effect of ignoring the situation?

EDIT: Two minor points of clarification:

  • I thought it was clear from the original question, and I think to most people it was, but I'll spell it out anyway: I do realize that "nasal demons" is tongue-in-cheek.
  • Please do not write an(other) answer explaining that UB allows for platform-specific compiler optimizations, unless you also explain how it allows for optimizations that implementation-defined behavior wouldn't allow.

9 comments

@Mehrdad 2015-08-21 04:56:57

Yes, it permits anything to happen. The note is just giving examples. The definition is pretty clear:

Undefined behavior: behavior for which this International Standard imposes no requirements.


Frequent point of confusion:

You should understand that "no requirement" also means means the implementation is NOT required to leave the behavior undefined or do something bizarre/nondeterministic!

The implementation is perfectly allowed by the C++ standard to document some sane behavior and behave accordingly.1 So, if your compiler claims to wrap around on signed overflow, logic (sanity?) would dictate that you're welcome to rely on that behavior on that compiler. Just don't expect another compiler to behave the same way if it doesn't claim to.

1Heck, it's even allowed to document one thing and do another. That'd be stupid, and it'd probably make you toss it into the trash—why would you trust a compiler whose documentation lies to you?—but it's not against the C++ standard.

@Kyle Strand 2015-08-21 05:08:39

Gaaaaaaah, I missed that line and (obviously) interpreted the bit I quoted as the actual definition.

@supercat 2015-08-21 06:31:29

It's interesting, however, to compare the normative examples which presumably reflected the intended meaning of the phrase, with the behaviors of modern compilers. I've seen no evidence whatsoever that the authors of the Standard intended that compilers would use Undefined Behavior to determine what inputs a program would or would not receive.

@T.C. 2015-08-21 06:41:05

@supercat Examples and notes are not normative.

@R.. 2015-08-21 14:51:03

@supercat: It was quite obvious that the intent was essentially to "determine what inputs a program would not receive" - it's just that compilers were not so advanced at the time. For example, the whole point of x<<n being UB when n is equal to or exceeds to width of x's type is that the compiler can simply assume n doesn't and not have to implement complex and costly logic for what to do in that case. Conceptually there is no difference in making this optimization and performing other more advanced DCE based on UB.

@edmz 2015-08-21 17:50:38

To show that everything could happen, here UB triggers a forbidden system call on the host machine that would create a socket.

@supercat 2015-08-21 18:39:17

@R..: I would interpret the intention of UB with "<<" as "Programmers won't use x<<y with y values above the word size except when writing for a platform whose behavior for such an action meets their requirements. When writing code for a platform which guarantees that shifting a uint32_t by any value from 32 to 127 will yield zero, exploiting such behavior can allow certain kinds of code to be much more efficient than would be possible if it had to add conditional logic for such values. More notably, the fact that p<q with unrelated pointers p and q yields UB...

@supercat 2015-08-21 18:43:05

...was never intended to say that code which is written for platforms that define a non-overlapping ranking to all pointers should refrain from using expressions like p >= base && p < base+length to determine whether p identifies part of an object. On platforms which do not define a natural non-overlapping ranking for pointers, it's unclear what p<q should mean, but I am unaware of any evidence that K&R ever intended that such an expression shouldn't allow all pointers to be ranked on platforms where it can practically do so.

@Yakk - Adam Nevraumont 2015-08-21 19:14:59

@supercat except if we know base is the start of an object that is length long, it would be more efficient to replace p >= base && p < base+length with true than actually fiddle with bits and stuff, and it would be conforming. This is infinitely faster than actually doing the work. An infinite times slowdown sure seems impractical!

@supercat 2015-08-21 19:34:07

@Yakk: Are you implying that K&R intended any anyone needing to determine whether p identifies an address within the indicated object should use the portable for (i<0; i<length; i++) {if ((char*)base+i==p) return 1; }return 0; rather than return p>=base && p<base+length; even on platforms which could guarantee that the latter would achieve the same effect? I would think it more likely that K&R intended that some algorithms would be practical on machines which could yield the required semantics for the latter code, and that the existence of machines that couldn't run the algorithms...

@supercat 2015-08-21 19:34:32

...shouldn't preclude the use of the algorithms on machines that could run them.

@Yakk - Adam Nevraumont 2015-08-21 19:38:09

@supercat No, I'm saying that your use of "practical" is impractically vague. Sure, you'll know it when you see it. And compilers today are free to state that their pointers exist in a flat memory space. Some compilers choose not to make (many) guarantees beyond the standard, and exploit that freedom. Others compilers don't. Practical programmers either have to restrict their code to one version of one compiler using one standard, or code against the standard. Try to only dip into undefined behavior with lots of warnings and if the payoff is great, ideally asserting compiler versions.

@supercat 2015-08-21 20:07:37

@Yakk: Until recently, compiler writers recognized that the efficiencies that would be made possible by constraining the effects of various forms of UB-invoking actions greatly outweighed the benefits that could be obtained by making the effects of such actions unpredictable. Most programs are subject to the requirement that when fed invalid input they may produce a wide range of possible outputs, but must not launch nuclear missiles. Allowing overflow to have a constrained range of effects would allow a compiler to make useful overflow-related optimizations...

@supercat 2015-08-21 20:12:41

...while allowing source text to be focused on a program's main purpose. Allowing the compiler to do anything it likes with overflow will force a programmer subject to the above constraints to write code which avoids all possible overflows, doesn't allow any overflow-related optimizations, and is virtually guaranteed to be sub-optimal on at least some platforms. Consider int mulcomp(int a, int b, int c, int d) {return a*b > c*d;}, but with an added constraint that the function must not launch nuclear missiles. The function as written would meet constraints optimally on both...

@supercat 2015-08-21 20:18:31

...the majority of 16-bit compilers for the 8086, and on the TMS 32C050 DSP (with 16-bit int, but a 16x16+32->32 multiply-accumulate unit). Doing the multiplies on unsigned values and then converting to int would be much slower than computing (long)a*b-(long)c*d > 0, but on the 8086 performing the comparison as long would require the compiler to spend extra code saving the high word from the first multiply and then comparing it to the high word of the second. I would suggest that efficiency losses from such coding are apt to dwarf any gains that couldn't be achieved by other means.

@R.. 2015-08-21 21:49:04

@supercat: You're neglecting the biggest benefit of making the effects of UB undesirable: breaking non-portable code and forcing people to write portable code. The value of this cannot be understated.

@supercat 2015-08-21 22:56:44

@R..: For many kinds of applications, if is possible to write readable code which when run through a variety of compilers for a variety of platforms will perform efficiently and correctly, but it is not possible to write strictly-compliant code which will be anywhere near as readable or efficient (or in some cases, meet requirements at all). In what way would portability be better served by requiring programs to use an anemic language, versus standardizing some small but important guarantees which many compilers have offered that make programming much more practical?

@R.. 2015-08-21 23:51:43

@supercat: I don't buy the argument that it's not possible to write portable code. It's certainly practical to write code without UB and that only depends on implementation-defined behavior in ways that are testable at compile-time, e.g. testing for CHAR_BIT==8, type_MIN!=-type_MAX (2s complement/full-range), defined(UINTn_MAX), etc. This is a much less precarious situation to be in than depending on undefined behavior to be endowed with untestable properties you expect it to have on real-world systems.

@supercat 2015-08-22 01:32:50

@R..: Why not make the necessary properties testable? Let code specify what it needs, and not bother trying to support the platforms it's never going to run on anyway. Or, better yet, define compiler intrinsics to do things which are awkward in C but frequently easy in machine code (e.g. store a 32-bit word as four char values MSB-first). Even on x86, making an intrinsic for that yield optimal code (a byte-swap instruction followed by a word store) would be easier than having a compiler recognize all of the ways that programmers might try to code such a thing.

@R.. 2015-08-22 05:03:48

@supercat: Now you're getting off into a separate topic which is feature-creeping the language. There are as many feature requests for C as there are users, but if even 1% of them were entertained, it would be as big a mess as C++.. :-) In any case that's a separate topic from UB.

@Random832 2015-08-22 07:42:46

@supercat "a platform whose behavior for such an action they believe meets their requirements. "

@supercat 2015-08-22 17:23:52

@R..: If one examines the corpus of existing C programs, one will find certain target platform behaviors which are useful, widely supported, and sometimes relied upon. Rather than declaring programs using such behaviors illegitimate, it would be far more useful to catalog such behaviors and have standardized means by which programs could indicate which behaviors they require. This would improve robustness, portability, and efficiency. It's not necessary to catalog every behavior that any compiler has ever supported, since the biggest benefits would come from identifying those that are...

@supercat 2015-08-22 17:36:30

...most widely supported. On the other hand, since the only burden on a compiler writer from adding a precisely-delineated behavior to the catalog would generally be adding extra line to a header file to identify the behavior, and having the compiler indicate whether it's supported or not, there shouldn't be much of a "burden of proof" in favor of cataloging any particular behavior. If many programmers end up indicating a behavioral requirement which a compiler can meet with optimizations off, but can't meet with optimizations on, and if slightly constraining the optimizer's behavior...

@supercat 2015-08-22 17:37:53

...would allow the compiler to meet that requirement, then the authors of that compiler would be able to see that they could improve the performance of a lot of code by offering an intermediate setting, rather than trying to guess at what optimizations would and would not be useful.

@supercat 2015-08-22 17:42:47

@Random832: True, but that's still nowhere near the claim that programmers assert that UB will not occur. Further, the way compilers treat the word "assume" seems rather different from the normal usage, and doesn't quite fit a causal-sequential universe. In a non-sequential universe of mathematical propositions, if the only way for P too be true is for Q to be true, then an invitation to assume P is an invitation to assume Q, and Q's falsehood would imply that an invitation to assume P would be an invitation to assume anything and everything. I don't think that logic works in a...

@supercat 2015-08-22 17:56:34

...causal-sequential universe, though. If someone is planning to pick up an a parcel from an office, an invitation to assume the shipping clerk will have it ready would not imply permission to go on a murderous rampage if he doesn't. Further, even if the only way the shipping clerk could have the package ready would be if some other task had been performed, an invitation to assume the package will be ready would not imply permission to take actions which required the other task to have been performed. Instead, the assumption that the package would be ready would allow...

@supercat 2015-08-22 17:59:35

...the courier to be dispatched to pick up the package in advance of its readiness, and a willingness to accept the cost of an wasted visit by the courier should the package not be ready. In everyday life, the ability to grant license to make assumptions for constrained purpose is very useful; I would posit that the ability would be just as useful in programming.

@supercat 2015-08-22 18:04:53

As a programming analogy to the latter case, consider a case where a program will be performing many small fixed-size memory-copy operations and about 0.1% of them would involve the same source and destination. If objects will never lap outside of that situation, but a compiler wouldn't be able to know that, it's entirely possible that neither if (dest != src) memcpy(dest, src, 8);or memmove(dest, src, 8); could be optimized to be as fast as would a version of memcpy() which was required to behave like memmove when source and destination pointers are equal [not quite a nop, since...

@supercat 2015-08-22 18:05:59

...it would be legal to write an object using one type of pointer, memmove it to itself, and then read it using a different type of pointer].

@supercat 2019-03-19 17:28:29

Another frequent point of confusion/contention is that it was historically very common for certain kinds of implementations to share certain behavioral traits in places where the Standard imposes no requirements. Even the authors of the Standard have indicated that the majority of current implementations used two's-complement silent-wraparound semantics on integer overflow and would would process signed and unsigned math identically, regardless of overflow, except in certain specific constructs. Were the authors of the Standard confused, or has "modern C" diverged from their intention?

@alain 2015-08-21 13:34:58

First, it is important to note that it is not only the behaviour of the user program that is undefined, it is the behaviour of the compiler that is undefined. Similarly, UB is not encountered at runtime, it is a property of the source code.

To a compiler writer, "the behaviour is undefined" means, "you do not have to take this situation into account", or even "you can assume no source code will ever produce this situation". A compiler can do anything, intentionally or unintentionally, when presented with UB, and still be standard compliant, so yes, if you granted access to your nose...

Then, it is not always possible to know if a program has UB or not. Example:

int * ptr = calculateAddress();
int i = *ptr;

Knowing if this can ever be UB or not would require knowing all possible values returned by calculateAddress(), which is impossible in the general case (See "Halting Problem"). A compiler has two choices:

  • assume ptr will always have a valid address
  • insert runtime checks to guarantee a certain behaviour

The first option produces fast programs, and puts the burden of avoiding undesired effects on the programmer, while the second option produces safer but slower code.

The C and C++ standards leave this choice open, and most compilers choose the first, while Java for example mandates the second.


Why is the behaviour not implementation-defined, but undefined?

Implementation-defined means (N4296, 1.9§2):

Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int) ). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects. Such documentation shall define the instance of the abstract machine that corresponds to that implementation (referred to as the “corresponding instance” below).

Emphasis mine. In other words: A compiler-writer has to document exactly how the machine-code behaves, when the source code uses implementation-defined features.

Writing to a random non-null invalid pointer is one of the most unpredictable things you can do in a program, so this would require performance-reducing runtime-checks too.
Before we had MMUs, you could destroy hardware by writing to the wrong address, which comes very close to nasal demons ;-)

@Kyle Strand 2015-08-21 14:53:28

Skipping the checks is the same as "ignoring the situation." This could still be a valid optimization with "implementation-defined" behavior, not UB. Also, I understand the halting problem, but see Rust for an example of a low-level language that solved the problem by disallowing null pointers.

@alain 2015-08-21 15:04:00

It's not only null-pointers, signed overflow or division by zero are other examples of things that are generally impossible to forsee at compile-time. Sorry, I didn't understand what you mean with the first two sentences?

@Kyle Strand 2015-08-21 15:08:41

Yes, I realize that Rust has not sidestepped the halting problem, but null pointer dereferencing is one of the most common types of errors, and it's the one you used as an example. My first two sentences are basically saying that your answer doesn't really address UB; yes, in C/C++ dereferencing a null is UB, but it could just as easily have been implementation-defined, which is different (and less permissive).

@alain 2015-08-21 15:14:06

Yes, IIRC Stroustrup regrets having introduced null pointers. This is a great article that explains the advantages of UB: blog.regehr.org/archives/213

@alain 2015-08-21 15:16:33

I'm not sure, but I think "implementation-defined" would not leave the freedom of completely ignoring the situation, which is what enables better performance. There are other invalid pointer values than null, which make checks quite complicated.

@Kyle Strand 2015-08-21 15:22:17

Stroustrup didn't invent them, but yes, the inventor called them a million-dollar mistake.

@alain 2015-08-21 15:25:00

:-) Yes "million dollar mistake" was the term, now I remember too.

@supercat 2015-08-21 19:11:24

@KyleStrand: The primary "mistake" associated with null pointers was the failure of some compilers to trap on arithmetic operations which would turn a null pointer into a seemingly-valid pointer. If a compiler traps on any attempt to dereference a null-pointer, and on any pointer arithmetic which would attempt to turn a null pointer into something else, the only potential harmful consequence of null pointers would be to delay the point where a program terminates (on a system without a null-pointer concept, the act of reading an uninitialized pointer would yield an immediate trap, rather...

@supercat 2015-08-21 19:11:45

...than allow for the possibility that the code might notice that the pointer is null before trying to do anything with it.

@Kyle Strand 2015-08-21 19:14:09

@supercat When you say "if a compiler traps on...", are you talking about detecting possibly-null pointers at compile-time, or about inserting dynamic checks, or something else?

@supercat 2015-08-21 19:42:25

@KyleStrand: Inserting code to check whether a pointer is null before performing arithmetic on it, hopefully keeping track of which pointers have been checked so as to minimize the added overhead. Note that if a p is non-null, a compiler can and should assume that a pointer formed by adding or subtracting an integer will likewise be non-null, so null checks can generally be hoisted out of loops.

@alain 2015-08-21 21:25:16

@KyleStrand I tried to address implementation-defined vs. undefined in an edit.

@M.M 2015-08-22 01:45:11

The behaviour of the compiler isn't undefined. The compiler is not supposed to format your harddrive, or launch missiles, or crash . What's undefined is the behaviour of an executable (if any) which the compiler produces.

@M.M 2015-08-22 01:46:06

"UB is not encountered at runtime, it is a property of the source code." - it comes in both varieties. UB may be encountered at run-time, for example dividing by an integer input by the user without checking that they didn't input 0

@alain 2015-08-22 10:05:14

@MattMcNabb I just asked this question here: stackoverflow.com/questions/32154832/…

@alain 2015-08-22 11:38:24

@MattMcNabb What I meant was: The compiler may treat if(i != 0) as if(true) after the division statement, so this UB has an effect at compile time.

@Ray 2015-08-21 20:40:19

One of the reasons for leaving behavior undefined is to allow the compiler to make whatever assumptions it wants when optimizing.

If there exists some condition that must hold if an optimization is to be applied, and that condition is dependent on undefined behavior in the code, then the compiler may assume that it's met, since a conforming program can't depend on undefined behavior in any way. Importantly, the compiler does not need to be consistent in these assumptions. (which is not the case for implementation-defined behavior)

So suppose your code contains an admittedly contrived example like the one below:

int bar = 0;
int foo = (undefined behavior of some kind);
if (foo) {
   f();
   bar = 1;
}
if (!foo) {
   g();
   bar = 1;
}
assert(1 == bar);

The compiler is free to assume that !foo is true in the first block and foo is true in the second, and thus optimize the entire chunk of code away. Now, logically either foo or !foo must be true, and so looking at the code, you would reasonably be able to assume that bar must equal 1 once you've run the code. But because the compiler optimized in that manner, bar never gets set to 1. And now that assertion becomes false and the program terminates, which is behavior that would not have happened if foo hadn't relied on undefined behavior.

Now, is it possible for the compiler to actually insert completely new code if it sees undefined behavior? If doing so will allow it to optimize more, absolutely. Is it likely to happen often? Probably not, but you can never guarantee it, so operating on the assumption that nasal demons are possible is the only safe approach.

@Kyle Strand 2015-08-21 20:43:48

Sigh. Did you read my edit, asking people not to post answers about optimization unless these answers clearly distinguish what makes UB better for optimization than "implementation-defined" behavior? Also, I was asking what the standard permits, not why it permits it, so this technically doesn't answer the question--although I do appreciate the defense of UB, since I am increasingly opposed to the idea of UB in general.

@Ray 2015-08-21 21:09:29

The ability to be inconsistent one of the big differences. sizeof(int) is implementation-defined, but it's not going to change from 4 to 8 halfway through the program. If it was undefined, it could. Implementation-defined things also tend to have additional restrictions: e.g. sizeof(int) * CHAR_BIT must be at least 16, whereas if it was undefined, it could be or do anything at all.

@Kyle Strand 2015-08-21 21:12:14

That sounds like a useful distinction to include in your answer.

@Kyle Strand 2015-08-21 21:12:54

...ah, I see that you've done so.

@Ray 2015-08-21 21:19:17

You might also want to look at stackoverflow.com/a/2397995/5196093. That answer includes the standard's definitions of undefined/implementation defined/unspecified. It doesn't say whether it's quoting the C or C++ standard, but I don't believe they differ on this.

@supercat 2015-08-22 18:25:21

Given int i=INT_MAX; long l1,l2; i+=function_returning_one(); l1=i; second_function(); l2=i; I would not consider it "surprising" for l1 to yield INT_MAX+1u and l2 to yield -INT_MAX-1; indeed, on many DSPs such behavior would be a likely result (the compiler would add 16-bit value i to a 32-bit accumulator, store the result in both i and l2, call second_function();, load i (16 bits), and store it to l2. Writing the code as i=(int)((unsigned)i+function_returning_one()); would cause l1 and l2 to yield -INT_MAX-1, but would make the code less readable and less...

@supercat 2015-08-22 18:27:17

...efficient. If the programmer would be perfectly happy with l1 and l2 holding any values congruent to 32768 mod 65536, and doesn't care if they match, I see little purpose to requiring the programmer to add clutter the source code with language which makes the function harder to read (incidentally, even if i were type int16_t, I'm not sure if the Standard would guarantee that the latter statement would work correctly if invoked with negative values, since it would from what I can tell be legal [though odd] for a compiler to use two's-complement representations for negative numbers...

@supercat 2015-08-22 18:31:29

...but define unsigned-to-signed casts as yielding 32767 for all unsigned values 32768 and up.

@Allen 2015-08-21 19:56:21

Undefined behaviors allow compilers to generate faster code in some cases. Consider two different processor architectures that ADD differently: Processor A inherently discards the carry bit upon overflow, while processor B generates an error. (Of course, Processor C inherently generates Nasal Demons - its just the easiest way to discharge that extra bit of energy in a snot-powered nanobot...)

If the standard required that an error be generated, then all code compiled for processor A would basically be forced to include additional instructions, to perform some sort of check for overflow, and if so, generate an error. This would result in slower code, even if the developer know that they were only going to end up adding small numbers.

Undefined behavior sacrifices portability for speed. By allowing 'anything' to happen, the compiler can avoid writing safety-checks for situations that will never occur. (Or, you know... they might.)

Additionally, when a programmer knows exactly what an undefined behavior will actually cause in their given environment, they are free to exploit that knowledge to gain additional performance.

If you want to ensure that your code behaves exactly the same on all platforms, you need to ensure that no 'undefined behavior' ever occurs - however, this may not be your goal.

Edit: (In respons to OPs edit) Implementation Defined behavior would require the consistent generation of nasal demons. Undefined behavior allows the sporadic generation of nasal demons.

That's where the advantage that undefined behavior has over implementation specific behavior appears. Consider that extra code may be needed to avoid inconsistent behavior on a particular system. In these cases, undefined behavior allows greater speed.

@Kyle Strand 2015-08-21 19:58:00

But (as mentioned in comments on various other answers already), the standard also defines "implementation-defined behavior"--which provides just as much flexibility for optimization.

@Allen 2015-08-21 20:19:59

A good point. Perhaps I should have said that Processor C occasionally generates nasal demons. Lets pretend that the energy discharged from a nasal nanobot only has a certain probability of creating a nasal demon, say 50%. Now, in this hypothetical universe, the "undefined behavior" allows a more speed optimized processor to be manufactured while still remaining standards compliant. If implementation-defined behavior were required, this would not be the case, as "50% chance of demon" probably doesn't make the cut as "something consistent"

@Kyle Strand 2015-08-21 20:21:51

And yet it's still a far cry from "insert malicious code," which is also allowed by the existing definition of UB.

@Allen 2015-08-21 20:32:35

It was probably just easier to say "you can do whatever you want" as opposed to trying to list off all of the things that you can and can't do. Sure, on the PC platform you typically generate nasal demons from an external USB device... that probably won't happen by accident with an electronic computer... but it might accidentally happen on a Turing complete Ouija board. Not all computers will necessarily be electronic, so not all nasal demons must be from intentionally malicious code. Some could just be from unsafe code.

@Kyle Strand 2015-08-21 20:34:59

If you are actually interested in some of the historical reasons for and uses of UB, see supercat's answer and comments on other answers. Also note that my question is not actually asking for a rationale behind UB, although I do happen to think that it ought to be removed from the standard.

@Allen 2015-08-21 20:36:21

I was merely trying to answer why it would allow for something as absurd as nasal demons...

@Zan Lynx 2015-08-21 22:05:04

@KyleStrand: Write correct C code and nothing will go wrong. The standard shouldn't change. If you do want particular behavior, compilers have been growing options and intrinsics to do what you want explicitly. C is about fast code. I recommend Java, C#, Go, etc. for hand holding.

@Kyle Strand 2015-08-21 22:17:03

@ZanLynx "Just don't mess up" is unacceptable as advice. Developers are human beings, and human beings are imperfect. I don't think it's "hand-holding" for a developer's tools to help them avoid things like, say, accidentally launching nuclear missiles (an extremely contrived example of potential UB, but one given as an example in one of the links posted in the comments beneath my question) due to an honest mistake--and yes, the only times I have invoked UB have been honest mistakes.

@Zan Lynx 2015-08-21 22:18:12

@KyleStrand: As I said, if you want hand holding go to another language. C is barely higher than assembly with all the power and peril that implies.

@Kyle Strand 2015-08-21 22:20:12

@ZanLynx Sure, C is barely higher than assembly (though that does not imply that UB is necessary to get the same benefits in terms of optimization and flexibility!). But C++ is a supposedly "high-level" language. This could mean that it offers some protection from bad behavior via compile-time checks, and indeed this is sometimes what happens. But in practice, it also means that the pitfalls are much more subtle than they are in C, which I think is a bad thing.

@Kyle Strand 2015-08-21 22:22:35

@ZanLynx And sure, maybe you think it's reasonable to use a language that's so full of cat-killing and black holes (more examples stolen from comments), because you have a lot of experience, you're very careful, you have a deep understanding of the language, etc, etc. But can you guarantee that all of your co-workers are equally good at avoiding UB? Can you guarantee every library you use has no UB? Can you guarantee that if you ever have a rough night of sleep and wake up tired but you have some code that needs to get finished, that you'll be as good at avoiding UB as you usually are?

@Kyle Strand 2015-08-21 22:24:07

@ZanLynx Java, C#, Go, etc have runtime overhead that may be considered unacceptable. But what about a language like Rust, which offers the same flexibility and high-level compile-time abstractions as C++, but has much better safety guarantees? Is that too much "hand-holding"? What's wrong with hand-holding, anyway?

@supercat 2015-08-22 14:50:40

@ZanLynx: Assembly language is less error-prone than modern C. In assembly language, if memory location which held a no-longer valid pointer should hold null, one can safely test for that with something like ldr r1,[r0] / cmp r1,#0 / bne oops and know the assembler won't do anything weird. In a sensible C compiler for most platforms, assert(*q==null); should be safe. If q isn't null, either the assertion will fail, terminating the program, or the system will detect that q is an invalid pointer and terminate the program. Hyper-modern C, however, believes that if the compiler...

@supercat 2015-08-22 14:53:21

...determines that q can't be non-null without the comparison invoking UB, it should not only remove the comparison, but it should also remove other code which it recognizes as having no usefulness outside such cases, possibly causing behaviors even worse than those the assertion was designed to protect against.

@Kyle Strand 2015-08-22 22:36:22

@supercat I'm glad I asked this question if for no other reason than to indirectly inspire all your comments.

@Waters 2015-08-21 13:30:30

Undefined behavior is simply the result of a situation coming up that the writers of the specification did not foresee.

Take the idea of a traffic light. Red means stop, yellow means prepare for red, and green means go. In this example people driving cars are the implementation of the spec.

What happens if both green and red are on? Do you stop, then go? Do you wait until red turns off and it's just green? This is a case that the spec did not describe, and as a result, anything the drivers do is undefined behavior. Some people will do one thing, some another. Since there is no guarantee about what will happen you want to avoid this situation. The same applies to code.

@Muzer 2015-08-21 13:42:13

That's not necessarily the case in C/C++. In many cases, undefined behaviour was deliberately foreseen, and deliberately left undefined. In C/C++, undefined behaviour is something defined in the spec and explicitly given for a few examples. I have no reason to believe that everyone working on the first standard just didn't think about what should happen when a NULL pointer is dereferenced. Instead, they probably deliberately left it undefined so that the compiler didn't have to special-case it, slowing down code.

@Waters 2015-08-21 13:51:53

True, those fall under the "why would you do that?" category. Like if there was a wreck at the intersection but the light was green. You don't just drive.

@Kyle Strand 2015-08-21 14:56:29

See supercat's answer. Also, the range of possible things compilers are allowed to do when encountering UB is, frankly, insane; if the reason for UB were simply "we can't predict everything in advance," the only necessary directive would be "ignore the situation."

@chux 2015-08-21 17:46:04

If a traffic light appears malfunctioning, treat like a stop sign. If code is malfunctioning, treat it cautiously, but continue on as able.

@supercat 2015-08-21 19:49:21

@Muzer: I think a bigger reason for UB is to allow for the possibility of code taking advantage of platform features which would be useful in some situations but bothersome in others. On some machines, overflow-trapped integer arithmetic is the normal behavior and non-trapped arithmetic is expensive. On other machines, integer arithmetic that overflows generally wraps, and overflow trapping would be very expensive. For the Standard to mandate either trapping or non-trapping behavior would not only increase the cost of all arithmetic on one platform or the other, but to add insult...

@supercat 2015-08-21 19:54:52

...to injury, code which wanted to compute x+y using the the disfavored behavior and was written for hardware implementing that behavior would have to add additional logic to achieve the required behavior, and all of the added logic would run extra-slowly because of the logic included in the compiler. Thus, something that should have translated as add r1,r2,r3 would instead end up as some huge monstrosity which could quite plausibly be less than 10% fast as the optimal code that could have met requirements if overflow had been UB.

@Muzer 2015-08-23 19:00:26

@supercat but the point of C has always been portability. If you have code that does different things on different platforms, therefore, except where that's really necessary and what you want (eg things like inline assembly), your code is broken. You should therefore be coding to AVOID these situations. So compilers being able to turn this behaviour into anything at all, and mercilessly taking advantage of such a situation, is, in my mind, perfectly valid. People should NEVER have EVER relied on ANY behaviour that's potentially different between compilers/architectures.

@supercat 2015-08-23 20:54:06

@Muzer: There are two ways a language can be portable: (1) The language requires platforms to implement consistent behaviors independent of the platform upon which it is running; Java, for example, requires that int be 32 bits, long 64 bits, and that integer shifts be performed mod 32 and long shifts mod 64; (2) Features of the language (e.g. the size of int, overflow semantics, etc.) vary by platform, such that compilers for the language can avoid having to generate inefficient code to emulate behaviors which the hardware doesn't support well, and which programmers...

@supercat 2015-08-23 20:59:48

...may not want anyway. C was designed for the second kind of portability. It was never intended for the first. Nowadays, the market is totally dominated by hardware which support two's-complement 8, 16, 32, and 64-bit types without padding, where all pointers are ranked, where integer arithmetic naturally supports partially-indeterminate-value semantics on overflow, etc. and most programs will never need to run on platforms without those characteristics. Further, although C was designed in an era in which most programs were never expected to receive maliciously-crafted inputs,...

@supercat 2015-08-23 21:13:54

...the world in which today's programs run is very different environment. If the authors of C standards wish the language to remain useful, they should acknowledge these realities and define some normative standards for behavior so that programs whose two requirements are: (1) Generate correct output given valid input; (2) Generate arbitrary output, within broad constraints, when given invalid input, don't have to write extra code to constrain the compiler more tightly than they want or need to. If a programmer wants to compute x+y when it's representable as int, and...

@supercat 2015-08-23 21:28:06

...would be equally happy with it yielding any value congruent to the mathematical integer value of x+y mod 4294967296, I would posit that the most natural and readable way to express that when compiling for hardware platform that can implement such semantics (are there any modern ones that can't?) would be to write it as x+y. Having overflow return partially-indeterinate values will facilitate optimizations which will be unavailable if the programmer has to write code that avoids overflows at all costs.

@Muzer 2015-08-24 09:17:08

@supercat But C provides sizeof which does allow the first kind of portability. I'm not necessarily defending the existence of undefined behaviour, I just don't agree with your claim that it was originally designed to allow programmers to be able to actually use that behaviour in real code, just that it would mean something different on different platforms.

@supercat 2015-08-24 15:10:01

@Muzer: Things like INT_MAX make it possible for programs to accommodate a range of platforms, but it's not generally possible without extreme awkwardness to write code which will run correctly on every possible standard-conforming compiler where it compiles, since most programs will rely upon behaviors for which no standard testing macros exist. For example, a lot of code assumes that given int16_t x; the expression (int16_t)(uint16_t)x will always equal x, and while that may be true of all production C compilers, I don't think anything in the Standard would forbid a compiler...

@Muzer 2015-08-24 15:19:00

@supercat but that behaviour (unsigned to signed int) is implementation-defined, not undefined. And I would generally hesitate about writing code that makes such assumptions, even though they might be completely safe today, except maybe in an embedded environment where porting the code to anything else would make zero sense.

@supercat 2015-08-24 15:19:22

...from specifying that (int16_t)32768 will normally yield -32768, but will yield 24601 if the program is launched with the of command-line argument of "EASTEREGG" and the value was not used in a place where the Standard required a constant expression.

@Waters 2015-08-24 15:21:11

@Muzer: embedded code is routinely ported between different microprocessors, and even crosscompiled onto x86.

@Muzer 2015-08-24 15:23:04

@Waters By that I meant the sort of embedded code that inherently depends on the precise hardware configuration, as opposed to more general embedded code, ie I meant "in (an embedded environment where porting the code to anything else would make zero sense)" rather than "in an embedded environment (where porting the code to anything else would make zero sense)". I'm fully aware that plenty of embedded code is written to be portable.

@supercat 2015-08-24 15:25:35

@Muzer: The decision of whether to leave something undefined or implementation-defined seems to have been predicated upon whether on some platforms or for some kinds of applications it might be useful to have the compiler generate code that would trap or raise a signal in a fashion outside the Standard's jurisdiction. From a requirements perspective, there's no semantic difference between "Division by zero may yield an indeterminate value or cause a trap whose behavior an implementation should, but need not, document" versus simply saying it yields "Undefined Behavior".

@supercat 2015-08-24 16:38:41

@Muzer: Further, the way the Standard is defined makes it excessively difficult for embedded code to avoid UB when being ported from one platform to another. For example, given int16_t checksum; void updateChecksumMulti(uint16_t dat, uint16_t n) { checksum += dat*n; } will never yield Undefined Behavior if run on an 8-bit or 16-bit machine, but may yield Undefined Behavior on a 32-bit machine. Wile writing the expression as 1u*dat*n; would work, and hopefully an embedded compiler would be smart enough not to actually do the multiply, I'd suggest that makes the code less clear.

@Muzer 2015-08-25 08:45:19

@supercat Err, I don't believe that should produce undefined behaviour only for a machine that uses 32-bit ints. On the contrary, I believe it produces undefined behaviour for a machine that uses 16 bits or 8 bits. The C standard defines that for values directly representable in a smaller type, the result is implementation defined, which is what will happen on 32-bit; on all platforms though, dat*n might cause a signed overflow of checksum which is undefined. The fix is obvious, make checksum unsigned.

@supercat 2015-08-25 13:20:23

@Muzer: I meant checksum to be uint16_t, but I typoed, making behavior Implementation-defined in machines with 16 bits rather than fully defined, but it's still better than the situation with 32-bit ints, where it would invoke Undefined Behavior if the product of dat and n exceeds 2147483647.

@Muzer 2015-08-25 14:31:38

@supercat OK, very good point, that is pretty horrible. I'd missed that as I was distracted by the bigger issue! I think the main issue here is that C's integer promotion rules are all kinds of messed up. Ugh.

@supercat 2015-08-25 16:54:59

Although C99 integer types as defined had the advantage of being back-portable to C89 compilers, they suffer from extremely murky semantics. IMHO, the Standard should define unumN_t and uwrapN_t types, such that unumN_t would, if defined, behave essentially as uintN_t behaves on machines where int is larger than N bits, and uwrapN_t would behave essentially as uintN_t behaves on machines where int is smaller; machines with any int size would be allowed to define any uwrapN_t and uintN_t for any N as compiler intrinsics if they can achieve the proper behavior.

@supercat 2015-08-21 06:13:09

One of the historical purposes of Undefined Behavior was to allow for the possibility that certain actions may have different potentially-useful effects on different platforms. For example, in the early days of C, given

int i=INT_MAX;
i++;
printf("%d",i);

some compilers could guarantee that the code would print some particular value (for a two's-complement machine it would typically be INT_MIN), while others would guarantee that the program would terminate without reaching the printf. Depending upon the application requirements, either behavior could be useful. Leaving the behavior undefined meant that an application where abnormal program termination was an acceptable consequence of overflow but producing seemingly-valid-but-wrong output would not be, could forgo overflow checking if run on a platform which would reliably trap it, and an application where abnormal termination in case of overflow would not be acceptable, but producing arithmetically-incorrect output would be, could forgo overflow checking if run on a platform where overflows weren't trapped.

Recently, however, some compiler authors seem to have gotten into a contest to see who can most efficiently eliminate any code whose existence would not be mandated by the standard. Given, for example...

#include <stdio.h>

int main(void)
{
  int ch = getchar();
  if (ch < 74)
    printf("Hey there!");
  else
    printf("%d",ch*ch*ch*ch*ch);
}

a hyper-modern compiler may conclude that if ch is 74 or greater, the computation of ch*ch*ch*ch*ch would yield Undefined Behavior, and as a consequence the program should print "Hey there!" unconditionally regardless of what character was typed.

@Kyle Strand 2015-08-21 15:00:52

Wow. Any idea how we got from "potentially useful" to the current situation, in which much of the C++ community seems adamantly opposed to any attempt to determine the exact behavior of certain compilers upon encountering a situation allowing UB, with the explanation "it doesn't matter, your program has UB"?

@Bwmat 2015-08-21 16:19:20

It's all about the benchmarks

@Lightness Races in Orbit 2015-08-21 17:05:58

No, it's about portability. We live in an interconnected age now with software distributed faster than you can think. We're no longer writing programs for that one dusty supercomputer in the basement. At least, most of us aren't. It's effectively down to a decades-old paradigm shift in programming; there are now tangible practical benefits to coding rigorously to standards (which ideally we'd always have done), and the toolchain writers can take advantage of that to produce really fast and efficient compilers. Why not?!

@supercat 2015-08-21 18:00:03

@LightnessRacesinOrbit: Can you write an strictly-compliant function which behaves as int mulcomp(int a, int b, int c, int d) { return a*b > c*d;} when the values of a*b and c*d are representable as int, and is required to return 1, return 0, or terminate execution otherwise (arbitrary choice), without such a function being much harder to read than the original, and without the optimal code for the "portable" code being significantly slower than the code for the original on at least some platforms?

@supercat 2015-08-21 18:07:29

@LightnessRacesinOrbit: If the goal were to have a usable portable language, the Committee should recognize the existence of some distinct variations (e.g. dialects where p >= object.base && p<object.base+object.size) can be used to test whether p is part of an object, but which can't be implemented on all platforms, versus those which do not allow such comparisons but which can be implemented on more platforms). It should also define some data types which, if supported, would be required to behave consistently on all platforms. As it is, C has two distinct 32-bit signed integer types...

@supercat 2015-08-21 18:10:22

...and two distinct unsigned 32-bit integer types. On platforms where all values of uint32_t are representable as int, subtraction of two uint32_t values will yield a signed result. On platforms where some values of uint32_t are not representable as int, subtraction yields a uint32_t result. Both types are called uint32_t, but their semantics are extremely different. Likewise, on platforms where int is larger than 32 bits, incrementing an int32_t will always have defined behavior. On platforms where int is exactly 32 bits, incrementing int32_t can cause UB.

@supercat 2015-08-21 18:17:44

@LightnessRacesinOrbit: Further, a portable language should define an efficient portable means of packing and unpacking a larger integer type into/from a sequence of smaller ones. Writing *dat++= value & 255; *dat++=(value >> 8) & 255; *dat++ = (value >> 16) & 255; *dat++ = (value >> 24) & 255; may be 100% portable (even for machines where CHAR_BITS > 8, but even on platforms where a single 32-bit store would have yielded correct behavior it would be hard for a compiler to determine that. Given __pack_i32_cle(&dat, value); any compiler could easily produce optimal code.

@giorgim 2015-08-21 19:51:11

I think it is difficult to justify the existence of UB; it's drawbacks are more than benefits

@supercat 2015-08-21 20:27:02

@Giorgi: There is significant benefit to having a category of actions where different platforms may be able to offer a variety of different behavioral guarantees which weren't necessarily compatible (e.g. some may guarantee that adding 1 to MAX_NT will trap, some may guarantee that it will yield MIN_INT, some may guarantee that it will yield some number, not necessarily within the range of int, which is congruent to (MAX_INT+1) mod (MAX_INT+MAX_INT+2, etc.) What's counter-productive is the philosophy that someone who is told "You may assume that someone will clean up after you"...

@supercat 2015-08-21 20:29:04

...would be entitled to do absolutely anything whatsoever he pleases if it turns out that nobody in fact cleans up after him.

@M.M 2015-08-22 01:49:18

@Giorgi there are hundreds of languages that don't have UB , perhaps use one of those instead, instead of trying to make all languages the same

@supercat 2015-08-22 18:15:46

@MattMcNabb: There's a difference between saying that no actions should have undefined behavior, and saying that many of the actions which in C have totally-unconstrained behavior, shouldn't. Optimization opportunities are maximized when compilers offer behavioral guarantees which are as loose as possible without increasing the amount of code programmers have to write to meet requirements. Failure to offer such guarantees will compel programmers to write code which is more verbose, harder to read, slower, and less conducive to optimization than would otherwise have been possible.

@Kyle Strand 2015-09-18 16:42:25

@M.M But lots of people (such as myself) use C or C++ out of necessity; many more (in fact very likely anyone who uses a modern computer for any purpose) use programs written in C and C++, and I suspect that many of the mysterious bugs/errors/crashes/etc with which everyone is familiar are due to UB.

@user5250294 2015-08-21 05:11:58

Nitpicking: You have not quoted a standard.

These are the sources used to generate drafts of the C++ standard. These sources should not be considered an ISO publication, nor should documents generated from them unless officially adopted by the C++ working group (ISO/IEC JTC1/SC22/WG21).

Interpretation: Notes are not normative according to the ISO/IEC Directives Part 2.

Notes and examples integrated in the text of a document shall only be used for giving additional information intended to assist the understanding or use of the document. They shall not contain requirements ("shall"; see 3.3.1 and Table H.1) or any information considered indispensable for the use of the document e.g. instructions (imperative; see Table H.1), recommendations ("should"; see 3.3.2 and Table H.2) or permission ("may"; see Table H.3). Notes may be written as a statement of fact.

Emphasis mine. This alone rules out "comprehensive list of options". Giving examples however does count as "additional information intended to assist the understanding .. of the document".

Do keep in mind that the "nasal demon" meme is not meant to be taken literally, just as using a balloon to explain how universe expansion works holds no truth in physical reality. It's to illustrate that it's foolhardy to discuss what "undefined behavior" should do when it's permissible to do anything. Yes, this means that there isn't an actual rubber band in outer space.

@Kyle Strand 2015-08-21 05:27:40

Re: nitpick: I was inspired to go find that quote in the draft-standard by seeing it quoted from the 2003 standard in another answer. The wording looked very similar, so I don't think the wording has changed much for at least a decade, which is why I felt comfortable quoting from the draft (plus, it's free and online).

@too honest for this site 2016-02-05 01:12:24

The final versions of those standard are not freely available, but behind quite a high paywall, thus cannot be linked. However, the final drafts are identical with the final version in all relevant technical and linguistic aspects. Without those drafts, citations from and references to the standard are actually impossible. So what do you prefer: 1) citation from the final (and in that aspect identical) draft or 2) no citation at all, thus just stating with no foundation at all? (and how do you know there is no rubber band in space?)

@Muzer 2015-08-21 09:56:38

I thought I'd answer just one of your points, since the other answers answer the general question quite well, but have left this unaddressed.

"Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons)."

A situation in which nasal demons could very reasonably be expected to occur with a sensible compiler, without the compiler inserting ANY code, would be the following:

if(!spawn_of_satan)
    printf("Random debug value: %i\n", *x); // oops, null pointer deference
    nasal_angels();
else
    nasal_demons();

A compiler, if it can prove that that *x is a null pointer dereference, is perfectly entitled, as part of some optimisation, to say "OK, so I see that they've dereferenced a null pointer in this branch of the if. Therefore, as part of that branch I'm allowed to do anything. So I can therefore optimise to this:"

if(!spawn_of_satan)
    nasal_demons();
else
    nasal_demons();

"And from there, I can optimise to this:"

nasal_demons();

You can see how this sort of thing can in the right circumstances prove very useful for an optimising compiler, and yet cause disaster. I did see some examples a while back of cases where actually it IS important for optimisation to be able to optimise this sort of case. I might try to dig them out later when I have more time.

EDIT: One example that just came from the depths of my memory of such a case where it's useful for optimisation is where you very frequently check a pointer for being NULL (perhaps in inlined helper functions), even after having already dereferenced it and without having changed it. The optimising compiler can see that you've dereferenced it and so optimise out all the "is NULL" checks, since if you've dereferenced it and it IS null, anything is allowed to happen, including just not running the "is NULL" checks. I believe that similar arguments apply to other undefined behaviour.

@Muzer 2015-08-21 09:58:50

Err, sorry @supercat, I somehow missed the second half of your answer, which also explains this!

@Kyle Strand 2015-08-21 14:48:21

...yes, I realize that if the user asks for nasal demons in certain cases, then they might get summoned in unexpected cases if the program has UB. When I say that certain UB behaviors would require inserting code, I'm talking about completely unexpected behaviors that are not already explicitly written into your code.

@Muzer 2015-08-21 15:05:31

There must be some corner case where it's weirdly more efficient to generate completely new code that takes advantage of UB. I'll dig out some of the articles I read later.

@Kyle Strand 2015-08-21 15:11:02

I'd be interested to see that, but keep in mind the original question could be rephrased as "does the standard really allow arbitrary code insertion for UB", which has already been answered.

@Kyle Strand 2015-08-21 15:12:07

I.e., Mehrdad's answer shows that yes, insertion of code is permissible.

@Muzer 2015-08-21 15:17:38

Indeed, and I'm not disputing that. I was just attempting to answer one of the questions you implied as a tangent to your main question!

@Kyle Strand 2015-08-21 15:23:04

Just making sure!

@supercat 2015-08-21 19:17:32

@Muzer: I suspect that someone developed a static analyzer which could determine how much of the code in a program would be necessary if it never had to handle any input which invoked Undefined Behavior, ran it on a bunch of popular software, and when it revealed that a large portion of the code of such programs could be eliminated, concluded that would be a useful form of optimization, completely ignoring the fact that in many cases the code was relying upon certain behavioral constraints which, while not required by the Standard, were nonetheless satisfied by pretty much every compiler.

@supercat 2015-08-21 19:22:54

@Muzer: The simple fact of the matter is that the set of behaviors defined by the C Standard is insufficient to perform many actions efficiently, but the vast majority of compilers have historically offered some extensions which allowed programs to meet their requirements much more efficiently than would otherwise be possible. For example, on some platforms, given int a,b,c,d; the implementation of a*b>c*d which would be most efficient when values are within range would compute (int)((unsigned)a*b)>(int)((unsigned)c*d), while on other platforms the most efficient function would...

@supercat 2015-08-21 19:27:49

...compute (long)a*b > (long)c*d. There are some platforms where casting to unsigned would be much faster than casting to long (even if they can optimize the int*int->long multiply), and there are others where casting to long would be faster (some DSPs, for example, have multiply-accumulate units that are longer than int). If a function that arbitrarily returns 0 or 1 in case of overflow would meet requirements, allowing a*b > c*d to represent such a function would allow optimal code on both kinds of platforms. Requiring a programmer to write one of the latter forms would prevent...

@supercat 2015-08-21 19:28:46

...compilers for the dis-favored platform from generating optimal code, since they'd be compelled to precisely match corner-case behavior the programmer didn't care about.

@Joshua 2015-08-21 19:57:30

You just saw why sane compilers have a way to suppress the NULL clobber though. When compiling in kernel mode code, for all the compiler knows, *NULL is a reasonable request.

@supercat 2017-10-28 19:38:50

@Joshua: It would be very helpful if compiler writers could recognize a distinction between optimizing based on the assumptions that code won't do weird things when there's no evidence that it will (e.g. given extern char *foo, treating *foo++ = char1; *foo++ = char2;, as foo[0]=char1; foo[1]=char2; foo+=2;), versus trying to assume that code won't do things that evidence would suggest that it does (e.g. assuming that void inc_float(float *f) { *(uint32_t*)f += 1; } won't affect anything of type float). Some compiler writers claim such things would require compilers...

@supercat 2017-10-28 19:41:38

...to be magically omniscient, but the question "does any evidence of weirdness exist" is pretty straightforward. In cases where blocking optimizations in the presence of weirdness would create a real performance problem, it may make sense to work out ways of allowing them when the evidence of weirdness is illusory, but I don't think that would be necessary very often.

@Peter 2015-08-21 05:06:01

The definition of undefined behaviour, in every C and C++ standard, is essentially that the standard imposes no requirements on what happens.

Yes, that means any outcome is permitted. But there are no particular outcomes that are required to happen, nor any outcomes that are required to NOT happen. It does not matter if you have a compiler and library that consistently yields a particular behaviour in response to a particular instance of undefined behaviour - such a behaviour is not required, and may change even in a future bugfix release of your compiler - and the compiler will still be perfectly correct according to each version of the C and C++ standards.

If your host system has hardware support in the form of connection to probes that are inserted in your nostrils, it is within the realms of possibility that an occurrence of undefined behaviour will cause undesired nasal effects.

@supercat 2015-08-21 06:17:23

Historically, the fact that the Standard didn't define a behavior in no way implied that implementations shouldn't do so. Indeed, a number of things which trigger Undefined Behavior do so do so because prior to the ratification of the C Standard, different implementations made two (or more) contradictory guarantees, both of which were relied upon by programs written for those implementations.

@Matthieu M. 2015-08-21 06:48:38

@supercat: Thanks for this! As usual I greatly appreciate your historical insights.

@Peter 2015-08-21 07:05:16

Very true, supercat. There are several reasons behind something being undefined. One of those is that a number of compiler/library vendors - and their customers - did not want to lose particular features that predated the standard. The only way to get consensus was to make such features undefined (or implementation defined, unspecified, etc) and permit implementation freedom.

@supercat 2015-08-21 07:18:46

@Peter: The issue isn't just one of getting people to agree to a Standard. One of the reasons C has thrived is that compilers for various platforms could offer different trade-offs between performance, usability, and robustness, which were tailored to the needs of those platforms' users.

@Peter 2015-08-21 07:33:46

None of that would be possible in standard C, supercat, without specific provisions to permit implementation freedom.

@MSalters 2015-08-21 08:14:50

A good example was dereferencing the null pointer. On SPARC reading that gave you the value 0, and writing silently discarded the result. On MS-DOS, that location held the interrupt table. Try reconciling that.

@Muzer 2015-08-21 13:47:38

@supercat But I believe the standard separately defines "implementation defined" behaviour, which DOES match with what you said. For example, what >> does on signed values is implementation defined (which means something consistent and defined in compiler documentation must happen), whereas what << does on signed values is undefined (which means anything can happen and nobody has to define it). Don't blame compiler writers; it's clear that modern writers of the standard are perfectly fine with what is going on, else they'd just make all the currently undefined behaviour implementation defined!

@supercat 2015-08-21 18:54:25

@Muzer: In order for the Standard to state that an action invokes Implementation-Defined behavior, it must be practical for every implementation to make that action behave consistently. If overflow invoked Implementation-Defined behavior, for example, a platform where ADD instructions would trap on overflow but INC instructions would not would not be allowed to use INC instructions on signed types unless it added its own overflow checking (thus likely negating the purpose of using those instructions in the first place) or else documented the precise circumstances where it would use each...

@supercat 2015-08-21 18:58:07

...instruction (which would likely be impractical, given that such issues may be affected by register allocation, which may be in turn affected by many other factors). I would suggest that there are places where the Standard expressly forbids programs from doing certain things (generally at the syntactic or structural level), and that if the Standard intended to forbid certain things it could have done so.

Related Questions

Sponsored Content

19 Answered Questions

5 Answered Questions

[SOLVED] Undefined behavior and sequence points

9 Answered Questions

11 Answered Questions

[SOLVED] Why is f(i = -1, i = -1) undefined behavior?

0 Answered Questions

3 Answered Questions

[SOLVED] Is something undefined behavior by omission?

2 Answered Questions

3 Answered Questions

Sponsored Content