2009-06-04 09:17:52 8 Comments
#include <stdio.h>
int main(void)
{
int i = 0;
i = i++ + ++i;
printf("%d\n", i); // 3
i = 1;
i = (i++);
printf("%d\n", i); // 2 Should be 1, no ?
volatile int u = 0;
u = u++ + ++u;
printf("%d\n", u); // 1
u = 1;
u = (u++);
printf("%d\n", u); // 2 Should also be one, no ?
register int v = 0;
v = v++ + ++v;
printf("%d\n", v); // 3 (Should be the same as u ?)
int w = 0;
printf("%d %d\n", ++w, w); // shouldn't this print 1 1
int x[2] = { 5, 8 }, y = 0;
x[y] = y ++;
printf("%d %d\n", x[0], x[1]); // shouldn't this print 0 8? or 5 0?
}
Related Questions
Sponsored Content
8 Answered Questions
[SOLVED] Undefined, unspecified and implementation-defined behavior
- 2010-03-07 21:10:30
- Zolomon
- 51445 View
- 508 Score
- 8 Answer
- Tags: c++ c undefined-behavior unspecified-behavior implementation-defined-behavior
5 Answered Questions
[SOLVED] Undefined behavior and sequence points
- 2010-11-14 05:37:46
- Prasoon Saurav
- 101138 View
- 970 Score
- 5 Answer
- Tags: c++ undefined-behavior c++-faq sequence-points
23 Answered Questions
[SOLVED] Which is faster: while(1) or while(2)?
- 2014-07-20 07:32:49
- Nikole
- 87104 View
- 582 Score
- 23 Answer
- Tags: c performance while-loop
1 Answered Questions
[SOLVED] Is the behavior of i = post_increment_i() specified, unspecified, or undefined?
- 2012-06-25 20:36:19
- user1480833
- 280 View
- 6 Score
- 1 Answer
- Tags: c language-lawyer undefined-behavior operator-precedence sequence-points
9 Answered Questions
11 Answered Questions
[SOLVED] Why is f(i = -1, i = -1) undefined behavior?
- 2014-02-10 06:31:32
- Nicu Stiurca
- 23762 View
- 266 Score
- 11 Answer
- Tags: c++ language-lawyer undefined-behavior
2 Answered Questions
[SOLVED] What is the rationale for this undefined behavior?
- 2015-07-07 12:51:40
- dhein
- 830 View
- 5 Score
- 2 Answer
- Tags: c++ undefined-behavior
5 Answered Questions
[SOLVED] Undefined behavior and sequence points reloaded
- 2011-01-09 08:40:54
- Nawaz
- 11368 View
- 84 Score
- 5 Answer
- Tags: c++ undefined-behavior c++-faq sequence-points
14 comments
@Steve Summit 2015-06-18 11:55:45
Another way of answering this, rather than getting bogged down in arcane details of sequence points and undefined behavior, is simply to ask, what are they supposed to mean? What was the programmer trying to do?
The first fragment asked about,
i = i++ + ++i
, is pretty clearly insane in my book. No one would ever write it in a real program, it's not obvious what it does, there's no conceivable algorithm someone could have been trying to code that would have resulted in this particular contrived sequence of operations. And since it's not obvious to you and me what it's supposed to do, it's fine in my book if the compiler can't figure out what it's supposed to do, either.The second fragment,
i = i++
, is a little easier to understand. Someone is clearly trying to increment i, and assign the result back to i. But there are a couple ways of doing this in C. The most basic way to add 1 to i, and assign the result back to i, is the same in almost any programming language:C, of course, has a handy shortcut:
This means, "add 1 to i, and assign the result back to i". So if we construct a hodgepodge of the two, by writing
what we're really saying is "add 1 to i, and assign the result back to i, and assign the result back to i". We're confused, so it doesn't bother me too much if the compiler gets confused, too.
Realistically, the only time these crazy expressions get written is when people are using them as artificial examples of how ++ is supposed to work. And of course it is important to understand how ++ works. But one practical rule for using ++ is, "If it's not obvious what an expression using ++ means, don't write it."
We used to spend countless hours on comp.lang.c discussing expressions like these and why they're undefined. Two of my longer answers, that try to really explain why, are archived on the web:
See also question 3.8 and the rest of the questions in section 3 of the C FAQ list.
@supercat 2015-06-30 16:14:39
A rather nasty gotcha with regard to Undefined Behavior is that while it used to be safe on 99.9% of compilers to use
*p=(*q)++;
to meanif (p!=q) *p=(*q)++; else *p= __ARBITRARY_VALUE;
that is no longer the case. Hyper-modern C would require writing something like the latter formulation (though there's no standard way of indicating code doesn't care what's in*p
) to achieve the level of efficiency compilers used to provide with the former (theelse
clause is necessary in order to let the compiler optimize out theif
which some newer compilers would require).@Steve Summit 2019-09-23 18:26:18
@supercat I now believe that any compiler that's "smart" enough to perform that sort of optimization must also be smart enough to peek at
assert
statements, so that the programmer can precede the line in question with a simpleassert(p != q)
. (Of course, taking that course would also require rewriting<assert.h>
to not delete assertions outright in non-debug versions, but rather, turn them into something like__builtin_assert_disabled()
that the compiler proper can see, and then not emit code for.)@P.P. 2015-12-30 20:26:30
Often this question is linked as a duplicate of questions related to code like
or
or similar variants.
While this is also undefined behaviour as stated already, there are subtle differences when
printf()
is involved when comparing to a statement such as:In the following statement:
the order of evaluation of arguments in
printf()
is unspecified. That means, expressionsi++
and++i
could be evaluated in any order. C11 standard has some relevant descriptions on this:Annex J, unspecified behaviours
3.4.4, unspecified behavior
The unspecified behaviour itself is NOT an issue. Consider this example:
This too has unspecified behaviour because the order of evaluation of
++x
andy++
is unspecified. But it's perfectly legal and valid statement. There's no undefined behaviour in this statement. Because the modifications (++x
andy++
) are done to distinct objects.What renders the following statement
as undefined behaviour is the fact that these two expressions modify the same object
i
without an intervening sequence point.Another detail is that the comma involved in the printf() call is a separator, not the comma operator.
This is an important distinction because the comma operator does introduce a sequence point between the evaluation of their operands, which makes the following legal:
The comma operator evaluates its operands left-to-right and yields only the value of the last operand. So in
j = (++i, i++);
,++i
incrementsi
to6
andi++
yields old value ofi
(6
) which is assigned toj
. Theni
becomes7
due to post-increment.So if the comma in the function call were to be a comma operator then
will not be a problem. But it invokes undefined behaviour because the comma here is a separator.
For those who are new to undefined behaviour would benefit from reading What Every C Programmer Should Know About Undefined Behavior to understand the concept and many other variants of undefined behaviour in C.
This post: Undefined, unspecified and implementation-defined behavior is also relevant.
@kavadias 2018-10-17 20:20:57
This sequence
int a = 10, b = 20, c = 30; printf("a=%d b=%d c=%d\n", (a = a + b + c), (b = b + b), (c = c + c));
appears to give stable behavior (right-to-left argument evaluation in gcc v7.3.0; result "a=110 b=40 c=60"). Is it because the assignments are considered as 'full-statements' and thus introduce a sequence point? Shouldn't that result in left-to-right argument/statement evaluation? Or, is it just manifestation of undefined behavior?@P.P. 2018-10-18 08:40:02
@kavadias That printf statement involves undefined behaviour, for the same reason explained above. You are writing
b
andc
in 3rd & 4th arguments respectively and reading in 2nd argument. But there's no sequence between these expressions (2nd, 3rd, & 4th args). gcc/clang has an option-Wsequence-point
which can help find these, too.@Antti Haapala 2017-03-26 14:58:07
While the syntax of the expressions like
a = a++
ora++ + a++
is legal, the behaviour of these constructs is undefined because a shall in C standard is not obeyed. C99 6.5p2:With footnote 73 further clarifying that
The various sequence points are listed in Annex C of C11 (and C99):
The wording of the same paragraph in C11 is:
You can detect such errors in a program by for example using a recent version of GCC with
-Wall
and-Werror
, and then GCC will outright refuse to compile your program. The following is the output of gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005:The important part is to know what a sequence point is -- and what is a sequence point and what isn't. For example the comma operator is a sequence point, so
is well-defined, and will increment
i
by one, yielding the old value, discard that value; then at comma operator, settle the side effects; and then incrementi
by one, and the resulting value becomes the value of the expression - i.e. this is just a contrived way to writej = (i += 2)
which is yet again a "clever" way to writeHowever, the
,
in function argument lists is not a comma operator, and there is no sequence point between evaluations of distinct arguments; instead their evaluations are unsequenced with regard to each other; so the function callhas undefined behaviour because there is no sequence point between the evaluations of
i++
and++i
in function arguments, and the value ofi
is therefore modified twice, by bothi++
and++i
, between the previous and the next sequence point.@Steve Summit 2018-08-16 11:54:35
Your question was probably not, "Why are these constructs undefined behavior in C?". Your question was probably, "Why did this code (using
++
) not give me the value I expected?", and someone marked your question as a duplicate, and sent you here.This answer tries to answer that question: why did your code not give you the answer you expected, and how can you learn to recognize (and avoid) expressions that will not work as expected.
I assume you've heard the basic definition of C's
++
and--
operators by now, and how the prefix form++x
differs from the postfix formx++
. But these operators are hard to think about, so to make sure you understood, perhaps you wrote a tiny little test program involving something likeBut, to your surprise, this program did not help you understand -- it printed some strange, unexpected, inexplicable output, suggesting that maybe
++
does something completely different, not at all what you thought it did.Or, perhaps you're looking at a hard-to-understand expression like
Perhaps someone gave you that code as a puzzle. This code also makes no sense, especially if you run it -- and if you compile and run it under two different compilers, you're likely to get two different answers! What's up with that? Which answer is correct? (And the answer is that both of them are, or neither of them are.)
As you've heard by now, all of these expressions are undefined, which means that the C language makes no guarantee about what they'll do. This is a strange and surprising result, because you probably thought that any program you could write, as long as it compiled and ran, would generate a unique, well-defined output. But in the case of undefined behavior, that's not so.
What makes an expression undefined? Are expressions involving
++
and--
always undefined? Of course not: these are useful operators, and if you use them properly, they're perfectly well-defined.For the expressions we're talking about what makes them undefined is when there's too much going on at once, when we're not sure what order things will happen in, but when the order matters to the result we get.
Let's go back to the two examples I've used in this answer. When I wrote
the question is, before calling
printf
, does the compiler compute the value ofx
first, orx++
, or maybe++x
? But it turns out we don't know. There's no rule in C which says that the arguments to a function get evaluated left-to-right, or right-to-left, or in some other order. So we can't say whether the compiler will dox
first, then++x
, thenx++
, orx++
then++x
thenx
, or some other order. But the order clearly matters, because depending on which order the compiler uses, we'll clearly get different results printed byprintf
.What about this crazy expression?
The problem with this expression is that it contains three different attempts to modify the value of x: (1) the
x++
part tries to add 1 to x, store the new value inx
, and return the old value ofx
; (2) the++x
part tries to add 1 to x, store the new value inx
, and return the new value ofx
; and (3) thex =
part tries to assign the sum of the other two back to x. Which of those three attempted assignments will "win"? Which of the three values will actually get assigned tox
? Again, and perhaps surprisingly, there's no rule in C to tell us.You might imagine that precedence or associativity or left-to-right evaluation tells you what order things happen in, but they do not. You may not believe me, but please take my word for it, and I'll say it again: precedence and associativity do not determine every aspect of the evaluation order of an expression in C. In particular, if within one expression there are multiple different spots where we try to assign a new value to something like
x
, precedence and associativity do not tell us which of those attempts happens first, or last, or anything.So with all that background and introduction out of the way, if you want to make sure that all your programs are well-defined, which expressions can you write, and which ones can you not write?
These expressions are all fine:
These expressions are all undefined:
And the last question is, how can you tell which expressions are well-defined, and which expressions are undefined?
As I said earlier, the undefined expressions are the ones where there's too much going at once, where you can't be sure what order things happen in, and where the order matters:
As an example of #1, in the expression
there are three attempts to modify `x.
As an example of #2, in the expression
we both use the value of
x
, and modify it.So that's the answer: make sure that in any expression you write, each variable is modified at most once, and if a variable is modified, you don't also attempt to use the value of that variable somewhere else.
@alinsoar 2017-10-13 13:58:04
A good explanation about what happens in this kind of computation is provided in the document n1188 from the ISO W14 site.
I explain the ideas.
The main rule from the standard ISO 9899 that applies in this situation is 6.5p2.
The sequence points in an expression like
i=i++
are beforei=
and afteri++
.In the paper that I quoted above it is explained that you can figure out the program as being formed by small boxes, each box containing the instructions between 2 consecutive sequence points. The sequence points are defined in annex C of the standard, in the case of
i=i++
there are 2 sequence points that delimit a full-expression. Such an expression is syntactically equivalent with an entry ofexpression-statement
in the Backus-Naur form of the grammar (a grammar is provided in annex A of the Standard).So the order of instructions inside a box has no clear order.
can be interpreted as
or as
because both all these forms to interpret the code
i=i++
are valid and because both generate different answers, the behavior is undefined.So a sequence point can be seen by the beginning and the end of each box that composes the program [the boxes are atomic units in C] and inside a box the order of instructions is not defined in all cases. Changing that order one can change the result sometimes.
EDIT:
Other good source for explaining such ambiguities are the entries from c-faq site (also published as a book) , namely here and here and here .
@haccks 2017-11-24 07:00:15
How this answer added new to the existing answers? Also the explanations for
i=i++
is very similar to this answer.@alinsoar 2017-11-24 12:14:44
@haccks I did not read the other answers. I wanted to explain in my own language what I learned from the mentioned document from the official site of ISO 9899 open-std.org/jtc1/sc22/wg14/www/docs/n1188.pdf
@supercat 2012-12-05 18:30:27
While it is unlikely that any compilers and processors would actually do so, it would be legal, under the C standard, for the compiler to implement "i++" with the sequence:
While I don't think any processors support the hardware to allow such a thing to be done efficiently, one can easily imagine situations where such behavior would make multi-threaded code easier (e.g. it would guarantee that if two threads try to perform the above sequence simultaneously,
i
would get incremented by two) and it's not totally inconceivable that some future processor might provide a feature something like that.If the compiler were to write
i++
as indicated above (legal under the standard) and were to intersperse the above instructions throughout the evaluation of the overall expression (also legal), and if it didn't happen to notice that one of the other instructions happened to accessi
, it would be possible (and legal) for the compiler to generate a sequence of instructions that would deadlock. To be sure, a compiler would almost certainly detect the problem in the case where the same variablei
is used in both places, but if a routine accepts references to two pointersp
andq
, and uses(*p)
and(*q)
in the above expression (rather than usingi
twice) the compiler would not be required to recognize or avoid the deadlock that would occur if the same object's address were passed for bothp
andq
.@Muhammad Annaqeeb 2017-06-10 22:56:26
The reason is that the program is running undefined behavior. The problem lies in the evaluation order, because there is no sequence points required according to C++98 standard ( no operations is sequenced before or after another according to C++11 terminology).
However if you stick to one compiler, you will find the behavior persistent, as long as you don't add function calls or pointers, which would make the behavior more messy.
So first the GCC: Using Nuwen MinGW 15 GCC 7.1 you will get:
}
How does GCC work? it evaluates sub expressions at a left to right order for the right hand side (RHS) , then assigns the value to the left hand side (LHS) . This is exactly how Java and C# behave and define their standards. (Yes, the equivalent software in Java and C# has defined behaviors). It evaluate each sub expression one by one in the RHS Statement in a left to right order; for each sub expression: the ++c (pre-increment) is evaluated first then the value c is used for the operation, then the post increment c++).
according to GCC C++: Operators
the equivalent code in defined behavior C++ as GCC understands:
Then we go to Visual Studio. Visual Studio 2015, you get:
How does visual studio work, it takes another approach, it evaluates all pre-increments expressions in first pass, then uses variables values in the operations in second pass, assign from RHS to LHS in third pass, then at last pass it evaluates all the post-increment expressions in one pass.
So the equivalent in defined behavior C++ as Visual C++ understands:
as Visual Studio documentation states at Precedence and Order of Evaluation:
@Antti Haapala 2017-10-21 10:46:50
I've edited the question to add the UB in evaluation of function arguments, as this question is often used as a duplicate for that. (The last example)
@Antti Haapala 2017-10-21 10:47:24
Also the question is about c now, not C++
@haccks 2015-06-27 00:27:48
Most of the answers here quoted from C standard emphasizing that the behavior of these constructs are undefined. To understand why the behavior of these constructs are undefined, let's understand these terms first in the light of C11 standard:
Sequenced: (5.1.2.3)
Unsequenced:
Evaluations can be one of two things:
Sequence Point:
Now coming to the question, for the expressions like
standard says that:
6.5 Expressions:
Therefore, the above expression invokes UB because two side effects on the same object
i
is unsequenced relative to each other. That means it is not sequenced whether the side effect by assignment toi
will be done before or after the side effect by++
.Depending on whether assignment occurs before or after the increment, different results will be produced and that's the one of the case of undefined behavior.
Lets rename the
i
at left of assignment beil
and at the right of assignment (in the expressioni++
) beir
, then the expression be likeAn important point regarding Postfix
++
operator is that:It means the expression
il = ir++
could be evaluated either asor
resulting in two different results
1
and2
which depends on the sequence of side effects by assignment and++
and hence invokes UB.@unwind 2009-06-04 09:20:59
C has the concept of undefined behavior, i.e. some language constructs are syntactically valid but you can't predict the behavior when the code is run.
As far as I know, the standard doesn't explicitly say why the concept of undefined behavior exists. In my mind, it's simply because the language designers wanted there to be some leeway in the semantics, instead of i.e. requiring that all implementations handle integer overflow in the exact same way, which would very likely impose serious performance costs, they just left the behavior undefined so that if you write code that causes integer overflow, anything can happen.
So, with that in mind, why are these "issues"? The language clearly says that certain things lead to undefined behavior. There is no problem, there is no "should" involved. If the undefined behavior changes when one of the involved variables is declared
volatile
, that doesn't prove or change anything. It is undefined; you cannot reason about the behavior.Your most interesting-looking example, the one with
is a text-book example of undefined behavior (see Wikipedia's entry on sequence points).
@PiX 2009-06-04 09:42:37
I knew it was undefined, (The idea of seing this code in production frighten me :)) but I tried to understand what was the reason for these results. Especially why u = u++ incremented u. In java for example: u = u++ returns 0 as (my brain) expected :) Thanks for the sequence points links BTW.
@ChrisBD 2009-06-04 10:21:25
Obviously because of the brackets around the u++ the compiler has decided to incerement u and then return it. As it is undefined behaviuor in C this is ligitimate. A different compiler or even a different machine and the same one may give a different answer. I do not know java, but perhaps the behaviour is clearly defined.
@Richard 2009-06-04 10:57:18
@PiX: Things are undefined for a number of possible reasons. These include: there is no clear "right result", different machine architectures would strongly favour different results, existing practice is not consistent, or beyond the scope of the standard (e.g. what filenames are valid).
@Laurence Gonsalves 2012-07-30 16:19:24
@PiX Java goes out of its way to have defined behaviors for many things that are undefined in C.
@Pascal Cuoq 2012-11-17 19:01:37
@PaulManta, If you see this, editing answers is not intended for the addition of irrelevant information to already-accepted answers. This is a C question and the answer was fine as it was to describe the situation in C standards from C90 to C11. Editing is for fixing syntax and style.
@user3124504 2014-03-22 11:13:37
unwind you called it undefined behaviour, but is there any explanation for why it is so?
@unwind 2014-03-22 20:01:45
@rusty Not sure what you mean. The term "undefined behavior" is used in the C standard. It means that even though some constructs are syntactically valid and will typically compile, they lead to undefine behavior i.e. they do not make sense and should be avoided since your program is broken if it has undefined behavior.
@M.M 2014-07-10 05:51:00
Just to confuse everyone, some such examples are now well-defined in C11, e.g.
i = ++i + 1;
.@Shafik Yaghmour 2014-07-14 01:18:59
@MattMcNabb that is only well defined in C++11 not in C11.
@Antti Haapala 2017-10-21 10:46:00
I've edited the question to add the UB in evaluation of function arguments, as this question is often used as a duplicate for that. (The last example)
@supercat 2017-12-17 23:12:29
Reading the Standard and the published rationale, It's clear why the concept of UB exists. The Standard was never intended to fully describe everything a C implementation must do to be suitable for any particular purpose (see the discussion of the "One Program" rule), but instead relies upon implementors' judgment and desire to produce useful quality implementations. A quality implementation suitable for low-level systems programming will need to define the behavior of actions that wouldn't be needed in high-end number crunching.applications. Rather than try to complicate the Standard...
@supercat 2017-12-17 23:15:59
...by getting into extreme detail about which corner cases are or are not defined, the authors of the Standard recognized that implementors should be better paced to judge which kinds of behaviors will be needed by the kinds of programs they're expected to support. Hyper-modernist compilers pretend that making certain actions UB was intended to imply that no quality program should need them, but the Standard and rationale are inconsistent with such a supposed intent.
@jrh 2018-01-03 17:56:24
@supercat Well put, I'd recommend adding that to your answer.
@supercat 2018-01-03 18:08:20
@jrh: I wrote that answer before I'd realized how out of hand the hyper-modernist philosophy had gotten. What irks me is the progression from "We don't need to officially recognize this behavior because the platforms where it's needed can support it anyway" to "We can remove this behavior without providing a usable replacement because it was never recognized and thus any code needing it was broken". Many behaviors should have been deprecated long ago in favor of replacements that were in every way better, but that would have required acknowledging their legitimacy.
@pqnet 2018-07-04 13:55:19
Undefined behavior basically allows the compiler to make more assumption about conditions which can only be verified at runtime, e.g. assume that in the expression
*ptr
the pointer is valid, because if it is null the program is allowed to do anything and so it is not necessary to add code to the program to check for that condition and ensure a defined behavior.@David R Tribble 2018-07-31 20:39:04
A the time that C was standardized (1989), many C compilers existed, and each one played by slightly different rules. The primary goal of the ANSI (and later ISO) committee was to codify existing practice. Thus in many cases where multiple compilers disagreed on the "correct" semantic behavior for obviously ambiguous cases (mostly having to do with the evaluation order of expression operators), the committee (wisely) chose to deem such cases as "undefined behavior" or "implementation defined behavior".
@iBug 2018-10-08 01:33:20
If I write
b = (++a)+(++a)+(++a)
, is the value ofa
well-defined?@stillanoob 2018-10-13 12:16:01
@unwind For
u=1; u=u++;
, is it true that what's undefined is the value ofu
after the second statement is executed? I mean, by the rules of sequencing of value evaluation (as opposed to the side-effect evaluation), the expressionu=u++
must be guaranteed to evaluate to1
, right?@Ilmari Karonen 2019-01-31 15:55:02
@stillanoob: No, because the behavior of any code containing that expression is undefined, meaning that it can do literally anything. It might always evaluate to 42, except on Sundays when the moon is waxing gibbous. It might get stuck in an infinite loop instead of evaluating to anything at all. It might jump to a random location in your code. It might crash the process. It might even make your computer catch fire and make demons fly out of your nose, and the C standard still wouldn't care.
@Rajesh 2019-04-01 00:54:25
I understand variable cannot get updated more than once within a sequence point and hence i=i++ is undefined. But why b= i++ + i is also undefined? Here i is not getting updated more than once and I get compiler warning for this too.
@unwind 2019-04-01 08:42:19
@Rajesh Because the operator
+
is not a sequence point. Please read the Wikipedia page.@TomOnTime 2015-04-08 03:20:31
In https://stackoverflow.com/questions/29505280/incrementing-array-index-in-c someone asked about a statement like:
which prints 7... the OP expected it to print 6.
The
++i
increments aren't guaranteed to all complete before the rest of the calculations. In fact, different compilers will get different results here. In the example you provided, the first 2++i
executed, then the values ofk[]
were read, then the last++i
thenk[]
.Modern compilers will optimize this very well. In fact, possibly better than the code you originally wrote (assuming it had worked the way you had hoped).
@Nikhil Vidhani 2014-09-11 12:36:41
The C standard says that a variable should only be assigned at most once between two sequence points. A semi-colon for instance is a sequence point.
So every statement of the form:
and so on violate that rule. The standard also says that behavior is undefined and not unspecified. Some compilers do detect these and produce some result but this is not per standard.
However, two different variables can be incremented between two sequence points.
The above is a common coding practice while copying/analysing strings.
@underscore_d 2016-07-19 18:55:58
Of course it doesn't apply to different variables within one expression. It would be a total design failure if it did! All you need in the 2nd example is for both to be incremented between the statement ending and the next one beginning, and that's guaranteed, precisely because of the concept of sequence points at the centre of all this.
@badp 2010-05-24 13:26:05
Just compile and disassemble your line of code, if you are so inclined to know how exactly it is you get what you are getting.
This is what I get on my machine, together with what I think is going on:
(I... suppose that the 0x00000014 instruction was some kind of compiler optimization?)
@bad_keypoints 2012-09-24 14:11:42
how do i get the machine code? I use Dev C++, and i played around with 'Code Generation' option in compiler settings, but go no extra file output or any console output
@badp 2012-09-24 18:20:05
@ronnieaka
gcc evil.c -c -o evil.bin
andgdb evil.bin
→disassemble evil
, or whatever the Windows equivalents of those are :)@kchoi 2013-09-20 16:07:50
is -0x4(%ebp) = 4 at the end?
@Shafik Yaghmour 2014-07-01 14:00:21
This answer does not really address the question of
Why are these constructs undefined behavior?
.@badp 2014-07-01 16:27:17
@ShafikYaghmour I'm addressing the questions in the question body ("why am I not getting the results I am getting?"), see the comments in the code. Given that this is undefined behaviour, I can only show how to get the actual assembly he's compiled.
@Shafik Yaghmour 2014-07-01 18:12:59
Perhaps the answer is in there but I think most would not be able to figure it out without some elaboration. Just add some explanatory text and it becomes an answer.
@badp 2014-07-01 23:21:25
@ShafikYaghmour I must admit that the assembly is kinda baffling me; especially the instruction at +20. But why am I trying to make sense of it?
@Kat 2015-07-27 20:32:11
As an aside, it'll be easier to compile to assembly (with
gcc -S evil.c
), which is all that's needed here. Assembling then disassembling it is just a roundabout way of doing it.@Steve Summit 2016-02-16 21:26:07
For the record, if for whatever reason you're wondering what a given construct does -- and especially if there's any suspicion that it might be undefined behavior -- the age-old advice of "just try it with your compiler and see" is potentially quite perilous. You will learn, at best, what it does under this version of your compiler, under these circumstances, today. You will not learn much if anything about what it's guaranteed to do. In general, "just try it with your compiler" leads to nonportable programs that work only with your compiler.
@Shafik Yaghmour 2013-08-15 19:25:21
The behavior can't really be explained because it invokes both unspecified behavior and undefined behavior, so we can not make any general predictions about this code, although if you read Olve Maudal's work such as Deep C and Unspecified and Undefined sometimes you can make good guesses in very specific cases with a specific compiler and environment but please don't do that anywhere near production.
So moving on to unspecified behavior, in draft c99 standard section
6.5
paragraph 3 says(emphasis mine):So when we have a line like this:
we do not know whether
i++
or++i
will be evaluated first. This is mainly to give the compiler better options for optimization.We also have undefined behavior here as well since the program is modifying variables(
i
,u
, etc..) more than once between sequence points. From draft standard section6.5
paragraph 2(emphasis mine):it cites the following code examples as being undefined:
In all these examples the code is attempting to modify an object more than once in the same sequence point, which will end with the
;
in each one of these cases:Unspecified behavior is defined in the draft c99 standard in section
3.4.4
as:and undefined behavior is defined in section
3.4.3
as:and notes that:
@Christoph 2009-06-04 09:35:47
I think the relevant parts of the C99 standard are 6.5 Expressions, §2
and 6.5.16 Assignment operators, §4:
@supercat 2011-11-20 21:41:25
Would the above imply that 'i=i=5;" would be Undefined Behavior?
@dhein 2013-09-23 15:39:50
@supercat as far as I know
i=i=5
is also undefined behavior@supercat 2013-09-23 16:18:26
@Zaibis: The rationale I like to use for most places rule applies that in theory a mutli-processor platform could implement something like
A=B=5;
as "Write-lock A; Write-Lock B; Store 5 to A; store 5 to B; Unlock B; Unock A;", and a statement likeC=A+B;
as "Read-lock A; Read-lock B; Compute A+B; Unlock A and B; Write-lock C; Store result; Unlock C;". That would ensure that if one thread didA=B=5;
while another didC=A+B;
the latter thread would either see both writes as having taken place or neither. Potentially a useful guarantee. If one thread didI=I=5;
, however, ...@supercat 2013-09-23 16:19:57
... and the compiler didn't notice that both writes were to the same location (if one or both lvalues involve pointers, that may be hard to determine), the generated code could deadlock. I don't think any real-world implementations implement such locking as part of their normal behavior, but it would be permissible under the standard, and if hardware could implement such behaviors cheaply it might be useful. On today's hardware such behavior would be way too expensive to implement as a default, but that doesn't mean it would always be thus.
@dhein 2013-09-23 16:40:45
@supercat but wouldn't the sequence point access rule of c99 alone be enough to declare it as undefined behavior? So it doesn't matter what technically the hardware could implement?
@supercat 2013-09-23 16:48:53
@Zaibis: Rules which characterize actions as Undefined Behavior aren't supposed to exist merely to allow implementations to behave in hostile fashion. They're supposed to exist to allow implementers to either do something more efficiently or more usefully than would be possible in their absence. To understand why the specs characterize something as UB, it's helpful to identify something useful the rule would allow implementations to do which they otherwise could not.
@dhein 2013-09-23 16:56:34
@supercat I absolutly agree to that what you say about the behavior of undefined behavior(^^). But this doesn't change the point that if something is in the standard listed as UB you can expect, it is well defined just because it would be easy to implement as well defined construct. If the standard says it is UB, then the answer to the question is it UB? is "Yes!", and not "It could... [...]".
@supercat 2013-09-23 17:49:17
@Zaibis: The answer to almost any question of the form "Why is X in language/framework Y Undefined Behavior" is "Because that's what the standard for Y says", but that's hardly enlightening. In most cases, however, what someone asking such a question really wants to know is "Why did the makers of the standard specify that". In most cases, things are specified as UB (rather than partially-specified behaviors) to allow for the possibility of an implementation which might do something unexpected. For example, the spec could have said that
p1=malloc(4); p2=malloc(5); r=p1>p2;
...@supercat 2013-09-23 17:55:26
...may result in
r
arbitrarily holding 1 or 0, with no guarantee that the value will relate in any way to future comparisons among the same or different operands. Such a spec (returning an arbitrary 0 or 1) would have allowed an efficientmemmove
to be written in portable fashion [ifdest > src
, apply a top-down copy, else bottom-up; if the regions don't overlap, either will work so the comparison result wouldn't matter]. I believe the standard says such comparison is UB, however; if every machine could easily--at worst--arbitrarily yield a 0 or 1, there'd be no reason not to say so.