By Ahmad AL-wazzan


2015-01-03 02:41:11 8 Comments

I was playing around with strcmp when I noticed this, here is the code:

#include <string.h>
#include <stdio.h>

int main(){

    //passing strings directly
    printf("%d\n", strcmp("ahmad", "fatema"));

    //passing strings as pointers 
    char *a= "ahmad";
    char *b= "fatema";
    printf("%d\n",strcmp(a,b));

    return 0;

}

the output is:

-1
-5

shouldn't strcmp work the same? Why is it that I am given different value when I pass strings as "ahmad" or as char* a = "ahmad". When you pass values to a function they are allocated in its stack right?

2 comments

@Shafik Yaghmour 2015-01-03 02:50:57

You are most likely seeing the result of a compiler optimization. If we test the code using gcc on godbolt, with -O0 optimization level, we can see for the first case it does not call strcmp:

movl    $-1, %esi   #,
movl    $.LC0, %edi #,
movl    $0, %eax    #,
call    printf  #

Since your are using constants as arguments to strcmp the compiler is able for perform constant folding and call a compiler intrinsic at compile time and generate the -1 then, instead of having to call strcmp at run-time which is implemented in the standard library and will have a different implementation then a likely more simple compile time strcmp.

In the second case it does generate a call to strcmp:

call    strcmp  #
movl    %eax, %esi  # D.2047,
movl    $.LC0, %edi #,
movl    $0, %eax    #,
call    printf  #

This is consistent with the fact that gcc has a builtin for strcmp, which is what gcc will use during constant folding.

If we further test using -O1 optimization level or greater gcc is able to fold both cases and the result will be -1 for both cases:

movl    $-1, %esi   #,
movl    $.LC0, %edi #,
xorl    %eax, %eax  #
call    printf  #
movl    $-1, %esi   #,
movl    $.LC0, %edi #,
xorl    %eax, %eax  #
call    printf  #

With more optimizations options turned on the optimizer is able to determine that a and b point to constants known at compile time as well and can also compute the result of strcmp for this case as well during compile time.

We can confirm that gcc is using builtin function by building with the -fno-builtin flag and observing that a call to strcmp will be generated for all cases.

clang is slightly different in that it does not fold at all using -O0 but will fold at -O1 and above for both.

Note, that any negative result is an entirely conformant, we can see by going to the draft C99 standard section 7.21.4.2 The strcmp function which says (emphasis mine):

int strcmp(const char *s1, const char *s2);

The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

technosurus points out that strcmp is specified to treat the strings as if they were composed of unsigned char, this is covered in C99 under 7.21.1 which says:

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value).

@davmac 2015-01-03 02:53:15

While this is interesting, it is not really important to the answer. Even if strcmp were called in both cases, it would be perfectly valid for it to return different values for the same input string (so long as the return values were both the same sign etc).

@asimes 2015-01-03 02:56:01

@davmac, seems like the OP wanted to know why the values are different (even though they are both negative)

@davmac 2015-01-03 03:01:02

@asimes my point is that this is only part of the explanation as to why they are different. Or, if you prefer, it is one explanation of why they could be different (remember that different compilers, platforms etc might give different results). But the question implies that the OP does not understand why they can be different, and that it would be wrong to assume that the numerical result must always be the same for the same input strings.

@glglgl 2015-01-03 10:50:34

@davmac It would be allowed for them to be different, but surprising and unlogic. Except if you have something like the "Bastard Compiler From Hell"™...

@technosaurus 2015-01-03 21:25:56

I believe it also mentions that it should be an unsigned comparison such that an extended (negative) ascii value is greater than a standard ascii value

@davmac 2015-01-04 00:46:06

@glglgl it's surprising only if you make assumptions beyond what the spec says. If you use the method as it is intended (to return a value that you then immediately compare with 0), there's no issue. It's only if you start trying to do something else with the return value (eg print it out, as in OP's case) that the behavior becomes surprising. In other words: if you try and do something unusual, you might get surprising results; in which case, it's hardly fair to blame the compiler.

@Shafik Yaghmour 2015-01-04 04:38:26

@davmac I think most people find this unusual because the result is seemingly inconsistent. Unless you spend a lot of time looking at what optimizers do most experienced developers would have to stop and think for a while to explain why the result of calling what appears to be the same function with the same values gives a different result. No one is suggesting this result is not to specifications nor that anyone should rely on it but it validly makes someone question their understanding which is great, not everyone knows how to dig into these things which is why SO is here.

@technosaurus 2015-01-04 07:14:06

You can always pass -fno-builtin-strcmp if you need your {libc's} implementation behavior.

@Shafik Yaghmour 2015-01-04 13:47:11

@technosaurus that is a good point, I actually used -fno-builtin to verify originally but I did not add that detail, I probably will later on when I get some more time. I recently wrote a self-answered question which deals with builtins and constant expressions so I have been thinking about similar stuff a lot recently, you can see it here: Is it a conforming compiler extension to treat non-constexpr standard library functions as constexpr?

@davmac 2015-01-05 01:35:24

@ShafikYaghmour I understand why it surprises people. Again, my point is that the cause of this surprise is not a misunderstanding of strcmp as such but a more general misunderstanding, a belief that everything in C is simple and straightforward and that it is safe to make assumptions about how things will be compiled and how they will work - but it isn't (as I'm sure you're aware). Don't get me wrong - I like your answer (especially as amended) and I don't think it's incorrect.

@davmac 2015-01-03 02:50:20

I think you believe that the value returned by strcmp should somehow depend on the input strings passed to it in a way that is not defined by the function specification. This isn't correct. See for instance the POSIX definition:

http://pubs.opengroup.org/onlinepubs/009695399/functions/strcmp.html

Upon completion, strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2, respectively.

This is exactly what you are seeing. The implementation does not need to make any guarantee about the exact return value - only that is less than zero, equal to zero, or greater than zero as appropriate.

@Iharob Al Asimi 2015-01-03 03:48:52

so it returns a random negative number then? i think it should be anyway deterministic, and there should be an explanation for the observed bahvior.

@Clifford 2015-01-03 03:56:19

@iharob : It is not "random", in the second case, it is the result of 'f' - 'a', but that is itself the result of the specific strcmp() (though I doubt implementations differ in this respect). The first result is explained by Shafic's answer and is compiler (or compiler option) dependent. Either way, you cannot rely on any result other than that guaranteed by the function's standard specification.

@davmac 2015-01-03 05:27:26

@iharob: returning a random negative number would indeed satisfy the spec, but that's not what's happening here. The "explanation for the observed behavior" is that the strcmp implementation(s) returned different values on different occasions - in other words, the observed behavior is dependent on the implementation of the function. We could poke into various reasons why an implementation might give different numerical results (see Shafik's answer) but I personally think the key takeaway is "don't make assumptions about behavior that aren't explicit in the spec".

Related Questions

Sponsored Content

13 Answered Questions

[SOLVED] What is a smart pointer and when should I use one?

4 Answered Questions

[SOLVED] Is this the only return value for strcmp() in C?

  • 2014-07-29 12:19:35
  • Manish Giri
  • 2218 View
  • 2 Score
  • 4 Answer
  • Tags:   c string strcmp

10 Answered Questions

[SOLVED] Improve INSERT-per-second performance of SQLite?

10 Answered Questions

1 Answered Questions

[SOLVED] C strcmp() not returning 0 as expected

  • 2017-06-06 22:11:57
  • Ostküste
  • 449 View
  • 1 Score
  • 1 Answer
  • Tags:   c debugging strcmp

2 Answered Questions

[SOLVED] void function pointer return value in C

3 Answered Questions

[SOLVED] change a string in a function passing a pointer

  • 2016-02-29 18:28:15
  • qdevial
  • 34 View
  • 0 Score
  • 3 Answer
  • Tags:   c

1 Answered Questions

Strcmp return value not correct?

  • 2015-04-09 02:47:19
  • user3413540
  • 484 View
  • 0 Score
  • 1 Answer
  • Tags:   c strcmp

2 Answered Questions

[SOLVED] Why is my implementation of strcmp not returning the proper value?

  • 2012-06-20 08:48:56
  • ordinary
  • 593 View
  • 5 Score
  • 2 Answer
  • Tags:   c++ c

6 Answered Questions

[SOLVED] C null pointer with string literals

Sponsored Content