By goe


2009-10-25 16:56:38 8 Comments

I use this code:

while ( scanf("%s", buf) == 1 ){

What would be the best way to prevent possible buffer overflow so that it can be passed strings of random lengths?

I know I can limit the input string by calling for example:

while ( scanf("%20s", buf) == 1 ){

But I'd prefer to be able to process whatever the user inputs. Or can't this be done safely using scanf and I should use fgets?

6 comments

@Jonathan Leffler 2009-10-25 20:21:58

In their book The Practice of Programming (which is well worth reading), Kernighan and Pike discuss this problem, and they solve it by using snprintf() to create the string with the correct buffer size for passing to the scanf() family of functions. In effect:

int scanner(const char *data, char *buffer, size_t buflen)
{
    char format[32];
    if (buflen == 0)
        return 0;
    snprintf(format, sizeof(format), "%%%ds", (int)(buflen-1));
    return sscanf(data, format, buffer);
}

Note, this still limits the input to the size provided as 'buffer'. If you need more space, then you have to do memory allocation, or use a non-standard library function that does the memory allocation for you.


Note that the POSIX 2008 (2013) version of the scanf() family of functions supports a format modifier m (an assignment-allocation character) for string inputs (%s, %c, %[). Instead of taking a char * argument, it takes a char ** argument, and it allocates the necessary space for the value it reads:

char *buffer = 0;
if (sscanf(data, "%ms", &buffer) == 1)
{
    printf("String is: <<%s>>\n", buffer);
    free(buffer);
}

If the sscanf() function fails to satisfy all the conversion specifications, then all the memory it allocated for %ms-like conversions is freed before the function returns.

@Jonathan Leffler 2013-09-11 16:35:51

@Sam: Yes, it should be buflen-1 — Thank You. You then have to worry about unsigned underflow (wrapping to a rather large number), hence the if test. I'd be sorely tempted to replace that with an assert(), or back it up with an assert() before the if that fires during development if anyone is careless enough to pass 0 as the size. I've not carefully reviewed the documentation for what %0s means to sscanf() — the test might be better as if (buflen < 2).

@krb686 2015-04-26 14:18:32

So snprintf writes some data to a string buffer, and sscanf reads from that created string. Where exactly does this replace scanf in that it reads from stdin?

@krb686 2015-04-26 14:24:51

It's also quite confusing that you use the word "format" for your result string and thus pass in "format" as the first argument to snprintf yet it is not the actual format parameter.

@Jonathan Leffler 2015-04-26 15:28:30

@krb686: This code is written so that the data to be scanned is in the parameter data and hence sscanf() is appropriate. If you want to read from standard input instead, drop the data parameter and call scanf() instead. As to the choice of the name format for the variable that becomes the format string in the call to sscanf(), you are entitled to rename it if you wish, but its name is not inaccurate. I am not sure what alternative makes sense; would in_format make it any clearer? I am not planning to change it in this code; you may if you use this idea in your own code.

@Jonathan Leffler 2015-04-26 15:42:59

@krb686: Incidentally, the snprintf() takes the number in buflen (say 256), and creates a string in the variable format that is equivalent to "%255s". Then sscanf() reads from the string data using format as the format string (conversion specification) to control how the data is interpreted, with the result being written into the variable buffer. Thus the leading white space in data is ignored and the first 'word' is copied to buffer without risk of buffer overflow as long as the area pointed at by buffer has at least as much space as buflen says it has.

@mabraham 2015-06-24 08:47:56

Note also that MSVC at least up to 2013 does not conform to POSIX standards, so the format modifier for "%ms" is not available

@Jonathan Leffler 2017-06-06 19:36:41

@mabraham: It is still true under macOS Sierra 10.12.5 (up to 2017-06-06) — the scanf() on macOS is not documented as supporting %ms, useful though it would be.

@John Ledbetter 2009-10-25 18:50:46

If you are using gcc, you can use the GNU-extension a specifier to have scanf() allocate memory for you to hold the input:

int main()
{
  char *str = NULL;

  scanf ("%as", &str);
  if (str) {
      printf("\"%s\"\n", str);
      free(str);
  }
  return 0;
}

Edit: As Jonathan pointed out, you should consult the scanf man pages as the specifier might be different (%m) and you might need to enable certain defines when compiling.

@Jonathan Leffler 2009-10-25 20:24:45

That's more an issue of using glibc (the GNU C Library) than of using the GNU C Compiler.

@Jonathan Leffler 2014-02-22 21:15:28

And note that the POSIX 2008 standard provides the m modifier to do the same job. See scanf(). You'll need to check whether the systems you use do support this modifier.

@Jonathan Leffler 2014-02-22 21:16:21

GNU (as found on Ubuntu 13.10, at any rate) supports %ms. The notation %a is a synonym for %f (on output, it requests hexadecimal floating point data). The GNU man page for scanf() says: _ It is not available if the program is compiled with gcc -std=c99 or gcc -D_ISOC99_SOURCE (unless _GNU_SOURCE is also specified), in which case the a is interpreted as a specifier for floating-point numbers (see above)._

@user2247995 2013-04-05 07:11:12

It's not that much work to make a function that's allocating the needed memory for your string. That's a little c-function i wrote some time ago, i always use it to read in strings.

It will return the read string or if a memory error occurs NULL. But be aware that you have to free() your string and always check for it's return value.

#define BUFFER 32

char *readString()
{
    char *str = malloc(sizeof(char) * BUFFER), *err;
    int pos;
    for(pos = 0; str != NULL && (str[pos] = getchar()) != '\n'; pos++)
    {
        if(pos % BUFFER == BUFFER - 1)
        {
            if((err = realloc(str, sizeof(char) * (BUFFER + pos + 1))) == NULL)
                free(str);
            str = err;
        }
    }
    if(str != NULL)
        str[pos] = '\0';
    return str;
}

@RastaJedi 2016-09-14 17:27:31

sizeof (char) is by definition 1. You don't need it here.

@Michael Beer 2018-06-12 08:56:22

Usually it's good practise to keep pointer allocation/freeing at the same level, meaning that your function should not allocate memory on its own, since the caller then has to free it. Most of the standard library / posix functions adhere to this principle by either returning a static string (like strerror(3)) or expect a pre-allocated string passed in (like (strerror_r(3) - or scanf(3) ) ...

@dirkgently 2009-10-25 17:59:50

Most of the time a combination of fgets and sscanf does the job. The other thing would be to write your own parser, if the input is well formatted. Also note your second example needs a bit of modification to be used safely:

#define LENGTH          42
#define str(x)          # x
#define xstr(x)         str(x)

/* ... */ 
int nc = scanf("%"xstr(LENGTH)"[^\n]%*[^\n]", array); 

The above discards the input stream upto but not including the newline (\n) character. You will need to add a getchar() to consume this. Also do check if you reached the end-of-stream:

if (!feof(stdin)) { ...

and that's about it.

@Roland Illig 2016-10-24 21:52:12

Could you put the feof code into a bigger context? I'm asking since that function is often used wrong.

@jxh 2019-03-19 22:58:58

array needs to be char array[LENGTH+1];

@Mark Bessey 2009-10-25 17:11:18

Limiting the length of the input is definitely easier. You could accept an arbitrarily-long input by using a loop, reading in a bit at a time, re-allocating space for the string as necessary...

But that's a lot of work, so most C programmers just chop off the input at some arbitrary length. I suppose you know this already, but using fgets() isn't going to allow you to accept arbitrary amounts of text - you're still going to need to set a limit.

@goe 2009-10-25 17:17:18

So does anybody know how to do that with scanf then?

@bdonlan 2009-10-25 18:53:58

Using fgets in a loop can allow you to accept arbitrary amounts of text - just keep realloc()ing your buffer.

@DigitalRoss 2009-10-25 17:04:28

Directly using scanf(3) and its variants poses a number of problems. Typically, users and non-interactive use cases are defined in terms of lines of input. It's rare to see a case where, if enough objects are not found, more lines will solve the problem, yet that's the default mode for scanf. (If a user didn't know to enter a number on the first line, a second and third line will probably not help.)

At least if you fgets(3) you know how many input lines your program will need, and you won't have any buffer overflows...

Related Questions

Sponsored Content

29 Answered Questions

[SOLVED] How do you set, clear, and toggle a single bit?

1 Answered Questions

1 Answered Questions

[SOLVED] How to get rid of buffer overflow on taking input from user?

  • 2020-05-21 23:20:16
  • Bilal Khan
  • 39 View
  • 0 Score
  • 1 Answer
  • Tags:   c function fgets

10 Answered Questions

[SOLVED] Improve INSERT-per-second performance of SQLite

2 Answered Questions

[SOLVED] scanf in c and relation input buffer

  • 2015-10-22 12:47:53
  • knowledge
  • 638 View
  • 6 Score
  • 2 Answer
  • Tags:   c scanf

3 Answered Questions

[SOLVED] Preventing buffer overflow when using fscanf

2 Answered Questions

[SOLVED] Nongreedy fscanf and buffer overflow check in c

4 Answered Questions

[SOLVED] Overflow over scanf("%8s", string)?

Sponsored Content