By ZeD


2009-01-04 13:37:56 8 Comments

Is there a way to include an entire text file as a string in a C program at compile-time?

something like:

  • file.txt:

    This is
    a little
    text file
    
  • main.c:

    #include <stdio.h>
    int main(void) {
       #blackmagicinclude("file.txt", content)
       /*
       equiv: char[] content = "This is\na little\ntext file";
       */
       printf("%s", content);
    }
    

obtaining a little program that prints on stdout "This is a little text file"

At the moment I used an hackish python script, but it's butt-ugly and limited to only one variable name, can you tell me another way to do it?

16 comments

@Daemin 2009-01-04 13:45:10

What might work is if you do something like:

int main()
{
    const char* text = "
#include "file.txt"
";
    printf("%s", text);
    return 0;
}

Of course you'll have to be careful with what is actually in the file, making sure there are no double quotes, that all appropriate characters are escaped, etc.

Therefore it might be easier if you just load the text from a file at runtime, or embed the text directly into the code.

If you still wanted the text in another file you could have it in there, but it would have to be represented there as a string. You would use the code as above but without the double quotes in it. For example:

file.txt

"Something evil\n"\
"this way comes!"

main.cpp

int main()
{
    const char* text =
#include "file.txt"
;
    printf("%s", text);
    return 0;
}

So basically having a C or C++ style string in a text file that you include. It would make the code neater because there isn't this huge lot of text at the start of the file.

@Motti 2009-01-04 13:59:03

Nice idea but it won't work, either you have an error because the literal includes a new-line or the #include part will be read as a string and not executed, damned if you do and damned if you don't...

@Jonathan Leffler 2009-01-04 16:39:14

@Motti: agreed - as written, syntactically invalid C. The idea is interesting - the C Pre-Processor is logically a separate phase - but the practice is that it doesn't get off the ground because each line in the included file would have to end with a backslash, etc.

@EvilTeach 2009-01-04 22:00:25

Humm. Seems to me that you should not need the backslash as most compilers will concatenate adjecent strings together

@Mark Ch 2019-07-05 08:01:55

the thing with this answer is... if it was that simple, I don't think the OP would ever have asked the question! -1 because the presence of this answer is slightly encouraging people to waste their time trying something that doesn't work. I think we could remove the downvote if you changed "What might work" to "For reference, this does not work"

@Daemin 2019-07-26 08:25:32

@JonathanLeffler After the preprocessor runs it should be valid C or C++ depending on how file.txt is formatted.

@Daemin 2019-07-26 08:27:03

@MarkCh It is that simple, another answer from ilya above verified it and does the same thing as my second snippet of code.

@mattnewport 2019-03-04 01:39:37

If you're willing to resort to some dirty tricks you can get creative with raw string literals and #include for certain types of files.

For example, say I want to include some SQL scripts for SQLite in my project and I want to get syntax highlighting but don't want any special build infrastructure. I can have this file test.sql which is valid SQL for SQLite where -- starts a comment:

--x, R"(--
SELECT * from TestTable
WHERE field = 5
--)"

And then in my C++ code I can have:

int main()
{
    auto x = 0;
    const char* mysql = (
#include "test.sql"
    );

    cout << mysql << endl;
}

The output is:

--
SELECT * from TestTable
WHERE field = 5
--

Or to include some Python code from a file test.py which is a valid Python script (because # starts a comment in Python and pass is a no-op):

#define pass R"(
pass
def myfunc():
    print("Some Python code")

myfunc()
#undef pass
#define pass )"
pass

And then in the C++ code:

int main()
{
    const char* mypython = (
#include "test.py"
    );

    cout << mypython << endl;
}

Which will output:

pass
def myfunc():
    print("Some Python code")

myfunc()
#undef pass
#define pass

It should be possible to play similar tricks for various other types of code you might want to include as a string. Whether or not it is a good idea I'm not sure. It's kind of a neat hack but probably not something you'd want in real production code. Might be ok for a weekend hack project though.

@yano 2019-08-05 16:51:38

I have used this approach to put OpenGL Shaders in text files as well!

@user2394284 2017-10-10 14:46:10

I reimplemented xxd in python3, fixing all of xxd's annoyances:

  • Const correctness
  • string length datatype: int → size_t
  • Null termination (in case you might want that)
  • C string compatible: Drop unsigned on the array.
  • Smaller, readable output, as you would have written it: Printable ascii is output as-is; other bytes are hex-encoded.

Here is the script, filtered by itself, so you can see what it does:

pyxxd.c

#include <stddef.h>

extern const char pyxxd[];
extern const size_t pyxxd_len;

const char pyxxd[] =
"#!/usr/bin/env python3\n"
"\n"
"import sys\n"
"import re\n"
"\n"
"def is_printable_ascii(byte):\n"
"    return byte >= ord(' ') and byte <= ord('~')\n"
"\n"
"def needs_escaping(byte):\n"
"    return byte == ord('\\\"') or byte == ord('\\\\')\n"
"\n"
"def stringify_nibble(nibble):\n"
"    if nibble < 10:\n"
"        return chr(nibble + ord('0'))\n"
"    return chr(nibble - 10 + ord('a'))\n"
"\n"
"def write_byte(of, byte):\n"
"    if is_printable_ascii(byte):\n"
"        if needs_escaping(byte):\n"
"            of.write('\\\\')\n"
"        of.write(chr(byte))\n"
"    elif byte == ord('\\n'):\n"
"        of.write('\\\\n\"\\n\"')\n"
"    else:\n"
"        of.write('\\\\x')\n"
"        of.write(stringify_nibble(byte >> 4))\n"
"        of.write(stringify_nibble(byte & 0xf))\n"
"\n"
"def mk_valid_identifier(s):\n"
"    s = re.sub('^[^_a-z]', '_', s)\n"
"    s = re.sub('[^_a-z0-9]', '_', s)\n"
"    return s\n"
"\n"
"def main():\n"
"    # `xxd -i` compatibility\n"
"    if len(sys.argv) != 4 or sys.argv[1] != \"-i\":\n"
"        print(\"Usage: xxd -i infile outfile\")\n"
"        exit(2)\n"
"\n"
"    with open(sys.argv[2], \"rb\") as infile:\n"
"        with open(sys.argv[3], \"w\") as outfile:\n"
"\n"
"            identifier = mk_valid_identifier(sys.argv[2]);\n"
"            outfile.write('#include <stddef.h>\\n\\n');\n"
"            outfile.write('extern const char {}[];\\n'.format(identifier));\n"
"            outfile.write('extern const size_t {}_len;\\n\\n'.format(identifier));\n"
"            outfile.write('const char {}[] =\\n\"'.format(identifier));\n"
"\n"
"            while True:\n"
"                byte = infile.read(1)\n"
"                if byte == b\"\":\n"
"                    break\n"
"                write_byte(outfile, ord(byte))\n"
"\n"
"            outfile.write('\";\\n\\n');\n"
"            outfile.write('const size_t {}_len = sizeof({}) - 1;\\n'.format(identifier, identifier));\n"
"\n"
"if __name__ == '__main__':\n"
"    main()\n"
"";

const size_t pyxxd_len = sizeof(pyxxd) - 1;

Usage (this extracts the script):

#include <stdio.h>

extern const char pyxxd[];
extern const size_t pyxxd_len;

int main()
{
    fwrite(pyxxd, 1, pyxxd_len, stdout);
}

@Martin R. 2017-12-13 19:52:40

I like kayahr's answer. If you don't want to touch the input files however, and if you are using CMake, you can add the delimeter character sequences on the file. The following CMake code, for instance, copies the input files and wraps their content accordingly:

function(make_includable input_file output_file)
    file(READ ${input_file} content)
    set(delim "for_c++_include")
    set(content "R\"${delim}(\n${content})${delim}\"")
    file(WRITE ${output_file} "${content}")
endfunction(make_includable)

# Use like
make_includable(external/shaders/cool.frag generated/cool.frag)

Then include in c++ like this:

constexpr char *test =
#include "generated/cool.frag"
;

@John Zwinck 2017-09-14 14:37:24

You can do this using objcopy:

objcopy --input binary --output elf64-x86-64 myfile.txt myfile.o

Now you have an object file you can link into your executable which contains symbols for the beginning, end, and size of the content from myfile.txt.

@Mark Ch 2019-07-05 08:58:31

are you able to tell us what the symbol names will be?

@John Zwinck 2019-07-05 10:59:14

@MarkCh: As per the docs, the symbol names are generated from the input filename.

@Mark Ch 2019-07-05 11:40:28

_binary_myfile_txt_start and _binary_myfile_txt_size, for anyone lazy...

@ThorSummoner 2019-10-05 06:39:49

I'm guessing this wont work on non x86-64 machines, will it?

@volzotan 2017-03-30 22:41:42

I had similar issues, and for small files the aforementioned solution of Johannes Schaub worked like a charm for me.

However, for files that are a bit larger, it ran into issues with the character array limit of the compiler. Therefore, I wrote a small encoder application that converts file content into a 2D character array of equally sized chunks (and possibly padding zeros). It produces output textfiles with 2D array data like this:

const char main_js_file_data[8][4]= {
    {'\x69','\x73','\x20','\0'},
    {'\x69','\x73','\x20','\0'},
    {'\x61','\x20','\x74','\0'},
    {'\x65','\x73','\x74','\0'},
    {'\x20','\x66','\x6f','\0'},
    {'\x72','\x20','\x79','\0'},
    {'\x6f','\x75','\xd','\0'},
    {'\xa','\0','\0','\0'}};

where 4 is actually a variable MAX_CHARS_PER_ARRAY in the encoder. The file with the resulting C code, called, for example "main_js_file_data.h" can then easily be inlined into the C++ application, for example like this:

#include "main_js_file_data.h"

Here is the source code of the encoder:

#include <fstream>
#include <iterator>
#include <vector>
#include <algorithm>


#define MAX_CHARS_PER_ARRAY 2048


int main(int argc, char * argv[])
{
    // three parameters: input filename, output filename, variable name
    if (argc < 4)
    {
        return 1;
    }

    // buffer data, packaged into chunks
    std::vector<char> bufferedData;

    // open input file, in binary mode
    {    
        std::ifstream fStr(argv[1], std::ios::binary);
        if (!fStr.is_open())
        {
            return 1;
        }

        bufferedData.assign(std::istreambuf_iterator<char>(fStr), 
                            std::istreambuf_iterator<char>()     );
    }

    // write output text file, containing a variable declaration,
    // which will be a fixed-size two-dimensional plain array
    {
        std::ofstream fStr(argv[2]);
        if (!fStr.is_open())
        {
            return 1;
        }
        const std::size_t numChunks = std::size_t(std::ceil(double(bufferedData.size()) / (MAX_CHARS_PER_ARRAY - 1)));
        fStr << "const char " << argv[3] << "[" << numChunks           << "]"    <<
                                            "[" << MAX_CHARS_PER_ARRAY << "]= {" << std::endl;
        std::size_t count = 0;
        fStr << std::hex;
        while (count < bufferedData.size())
        {
            std::size_t n = 0;
            fStr << "{";
            for (; n < MAX_CHARS_PER_ARRAY - 1 && count < bufferedData.size(); ++n)
            {
                fStr << "'\\x" << int(unsigned char(bufferedData[count++])) << "',";
            }
            // fill missing part to reach fixed chunk size with zero entries
            for (std::size_t j = 0; j < (MAX_CHARS_PER_ARRAY - 1) - n; ++j)
            {
                fStr << "'\\0',";
            }
            fStr << "'\\0'}";
            if (count < bufferedData.size())
            {
                fStr << ",\n";
            }
        }
        fStr << "};\n";
    }

    return 0;
}

@kayahr 2014-07-29 17:32:52

The question was about C but in case someone tries to do it with C++11 then it can be done with only little changes to the included text file thanks to the new raw string literals:

In C++ do this:

const char *s =
#include "test.txt"
;

In the text file do this:

R"(Line 1
Line 2
Line 3
Line 4
Line 5
Line 6)"

So there must only be a prefix at the top of the file and a suffix at the end of it. Between it you can do what you want, no special escaping is necessary as long as you don't need the character sequence )". But even this can work if you specify your own custom delimiter:

R"=====(Line 1
Line 2
Line 3
Now you can use "( and )" in the text file, too.
Line 5
Line 6)====="

@YitzikC 2017-01-29 21:58:48

Thanks, I chose the method proposed here to embed long fragments of sql into my C++ 11 code. This allows me to keep the SQL cleantly separated into its own files, and edit them with appropriate syntax checking, highlighting etc.

@TMS 2018-09-21 04:16:44

This is really close to what I want. Especially the user defined delimiter. Very useful. I do want to go a step further: is there a way to completely remove the prefix R"( and suffix )" from the file you want to include ? I tried with define two files called bra.in and ket.in with the prefix and suffix in them, include bra.in, file.txt and ket.in one by one. But compiler evaluate the content of bra.in (which is just R"() before include next file. so it will complain. Please let me know if anyone knows how to get ride of prefix and suffix from file.txt. thanks.

@Brian Chrisman 2019-08-10 16:09:48

I'm guessing C++ wouldn't allow R"(<newline>#include...)" ? Would be nice to have the file being compile-time-ingested to not require any encoding whatsoever.... ie straight json or xml or csv or what not..

@TechDragon 2014-09-11 05:24:18

Why not link the text into the program and use it as a global variable! Here is an example. I'm considering using this to include Open GL shader files within an executable since GL shaders need to be compiled for the GPU at runtime.

@not-a-user 2014-04-15 13:59:29

I think it is not possible with the compiler and preprocessor alone. gcc allows this:

#define _STRGF(x) # x
#define STRGF(x) _STRGF(x)

    printk ( MODULE_NAME " built " __DATE__ " at " __TIME__ " on host "
            STRGF(
#               define hostname my_dear_hostname
                hostname
            )
            "\n" );

But unfortunately not this:

#define _STRGF(x) # x
#define STRGF(x) _STRGF(x)

    printk ( MODULE_NAME " built " __DATE__ " at " __TIME__ " on host "
            STRGF(
#               include "/etc/hostname"
            )
            "\n" );

The error is:

/etc/hostname: In function ‘init_module’:
/etc/hostname:1:0: error: unterminated argument list invoking macro "STRGF"

@Jonathan Leffler 2014-04-15 14:26:09

I've looked, as you bid me look. I don't see any new information in your answer (information that is not in other answers), beyond a reference to /etc/hostname as a way of embedding the name of the build machine in the string, which (even if it worked) would not be portable since Mac OS X does not have a file /etc/hostname. Note that using macro names that start with an underscore followed by a capital letter is using a name reserved to the implementation, which is A Bad Thing™.

@starseeker 2013-04-13 16:36:21

Hasturkun's answer using the xxd -i option is excellent. If you want to incorporate the conversion process (text -> hex include file) directly into your build the hexdump.c tool/library recently added a capability similar to xxd's -i option (it doesn't give you the full header - you need to provide the char array definition - but that has the advantage of letting you pick the name of the char array):

http://25thandclement.com/~william/projects/hexdump.c.html

It's license is a lot more "standard" than xxd and is very liberal - an example of using it to embed an init file in a program can be seen in the CMakeLists.txt and scheme.c files here:

https://github.com/starseeker/tinyscheme-cmake

There are pros and cons both to including generated files in source trees and bundling utilities - how to handle it will depend on the specific goals and needs of your project. hexdump.c opens up the bundling option for this application.

@user735796 2012-12-10 03:48:42

You need my xtr utility but you can do it with a bash script. This is a script I call bin2inc. The first parameter is the name of the resulting char[] variable. The second parameter is the name of the file. The output is C include file with the file content encoded (in lowercase hex) as the variable name given. The char array is zero terminated, and the length of the data is stored in $variableName_length

#!/bin/bash

fileSize ()

{

    [ -e "$1" ]  && {

        set -- `ls -l "$1"`;

        echo $5;

    }

}

echo unsigned char $1'[] = {'
./xtr -fhex -p 0x -s ', ' < "$2";
echo '0x00'
echo '};';
echo '';
echo unsigned long int ${1}_length = $(fileSize "$2")';'

YOU CAN GET XTR HERE xtr (character eXTRapolator) is GPLV3

@Ilya 2009-01-04 21:33:40

ok, inspired by Daemin's post i tested the following simple example :

a.data:

"this is test\n file\n"

test.c:

int main(void)
{
    char *test = 
#include "a.data"
    ;
    return 0;
}

gcc -E test.c output:

# 1 "test.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "test.c"

int main(void)
{
    char *test =
# 1 "a.data" 1
"this is test\n file\n"
# 6 "test.c" 2
    ;
    return 0;
}

So it's working but require data surrounded with quotation marks.

@Daemin 2009-01-04 22:11:44

That's what I was alluding to in the last bit of my answer.

@bdonlan 2009-07-28 17:28:28

Data surrounded by commas?

@Ilya 2009-07-29 15:32:07

quotation, or whatever it's called, pardon my English

@Brian Chrisman 2019-08-10 16:08:11

This requires the data to be C-escaped. I don't think that's what the post is looking for. If this had some sort of include macro which C-escaped the contents of the file, that would be fine.

@EvilTeach 2009-01-04 21:58:26

in x.h

"this is a "
"buncha text"

in main.c

#include <stdio.h>
int main(void)
{
    char *textFileContents =
#include "x.h"
    ;

    printf("%s\n", textFileContents);

    return 0
}

ought to do the job.

@Superfly Jon 2016-09-27 10:46:36

For multiple lines you need to add \n so: "line 1\n" "line 2\n"

@Mark Ch 2019-07-05 08:56:12

its a bit misleading, obviously this requires some preparation of the text file to add quotes and \n characters, doesn't work in the general case

@Johannes Schaub - litb 2009-01-04 13:57:55

You have two possibilities:

  1. Make use of compiler/linker extensions to convert a file into a binary file, with proper symbols pointing to the begin and end of the binary data. See this answer: Include binary file with GNU ld linker script.
  2. Convert your file into a sequence of character constants that can initialize an array. Note you can't just do "" and span multiple lines. You would need a line continuation character (\), escape " characters and others to make that work. Easier to just write a little program to convert the bytes into a sequence like '\xFF', '\xAB', ...., '\0' (or use the unix tool xxd described by another answer, if you have it available!):

Code:

#include <stdio.h>

int main() {
    int c;
    while((c = fgetc(stdin)) != EOF) {
        printf("'\\x%X',", (unsigned)c);
    }
    printf("'\\0'"); // put terminating zero
}

(not tested). Then do:

char my_file[] = {
#include "data.h"
};

Where data.h is generated by

cat file.bin | ./bin2c > data.h

@Hasturkun 2009-01-04 14:08:59

last line should probably read "cat file.bin | ./bin2c > data.h" or "./bin2c < file.bin > data.h"

@Someone Somewhere 2016-01-05 12:10:30

I used codeproject.com/Tips/845393/… to create a hex file (on Windows) from a binary and then used your suggestion of char my_file[] = { #include my_large_file.h }; Thanks !

@ThorSummoner 2019-10-05 06:32:39

bin2c is not the same bin2c as from debian's hxtools, beware

@ThorSummoner 2019-10-05 06:45:23

or if it is, the invocation is much weirder now: bin2c -H myoutput.h myinput1.txt myinputN.txt

@Hasturkun 2009-01-04 13:56:22

I'd suggest using (unix util)xxd for this. you can use it like so

$ echo hello world > a
$ xxd -i a

outputs:

unsigned char a[] = {
  0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64, 0x0a
};
unsigned int a_len = 12;

@ZeD 2009-01-04 16:10:34

Just a note: the char[] created by xxd isn't NULL-terminated! so I do $ xxd -i < file.txt > file.xxd $ echo ', 0' >> file.xxd and in the main.c char file_content[] = { #include "file.xxd" };

@anon 2009-01-06 00:29:22

I never knew about xxd. It's awesome!

@Lazer 2010-03-20 11:33:45

@Hasturkun: I understand how you generated the output using xxd. What I do not understand is how are you going to include xxd in your C code. After all xxd is a shell command. How are you going to use it from within a C program??

@Hasturkun 2010-03-21 18:27:59

@eSKay: you do not include xxd in your code, you include the output of xxd in your code. eg. you can run something like xxd -i inputfile outputfile.h and later #include "outputfile.h"

@Lazer 2010-03-21 18:42:14

@Hasturkun: How will you get the unsigned char a[] = { and } parts?

@Hasturkun 2010-03-21 23:42:03

@eSKay: that comes directly from the output of xxd, as the answer says. the name of the array is the input filename. if you're piping data in instead of using an input file, you'll get an list of hexadecimal values instead (without the array declaration or the len variable).

@linello 2016-01-18 11:11:15

That's extremely useful when embedding GLSL shaders.

@vleo 2016-02-26 17:36:55

Another way to add 0x00 termination to xxd produced C code: xxd -i file.txt | sed 's/\([0-9a-f]\)$/\0, 0x00/' > file.h

@Jack Wasey 2019-04-11 15:54:43

This works well in autoconf configure.ac, but if you echo square brackets, remember the first must be [[ for autoconf escaping.

@Daniel Paull 2009-01-04 13:54:40

Even if it can be done at compile time (I don't think it can in general), the text would likely be the preprocessed header rather than the files contents verbatim. I expect you'll have to load the text from the file at runtime or do a nasty cut-n-paste job.

Related Questions

Sponsored Content

3 Answered Questions

[SOLVED] Why do all the C files written by my lecturer start with a single # on the first line?

  • 2017-08-11 07:17:56
  • The Main Man
  • 21373 View
  • 424 Score
  • 3 Answer
  • Tags:   c c-preprocessor

24 Answered Questions

[SOLVED] Difference between require, include, require_once and include_once?

33 Answered Questions

[SOLVED] Include another HTML file in a HTML file

  • 2012-01-24 14:51:16
  • lolo
  • 1347764 View
  • 589 Score
  • 33 Answer
  • Tags:   html include

58 Answered Questions

[SOLVED] How do I include a JavaScript file in another JavaScript file?

27 Answered Questions

13 Answered Questions

[SOLVED] What is the difference between char s[] and char *s?

1 Answered Questions

[SOLVED] C. invalid operands to binary * (have ‘char *’ and ‘int’)

  • 2018-09-30 23:59:18
  • Pete Lensky
  • 274 View
  • -2 Score
  • 1 Answer
  • Tags:   c

2 Answered Questions

[SOLVED] Program run in child process doesn't loop

  • 2010-12-09 14:49:33
  • Robin
  • 2324 View
  • 3 Score
  • 2 Answer
  • Tags:   c exec fork pipe

1 Answered Questions

[SOLVED] int main(int argc, const char * argv[]) AND file input

  • 2014-03-30 18:19:25
  • user2755244
  • 6254 View
  • 1 Score
  • 1 Answer
  • Tags:   c

1 Answered Questions

[SOLVED] How to read all contents of file, including NUL chars between valid text?

  • 2011-03-24 05:34:43
  • Lynton Grice
  • 1164 View
  • 2 Score
  • 1 Answer
  • Tags:   c

Sponsored Content