By ChrisWue


2016-01-03 00:04:46 8 Comments

Inspired by this question I thought I provide my implementation. I tried to go with the spirit of the *nix tool chain - read from stdin and write to stdout. This has the added benefit of making buffering very easy (current and previous characters and the count).

All kinds of reviews welcome (best practices, error handling, weird edge cases, potential bugs or other pitfalls).

#include <stdio.h>
#include <stdbool.h>
#include <stdint.h>

void write_char(int c)
{
    if (EOF == putchar(c))
    {
        if (ferror(stdout)) 
        {
            perror("error writing char to stdout");
            exit(EXIT_FAILURE);
        }
    }
}

void write_count(uint64_t count)
{
    if (printf("%ull", count) < 0)
    {
        perror("error writing character count to stdout");
        exit(EXIT_FAILURE);
    }
}

int main(int argc, char** argv)
{
    int current_char = 0;
    int previous_char = 0;
    uint64_t current_char_count = 0;

    while (EOF != (current_char = getchar())
    {
        if (current_char_count == 0 || current_char_count == UINT64_MAX || previous_char != current_char)
        {
            if (current_char_count > 0)
            {
                write_count(current_char_count);
            }
            write_char(current_char);
            current_char_count = 1;
            previous_char = current_char;
        }
        else
        {
            current_char_count += 1;
        }
    }
}

2 comments

@user3629249 2016-01-04 06:04:17

When compiling, always enable all the warnings, then fix those warnings.

For gcc, at a minimum, use: -Wall -Wextra -pedantic (I also use -std=c99 -Wconversion)

The compiler outputs several 'problem' statements:

To start: the main() function signature of int main(int argc, char* argv[]) which in this case should be int main( void ).

unused parameter `argc`
unused parameter `argv`

And, because of a missing #include <stdlib.h> statement:

implicit declaration of function: `exit()`
EXIT_FAILURE not declared

And this line:

while (EOF != (current_char = getchar())

has a syntax error (always check for matching numbers of open and close parens):

error: expected ')' before '{'

That error means the posted code was never compiled.

@ChrisWue 2016-01-04 21:38:37

As I've stated as comment to SirPython's answer: I accidentally copied an broken code version.

@SirPython 2016-01-03 00:33:28

Compressor number or real

When you are write_counting, you are writing the ASCII number characters to the new file. However, when you go to decompress this file, how are you going to differentiate between the actual content in the file and the numbers that mark the occurrences of a character?

A possible solution for this might be to just write the number itself to the file (no ASCII). That way, when you encounter a number that is ASCII, you can be almost sure that the number is part of the content (that is, unless there was a letter that occurred so many times in a row that the counter rose into the '0'-'9' range).


Two ones or twelve?

This is kind of a continuation from the top one. Let's say your compressor went to go compress this file:

12

Now, I am ready to decompress it. Since your compressor writes a number to show occurrences of a character, the output would be this:

1121

How do I know if all of those numbers are part of the content?

The only fix I can think of, unfortunately, would be to follow the above tip and write 0x01 instead of an ASCII number.


Misc

while (EOF != (current_char = getchar())

You are missing a brace here.


if (printf("%ull", count) < 0)

When compiling your code, I get this on this line:

warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘uint64_t’ [-Wformat=]

This also showed a problem that two ls are written after the number that shows how many character occurrences there were.

@ChrisWue 2016-01-03 04:56:49

Regarding your Misc bugs: must have copied accidentally an older version of the code. Sorry about that, should have checked more carefully. Good point about the number encoding, got some ideas about that.

Related Questions

Sponsored Content

2 Answered Questions

[SOLVED] String Compression

4 Answered Questions

[SOLVED] Algorithm for Run Length Encoding - String Compression

2 Answered Questions

[SOLVED] Simple string compression in Python

2 Answered Questions

[SOLVED] Naive string compression

2 Answered Questions

[SOLVED] Basic string compression implementation in C

1 Answered Questions

[SOLVED] String compression implementation in C

2 Answered Questions

[SOLVED] Simple compression on steroids - now with decompression

  • 2016-01-05 19:56:26
  • ChrisWue
  • 130 View
  • 1 Score
  • 2 Answer
  • Tags:   c compression

2 Answered Questions

[SOLVED] Simple compression reloaded++

  • 2016-01-04 01:25:44
  • ChrisWue
  • 250 View
  • 8 Score
  • 2 Answer
  • Tags:   c compression

1 Answered Questions

[SOLVED] Simple LZW compression algorithm

4 Answered Questions

[SOLVED] Simple compression algorithm

Sponsored Content