By Ruslan

2019-04-14 06:27:15 8 Comments

I have a directory with over 400 GiB of data in it. I wanted to check that all the files can be read without errors, so a simple way I thought of was to tar it into /dev/null. But instead I see the following behavior:

$ time tar cf /dev/null .

real    0m4.387s
user    0m3.462s
sys     0m0.185s
$ time tar cf - . > /dev/null

real    0m3.130s
user    0m3.091s
sys     0m0.035s
$ time tar cf - . | cat > /dev/null

real    10m32.985s
user    0m1.942s
sys     0m33.764s

The third command above was forcibly stopped by Ctrl+C after having run for quite long already. Moreover, while the first two commands were working, activity indicator of the storage device containing . was nearly always idle. With the third command the indicator is constantly lit up, meaning extreme busyness.

So it seems that, when tar is able to find out that its output file is /dev/null, i.e. when /dev/null is directly opened to have the file handle which tar writes to, file body appears skipped. (Adding v option to tar does print all the files in the directory being tar'red.)

So I wonder, why is this so? Is it some kind of optimization? If yes, then why would tar even want to do such a dubious optimization for such a special case?

I'm using GNU tar 1.26 with glibc 2.27 on Linux 4.14.105 amd64.


@Guntram Blohm 2019-04-14 09:51:59

This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.

As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.

In the old times, when a program copied a file to somewhere, it'd alternate between read calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read returns) and write calls (which take the chunk of memory and send the content to the destination).

However, there are at least two newer ways to achieve the same:

  • Linux has system calls copy_file_range (not portable to other unixes at all) and sendfile (somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is /dev/null and turns the system call into a no-op

  • Programs can use mmap to get the file contents instead of read, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program can mmap the source file, then call write on that chunk of mapped memory. However, as writing /dev/null doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.

Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null instead of > /dev/null - and why | cat > /dev/null should be avoided in all other cases.

@Peter Cordes 2019-04-14 16:22:28

I think the implication in the GNU tar info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked with tar cf /dev/null foo* on a couple files and yeah, just newfstatat(..., AT_SYMLINK_NOFOLLOW) system calls, not even an open() that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.

@Wayne Conrad 2019-04-14 21:07:50

Should the mmap explanation read "access the read data" instead of "access the written data?"

@Stéphane Chazelas 2019-04-15 08:16:29

See also splice(2) on Linux. Actually, replacing, cat > /dev/null with pv -q > /dev/null (which uses splice() on Linux) would likely reduce the overhead. Or dd bs=65536 skip=9999999999 2> /dev/null, or wc -c > /dev/null or tail -c1 > /dev/null...

@muru 2019-04-14 06:45:11

It is a documented optimization:

When the archive is being created to /dev/null, GNU tar tries to minimize input and output operations. The Amanda backup system, when used with GNU tar, has an initial sizing pass which uses this feature.

@Ruslan 2019-04-14 07:00:05

Ah, this wasn't described in the man page I had installed. Should have tried info tar instead...

@Xen2050 2019-04-14 09:10:25

They should really keep the man & info pages in sync, it's practically a bug that they're not

@Gilles 2019-04-14 09:47:44

@Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available with info or as HTML in a browser.

@Owen 2019-04-14 11:26:49

Related Questions

Sponsored Content

2 Answered Questions

[SOLVED] Meaning of `cat /dev/null > file`

0 Answered Questions

Compare file compression methods by sending to /dev/null

  • 2019-03-11 19:13:41
  • Aaron Thomas
  • 31 View
  • 0 Score
  • 0 Answer
  • Tags:   compression null

9 Answered Questions

3 Answered Questions

[SOLVED] Make tar from /dev/stdin file

  • 2018-08-05 01:52:14
  • Daffy
  • 654 View
  • 5 Score
  • 3 Answer
  • Tags:   tar devices stdin

1 Answered Questions

[SOLVED] tar: /dev/st0: Cannot write: Input/output error

  • 2014-06-23 14:07:27
  • Valerie
  • 2986 View
  • 3 Score
  • 1 Answer
  • Tags:   rhel tar tape

1 Answered Questions

[SOLVED] What happens when you write to /dev/null? What’s the point?

  • 2017-03-06 19:42:28
  • Mas
  • 1133 View
  • 0 Score
  • 1 Answer
  • Tags:   null

1 Answered Questions

[SOLVED] Why does my tar not work?

  • 2016-02-21 14:38:43
  • Fabian Schmidt
  • 2073 View
  • 0 Score
  • 1 Answer
  • Tags:   linux tar

3 Answered Questions

[SOLVED] mv a file to /dev/null breaks dev/null

  • 2014-03-03 17:14:14
  • Gregg Leventhal
  • 5890 View
  • 25 Score
  • 3 Answer
  • Tags:   osx devices null

1 Answered Questions

[SOLVED] Output file generated by tar

2 Answered Questions

[SOLVED] Why is my tar file bigger than its contents?

Sponsored Content