By Bahram


2019-04-12 20:07:04 8 Comments

On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:

for i in {1..3000000}; do echo $i>$i; done;

I am using the command find to write the information about files in this directory into a file. This works surprisingly fast:

$ time find many_files -printf '%i %y %p\n'>info_file

real    0m6.970s
user    0m3.812s
sys     0m0.904s

Now if I add %M to get the permissions:

$ time find many_files -printf '%i %y %M %p\n'>info_file

real    2m30.677s
user    0m5.148s
sys     0m37.338s

The command takes much longer. This is very surprising to me, since in a C program we can use struct stat to get inode and permission information of a file and in the kernel the struct inode saves both these information.

My Questions:

  1. What causes this behavior?
  2. Is there a faster way to get file permissions for so many files?

1 comments

@A.B 2019-04-12 21:30:54

The first version requires only to readdir(3)/getdents(2) the directory, when run on a filesystem supporting this feature (ext4: filetype feature displayed with tune2fs -l /dev/xxx, xfs: ftype=1 displayed with xfs_info /mount/point ...).

The second version in addition also requires to stat(2) each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat is not required when looking only for name, inode and filetype because the directory entry is enough:

  The linux_dirent structure is declared as follows:

       struct linux_dirent {
           unsigned long  d_ino;     /* Inode number */
           unsigned long  d_off;     /* Offset to next linux_dirent */
           unsigned short d_reclen;  /* Length of this linux_dirent */
           char           d_name[];  /* Filename (null-terminated) */
                             /* length is actually (d_reclen - 2 -
                                offsetof(struct linux_dirent, d_name)) */
           /*
           char           pad;       // Zero padding byte
           char           d_type;    // File type (only since Linux
                                     // 2.6.4); offset is (d_reclen - 1)
           */
       }

the same informations are available to readdir(3):

struct dirent {
    ino_t          d_ino;       /* Inode number */
    off_t          d_off;       /* Not an offset; see below */
    unsigned short d_reclen;    /* Length of this record */
    unsigned char  d_type;      /* Type of file; not supported
                                   by all filesystem types */
    char           d_name[256]; /* Null-terminated filename */
};

Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:

strace -o v1 find many_files -printf '%i %y %p\n'>info_file
strace -o v2 find many_files -printf '%i %y %M %p\n'>info_file

Which on my Linux amd64 kernel 5.0.x just shows as main difference:

[...]

 getdents(4, /* 0 entries */, 32768)     = 0
 close(4)                                = 0
 fcntl(5, F_DUPFD_CLOEXEC, 0)            = 4
-write(1, "25499894 d many_files\n25502410 f"..., 4096) = 4096
-write(1, "iles/844\n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686\n25502095 f "..., 4096) = 4096
-write(1, "es/529\n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371\n25501780 f ma"..., 4096) = 4096
-write(1, "/214\n25497527 f many_files/213\n2"..., 4096) = 4096
-brk(0x55b29a933000)                     = 0x55b29a933000
+newfstatat(5, "1000", {st_mode=S_IFREG|0644, st_size=5, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0

[...]

+newfstatat(5, "891", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0

[...]

@mosvy 2019-04-12 21:49:01

Unfortunately, the d_type field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).

@A.B 2019-04-12 21:49:29

@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ

@A.B 2019-04-12 21:55:43

Hum actually xfs (CentOS' default) support isn't quite clear...

@A.B 2019-04-12 22:05:41

added how to check if the filetype feature is present on xfs, in case xfs is in use.

@mosvy 2019-04-12 22:12:36

I think it's supported on xfs -- when I was making a testcase for a glibc glob(3) that only triggered when the d_type field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC.

@A.B 2019-04-12 22:18:03

Ah yes CentOS7' mkfs.xfs' man tells ftype=1 is the default.

@mosvy 2019-04-12 22:33:52

It really is supported on centos 7 + xfs. Just tested it.

Related Questions

Sponsored Content

3 Answered Questions

[SOLVED] Why does find -inum iterate through the whole filesystem tree?

0 Answered Questions

"permission denied" when appending with echo, but working with vi

1 Answered Questions

[SOLVED] File inheriting permission of directory it is copied in?

0 Answered Questions

KVM guest I/O hangs randomly

  • 2018-04-13 07:25:36
  • Kuba Lucky
  • 153 View
  • 1 Score
  • 0 Answer
  • Tags:   linux kvm

5 Answered Questions

[SOLVED] Is it possible to run ls or find and pipe it through stat?

  • 2016-04-17 02:45:55
  • lettda
  • 4159 View
  • 6 Score
  • 5 Answer
  • Tags:   bash find ls stat

3 Answered Questions

4 Answered Questions

[SOLVED] python vs bc in evaluating 6^6^6

1 Answered Questions

1 Answered Questions

[SOLVED] Permissions for making some some (but not all) files visible directly under a directory

  • 2011-10-31 20:57:52
  • Amelio Vazquez-Reina
  • 1233 View
  • 1 Score
  • 1 Answer
  • Tags:   permissions files

Sponsored Content