gsa*_*ras 3 c linux fork low-level systems-programming
在一个终端我可以打电话ls -d */.现在我想要一个c程序为我这样做,像这样:
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>
int main( void )
{
int status;
char *args[] = { "/bin/ls", "-l", NULL };
if ( fork() == 0 )
execv( args[0], args );
else
wait( &status );
return 0;
}
Run Code Online (Sandbox Code Playgroud)
这将是ls -l一切.但是,当我尝试时:
char *args[] = { "/bin/ls", "-d", "*/", NULL };
Run Code Online (Sandbox Code Playgroud)
我会得到一个运行时错误:
ls:*/:没有这样的文件或目录
Pet*_*des 10
执行此操作的最低级别方法是使用相同的Linux系统调用ls.
那么看看输出strace -efile,getdents ls:
execve("/bin/ls", ["ls"], [/* 72 vars */]) = 0
...
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 23 entries */, 32768) = 840
getdents(3, /* 0 entries */, 32768) = 0
...
Run Code Online (Sandbox Code Playgroud)
getdents是一个特定于Linux的系统调用.该手册页说它是由libc的readdir(3)POSIX API函数在引擎盖下使用的.
最低级别的可移植方式(可移植到POSIX系统)是使用libc函数打开目录并读取条目. 与非目录文件不同,POSIX不指定确切的系统调用接口.
这些功能:
DIR *opendir(const char *name);
struct dirent *readdir(DIR *dirp);
Run Code Online (Sandbox Code Playgroud)
可以像这样使用:
// print all directories, and symlinks to directories, in the CWD.
// like sh -c 'ls -1UF -d */' (single-column output, no sorting, append a / to dir names)
// tested and works on Linux, with / without working d_type
#define _GNU_SOURCE // includes _BSD_SOURCE for DT_UNKNOWN etc.
#include <dirent.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
DIR *dirhandle = opendir("."); // POSIX doesn't require this to be a plain file descriptor. Linux uses open(".", O_DIRECTORY); to implement this
//^Todo: error check
struct dirent *de;
while(de = readdir(dirhandle)) { // NULL means end of directory
_Bool is_dir;
#ifdef _DIRENT_HAVE_D_TYPE
if (de->d_type != DT_UNKNOWN && de->d_type != DT_LNK) {
// don't have to stat if we have d_type info, unless it's a symlink (since we stat, not lstat)
is_dir = (de->d_type == DT_DIR);
} else
#endif
{ // the only method if d_type isn't available,
// otherwise this is a fallback for FSes where the kernel leaves it DT_UNKNOWN.
struct stat stbuf;
// stat follows symlinks, lstat doesn't.
stat(de->d_name, &stbuf); // TODO: error check
is_dir = S_ISDIR(stbuf.st_mode);
}
if (is_dir) {
printf("%s/\n", de->d_name);
}
}
}
Run Code Online (Sandbox Code Playgroud)
还有一个完全可编译的示例,在Linux stat(3posix)手册页中读取目录条目和打印文件信息. (不是Linux stat(2)手册页 ;它有一个不同的例子).
readdir(3)说明struct dirent的Linux声明的手册页是:
struct dirent {
ino_t d_ino; /* inode number */
off_t d_off; /* not an offset; see NOTES */
unsigned short d_reclen; /* length of this record */
unsigned char d_type; /* type of file; not supported
by all filesystem types */
char d_name[256]; /* filename */
};
Run Code Online (Sandbox Code Playgroud)
d_type是其中之一DT_UNKNOWN,在这种情况下,您需要stat了解目录条目本身是否为目录.或者它可以是DT_DIR或其他东西,在这种情况下,你可以确定它是或不是一个目录而不必stat它.
我认为有些文件系统,比如EXT4,以及非常新的XFS(带有新的元数据版本),会在目录中保留类型信息,因此无需从磁盘加载inode就可以返回它.这是一个巨大的加速find -name:它不需要通过子目标来统计任何东西.但对于不这样做的文件系统,d_type将永远是DT_UNKNOWN,因为填写它需要读取所有inode(甚至可能不从磁盘加载).
有时你只是匹配文件名,并且不需要类型信息,所以如果内核花费了大量额外的CPU时间(或特别是I/O时间)d_type而不是很便宜,那就太糟糕了. d_type只是一个表演捷径; 你总是需要一个后备(除了写一个嵌入式系统,你知道你正在使用什么FS并且它总是填充d_type,并且你有办法在将来有人试图使用这个代码时检测破损)另一种FS类型.)
不幸的是,所有基于 shell 扩展的解决方案都受到最大命令行长度的限制。哪个有所不同(运行true | xargs --show-limits以找出答案);在我的系统上,它大约是两兆字节。是的,许多人会争辩说它就足够了——就像比尔盖茨曾经在 640 KB 上所做的那样。
(在非共享文件系统上运行某些并行模拟时,在收集阶段,我偶尔会在同一目录下有数万个文件。是的,我可以用不同的方式来做,但这恰好是最简单和最健壮的方式来收集数据。很少有 POSIX 实用程序实际上愚蠢到可以假设“X 对每个人来说都足够了”。)
幸运的是,有几种解决方案。一种是find改用:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d");
Run Code Online (Sandbox Code Playgroud)
您还可以根据需要格式化输出,而不取决于语言环境:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\n'");
Run Code Online (Sandbox Code Playgroud)
如果要排序的输出,使用\0作为分隔符(因为文件名被允许包含换行符),并-t=为sort使用\0作为分隔符,太。tr将为您将它们转换为换行符:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\0' | sort -t= | tr -s '\0' '\n'");
Run Code Online (Sandbox Code Playgroud)
If you want the names in an array, use glob() function instead.
Finally, as I like to harp every now and then, one can use the POSIX nftw() function to implement this internally:
#define _GNU_SOURCE
#include <stdio.h>
#include <ftw.h>
#define NUM_FDS 17
int myfunc(const char *path,
const struct stat *fileinfo,
int typeflag,
struct FTW *ftwinfo)
{
const char *file = path + ftwinfo->base;
const int depth = ftwinfo->level;
/* We are only interested in first-level directories.
Note that depth==0 is the directory itself specified as a parameter.
*/
if (depth != 1 || (typeflag != FTW_D && typeflag != FTW_DNR))
return 0;
/* Don't list names starting with a . */
if (file[0] != '.')
printf("%s/\n", path);
/* Do not recurse. */
return FTW_SKIP_SUBTREE;
}
Run Code Online (Sandbox Code Playgroud)
and the nftw() call to use the above is obviously something like
if (nftw(".", myfunc, NUM_FDS, FTW_ACTIONRETVAL)) {
/* An error occurred. */
}
Run Code Online (Sandbox Code Playgroud)
The only "issue" in using nftw() is to choose a good number of file descriptors the function may use (NUM_FDS). POSIX says a process must always be able to have at least 20 open file descriptors. If we subtract the standard ones (input, output, and error), that leaves 17. The above is unlikely to use more than 3, though.
You can find the actual limit using sysconf(_SC_OPEN_MAX), and subtracting the number of descriptors your process may use at the same time. In current Linux systems, it is typically limited to 1024 per process.
The good thing is, as long as that number is at least 4 or 5 or so, it only affects the performance: it just determines how deep nftw() can go in the directory tree structure, before it has to use workarounds.
If you want to create a test directory with lots of subdirectories, use something like the following Bash:
mkdir lots-of-subdirs
cd lots-of-subdirs
for ((i=0; i<100000; i++)); do mkdir directory-$i-has-a-long-name-since-command-line-length-is-limited ; done
Run Code Online (Sandbox Code Playgroud)
On my system, running
ls -d */
Run Code Online (Sandbox Code Playgroud)
in that directory yields bash: /bin/ls: Argument list too long error, while the find command and the nftw() based program all run just fine.
You also cannot remove the directories using rmdir directory-*/ for the same reason. Use
find . -name 'directory-*' -type d -print0 | xargs -r0 rmdir
Run Code Online (Sandbox Code Playgroud)
instead. Or just remove the entire directory and subdirectories,
cd ..
rm -rf lots-of-subdirs
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2389 次 |
| 最近记录: |