如果运行两次，为什么 [find] 运行得非常快？

Question

如果运行两次，为什么 [find] 运行得非常快？

在我的 Ubuntu/Linux 系统的 Dash 中有同一个程序的两个版本。

初始问题

要找到相应的.desktop文件所在的位置，我使用了

find / -type f -name 'Sublime Text.desktop' 2> /dev/null

Run Code Online (Sandbox Code Playgroud)

我的点击率为零，所以我做到了（成功）

find / -type f -name '[s,S]ublime*.desktop' 2> /dev/null

Run Code Online (Sandbox Code Playgroud)

我很惊讶，它在大约三秒钟后完成，因为搜索词应该比第一个大得多。由于它对我来说不是安静的犹太洁食，我再次运行了第一个命令，令我惊讶的是，现在它也只用了大约三秒钟就完成了。

为了验证行为，我启动了第二个 Linux 机器并再次运行第一个命令，但这次使用 time

time find -type f -name 'Sublime Text.desktop' 2> /dev/null

Run Code Online (Sandbox Code Playgroud)

find不仅加快了对相同搜索词的搜索，而且加快了所有搜索（在同一路径内？）。甚至对“未关联”字符串的搜索也不会减慢。

time find / -type f -name 'Emilbus Txet.Potksed' 2> /dev/null

Run Code Online (Sandbox Code Playgroud)

在使用 find 之前和之后分析 RAM

find 怎么做才能如此疯狂地加快搜索过程？

Answer 1

ben*_*ofs 7

第二次 find 更快的原因是 linux 做了文件缓存。每当第一次访问文件时，它都会将文件的内容保存在内存中（当然，只有当您有可用的可用 RAM 时才会这样做）。如果稍后再次读取该文件，则它只需从内存中获取内容，而无需再次读取该文件。因为内存访问比磁盘访问快得多，所以这会提高整体性能。

所以发生的情况是，首先find，大部分文件还没有在内存中，所以 linux 必须做很多磁盘操作。这很慢，所以需要一些时间。

find再次执行时，大部分文件和目录已经在内存中，而且速度要快得多。

如果您清除两次查找执行之间的缓存，您可以自己测试一下。那么第二个查找不会比第一个更快。这是它在我的系统上的外观：

# This clears the cache. Be careful through, you might loose some data (Although this shouldn't happen, it's better to be sure)
$ sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
3

$ time find /usr/lib -name "lib*"
find /usr/lib/ -name "lib*"  0,47s user 1,41s system 8% cpu 21,435 total

# Now the file names are in the cache. The next find is very fast:
$ time find /usr/lib -name "lib*"
find /usr/lib/ -name "lib*"  0,19s user 0,28s system 69% cpu 0,673 total

# If we clear the cache, the time goes back to the starting time again
$ sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
3
$ time find /usr/lib -name "lib*"
find /usr/lib/ -name "lib*"  0,39s user 1,45s system 10% cpu 16,866 total

Run Code Online (Sandbox Code Playgroud)

正确，`find` 不访问文件的 _contents_，只访问目录结构和文件名，所以它只缓存那部分。在一般实践中，每次通过相关内核调用读取文件时，Linux 将首先检查 RAM 以查看它是否存在，如果不存在：实际从磁盘读取它（通常将其存储在 RAM 中以备将来查找）。可以直接看到，如果从未找到缓存文件，这种额外的查找必定会带来一定的性能损失，但是当您遇到缓存命中时，可观的“利润”远远超过这一点。 (2认同)

归档时间：	12 年，2 月前
查看次数：	1492 次
最近记录：	12 年，2 月前