什么是Linux中的匿名进程?

use*_*404 1 linux linux-kernel

我试图了解 task_struct 的 mm 和 active_mm 字段之间的区别,并遇到了来自 Linus Torvalds 的一封 20 年前的电子邮件,其中提到了“匿名进程”的概念:


 - we have "real address spaces" and "anonymous address spaces". The
   difference is that an anonymous address space doesn't care about the
   user-level page tables at all, so when we do a context switch into an
   anonymous address space we just leave the previous address space
   active.

   [...]

 - "tsk->mm" points to the "real address space". For an **anonymous process**,
   tsk->mm will be NULL, for the logical reason that an **anonymous process**
   really doesn't _have_ a real address space at all.

 - however, we obviously need to keep track of which address space we
   "stole" for such an anonymous user. For that, we have "tsk->active_mm",
   which shows what the currently active address space is.

   The rule is that for a process with a real address space (ie tsk->mm is
   non-NULL) the active_mm obviously always has to be the same as the real
   one.

   For a **anonymous process**, tsk->mm == NULL, and tsk->active_mm is the
   "borrowed" mm while the **anonymous process** is running. When the
   **anonymous process** gets scheduled away, the borrowed address space is
   returned and cleared.
Run Code Online (Sandbox Code Playgroud)

cg9*_*909 5

在您遗漏的电子邮件部分中或多或少地对此进行了解释。

“匿名地址空间”的明显用途是任何不需要任何用户映射的线程——所有内核线程基本上都属于这一类,但即使是“真正的”线程也可以暂时说在一段时间内它们不会去对用户空间感兴趣,并且调度程序也可以尽量避免在切换 VM 状态上浪费时间。目前只有老式的 bdflush sync 可以做到这一点。

内核线程只访问内核内存,所以它们不关心用户空间内存中的内容。“匿名过程”是对这些的优化。

当调度程序切换到内核线程任务时,它可以跳过相对耗时的内存映射设置,只保留前一个进程的地址空间。地址空间的内核部分以相同的方式映射到所有进程,因此对于这些任务使用哪种映射没有任何区别。

这种优化也可以在用户空间任务运行内核空间代码时临时应用于用户空间任务,例如在等待系统调用sync完成时,因为只需要在返回用户空间代码之前恢复真实地址空间. 正如电子邮件中所提到的,至少自从bdflushpdflush内核线程取代以来,似乎不再这样做了。