如果有硬件辅助虚拟化,那么使用半虚拟化的目的是什么?

pan*_*nxl 9 virtualization virtual-machines qemu paravirtualization kvm-virtualization

我正在研究 QEMU/KVM 和 Firecracker/KVM。据我了解,Firecracker 和 QEMU 都与 KVM 通信,通过将 CPU 模式更改为来宾到主机(反之亦然),最终使硬件辅助虚拟化受益。

  1. 在CPU的guest模式下,guest甚至可以直接执行其特权指令,那么为什么我们还需要半虚拟化呢?

  2. 在 Firecracker 中,仅模拟了 5 个设备,例如

  • virtio网,
  • virtio 块,
  • virtio-vsock 等。

即使在这种极简设计中,我们也必须放置半虚拟化驱动程序。难道我们不能仅仅依靠硬件辅助虚拟化吗?

Aus*_*arn 11

仅考虑网络的情况。

\n

为了在大多数情况下真正有用,虚拟机需要能够通过网络进行通信。为此,访客显然必须看到某种网络接口。但是 VT-x、AMD-V 和 ARM VHE 以及几乎所有其他硬件虚拟化实现都不提供 NIC,它们只是为您提供了一种安全隔离和分区 CPU 资源的方法。因此,硬件虚拟化不会为您提供网络接口。

\n

现在,您可以从主机系统通过物理网络接口,但这存在许多问题:

\n
    \n
  • 它需要主机操作系统在启动时进行特殊处理,以确保它不会实际将驱动程序绑定到该接口。
  • \n
  • 它需要硬件中的特殊支持才能确保安全(您需要IOMMU,并且您想要通过的 NIC 必须支持在 IOMMU 后面运行)。
  • \n
  • 它实际上需要每个虚拟机都有一个可用的网络接口。这显然意味着任何使用笔记本电脑的人都会不走运,但这也意味着大多数大型虚拟机托管提供商也会不走运(他们可能在同一主机上运行数十个虚拟机)。
  • \n
  • 如果没有 RDMA 硬件和大量额外的复杂性,它使得实时迁移在功能上不可能实现(就像任何其他直接设备直通一样)。
  • \n
  • In the case of VMs needing to talk to each other, it inherently introduces an external point of failure (because they have to send traffic through a network switch outside of the system).
  • \n
\n

So you obviously need to emulate a network interface somehow. The obvious choice would be to just pick a commonly used physical NIC and emulate that. But that has it\xe2\x80\x99s own set of issues:

\n
    \n
  • A lot of what a physical NIC does is rather computationally expensive to emulate. It\xe2\x80\x99s only efficient in a physical implementation because it\xe2\x80\x99s using physical logic and ASICs.
  • \n
  • A majority of the same stuff that\xe2\x80\x99s expensive to emulate isn\xe2\x80\x99t even needed for a VM, but you can\xe2\x80\x99t avoid emulating it because the drivers will expect it to work.
  • \n
  • A lot of additional complexity is needed in the guest drivers to support this stuff that isn\xe2\x80\x99t even giving any real benefit. For perspective, the Intel e1000 (a commonly emulated physical NIC) driver for Linux is about 17k lines of code, while the virtio-net driver is only 4.8k (7.3k if you include the virtio-pci components that it is probably using on x86 systems).
  • \n
\n

virtio-net solves those issues, it only covers the things that are actually needed to move network packets between the guest OS and the host networking layer, and nothing more. And solving those issues provides a huge performance improvement. I have not tested recently, but the last time I compared it using QEMU, virtio-net provided more than twice the effective bandwidth of an emulated e1000 card, and roughly 1/10th of the latency, all with lower CPU usage on the host side.

\n

The same logic applies for most other devices. Some stuff can be emulated relatively inexpensively or is not performance critical and thus doesn\xe2\x80\x99t need to be efficient (this is why there is no VirtIO watchdog timer for example, it\xe2\x80\x99s not performance critical, and it\xe2\x80\x99s trivial to emulate in most cases), but for most things that don\xe2\x80\x99t fit those criteria there is a paravirt option because the performance difference is huge and the reduced complexity tends to make things more reliable.

\n

And sometimes paravirtualization lets you do things you couldn\xe2\x80\x99t really do with \xe2\x80\x98regular\xe2\x80\x99 hardware. VirtIOFS and the VirtIO transport for 9P2000 are prime examples of this, they have no hardware analogues, but provide a reasonably efficient way to share files between the host and guest without needing to emulate a network or a block device.

\n

  • 我从来不认为硬件辅助虚拟化只是在cpu层面辅助虚拟化,而对设备接口没有帮助。谢谢你的澄清! (2认同)

sho*_*hok 6

简短的回答:当使用来宾端驱动程序进行增强时,现代硬件虚拟化使用某种形式的半虚拟化来执行性能最密集的操作。因此,现代虚拟机管理程序汇集了专用硬件支持半虚拟化。

长版本:原始的 X86 架构很难正确虚拟化并与未经修改的操作系统一起使用。一种解决方案是动态翻译有问题的代码片段,即时重新编译它们。这使得未经修改的来宾能够运行,但缺点是转换器本身的开销高且复杂。此外,每种客户内核类型都需要特殊处理。

因此,项目开始修改 Linux 内核以避免困难的情况 - 即:当指令/调用不容易虚拟化时,让我们从底层虚拟机管理程序调用超级调用。超级调用是半虚拟化的基础,您可以将其视为用户空间系统调用的等价物。换句话说,底层 Linux 内核是主机操作系统,另一个 Linux 内核将作为来宾“用户空间”应用程序运行。

这种方法最大限度地减少了虚拟化开销,但它需要修改后的来宾内核才能工作。您的标准 Windows 安装将无法运行。因此引入了完整的硬件虚拟化,其中向微处理器添加了额外的特权环。这允许主机操作系统在特定特权级别(即:-1,虚拟机管理程序空间)中运行,而来宾操作系统在环0(即:0,内核空间)中运行不变。然而,虚拟设备模拟仍然存在问题和/或开销较高,因此需要创建自定义来宾驱动器。这些驱动器为对性能最敏感的设备(特别是磁盘和网络)重新引入了有针对性的半虚拟化。这给我们带来了当前的情况,硬件辅助的虚拟机管理程序通过有针对性的半虚拟化驱动器得到增强。