From nobody Sun Dec 14 07:58:07 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0493FC19F28 for ; Wed, 3 Aug 2022 15:22:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238085AbiHCPWM (ORCPT ); Wed, 3 Aug 2022 11:22:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55076 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237975AbiHCPWI (ORCPT ); Wed, 3 Aug 2022 11:22:08 -0400 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C95A23AE78 for ; Wed, 3 Aug 2022 08:22:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.es; i=@amazon.es; q=dns/txt; s=amazon201209; t=1659540127; x=1691076127; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tAogWiDHErkvRl4pDKXqrqbenwRbcOqDd7W3H3Gt5Uw=; b=nXCDe33u/lJtKjgzErr2JiY5BVrzrScFDEsithPHb6x4eD9Q0hDYZYvz ypFxdn1/iI3SUQxaapNadJbpUq9oN9lZTb57jmITUX9hKrS1ECeQBk03S uum4rdEmExyiXxBBqL5U/xJH8Sv4h/lanMYnT2F5bd9jgV6NI1e9AT6hd c=; X-IronPort-AV: E=Sophos;i="5.93,214,1654560000"; d="scan'208";a="215058337" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-pdx-2b-2520d768.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-33001.sea14.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2022 15:21:52 +0000 Received: from EX13D37EUA003.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan3.pdx.amazon.com [10.236.137.198]) by email-inbound-relay-pdx-2b-2520d768.us-west-2.amazon.com (Postfix) with ESMTPS id F19A7449D3; Wed, 3 Aug 2022 15:21:49 +0000 (UTC) Received: from f4d4887fdcfb.ant.amazon.com (10.43.162.134) by EX13D37EUA003.ant.amazon.com (10.43.165.7) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Wed, 3 Aug 2022 15:21:45 +0000 From: To: CC: , , , , , , Subject: [PATCH 2/2] virt: vmgenid: add support for generation counter Date: Wed, 3 Aug 2022 17:21:27 +0200 Message-ID: <20220803152127.48281-3-bchalios@amazon.es> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220803152127.48281-1-bchalios@amazon.es> References: <20220803152127.48281-1-bchalios@amazon.es> MIME-Version: 1.0 X-Originating-IP: [10.43.162.134] X-ClientProxiedBy: EX13D07UWB003.ant.amazon.com (10.43.161.66) To EX13D37EUA003.ant.amazon.com (10.43.165.7) Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Babis Chalios VM Generation ID provides a means of reseeding kernel's RNG using a 128-bit UUID when a VM fork occurs, thus avoiding issues running multiple VMs with the exact same RNG state. However, user-space applications, such as user-space PRNGs and applications that maintain world-unique data, need a mechanism to handle VM fork events as well. To handle the user-space use-case, this: qemu patch extends Microsoft's original vmgenid specification adding an extra page which holds a single 32-bit generation counter, which increases every time a VM gets restored from a snapshot. This patch exposes the generation counter through a character device (`/dev/vmgenid`) that provides a `read` and `mmap` interface, for user-space applications to consume. Userspace applications should read this value before starting a transaction involving cached random bits and ensure that it has not changed while committing the transaction. It can be used from qemu using the `-device vmgenid,guid=3Dauto,genctr=3D42` parameter to start a VM with a generation counter with value 42. Reading 4 bytes from `/dev/vmgenid` will return the value 42. Next, use `savevm my_snapshot` in the monitor to snapshot the VM. Now, start another VM using `-device vmgenid,guid=3Dauto,genctr=3D43 -loadvm my_snapshot`. Reading now from `/dev/vmgenid` will return 43. Signed-off-by: Babis Chalios Reported-by: kernel test robot --- Documentation/virt/vmgenid.rst | 120 +++++++++++++++++++++++++++++++++ drivers/virt/vmgenid.c | 103 +++++++++++++++++++++++++++- 2 files changed, 221 insertions(+), 2 deletions(-) create mode 100644 Documentation/virt/vmgenid.rst diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst new file mode 100644 index 000000000..61c29e4a7 --- /dev/null +++ b/Documentation/virt/vmgenid.rst @@ -0,0 +1,120 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D +VMGENID +=3D=3D=3D=3D=3D=3D=3D + +The VM Generation ID (VMGENID) is a feature from Microsoft +(https://go.microsoft.com/fwlink/?LinkId=3D260709) supported by multiple +hypervisor vendors. + +Its purpose is to help tackle issues occurying by duplication of the state +of a Virtual Machine (VM) during events that cause a VM to "return back in +time", like snapshot and restore. It exposes a generation ID inside the VM= so +that applications that rely on world-wide unique or random data can check = if +that value has changed before committing transactions. + +Problem Definition +------------------ + +Often in its lifetime, a VM will get snapshotted and later it will be rest= ored +in that previous state. Moreover, one or more new VMs can be spawned from = this +snapshot. Both scenarios result in one or more VMs running with same RNG s= tate, +which makes early operations after restore that rely on randomness predict= able, +and thus render them insecure, for example TLS. + +Userspace PRNGs, as well as code that caches streams of random bits, to sp= eed +up latency critical applications, suffer from similar issues. + +Apart from concerns related with cryptography, userspace applications oper= ating +with (what they consider to be) unique data, such as UUIDs, are affected by +spawning of multiple VMs from the same snapshot. + +VMGENID tackles the issue by providing a unique (not random) 128-bits +identifier every time a VM is restored from a snapshot. The identifier is = used +to reseed the kernel's RNG ensuring that different VMs spawned from the sa= me +snapshot will observe different streams of random data. + +Notice that VMGENID does not eliminate the problem but it significantly re= duces +the window in which the system's RNG will produce identical data across +different VMs. + +Reseeding the kernel's RNG tackles the issue of duplicated random values +provided by the kernel, however it does little to address the issue of +userspace applications that use world-unique data. The UUID defined by the +original VMGENID specification is used to reseed the RNG, so it cannot be +exposed to the userspace. This class of applications need a separate API w= hich +they can consume in order to detect VM restore events and adapt accordingl= y. + +In that front, VMGENID has been extended to expose to userspace an additio= nal +32 bits generation counter, which acts as a notification mechanism for res= tore +events. The value of the counter after a VM restore will be different than +its value when the snapshot was taken in order to signal to userspace that +a VM restore has occurred. + +VMGENID in Linux +---------------- + +Linux kernel uses the 128-bits UUID of VMGENID to reseed the RNG every tim= e an +ACPI notification arrives. Moreover, it exposes the 32-bits generation cou= nter +through a character device ``/dev/vmgenid``. The device supports ``read()`` +and ``mmap`` for user space applications to monitor restore events: + +``read()``: +Read always returns the first 4 bytes of the page including the generation +counter. Partial reads and reads in offset other than 0 are not allowed and +return ``EINVAL``. + +``mmap()``: +It maps a single page in the address space of the userspace application. T= he +driver supports ``PROT_READ`` and ``MAP_SHARED``. Mapping with ``PROT_WRIT= E`` +will result in ``EPERM``, whereas mapping past the first page will result = in +``EINVAL``. + +A userspace application that caches random bits from the kernel should ens= ure +that the moment it actually wants to consume some of these bits the value = of +the generation counter equals its value when the bits were initially cache= d. +For example: + +``` +uint32_t *gen_cntr =3D mmaped_gen_counter(); +uint32_t cached_gen_cntr =3D *gen_cntr; +char *secret; + +for(;;) { + secret =3D get_secret(); + + // All good, not restore has happened. + if (cached_gen_cntr =3D=3D *gen_cntr) + break; + + // Generation counter has changed. We need to recreate caches and try = again + + cached_gen_cntr =3D *gen_cntr; + barrier(); + + // recreate secrets' cache + rebuild_cache(); +} + +consume_secret(secret); + +``` + +The driver for VMGENID lives under ``drivers/virt/vmgenid.c``. + +Using VMGENID +------------- + +https://git.qemu.org/?p=3Dqemu.git;a=3Dblob_plain;f=3Ddocs/specs/vmgenid.t= xt;hb=3Drefs/heads/master +describes how the VMGENID device can be used. First we start a VM passing = the +parameter `-device vmgenid,guid=3Dauto,genctr=3D42`. With this the UUID va= lue of +VMGENID will be populated with a UUID created by qemu and a generation cou= nter +of 42. Next, we can save the VM state from the monitor using the `savevm` +command. + +Now, we can start another VM from the same snapshot using the `-device +vmgenid,guid=3Dauto,genctr=3D43 -loadvm {snapshot}` options. This will upd= ate the +UUID with a new value generated by qemu and 43 for the generation counter = in +memory before resuming the vcpus and then send an appropriate ACPI notific= ation +to the guest. diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c index 0cc2fe0f4..1cb0b3560 100644 --- a/drivers/virt/vmgenid.c +++ b/drivers/virt/vmgenid.c @@ -11,6 +11,10 @@ #include #include #include +#include "linux/container_of.h" +#include +#include +#include =20 ACPI_MODULE_NAME("vmgenid"); =20 @@ -19,6 +23,69 @@ enum { VMGENID_SIZE =3D 16 }; struct vmgenid_state { u8 *next_id; u8 this_id[VMGENID_SIZE]; + + phys_addr_t gen_cntr_addr; + u32 *next_counter; + + int misc_enabled; + struct miscdevice misc; +}; + +static int vmgenid_mmap(struct file *filep, struct vm_area_struct *vma) +{ + struct vmgenid_state *state =3D filep->private_data; + + if (vma->vm_pgoff || vma_pages(vma) > 1) + return -EINVAL; + + if ((vma->vm_flags & VM_WRITE)) + return -EPERM; + + vma->vm_flags |=3D VM_DONTEXPAND | VM_DONTDUMP; + vma->vm_flags &=3D ~VM_MAYWRITE; + + return vm_iomap_memory(vma, state->gen_cntr_addr, PAGE_SIZE); +} + +static ssize_t vmgenid_read(struct file *filep, char __user *buff, size_t = count, + loff_t *offp) +{ + struct vmgenid_state *state =3D filep->private_data; + + if (count =3D=3D 0) + return 0; + + /* We don't allow partial reads */ + if (count !=3D sizeof(u32)) + return -EINVAL; + + if (put_user(*state->next_counter, (u32 __user *)buff)) + return -EFAULT; + + return sizeof(u32); +} + +static int vmgenid_open(struct inode *inode, struct file *filep) +{ + struct vmgenid_state *state =3D + container_of(filep->private_data, struct vmgenid_state, misc); + + filep->private_data =3D state; + return 0; +} + +static const struct file_operations fops =3D { + .owner =3D THIS_MODULE, + .open =3D vmgenid_open, + .read =3D vmgenid_read, + .mmap =3D vmgenid_mmap, + .llseek =3D noop_llseek, +}; + +static struct miscdevice vmgenid_misc =3D { + .minor =3D MISC_DYNAMIC_MINOR, + .name =3D "vmgenid", + .fops =3D &fops, }; =20 static int parse_vmgenid_address(struct acpi_device *device, acpi_string o= bject_name, @@ -57,7 +124,7 @@ static int vmgenid_add(struct acpi_device *device) phys_addr_t phys_addr; int ret; =20 - state =3D devm_kmalloc(&device->dev, sizeof(*state), GFP_KERNEL); + state =3D devm_kzalloc(&device->dev, sizeof(*state), GFP_KERNEL); if (!state) return -ENOMEM; =20 @@ -74,6 +141,27 @@ static int vmgenid_add(struct acpi_device *device) =20 device->driver_data =3D state; =20 + /* Backwards compatibility. If CTRA is not there we just don't expose + * the char device + */ + ret =3D parse_vmgenid_address(device, "CTRA", &state->gen_cntr_addr); + if (ret) + return 0; + + state->next_counter =3D devm_memremap(&device->dev, state->gen_cntr_addr, + sizeof(u32), MEMREMAP_WB); + if (IS_ERR(state->next_counter)) + return 0; + + memcpy(&state->misc, &vmgenid_misc, sizeof(state->misc)); + ret =3D misc_register(&state->misc); + if (ret) { + devm_memunmap(&device->dev, state->next_counter); + return 0; + } + + state->misc_enabled =3D 1; + return 0; } =20 @@ -89,6 +177,16 @@ static void vmgenid_notify(struct acpi_device *device, = u32 event) add_vmfork_randomness(state->this_id, sizeof(state->this_id)); } =20 +static int vmgenid_remove(struct acpi_device *device) +{ + struct vmgenid_state *state =3D device->driver_data; + + if (state->misc_enabled) + misc_deregister(&state->misc); + + return 0; +} + static const struct acpi_device_id vmgenid_ids[] =3D { { "VMGENCTR", 0 }, { "VM_GEN_COUNTER", 0 }, @@ -101,7 +199,8 @@ static struct acpi_driver vmgenid_driver =3D { .owner =3D THIS_MODULE, .ops =3D { .add =3D vmgenid_add, - .notify =3D vmgenid_notify + .notify =3D vmgenid_notify, + .remove =3D vmgenid_remove } }; =20 --=20 2.32.1 (Apple Git-133) Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado= 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . H= oja M-401234 . CIF B84570936