[PATCH v1 00/28] Introduce support for confidential guest reset

Ani Sinha posted 28 patches 1 month, 4 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251212150359.548787-1-anisinha@redhat.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>, "Michael S. Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, David Woodhouse <dwmw2@infradead.org>, Paul Durrant <paul@xen.org>, Bernhard Beschow <shentey@gmail.com>, Alex Williamson <alex@shazbot.org>, "Cédric Le Goater" <clg@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@kernel.org>, Peter Maydell <peter.maydell@linaro.org>, Marcelo Tosatti <mtosatti@redhat.com>, Zhao Liu <zhao1.liu@intel.com>, Song Gao <gaosong@loongson.cn>, Huacai Chen <chenhuacai@kernel.org>, Aurelien Jarno <aurelien@aurel32.net>, Jiaxun Yang <jiaxun.yang@flygoat.com>, Aleksandar Rikalo <arikalo@gmail.com>, Nicholas Piggin <npiggin@gmail.com>, Harsh Prateek Bora <harshpb@linux.ibm.com>, Chinmay Rath <rathc@linux.ibm.com>, Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <alistair.francis@wdc.com>, Weiwei Li <liwei1518@gmail.com>, Daniel Henrique Barboza <dbarboza@ventanamicro.com>, Liu Zhiwei <zhiwei_liu@linux.alibaba.com>, Halil Pasic <pasic@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Eric Farman <farman@linux.ibm.com>, Matthew Rosato <mjrosato@linux.ibm.com>, Ilya Leoshkevich <iii@linux.ibm.com>, Thomas Huth <thuth@redhat.com>
There is a newer version of this series
accel/kvm/kvm-all.c                         | 354 +++++++++++++++++---
accel/stubs/kvm-stub.c                      |  26 ++
hw/hyperv/vmbus.c                           |  30 ++
hw/i386/kvm/apic.c                          |  13 +
hw/i386/kvm/clock.c                         |  56 ++++
hw/i386/kvm/i8254.c                         |  84 +++--
hw/i386/kvm/xen_evtchn.c                    | 100 +++++-
hw/i386/x86-common.c                        |  50 ++-
hw/intc/openpic_kvm.c                       | 108 ++++--
hw/vfio/helpers.c                           |  81 ++++-
include/accel/accel-ops.h                   |   1 +
include/hw/i386/apic_internal.h             |   1 +
include/hw/i386/x86.h                       |   5 +-
include/system/confidential-guest-support.h |  27 ++
include/system/kvm.h                        |  54 +++
include/system/physmem.h                    |   1 +
system/physmem.c                            |  28 ++
system/runstate.c                           |  31 +-
target/arm/kvm.c                            |   5 +
target/i386/kvm/kvm.c                       | 189 +++++++++--
target/i386/kvm/tdx.c                       | 145 ++++++--
target/i386/kvm/tdx.h                       |   1 +
target/i386/kvm/xen-emu.c                   |  45 ++-
target/i386/sev.c                           | 110 +++++-
target/loongarch/kvm/kvm.c                  |   5 +
target/mips/kvm.c                           |   5 +
target/ppc/kvm.c                            |   5 +
target/riscv/kvm/kvm-cpu.c                  |   5 +
target/s390x/kvm/kvm.c                      |   5 +
29 files changed, 1382 insertions(+), 188 deletions(-)
[PATCH v1 00/28] Introduce support for confidential guest reset
Posted by Ani Sinha 1 month, 4 weeks ago
This change introduces support for confidential guests
(SEV-ES, SEV-SNP and TDX) to reset/reboot just like other non-confidential
guests. Currently, a reboot intiated from the confidential guest results
in termination of the QEMU hypervisor as the CPUs are not resettable. As the
initial state of the guest including private memory is locked and encrypted,
the contents of that memory will not be accessible post reset. Hence a new
KVM file descriptor must be opened to create a new confidential VM context
closing the old one. All KVM VM specific ioctls must be called again. New
VCPU file descriptors must be created against the new KVM fd and most VCPU
ioctls must be called again as well.

This change perfoms closing of the old KVM fd and creating a new one. After
the new KVM fd is opened, all generic and architecture specific ioctl calls
are issued again. Notifiers are added to notify subsystems that:
- The KVM file fd is about to be changed to state sync-ing from KVM to QEMU
  should be done if required.
- The KVM file fd has changed, so ioctl calls to the new KVM fd has to be
  performed again.
- That new VCPU fds are created so that VCPU ioctl calls must be called again
  where required.

Specific subsystems use these notifiers to re-issue ioctl calls where required.

Changes are made to SEV and TDX modules to reinitialize the confidential guest
state and seal it again. Along the way, some bug fixes are made so that some
initialization functions can be called again. Some refactoring of existing
code is done so that both init and reset paths can use them.

Tested on TDX and SEV-SNP.
CI pipeline passes: https://gitlab.com/anisinha/qemu/-/pipelines/2211550528
Rebased on top of version 10.2.0-rc3

CC: pbonzini@redhat.com
CC: kraxel@redhat.com
CC: vkuznets@redhat.com

Ani Sinha (28):
  i386/kvm: avoid installing duplicate msr entries in msr_handlers
  hw/accel: add a per-accelerator callback to change VM accelerator
    handle
  system/physmem: add helper to reattach existing memory after KVM VM fd
    change
  accel/kvm: add changes required to support KVM VM file descriptor
    change
  accel/kvm: mark guest state as unprotected after vm file descriptor
    change
  accel/kvm: add a notifier to indicate KVM VM file descriptor has
    changed
  kvm/i386: implement architecture support for kvm file descriptor
    change
  hw/i386: refactor x86_bios_rom_init for reuse in confidential guest
    reset
  kvm/i386: reload firmware for confidential guest reset
  accel/kvm: Add notifier to inform that the KVM VM file fd is about to
    be changed
  accel/kvm: rebind current VCPUs to the new KVM VM file descriptor upon
    reset
  i386/tdx: refactor TDX firmware memory initialization code into a new
    function
  i386/tdx: finalize TDX guest state upon reset
  i386/tdx: add a pre-vmfd change notifier to reset tdx state
  i386/sev: add migration blockers only once
  i386/sev: add notifiers only once
  i386/sev: free existing launch update data and kernel hashes data on
    init
  i386/sev: add support for confidential guest reset
  hw/vfio: generate new file fd for pseudo device and rebind existing
    descriptors
  kvm/i8254: add support for confidential guest reset
  hw/hyperv/vmbus: add support for confidential guest reset
  accel/kvm: add a per-confidential class callback to unlock guest state
  kvm/xen-emu: re-initialize capabilities during confidential guest
    reset
  kvm/xen_evtchn: add support for confidential guest reset
  ppc/openpic: create a new openpic device and reattach mem region on
    coco reset
  kvm/vcpu: add notifiers to inform vcpu file descriptor change
  kvm/i386/apic: set local apic after vcpu file descriptors changed
  kvm/clock: add support for confidential guest reset

 accel/kvm/kvm-all.c                         | 354 +++++++++++++++++---
 accel/stubs/kvm-stub.c                      |  26 ++
 hw/hyperv/vmbus.c                           |  30 ++
 hw/i386/kvm/apic.c                          |  13 +
 hw/i386/kvm/clock.c                         |  56 ++++
 hw/i386/kvm/i8254.c                         |  84 +++--
 hw/i386/kvm/xen_evtchn.c                    | 100 +++++-
 hw/i386/x86-common.c                        |  50 ++-
 hw/intc/openpic_kvm.c                       | 108 ++++--
 hw/vfio/helpers.c                           |  81 ++++-
 include/accel/accel-ops.h                   |   1 +
 include/hw/i386/apic_internal.h             |   1 +
 include/hw/i386/x86.h                       |   5 +-
 include/system/confidential-guest-support.h |  27 ++
 include/system/kvm.h                        |  54 +++
 include/system/physmem.h                    |   1 +
 system/physmem.c                            |  28 ++
 system/runstate.c                           |  31 +-
 target/arm/kvm.c                            |   5 +
 target/i386/kvm/kvm.c                       | 189 +++++++++--
 target/i386/kvm/tdx.c                       | 145 ++++++--
 target/i386/kvm/tdx.h                       |   1 +
 target/i386/kvm/xen-emu.c                   |  45 ++-
 target/i386/sev.c                           | 110 +++++-
 target/loongarch/kvm/kvm.c                  |   5 +
 target/mips/kvm.c                           |   5 +
 target/ppc/kvm.c                            |   5 +
 target/riscv/kvm/kvm-cpu.c                  |   5 +
 target/s390x/kvm/kvm.c                      |   5 +
 29 files changed, 1382 insertions(+), 188 deletions(-)

-- 
2.42.0
Re: [PATCH v1 00/28] Introduce support for confidential guest reset
Posted by Daniel P. Berrangé 1 month, 3 weeks ago
On Fri, Dec 12, 2025 at 08:33:28PM +0530, Ani Sinha wrote:
> This change introduces support for confidential guests
> (SEV-ES, SEV-SNP and TDX) to reset/reboot just like other non-confidential
> guests. Currently, a reboot intiated from the confidential guest results
> in termination of the QEMU hypervisor as the CPUs are not resettable. As the
> initial state of the guest including private memory is locked and encrypted,
> the contents of that memory will not be accessible post reset. Hence a new
> KVM file descriptor must be opened to create a new confidential VM context
> closing the old one. All KVM VM specific ioctls must be called again. New
> VCPU file descriptors must be created against the new KVM fd and most VCPU
> ioctls must be called again as well.
> 
> This change perfoms closing of the old KVM fd and creating a new one. After
> the new KVM fd is opened, all generic and architecture specific ioctl calls
> are issued again. Notifiers are added to notify subsystems that:
> - The KVM file fd is about to be changed to state sync-ing from KVM to QEMU
>   should be done if required.
> - The KVM file fd has changed, so ioctl calls to the new KVM fd has to be
>   performed again.
> - That new VCPU fds are created so that VCPU ioctl calls must be called again
>   where required.

Presumably this re-opening of VCPU FDs means that all  the KVM vCPU PIDs
are going to change ?

If so, this is a significant semantic change that will break management
applications. vCPU PIDs are exposed in QMP and applications like libvirt
query them upon QEMU startup *BEFORE* vCPUs are started, and then do
things like setting CPU pinning or NUMA policies against them.

They cannot re-query the vCPU PIDs at time of reset, as by that point QEMU
has been running guest code, and so mgmt applications must assume that the
QEMU process (and thus QMP replies) are hostile. They cannot trust the vCPU
PIDs that would be reported as QEMU might have been compromised and now be
reporting vCPU PIDs of a completely different process as a form of DoS
against the mgmt app.

Can we get this reset functionality into KVM natively instead so QEMU
doesn't have todo this dance to re-create everything ? 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH v1 00/28] Introduce support for confidential guest reset
Posted by Paolo Bonzini 1 month, 3 weeks ago
On 12/15/25 11:38, Daniel P. Berrangé wrote:
> On Fri, Dec 12, 2025 at 08:33:28PM +0530, Ani Sinha wrote:
>> This change perfoms closing of the old KVM fd and creating a new one. After
>> the new KVM fd is opened, all generic and architecture specific ioctl calls
>> are issued again. Notifiers are added to notify subsystems that:
>> - The KVM file fd is about to be changed to state sync-ing from KVM to QEMU
>>    should be done if required.
>> - The KVM file fd has changed, so ioctl calls to the new KVM fd has to be
>>    performed again.
>> - That new VCPU fds are created so that VCPU ioctl calls must be called again
>>    where required.
> 
> Presumably this re-opening of VCPU FDs means that all  the KVM vCPU PIDs
> are going to change ?

As Ani said, no - the PIDs are attached to QEMU threads, not KVM file 
descriptors.

I can answer this though:

> Can we get this reset functionality into KVM natively instead so QEMU
> doesn't have todo this dance to re-create everything ?

The answer is no.  Unlike normal reset, resetting a confidential VMs 
entails performing all the encryption and measurement from scratch for 
memory and registers, and the data is not available to KVM anymore.

QEMU can retrieve it again, just like it did when starting the original 
VM, but KVM does not save and therefore does not know the original 
contents of the memory.

Paolo


Re: [PATCH v1 00/28] Introduce support for confidential guest reset
Posted by Ani Sinha 1 month, 3 weeks ago

> On 15 Dec 2025, at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 12/15/25 11:38, Daniel P. Berrangé wrote:
>> On Fri, Dec 12, 2025 at 08:33:28PM +0530, Ani Sinha wrote:
>>> This change perfoms closing of the old KVM fd and creating a new one. After
>>> the new KVM fd is opened, all generic and architecture specific ioctl calls
>>> are issued again. Notifiers are added to notify subsystems that:
>>> - The KVM file fd is about to be changed to state sync-ing from KVM to QEMU
>>>   should be done if required.
>>> - The KVM file fd has changed, so ioctl calls to the new KVM fd has to be
>>>   performed again.
>>> - That new VCPU fds are created so that VCPU ioctl calls must be called again
>>>   where required.
>> Presumably this re-opening of VCPU FDs means that all  the KVM vCPU PIDs
>> are going to change ?
> 
> As Ani said, no - the PIDs are attached to QEMU threads, not KVM file descriptors.
> 
> I can answer this though:
> 
>> Can we get this reset functionality into KVM natively instead so QEMU
>> doesn't have todo this dance to re-create everything ?
> 
> The answer is no.  Unlike normal reset, resetting a confidential VMs entails performing all the encryption and measurement from scratch for memory and registers, and the data is not available to KVM anymore.

Wearing my FUKI hat, between resets one can also change the state, use a different starting CPU state, registers, firmware. So saving the old state in KVM would not do any good in that case either.

> 
> QEMU can retrieve it again, just like it did when starting the original VM, but KVM does not save and therefore does not know the original contents of the memory.
> 
> Paolo
> 
Re: [PATCH v1 00/28] Introduce support for confidential guest reset
Posted by Ani Sinha 1 month, 3 weeks ago

> On 15 Dec 2025, at 4:08 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> On Fri, Dec 12, 2025 at 08:33:28PM +0530, Ani Sinha wrote:
>> This change introduces support for confidential guests
>> (SEV-ES, SEV-SNP and TDX) to reset/reboot just like other non-confidential
>> guests. Currently, a reboot intiated from the confidential guest results
>> in termination of the QEMU hypervisor as the CPUs are not resettable. As the
>> initial state of the guest including private memory is locked and encrypted,
>> the contents of that memory will not be accessible post reset. Hence a new
>> KVM file descriptor must be opened to create a new confidential VM context
>> closing the old one. All KVM VM specific ioctls must be called again. New
>> VCPU file descriptors must be created against the new KVM fd and most VCPU
>> ioctls must be called again as well.
>> 
>> This change perfoms closing of the old KVM fd and creating a new one. After
>> the new KVM fd is opened, all generic and architecture specific ioctl calls
>> are issued again. Notifiers are added to notify subsystems that:
>> - The KVM file fd is about to be changed to state sync-ing from KVM to QEMU
>>  should be done if required.
>> - The KVM file fd has changed, so ioctl calls to the new KVM fd has to be
>>  performed again.
>> - That new VCPU fds are created so that VCPU ioctl calls must be called again
>>  where required.
> 
> Presumably this re-opening of VCPU FDs means that all  the KVM vCPU PIDs
> are going to change ?

Only vcpu file descriptor numbers are going to change, not the PID for the corresponding threads. The same thread is going to be used.

> 
> If so, this is a significant semantic change that will break management
> applications. vCPU PIDs are exposed in QMP and applications like libvirt
> query them upon QEMU startup *BEFORE* vCPUs are started, and then do
> things like setting CPU pinning or NUMA policies against them.
> 
> They cannot re-query the vCPU PIDs at time of reset, as by that point QEMU
> has been running guest code, and so mgmt applications must assume that the
> QEMU process (and thus QMP replies) are hostile. They cannot trust the vCPU
> PIDs that would be reported as QEMU might have been compromised and now be
> reporting vCPU PIDs of a completely different process as a form of DoS
> against the mgmt app.
> 
> Can we get this reset functionality into KVM natively instead so QEMU
> doesn't have todo this dance to re-create everything ? 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|