[PATCH 0/7] qemu: tpm: Add support for migration across shared storage

Stefan Berger posted 7 patches 1 year, 8 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20220822120554.3529999-1-stefanb@linux.ibm.com
There is a newer version of this series
docs/formatdomain.rst             | 16 ++++++++
src/conf/domain_conf.c            | 13 +++++++
src/conf/domain_conf.h            |  1 +
src/conf/schemas/domaincommon.rng |  5 +++
src/qemu/qemu_domain.c            | 12 +++---
src/qemu/qemu_domain.h            |  8 +++-
src/qemu/qemu_driver.c            | 20 +++++-----
src/qemu/qemu_extdevice.c         |  5 ++-
src/qemu/qemu_extdevice.h         |  3 +-
src/qemu/qemu_migration.c         | 13 ++++---
src/qemu/qemu_process.c           |  4 +-
src/qemu/qemu_snapshot.c          |  4 +-
src/qemu/qemu_tpm.c               | 61 ++++++++++++++++++++++++++-----
src/qemu/qemu_tpm.h               |  3 +-
src/util/virtpm.c                 |  1 +
src/util/virtpm.h                 |  1 +
16 files changed, 131 insertions(+), 39 deletions(-)
[PATCH 0/7] qemu: tpm: Add support for migration across shared storage
Posted by Stefan Berger 1 year, 8 months ago
This series of patches adds support for migrating vTPMs across hosts whose
storage has been set up to share the directory structure holding the state
of the TPM (swtpm). The domain XML is extended with a shared_storage
attribute that must be set to 'yes' when shared storage is used. It
influences the management of the directory structure holding the TPM state,
which for example is only to be removed when a domain is undefined (virsh
undefine) and not when a VM is removed on the migration source host.
Further, when shared storage is used security labeling on the destination
side is skipped assuming that the labeling was already done on the source
side.

I have tested this with an NFS setup where I had to turn SELinux off on
the hosts since the SELinux MLS range labeling is not supported.

Share storage migration requires the upcoming swtpm v0.8 with the PR
for shared storage merged: https://github.com/stefanberger/swtpm/pull/732

   Stefan

Stefan Berger (7):
  qemu: tpm: Pass parameter indicating reason for domain removal
  util: Add parsing support for swtpm's cmdarg-migration capability
  qemu: tpm: Conditionally create storage on incoming migration
  qemu: tpm: Pass --migration option to swtpm when using shared storage
  qemu: tpm: Avoid security labels on incoming migration with shared
    storage
  qemu: tpm: Remove TPM state files and directory only when undefining a
    VM
  qemu: config: Extend TPM domain XML with shared storage support

 docs/formatdomain.rst             | 16 ++++++++
 src/conf/domain_conf.c            | 13 +++++++
 src/conf/domain_conf.h            |  1 +
 src/conf/schemas/domaincommon.rng |  5 +++
 src/qemu/qemu_domain.c            | 12 +++---
 src/qemu/qemu_domain.h            |  8 +++-
 src/qemu/qemu_driver.c            | 20 +++++-----
 src/qemu/qemu_extdevice.c         |  5 ++-
 src/qemu/qemu_extdevice.h         |  3 +-
 src/qemu/qemu_migration.c         | 13 ++++---
 src/qemu/qemu_process.c           |  4 +-
 src/qemu/qemu_snapshot.c          |  4 +-
 src/qemu/qemu_tpm.c               | 61 ++++++++++++++++++++++++++-----
 src/qemu/qemu_tpm.h               |  3 +-
 src/util/virtpm.c                 |  1 +
 src/util/virtpm.h                 |  1 +
 16 files changed, 131 insertions(+), 39 deletions(-)

-- 
2.37.1
Re: [PATCH 0/7] qemu: tpm: Add support for migration across shared storage
Posted by Daniel P. Berrangé 1 year, 8 months ago
On Mon, Aug 22, 2022 at 08:05:47AM -0400, Stefan Berger wrote:
> This series of patches adds support for migrating vTPMs across hosts whose
> storage has been set up to share the directory structure holding the state
> of the TPM (swtpm). The domain XML is extended with a shared_storage
> attribute that must be set to 'yes' when shared storage is used. It

We don't require any 'shared_storage' attribute for disk images - we just
aim to "do the right thing" automatically. If we want to support shared
storage for TPM, then IMHO it should likewise be made transparent.

What's the thinking behind putting the TPM on shared storage though ?
The state is tiny so you're not going to notice it being transferred
in the regular migration stream, so there's no performance benefit
here.  Meanwhile it hurts security because we can't do SELinux isolation
on many NFS install, and the complexity of the dance around locking makes
the migration process more fragile. The pain we've had dealing with NFS
makes it really unappealing

> influences the management of the directory structure holding the TPM state,
> which for example is only to be removed when a domain is undefined (virsh
> undefine) and not when a VM is removed on the migration source host.
> Further, when shared storage is used security labeling on the destination
> side is skipped assuming that the labeling was already done on the source
> side.
> 
> I have tested this with an NFS setup where I had to turn SELinux off on
> the hosts since the SELinux MLS range labeling is not supported.

Have you tested in systems where 'root_squash' is enabled on the NFS
server, such that libvirt can't create the files as 'root' and then
chown them to 'qemu' later ? That's been a major source of pain related
to NFS for disks and snapshots historically.

> 
> Share storage migration requires the upcoming swtpm v0.8 with the PR
> for shared storage merged: https://github.com/stefanberger/swtpm/pull/732
> 
>    Stefan
> 
> Stefan Berger (7):
>   qemu: tpm: Pass parameter indicating reason for domain removal
>   util: Add parsing support for swtpm's cmdarg-migration capability
>   qemu: tpm: Conditionally create storage on incoming migration
>   qemu: tpm: Pass --migration option to swtpm when using shared storage
>   qemu: tpm: Avoid security labels on incoming migration with shared
>     storage
>   qemu: tpm: Remove TPM state files and directory only when undefining a
>     VM
>   qemu: config: Extend TPM domain XML with shared storage support
> 
>  docs/formatdomain.rst             | 16 ++++++++
>  src/conf/domain_conf.c            | 13 +++++++
>  src/conf/domain_conf.h            |  1 +
>  src/conf/schemas/domaincommon.rng |  5 +++
>  src/qemu/qemu_domain.c            | 12 +++---
>  src/qemu/qemu_domain.h            |  8 +++-
>  src/qemu/qemu_driver.c            | 20 +++++-----
>  src/qemu/qemu_extdevice.c         |  5 ++-
>  src/qemu/qemu_extdevice.h         |  3 +-
>  src/qemu/qemu_migration.c         | 13 ++++---
>  src/qemu/qemu_process.c           |  4 +-
>  src/qemu/qemu_snapshot.c          |  4 +-
>  src/qemu/qemu_tpm.c               | 61 ++++++++++++++++++++++++++-----
>  src/qemu/qemu_tpm.h               |  3 +-
>  src/util/virtpm.c                 |  1 +
>  src/util/virtpm.h                 |  1 +
>  16 files changed, 131 insertions(+), 39 deletions(-)
> 
> -- 
> 2.37.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 0/7] qemu: tpm: Add support for migration across shared storage
Posted by Stefan Berger 1 year, 8 months ago
On 8/22/22 12:35, Daniel P. Berrangé wrote:
> On Mon, Aug 22, 2022 at 08:05:47AM -0400, Stefan Berger wrote:
>> This series of patches adds support for migrating vTPMs across hosts whose
>> storage has been set up to share the directory structure holding the state
>> of the TPM (swtpm). The domain XML is extended with a shared_storage
>> attribute that must be set to 'yes' when shared storage is used. It
> 
> We don't require any 'shared_storage' attribute for disk images - we just
> aim to "do the right thing" automatically. If we want to support shared
> storage for TPM, then IMHO it should likewise be made transparent.
> 
> What's the thinking behind putting the TPM on shared storage though ?

It's drive by swtpm user(s):

https://github.com/stefanberger/swtpm/pull/732

The driving force is having the state available on the destination to 
restart a VM there if the original host failed. Allegedly all hosts in 
their setup would share the necessary storage to be able to do that with 
TPM state but then presumably also with the disk image(s).

> The state is tiny so you're not going to notice it being transferred

Tiny is relative to disk sizes. It can become ~260kb or so, depending on 
how much is stored in the NVRAM areas, but yes, it's comparably small.

> in the regular migration stream, so there's no performance benefit
> here.  Meanwhile it hurts security because we can't do SELinux isolation
> on many NFS install, and the complexity of the dance around locking makes
> the migration process more fragile. The pain we've had dealing with NFS
> makes it really unappealing


Others would want to use CephFS for it I heard and I am not sure what 
the pain points with this shared storage tech are.

> 
>> influences the management of the directory structure holding the TPM state,
>> which for example is only to be removed when a domain is undefined (virsh
>> undefine) and not when a VM is removed on the migration source host.
>> Further, when shared storage is used security labeling on the destination
>> side is skipped assuming that the labeling was already done on the source
>> side.
>>
>> I have tested this with an NFS setup where I had to turn SELinux off on
>> the hosts since the SELinux MLS range labeling is not supported.
> 
> Have you tested in systems where 'root_squash' is enabled on the NFS
> server, such that libvirt can't create the files as 'root' and then
> chown them to 'qemu' later ? That's been a major source of pain related
> to NFS for disks and snapshots historically.

Yes, that doesn't seem to work. I had used it with no_root_squash before.

# virsh start testVM
error: Failed to start domain 'testVM'
error: internal error: Could not create directory 
/var/lib/libvirt/swtpm/ecc221c4-6bb9-423f-ac31-72244fdbb1a1/tpm2 as 59:59

    Stefan
Re: [PATCH 0/7] qemu: tpm: Add support for migration across shared storage
Posted by Daniel P. Berrangé 1 year, 8 months ago
On Mon, Aug 22, 2022 at 02:52:26PM -0400, Stefan Berger wrote:
> On 8/22/22 12:35, Daniel P. Berrangé wrote:
> > On Mon, Aug 22, 2022 at 08:05:47AM -0400, Stefan Berger wrote:
> > > This series of patches adds support for migrating vTPMs across hosts whose
> > > storage has been set up to share the directory structure holding the state
> > > of the TPM (swtpm). The domain XML is extended with a shared_storage
> > > attribute that must be set to 'yes' when shared storage is used. It
> > 
> > We don't require any 'shared_storage' attribute for disk images - we just
> > aim to "do the right thing" automatically. If we want to support shared
> > storage for TPM, then IMHO it should likewise be made transparent.
> > 
> > What's the thinking behind putting the TPM on shared storage though ?
> 
> It's drive by swtpm user(s):
> 
> https://github.com/stefanberger/swtpm/pull/732
> 
> The driving force is having the state available on the destination to
> restart a VM there if the original host failed. Allegedly all hosts in their
> setup would share the necessary storage to be able to do that with TPM state
> but then presumably also with the disk image(s).

Ok, so use case is resilience rather than specifically live
migration.


> > The state is tiny so you're not going to notice it being transferred
> 
> Tiny is relative to disk sizes. It can become ~260kb or so, depending on how
> much is stored in the NVRAM areas, but yes, it's comparably small.

I meant 'tiny' in the sense that the time required to live migration
it is not measureably significant.  Compared with say migrating disk
storage, which could add hours to a live migration if it wasn't on
shared storage.

> > in the regular migration stream, so there's no performance benefit
> > here.  Meanwhile it hurts security because we can't do SELinux isolation
> > on many NFS install, and the complexity of the dance around locking makes
> > the migration process more fragile. The pain we've had dealing with NFS
> > makes it really unappealing
> 
> 
> Others would want to use CephFS for it I heard and I am not sure what the
> pain points with this shared storage tech are.
> 
> > 
> > > influences the management of the directory structure holding the TPM state,
> > > which for example is only to be removed when a domain is undefined (virsh
> > > undefine) and not when a VM is removed on the migration source host.
> > > Further, when shared storage is used security labeling on the destination
> > > side is skipped assuming that the labeling was already done on the source
> > > side.
> > > 
> > > I have tested this with an NFS setup where I had to turn SELinux off on
> > > the hosts since the SELinux MLS range labeling is not supported.
> > 
> > Have you tested in systems where 'root_squash' is enabled on the NFS
> > server, such that libvirt can't create the files as 'root' and then
> > chown them to 'qemu' later ? That's been a major source of pain related
> > to NFS for disks and snapshots historically.
> 
> Yes, that doesn't seem to work. I had used it with no_root_squash before.
> 
> # virsh start testVM
> error: Failed to start domain 'testVM'
> error: internal error: Could not create directory
> /var/lib/libvirt/swtpm/ecc221c4-6bb9-423f-ac31-72244fdbb1a1/tpm2 as 59:59
> 
>    Stefan
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 0/7] qemu: tpm: Add support for migration across shared storage
Posted by Stefan Berger 1 year, 8 months ago

On 8/23/22 08:55, Daniel P. Berrangé wrote:
> On Mon, Aug 22, 2022 at 02:52:26PM -0400, Stefan Berger wrote:
>> On 8/22/22 12:35, Daniel P. Berrangé wrote:
>>> On Mon, Aug 22, 2022 at 08:05:47AM -0400, Stefan Berger wrote:
>>>> This series of patches adds support for migrating vTPMs across hosts whose
>>>> storage has been set up to share the directory structure holding the state
>>>> of the TPM (swtpm). The domain XML is extended with a shared_storage
>>>> attribute that must be set to 'yes' when shared storage is used. It
>>>
>>> We don't require any 'shared_storage' attribute for disk images - we just
>>> aim to "do the right thing" automatically. If we want to support shared
>>> storage for TPM, then IMHO it should likewise be made transparent.
>>>
>>> What's the thinking behind putting the TPM on shared storage though ?
>>
>> It's drive by swtpm user(s):
>>
>> https://github.com/stefanberger/swtpm/pull/732
>>
>> The driving force is having the state available on the destination to
>> restart a VM there if the original host failed. Allegedly all hosts in their
>> setup would share the necessary storage to be able to do that with TPM state
>> but then presumably also with the disk image(s).
> 
> Ok, so use case is resilience rather than specifically live
> migration.

Yes, support for HA.

> 
> 
>>> The state is tiny so you're not going to notice it being transferred
>>
>> Tiny is relative to disk sizes. It can become ~260kb or so, depending on how
>> much is stored in the NVRAM areas, but yes, it's comparably small.
> 
> I meant 'tiny' in the sense that the time required to live migration
> it is not measureably significant.  Compared with say migrating disk
> storage, which could add hours to a live migration if it wasn't on
> shared storage.

I won't change QEMU for support of shared storage setups. It will 
continue migrating the TPM state. However, shared storage setups need 
special management support, which this series tries to address.