[PATCH 8/8] qemu: migration: Don't remember seclabel for images shared from current host

Peter Krempa posted 8 patches 1 month, 3 weeks ago
There is a newer version of this series
[PATCH 8/8] qemu: migration: Don't remember seclabel for images shared from current host
Posted by Peter Krempa 1 month, 3 weeks ago
In case when the user exports images from current host and there is an
incoming migration from a remote host, security label remembering would
be possible but would attempt to remember the label allowing access to
the image as the image is already used by a VM on remote host.

To prevent remembering the wrong label, we'll skip the remembering of
the label for any shared resource, so that the code behaves identically
regardless of how the image is accessed.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
---
 src/qemu/qemu_migration.c | 49 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index 2c462fd5f5..ec8b77bab9 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -533,6 +533,53 @@ qemuMigrationDstPrepareStorage(virDomainObj *vm,
 }


+static void
+qemuMigrationDstPrepareDiskSeclabels(virDomainObj *vm,
+                                     size_t nmigrate_disks,
+                                     const char **migrate_disks,
+                                     unsigned int flags)
+{
+    qemuDomainObjPrivate *priv = vm->privateData;
+    g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(priv->driver);
+    size_t i;
+
+    /* In case when storage is exported from this host, security label
+     * remembering would behave differently compared to the host which mounts
+     * the exported filesystem. Specifically for incoming migration remembering
+     * a seclabel would remember a seclabel already allowing access to the image,
+     * which is not desired. Thus we skip remembering of seclabels for images
+     * which are local to this host but accessed in a shared way from another
+     * host.
+     */
+    if (!cfg->sharedFilesystems ||
+        cfg->sharedFilesystems[0] == NULL)
+        return;
+
+    for (i = 0; i < vm->def->ndisks; i++) {
+        virDomainDiskDef *disk = vm->def->disks[i];
+
+        /* We care only about existing local storage */
+        if (virStorageSourceIsEmpty(disk->src) ||
+            !virStorageSourceIsLocalStorage(disk->src))
+            continue;
+
+        /* Only paths which are on local filesystem but shared elsewhere are
+         * relevant */
+        if (!virFileIsSharedFSOverride(disk->src->path, cfg->sharedFilesystems))
+            continue;
+
+        /* Any storage that was migrated via NBD is technically fully local so
+         * we want seclabels remembered */
+        if (flags & (VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC)) {
+            if (qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks))
+                continue;
+        }
+
+        disk->src->seclabelSkipRemeber = true;
+    }
+}
+
+
 /**
  * qemuMigrationDstStartNBDServer:
  * @driver: qemu driver
@@ -3154,6 +3201,8 @@ qemuMigrationDstPrepareActive(virQEMUDriver *driver,
                                              dataFD[0])))
         goto error;

+    qemuMigrationDstPrepareDiskSeclabels(vm, nmigrate_disks, migrate_disks, flags);
+
     if (qemuProcessPrepareDomain(driver, vm, startFlags) < 0)
         goto error;

-- 
2.45.2
Re: [PATCH 8/8] qemu: migration: Don't remember seclabel for images shared from current host
Posted by Andrea Bolognani 1 month, 3 weeks ago
On Wed, Jul 24, 2024 at 05:34:32PM GMT, Peter Krempa wrote:
> +static void
> +qemuMigrationDstPrepareDiskSeclabels(virDomainObj *vm,
> +                                     size_t nmigrate_disks,
> +                                     const char **migrate_disks,
> +                                     unsigned int flags)
> +{
> +    qemuDomainObjPrivate *priv = vm->privateData;
> +    g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(priv->driver);
> +    size_t i;
> +
> +    /* In case when storage is exported from this host, security label
> +     * remembering would behave differently compared to the host which mounts
> +     * the exported filesystem. Specifically for incoming migration remembering
> +     * a seclabel would remember a seclabel already allowing access to the image,
> +     * which is not desired. Thus we skip remembering of seclabels for images
> +     * which are local to this host but accessed in a shared way from another
> +     * host.
> +     */
> +    if (!cfg->sharedFilesystems ||
> +        cfg->sharedFilesystems[0] == NULL)
> +        return;
> +
> +    for (i = 0; i < vm->def->ndisks; i++) {
> +        virDomainDiskDef *disk = vm->def->disks[i];

This only iterates over disks, so it's probably not surprising that
it fails when other types of storage are involved.

For example, if the domain uses UEFI, local -> remote migration works
but when attempting to migrate back I get

  error: Requested operation is not valid: Setting different SELinux
  label on /var/lib/libvirt/qemu/nvram/cirros-efi_VARS.fd which is
  already in use

If I add TPM into the mix, local -> remote migration fails with

  error: unable to lock
  /var/lib/libvirt/swtpm/50601047-47f7-4f45-9b41-9112dfaa3539/tpm2/.lock
  for metadata change: Resource temporarily unavailable

Setting remember_owner=0 in qemu.conf makes all these errors go away,
but of course that's a very big hammer and the idea here is to use a
much smaller, more targeted one.

I think it might be enough to extend this logic beyond disks so that
it covers UEFI and TPM too, but I'm not entirely sure those use a
virStorageSource behind the scenes. Hopefully that's the case and the
changes needed on top are minimal, but I figure you're in a better
position than me to look into it.

> +        /* We care only about existing local storage */
> +        if (virStorageSourceIsEmpty(disk->src) ||
> +            !virStorageSourceIsLocalStorage(disk->src))
> +            continue;
> +
> +        /* Only paths which are on local filesystem but shared elsewhere are
> +         * relevant */
> +        if (!virFileIsSharedFSOverride(disk->src->path, cfg->sharedFilesystems))
> +            continue;
> +
> +        /* Any storage that was migrated via NBD is technically fully local so
> +         * we want seclabels remembered */
> +        if (flags & (VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC)) {
> +            if (qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks))
> +                continue;
> +        }
> +
> +        disk->src->seclabelSkipRemeber = true;

You misspelled "remember" here.

> @@ -3154,6 +3201,8 @@ qemuMigrationDstPrepareActive(virQEMUDriver *driver,
>                                               dataFD[0])))
>          goto error;
>
> +    qemuMigrationDstPrepareDiskSeclabels(vm, nmigrate_disks, migrate_disks, flags);
> +
>      if (qemuProcessPrepareDomain(driver, vm, startFlags) < 0)
>          goto error;

One thing that concerns me is that the changes appear to only be on
the destination side of the migration. So only the remote -> local
path is affected. But what about the local -> remote path?

If we assume remember_owner=1, then obviously the various XATTRs
related to remembering will be set on domain startup. What will take
care of collecting that garbage once local -> remote migration has
been performed? I think things might be fine based on the fact that
the XATTRs seems to be dropped for disks even in the current state,
but I thought I'd mention this specifically just in case.

Thanks a lot for looking into this!

-- 
Andrea Bolognani / Red Hat / Virtualization
Re: [PATCH 8/8] qemu: migration: Don't remember seclabel for images shared from current host
Posted by Peter Krempa 1 month, 2 weeks ago
On Fri, Jul 26, 2024 at 07:47:57 -0700, Andrea Bolognani wrote:
> On Wed, Jul 24, 2024 at 05:34:32PM GMT, Peter Krempa wrote:
> > +static void
> > +qemuMigrationDstPrepareDiskSeclabels(virDomainObj *vm,
> > +                                     size_t nmigrate_disks,
> > +                                     const char **migrate_disks,
> > +                                     unsigned int flags)
> > +{
> > +    qemuDomainObjPrivate *priv = vm->privateData;
> > +    g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(priv->driver);
> > +    size_t i;
> > +
> > +    /* In case when storage is exported from this host, security label
> > +     * remembering would behave differently compared to the host which mounts
> > +     * the exported filesystem. Specifically for incoming migration remembering
> > +     * a seclabel would remember a seclabel already allowing access to the image,
> > +     * which is not desired. Thus we skip remembering of seclabels for images
> > +     * which are local to this host but accessed in a shared way from another
> > +     * host.
> > +     */
> > +    if (!cfg->sharedFilesystems ||
> > +        cfg->sharedFilesystems[0] == NULL)
> > +        return;
> > +
> > +    for (i = 0; i < vm->def->ndisks; i++) {
> > +        virDomainDiskDef *disk = vm->def->disks[i];
> 
> This only iterates over disks, so it's probably not surprising that
> it fails when other types of storage are involved.

Hmm, yeah.

> For example, if the domain uses UEFI, local -> remote migration works
> but when attempting to migrate back I get
> 
>   error: Requested operation is not valid: Setting different SELinux
>   label on /var/lib/libvirt/qemu/nvram/cirros-efi_VARS.fd which is
>   already in use

So I've kind of forgot about this one. That's mainly as sharing this
path is not actually needed for migration to work, as the contents are
shared inside the migration stream automatically.

Given though that users might want to share this one, so that they can
start the VM withouth migration on a different host and also we allow
setting the path to a more reasonable directory for sharing
(users should not share /var/lib/libvirt/qemu/nvram/, or anything
belonging to libvirt in the first place) we should treat this one the
same as disk sources. Good thing that the nvram path is now a virStorage
source.

Now this also shows that we've forgotten to give it the same treatment
that disk images get inside qemuProcessStop() (see below for
explanation.

> If I add TPM into the mix, local -> remote migration fails with
> 
>   error: unable to lock
>   /var/lib/libvirt/swtpm/50601047-47f7-4f45-9b41-9112dfaa3539/tpm2/.lock
>   for metadata change: Resource temporarily unavailable

This makes it look like the file is still locked, that might be
indication of a different problem as libvirt will not keep the lock
after updating the XATTRs.

> Setting remember_owner=0 in qemu.conf makes all these errors go away,
> but of course that's a very big hammer and the idea here is to use a
> much smaller, more targeted one.

Yup, as I've stated we must never suggest that users disable this
feature.

> I think it might be enough to extend this logic beyond disks so that
> it covers UEFI and TPM too, but I'm not entirely sure those use a
> virStorageSource behind the scenes. Hopefully that's the case and the

At least the R/W pflash device is handled by virStorageSource so that
should be easy.

> changes needed on top are minimal, but I figure you're in a better
> position than me to look into it.
> 
> > +        /* We care only about existing local storage */
> > +        if (virStorageSourceIsEmpty(disk->src) ||
> > +            !virStorageSourceIsLocalStorage(disk->src))
> > +            continue;
> > +
> > +        /* Only paths which are on local filesystem but shared elsewhere are
> > +         * relevant */
> > +        if (!virFileIsSharedFSOverride(disk->src->path, cfg->sharedFilesystems))
> > +            continue;
> > +
> > +        /* Any storage that was migrated via NBD is technically fully local so
> > +         * we want seclabels remembered */
> > +        if (flags & (VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC)) {
> > +            if (qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks))
> > +                continue;
> > +        }
> > +
> > +        disk->src->seclabelSkipRemeber = true;
> 
> You misspelled "remember" here.

Eh ... and copied it all over place :D

> 
> > @@ -3154,6 +3201,8 @@ qemuMigrationDstPrepareActive(virQEMUDriver *driver,
> >                                               dataFD[0])))
> >          goto error;
> >
> > +    qemuMigrationDstPrepareDiskSeclabels(vm, nmigrate_disks, migrate_disks, flags);
> > +
> >      if (qemuProcessPrepareDomain(driver, vm, startFlags) < 0)
> >          goto error;
> 
> One thing that concerns me is that the changes appear to only be on
> the destination side of the migration. So only the remote -> local
> path is affected. But what about the local -> remote path?

On the source side, at least for disks we simply nuke the
metadata(xattrs). In fact we do that in most cases when stopping the VM,
not only for migration.

See  qemuProcessStop(), in the loop which calls
qemuBlockRemoveImageMetadata(). That call removes the metadata from all
disk top level sources.

While the original intention might have been different (block jobs) it
also ensures that this behaves correctly in this corner case.

Now it seems that at least the varstore pflash virStorageSource requires
the same treatment there.

> If we assume remember_owner=1, then obviously the various XATTRs
> related to remembering will be set on domain startup. What will take
> care of collecting that garbage once local -> remote migration has
> been performed? I think things might be fine based on the fact that
> the XATTRs seems to be dropped for disks even in the current state,
> but I thought I'd mention this specifically just in case.

So apart from the code I've mentioned above which explicitly drops the
xattrs, they are also not shared via NFS. Thus the remote side itself
would not get them.

If we'd leak them though it would mean that a re-start or incoming
migration  on the source of
the migration would no longer work, so we need the same treatment for
the other resources as well.

I'll post another version adding the missing treatment for uefi, but I
don't think I care enough about TPMs to go digging how that stuff is
plumbed in or why it complains about locks being held which don't seem
to be held by libvirt.
Re: [PATCH 8/8] qemu: migration: Don't remember seclabel for images shared from current host
Posted by Andrea Bolognani 1 month, 2 weeks ago
On Mon, Jul 29, 2024 at 03:09:55PM GMT, Peter Krempa wrote:
> On Fri, Jul 26, 2024 at 07:47:57 -0700, Andrea Bolognani wrote:
> > For example, if the domain uses UEFI, local -> remote migration works
> > but when attempting to migrate back I get
> >
> >   error: Requested operation is not valid: Setting different SELinux
> >   label on /var/lib/libvirt/qemu/nvram/cirros-efi_VARS.fd which is
> >   already in use
>
> So I've kind of forgot about this one. That's mainly as sharing this
> path is not actually needed for migration to work, as the contents are
> shared inside the migration stream automatically.
>
> Given though that users might want to share this one, so that they can
> start the VM withouth migration on a different host and also we allow
> setting the path to a more reasonable directory for sharing
> (users should not share /var/lib/libvirt/qemu/nvram/, or anything
> belonging to libvirt in the first place)

What's wrong with sharing this? It's just another type of storage
after all. And it's true that in some cases you might be able to
start the domain even with the NVRAM file being missing, but that's
not true in the general case.

Or are you saying that it's okay to share the directory where NVRAM
files are stored, as long as it's not under /var/lib/libvirt?

> > If I add TPM into the mix, local -> remote migration fails with
> >
> >   error: unable to lock
> >   /var/lib/libvirt/swtpm/50601047-47f7-4f45-9b41-9112dfaa3539/tpm2/.lock
> >   for metadata change: Resource temporarily unavailable
>
> This makes it look like the file is still locked, that might be
> indication of a different problem as libvirt will not keep the lock
> after updating the XATTRs.

I think there were some interesting timing issues with swtpm
attempting to lock the file while libvirt was trying to perform its
own metadata bookkeeping. I need to revisit that because honestly
I've forgotten almost everything about it.

> > One thing that concerns me is that the changes appear to only be on
> > the destination side of the migration. So only the remote -> local
> > path is affected. But what about the local -> remote path?
>
> On the source side, at least for disks we simply nuke the
> metadata(xattrs). In fact we do that in most cases when stopping the VM,
> not only for migration.
>
> See  qemuProcessStop(), in the loop which calls
> qemuBlockRemoveImageMetadata(). That call removes the metadata from all
> disk top level sources.

I see, thanks for the pointer.

> > If we assume remember_owner=1, then obviously the various XATTRs
> > related to remembering will be set on domain startup. What will take
> > care of collecting that garbage once local -> remote migration has
> > been performed? I think things might be fine based on the fact that
> > the XATTRs seems to be dropped for disks even in the current state,
> > but I thought I'd mention this specifically just in case.
>
> So apart from the code I've mentioned above which explicitly drops the
> xattrs, they are also not shared via NFS. Thus the remote side itself
> would not get them.
>
> If we'd leak them though it would mean that a re-start or incoming
> migration  on the source of
> the migration would no longer work, so we need the same treatment for
> the other resources as well.

Right, that's exactly my concern. I have encountered at least one
case in which XATTRs not being cleaned up correctly on (failed?)
migration resulted in having to perform manual fixup tasks before the
domain would start again.

> I'll post another version adding the missing treatment for uefi, but I
> don't think I care enough about TPMs to go digging how that stuff is
> plumbed in or why it complains about locks being held which don't seem
> to be held by libvirt.

I will try to look into it myself on top of your updated patches. I'm
pretty sure TPM is a requirement for KubeVirt, so we definitely need
it to be handled correctly or the only recourse from their side will
be to set remember_owner=0 - and we've already established that we
don't want that to happen :)

-- 
Andrea Bolognani / Red Hat / Virtualization
Re: [PATCH 8/8] qemu: migration: Don't remember seclabel for images shared from current host
Posted by Peter Krempa 1 month, 2 weeks ago
On Mon, Jul 29, 2024 at 08:29:42 -0700, Andrea Bolognani wrote:
> On Mon, Jul 29, 2024 at 03:09:55PM GMT, Peter Krempa wrote:
> > On Fri, Jul 26, 2024 at 07:47:57 -0700, Andrea Bolognani wrote:
> > > For example, if the domain uses UEFI, local -> remote migration works
> > > but when attempting to migrate back I get
> > >
> > >   error: Requested operation is not valid: Setting different SELinux
> > >   label on /var/lib/libvirt/qemu/nvram/cirros-efi_VARS.fd which is
> > >   already in use
> >
> > So I've kind of forgot about this one. That's mainly as sharing this
> > path is not actually needed for migration to work, as the contents are
> > shared inside the migration stream automatically.
> >
> > Given though that users might want to share this one, so that they can
> > start the VM withouth migration on a different host and also we allow
> > setting the path to a more reasonable directory for sharing
> > (users should not share /var/lib/libvirt/qemu/nvram/, or anything
> > belonging to libvirt in the first place)
> 
> What's wrong with sharing this? It's just another type of storage
> after all.

Well, in case of this specific directory it'd be okay. But we had users
wanting to share the XML directory too. In general I'd suggest user not
share anything that belongs to libvirt.

In worst case they'd share /var/lib/libvirt, which also contains stuff
that must not be shared, such as the master keys for qemu, dnsmasq data
files etc.

Similarly, sharing /var/lib/libvirt/qemu as whole is a bad idea.

> And it's true that in some cases you might be able to
> start the domain even with the NVRAM file being missing, but that's
> not true in the general case.

Sure, this specific case will work. Arguably though you'd want to store
the nvrams in a location where you also have the VM images shared from,
so configuring specifically /var/li/libvirt/qemu/nvram to be shared
extra seems a bit weird.

> Or are you saying that it's okay to share the directory where NVRAM
> files are stored, as long as it's not under /var/lib/libvirt?

As said, the nvram directory itself is okay, but it's under a directory
where other libvirt stuff is shared so users might misinterpret that.

Either way, we need to be able to handle the NVRAM image labels properly
regardless of where they are stored.

> > > If I add TPM into the mix, local -> remote migration fails with
> > >
> > >   error: unable to lock
> > >   /var/lib/libvirt/swtpm/50601047-47f7-4f45-9b41-9112dfaa3539/tpm2/.lock
> > >   for metadata change: Resource temporarily unavailable
> >
> > This makes it look like the file is still locked, that might be
> > indication of a different problem as libvirt will not keep the lock
> > after updating the XATTRs.
> 
> I think there were some interesting timing issues with swtpm
> attempting to lock the file while libvirt was trying to perform its
> own metadata bookkeeping. I need to revisit that because honestly
> I've forgotten almost everything about it.

IIRC libvirt uses a specific block to apply the lock on so that it's
locking it only for internal purposes. If 'swtpm' is locking the same
block it might cause problems.

> > > One thing that concerns me is that the changes appear to only be on
> > > the destination side of the migration. So only the remote -> local
> > > path is affected. But what about the local -> remote path?
> >
> > On the source side, at least for disks we simply nuke the
> > metadata(xattrs). In fact we do that in most cases when stopping the VM,
> > not only for migration.
> >
> > See  qemuProcessStop(), in the loop which calls
> > qemuBlockRemoveImageMetadata(). That call removes the metadata from all
> > disk top level sources.
> 
> I see, thanks for the pointer.
> 
> > > If we assume remember_owner=1, then obviously the various XATTRs
> > > related to remembering will be set on domain startup. What will take
> > > care of collecting that garbage once local -> remote migration has
> > > been performed? I think things might be fine based on the fact that
> > > the XATTRs seems to be dropped for disks even in the current state,
> > > but I thought I'd mention this specifically just in case.
> >
> > So apart from the code I've mentioned above which explicitly drops the
> > xattrs, they are also not shared via NFS. Thus the remote side itself
> > would not get them.
> >
> > If we'd leak them though it would mean that a re-start or incoming
> > migration  on the source of
> > the migration would no longer work, so we need the same treatment for
> > the other resources as well.
> 
> Right, that's exactly my concern. I have encountered at least one
> case in which XATTRs not being cleaned up correctly on (failed?)
> migration resulted in having to perform manual fixup tasks before the
> domain would start again.

It'd be great to know when that happened.

> > I'll post another version adding the missing treatment for uefi, but I
> > don't think I care enough about TPMs to go digging how that stuff is
> > plumbed in or why it complains about locks being held which don't seem
> > to be held by libvirt.
> 
> I will try to look into it myself on top of your updated patches. I'm
> pretty sure TPM is a requirement for KubeVirt, so we definitely need
> it to be handled correctly or the only recourse from their side will
> be to set remember_owner=0 - and we've already established that we
> don't want that to happen :)
Re: [PATCH 8/8] qemu: migration: Don't remember seclabel for images shared from current host
Posted by Andrea Bolognani 1 month, 2 weeks ago
On Tue, Jul 30, 2024 at 12:21:56PM GMT, Peter Krempa wrote:
> On Mon, Jul 29, 2024 at 08:29:42 -0700, Andrea Bolognani wrote:
> > On Mon, Jul 29, 2024 at 03:09:55PM GMT, Peter Krempa wrote:
> > > Given though that users might want to share this one, so that they can
> > > start the VM withouth migration on a different host and also we allow
> > > setting the path to a more reasonable directory for sharing
> > > (users should not share /var/lib/libvirt/qemu/nvram/, or anything
> > > belonging to libvirt in the first place)
> >
> > What's wrong with sharing this? It's just another type of storage
> > after all.
>
> Well, in case of this specific directory it'd be okay. But we had users
> wanting to share the XML directory too. In general I'd suggest user not
> share anything that belongs to libvirt.
>
> In worst case they'd share /var/lib/libvirt, which also contains stuff
> that must not be shared, such as the master keys for qemu, dnsmasq data
> files etc.
>
> Similarly, sharing /var/lib/libvirt/qemu as whole is a bad idea.

Right, I agree with you that indiscriminately sharing libvirt-private
directories would be a terrible idea. I was talking about the NVRAM
location and that one only.

> > I think there were some interesting timing issues with swtpm
> > attempting to lock the file while libvirt was trying to perform its
> > own metadata bookkeeping. I need to revisit that because honestly
> > I've forgotten almost everything about it.
>
> IIRC libvirt uses a specific block to apply the lock on so that it's
> locking it only for internal purposes. If 'swtpm' is locking the same
> block it might cause problems.

IIRC libvirt wanted to acquire the lock opportunistically and
specifically for the scope of the relabel operation, while swtpm owns
the storage and so it wants to have a long-term exclusive lock on it.

> > I have encountered at least one
> > case in which XATTRs not being cleaned up correctly on (failed?)
> > migration resulted in having to perform manual fixup tasks before the
> > domain would start again.
>
> It'd be great to know when that happened.

Once I start looking into this again, on top of your updated patches,
I'll probably manage to hit the issue whether I like it or not. When
that happens, I'll make sure to write down some details this time
around.

-- 
Andrea Bolognani / Red Hat / Virtualization