[RFC PATCH 13/15] iommufd: Persist iommu domains for live update

Samiullah Khawaja posted 15 patches 3 days, 2 hours ago
[RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Samiullah Khawaja 3 days, 2 hours ago
From: YiFei Zhu <zhuyifei@google.com>

Iterate through all the IOAS objects and the underlying hwpt_paging
objects. Persist each iommu domain using API iommu_domain_preserve.

This is temporary as only the domains attached to the persisted devices
need to preserved.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
 drivers/iommu/iommufd/liveupdate.c | 47 ++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/iommu/iommufd/liveupdate.c b/drivers/iommu/iommufd/liveupdate.c
index 1bdd5a82af90..0af0c6fadff1 100644
--- a/drivers/iommu/iommufd/liveupdate.c
+++ b/drivers/iommu/iommufd/liveupdate.c
@@ -8,9 +8,52 @@
 #include <linux/kexec_handover.h>
 #include <linux/liveupdate.h>
 #include <linux/mm.h>
+#include <linux/pci.h>
 
 #include "iommufd_private.h"
 
+static int iommufd_save_ioas(struct iommufd_ctx *ictx,
+			     struct iommufd_lu *iommufd_lu)
+{
+	struct iommufd_hwpt_paging *hwpt_paging;
+	struct iommufd_ioas *ioas = NULL;
+	struct iommufd_object *obj;
+	unsigned long index;
+	int rc;
+
+	/* Iterate each ioas. */
+	xa_for_each(&ictx->objects, index, obj) {
+		if (obj->type != IOMMUFD_OBJ_IOAS)
+			continue;
+
+		ioas = (struct iommufd_ioas *)obj;
+		mutex_lock(&ioas->mutex);
+
+		/*
+		 * TODO: Iterate over each device of this iommufd and only save
+		 * hwpt/domain if the device is persisted.
+		 */
+		list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
+			if (!hwpt_paging->common.domain)
+				continue;
+
+			rc = iommu_domain_preserve(hwpt_paging->common.domain);
+			if (rc)
+				goto err;
+		}
+
+		mutex_unlock(&ioas->mutex);
+		ioas = NULL;
+	}
+
+	return 0;
+
+err:
+	if (ioas)
+		mutex_unlock(&ioas->mutex);
+	return rc;
+}
+
 static int iommufd_liveupdate_prepare(struct liveupdate_file_handler *handler,
 				      struct file *file, u64 *data)
 {
@@ -33,6 +76,10 @@ static int iommufd_liveupdate_prepare(struct liveupdate_file_handler *handler,
 
 	iommufd_lu = folio_address(folio_lu);
 
+	rc = iommufd_save_ioas(ictx, iommufd_lu);
+	if (rc)
+		goto err_folio_put;
+
 	rc = kho_preserve_folio(folio_lu);
 	if (rc)
 		goto err_folio_put;
-- 
2.51.0.536.g15c5d4f767-goog
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Jason Gunthorpe 2 days, 5 hours ago
On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> +			     struct iommufd_lu *iommufd_lu)
> +{
> +	struct iommufd_hwpt_paging *hwpt_paging;
> +	struct iommufd_ioas *ioas = NULL;
> +	struct iommufd_object *obj;
> +	unsigned long index;
> +	int rc;
> +
> +	/* Iterate each ioas. */
> +	xa_for_each(&ictx->objects, index, obj) {
> +		if (obj->type != IOMMUFD_OBJ_IOAS)
> +			continue;

Wrong locking

> +
> +		ioas = (struct iommufd_ioas *)obj;
> +		mutex_lock(&ioas->mutex);
> +
> +		/*
> +		 * TODO: Iterate over each device of this iommufd and only save
> +		 * hwpt/domain if the device is persisted.
> +		 */
> +		list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> +			if (!hwpt_paging->common.domain)
> +				continue;

I don't think this should be automatic. The user should directly
serialize/unserialize HWPTs by ID.

Jason
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Pasha Tatashin 1 day, 8 hours ago
On Mon, Sep 29, 2025 at 12:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > +                          struct iommufd_lu *iommufd_lu)
> > +{
> > +     struct iommufd_hwpt_paging *hwpt_paging;
> > +     struct iommufd_ioas *ioas = NULL;
> > +     struct iommufd_object *obj;
> > +     unsigned long index;
> > +     int rc;
> > +
> > +     /* Iterate each ioas. */
> > +     xa_for_each(&ictx->objects, index, obj) {
> > +             if (obj->type != IOMMUFD_OBJ_IOAS)
> > +                     continue;
>
> Wrong locking
>
> > +
> > +             ioas = (struct iommufd_ioas *)obj;
> > +             mutex_lock(&ioas->mutex);
> > +
> > +             /*
> > +              * TODO: Iterate over each device of this iommufd and only save
> > +              * hwpt/domain if the device is persisted.
> > +              */
> > +             list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > +                     if (!hwpt_paging->common.domain)
> > +                             continue;
>
> I don't think this should be automatic. The user should directly
> serialize/unserialize HWPTs by ID.

Why not?  Live Updated uAPI is handled through FDs, and both iommufd
and vfiofd have to be preserved; I assume we can automatically
determine the hwpt to be preserved through dependencies. Why would we
delegate this to the user?

Pasha
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Jason Gunthorpe 1 day, 7 hours ago
On Tue, Sep 30, 2025 at 09:07:48AM -0400, Pasha Tatashin wrote:
> On Mon, Sep 29, 2025 at 12:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > > +                          struct iommufd_lu *iommufd_lu)
> > > +{
> > > +     struct iommufd_hwpt_paging *hwpt_paging;
> > > +     struct iommufd_ioas *ioas = NULL;
> > > +     struct iommufd_object *obj;
> > > +     unsigned long index;
> > > +     int rc;
> > > +
> > > +     /* Iterate each ioas. */
> > > +     xa_for_each(&ictx->objects, index, obj) {
> > > +             if (obj->type != IOMMUFD_OBJ_IOAS)
> > > +                     continue;
> >
> > Wrong locking
> >
> > > +
> > > +             ioas = (struct iommufd_ioas *)obj;
> > > +             mutex_lock(&ioas->mutex);
> > > +
> > > +             /*
> > > +              * TODO: Iterate over each device of this iommufd and only save
> > > +              * hwpt/domain if the device is persisted.
> > > +              */
> > > +             list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > > +                     if (!hwpt_paging->common.domain)
> > > +                             continue;
> >
> > I don't think this should be automatic. The user should directly
> > serialize/unserialize HWPTs by ID.
> 
> Why not?  Live Updated uAPI is handled through FDs, and both iommufd
> and vfiofd have to be preserved; I assume we can automatically
> determine the hwpt to be preserved through dependencies. Why would we
> delegate this to the user?

There are HWPTs outside the IOAS so it is inconsisent.

We are not going to reconstruct the IOAS.

The IDR ids of the HWPT may not be available on restore (we cannot
make this ABI), so without userspace expressly labeling them and
recovering the new IDR ids it doesn't work.

Finally we expect to discard the preserved HWPTs and replace them we
rebuilt ones at least as a first step. Userspace needs to sequence all
of this..

Jason
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Samiullah Khawaja 1 day, 1 hour ago
On Tue, Sep 30, 2025 at 6:59 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Sep 30, 2025 at 09:07:48AM -0400, Pasha Tatashin wrote:
> > On Mon, Sep 29, 2025 at 12:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > > > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > > > +                          struct iommufd_lu *iommufd_lu)
> > > > +{
> > > > +     struct iommufd_hwpt_paging *hwpt_paging;
> > > > +     struct iommufd_ioas *ioas = NULL;
> > > > +     struct iommufd_object *obj;
> > > > +     unsigned long index;
> > > > +     int rc;
> > > > +
> > > > +     /* Iterate each ioas. */
> > > > +     xa_for_each(&ictx->objects, index, obj) {
> > > > +             if (obj->type != IOMMUFD_OBJ_IOAS)
> > > > +                     continue;
> > >
> > > Wrong locking
> > >
> > > > +
> > > > +             ioas = (struct iommufd_ioas *)obj;
> > > > +             mutex_lock(&ioas->mutex);
> > > > +
> > > > +             /*
> > > > +              * TODO: Iterate over each device of this iommufd and only save
> > > > +              * hwpt/domain if the device is persisted.
> > > > +              */
> > > > +             list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > > > +                     if (!hwpt_paging->common.domain)
> > > > +                             continue;
> > >
> > > I don't think this should be automatic. The user should directly
> > > serialize/unserialize HWPTs by ID.
> >
> > Why not?  Live Updated uAPI is handled through FDs, and both iommufd
> > and vfiofd have to be preserved; I assume we can automatically
> > determine the hwpt to be preserved through dependencies. Why would we
> > delegate this to the user?
>
> There are HWPTs outside the IOAS so it is inconsisent.

This makes sense. But if I understand correctly a HWPT should be
associated one way or another to a preserved device or IOAS. Also the
nested ones will have parent HWPT. Can we not look at the dependencies
here and find the HWPTs that need to preserved.
>
> We are not going to reconstruct the IOAS.
>
> The IDR ids of the HWPT may not be available on restore (we cannot
> make this ABI), so without userspace expressly labeling them and
> recovering the new IDR ids it doesn't work.
>
> Finally we expect to discard the preserved HWPTs and replace them we
> rebuilt ones at least as a first step. Userspace needs to sequence all
> of this..

But if we discard the old HWPTs and replace them with the new ones, we
shouldn't need labeling of the old HWPTs? We would definitely need to
sequence the replacement and discard of the old ones, but that can
also be inferred through the dependencies between the new HWPTs?
>
> Jason
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Jason Gunthorpe 1 day ago
On Tue, Sep 30, 2025 at 01:02:31PM -0700, Samiullah Khawaja wrote:
> > There are HWPTs outside the IOAS so it is inconsisent.
> 
> This makes sense. But if I understand correctly a HWPT should be
> associated one way or another to a preserved device or IOAS. Also the
> nested ones will have parent HWPT. Can we not look at the dependencies
> here and find the HWPTs that need to preserved.

Maybe in some capacity, but I would say more of don't allow preserving
things that depend on things not already preserved somehow.

> > Finally we expect to discard the preserved HWPTs and replace them we
> > rebuilt ones at least as a first step. Userspace needs to sequence all
> > of this..
> 
> But if we discard the old HWPTs and replace them with the new ones, we
> shouldn't need labeling of the old HWPTs? We would definitely need to
> sequence the replacement and discard of the old ones, but that can
> also be inferred through the dependencies between the new HWPTs?

It depends how this ends up being designed and who is responsible to
free the restored iommu_domain.

The iommu core code should be restoring the iommu_domain as soon as
the attached device is plugged in and attaching the preserved domain
instead of something else during the device probe sequence

This logic should not be in drivers.

From there you either put the hwpt back into iommufd and have it free
the iommu_domain when it destroys the hwpt

Or you have the iommu core code free the iommu_domain at some point
after iommufd has replaced the attachment with a new iommu_domain?

I'm not sure which is a better option..

Also there is an interesting behavior to note that if the iommu driver
restores a domain then it will also prevent a non-vfio driver from
binding to that device.

Jason
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Samiullah Khawaja 22 hours ago
On Tue, Sep 30, 2025 at 2:05 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Sep 30, 2025 at 01:02:31PM -0700, Samiullah Khawaja wrote:
> > > There are HWPTs outside the IOAS so it is inconsisent.
> >
> > This makes sense. But if I understand correctly a HWPT should be
> > associated one way or another to a preserved device or IOAS. Also the
> > nested ones will have parent HWPT. Can we not look at the dependencies
> > here and find the HWPTs that need to preserved.
>
> Maybe in some capacity, but I would say more of don't allow preserving
> things that depend on things not already preserved somehow.

I agree. I think this makes sense. Users can explicitly indicate that
they want to preserve HWPTs and iommufd can enforce the dependencies.
>
> > > Finally we expect to discard the preserved HWPTs and replace them we
> > > rebuilt ones at least as a first step. Userspace needs to sequence all
> > > of this..
> >
> > But if we discard the old HWPTs and replace them with the new ones, we
> > shouldn't need labeling of the old HWPTs? We would definitely need to
> > sequence the replacement and discard of the old ones, but that can
> > also be inferred through the dependencies between the new HWPTs?
>
> It depends how this ends up being designed and who is responsible to
> free the restored iommu_domain.

Agreed. I think it depends on how much is restored from the previous
kernel. Discussed further below inline.
>
> The iommu core code should be restoring the iommu_domain as soon as
> the attached device is plugged in and attaching the preserved domain
> instead of something else during the device probe sequence
>
> This logic should not be in drivers.
>
> From there you either put the hwpt back into iommufd and have it free
> the iommu_domain when it destroys the hwpt
>
> Or you have the iommu core code free the iommu_domain at some point
> after iommufd has replaced the attachment with a new iommu_domain?

But we cannot do the replacement during domain attachment because
userspace might not have fully prepared the new domain with all the
required DMA mappings. Replace during LUO finish?

This is actually very close to what I had in mind for the "Hotswap"
model. My thought was:

1. During boot, the IOMMU core sets up a default domain but doesn't
program the context entries for the preserved device. The hardware
keeps on using the old preserved tables.
2. Userspace restores the iommufd, creates a new HWPT/domain and
populates mappings.
3. On FINISH, the IOMMU core updates the context entries of preserved
devices to point to the new domain.

I have a sequence diagram for this in the cover letter also.

I understand the desire to have the preserved iommu domain be restored
during boot so the device has a default domain and there is an owner
of the attached restored domain, but that would prevent the iommfud
from cooking a clean new domain.

Maybe we can refine the "Hotswap" model I had in mind. Basically on
boot the core restores the preserved iommu domain, but core lets
iommufd attach a new domain with preserved devices without replacing
the underlying context entries? The core replaces the context entries
when the iommufd indicates that the domain is fully prepared (during
luo finish).
>
> I'm not sure which is a better option..
>
> Also there is an interesting behavior to note that if the iommu driver
> restores a domain then it will also prevent a non-vfio driver from
> binding to that device.

Agreed. I think in the "Hotswap" approach I discussed above, if we
don't restore the domain, the core can just commit the context entries
of the new default domain if a non-vfio driver is bound to the device.
>
> Jason
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Jason Gunthorpe 9 hours ago
On Tue, Sep 30, 2025 at 04:15:43PM -0700, Samiullah Khawaja wrote:

> > The iommu core code should be restoring the iommu_domain as soon as
> > the attached device is plugged in and attaching the preserved domain
> > instead of something else during the device probe sequence
> >
> > This logic should not be in drivers.
> >
> > From there you either put the hwpt back into iommufd and have it free
> > the iommu_domain when it destroys the hwpt
> >
> > Or you have the iommu core code free the iommu_domain at some point
> > after iommufd has replaced the attachment with a new iommu_domain?
> 
> But we cannot do the replacement during domain attachment because
> userspace might not have fully prepared the new domain with all the
> required DMA mappings. Replace during LUO finish?

The idea is the kernel will restore the iommu_domain during early boot
in the iommu_core and then attach it. This should "rewrite" the IOMMU
HW context for that device with identical content. Drivers must be
enhanced to support this hitless rewrite (AMD and ARM are already
done).

At this point the kernel is operating normally with a normal domain
and a normal driver, no special luo stuff.

Later iommufd will come along and establish a HWPT that has an
identical translation. Then we replace the luo domain with the new
HWPT and free the luo domain.

> 1. During boot, the IOMMU core sets up a default domain but doesn't
> program the context entries for the preserved device. The hardware
> keeps on using the old preserved tables.

When the iommu driver first starts up it can take over the context
memory from the predecessor kernel. But it has to go through it and
clear out most of the context entries.

Only context entries belonging to devices marked for preservation
should be kept unchanged.

Later we probe the struct device to the iommu and do as I said above
to restore consistency.

> 2. Userspace restores the iommufd, creates a new HWPT/domain and
> populates mappings.

Yes

> 3. On FINISH, the IOMMU core updates the context entries of preserved
> devices to point to the new domain.

No, finish should never do anything on the restore path, IMHO. User
should directly attach the newly created HWPT when it is ready.

> I understand the desire to have the preserved iommu domain be restored
> during boot so the device has a default domain and there is an owner
> of the attached restored domain, but that would prevent the iommfud
> from cooking a clean new domain.

The "default domain" is the "DMA API domain" and it has to be created
and setup always. The change here is instead of attaching the default
domain we attach the luo restored domain at early boot.

This sets the device into an "owned" mode but vfio can still attach
and nothing prevents iommufd from building a new hwpt and attaching
it.

> Maybe we can refine the "Hotswap" model I had in mind. Basically on
> boot the core restores the preserved iommu domain, but core lets
> iommufd attach a new domain with preserved devices without replacing
> the underlying context entries? 

Replace the context entries. If everything is working properly the
preserved domain should compute an identical context entry, so no
reason to not just "replace" it which should be a NOP.

> > Also there is an interesting behavior to note that if the iommu driver
> > restores a domain then it will also prevent a non-vfio driver from
> > binding to that device.
> 
> Agreed. I think in the "Hotswap" approach I discussed above, if we
> don't restore the domain, the core can just commit the context entries
> of the new default domain if a non-vfio driver is bound to the device.

As I said, the owned nature of the device will prevent attaching a
non-vfio driver in the first place.

So the only path forward for userspace is to attach vfio, and then
iommufd should take over that luo restored iommu_domain and eventually
free it.

You might consider that finish should de-own the device if vfio didn't
claim it. But that is a bit tricky since it needs a FLR before the
domains can be switched around.

Jason
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Pasha Tatashin an hour ago
> > 3. On FINISH, the IOMMU core updates the context entries of preserved
> > devices to point to the new domain.
>
> No, finish should never do anything on the restore path, IMHO. User
> should directly attach the newly created HWPT when it is ready.

But, finish is our indicator that a particular session (VM) is out of
blackout, and now we are free to do slow things, such as
re-allocating/recreating page tables. Why start it before a VM is out
of blackout?
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Pasha Tatashin 1 day, 6 hours ago
On Tue, Sep 30, 2025 at 9:59 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Sep 30, 2025 at 09:07:48AM -0400, Pasha Tatashin wrote:
> > On Mon, Sep 29, 2025 at 12:00 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > > > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > > > +                          struct iommufd_lu *iommufd_lu)
> > > > +{
> > > > +     struct iommufd_hwpt_paging *hwpt_paging;
> > > > +     struct iommufd_ioas *ioas = NULL;
> > > > +     struct iommufd_object *obj;
> > > > +     unsigned long index;
> > > > +     int rc;
> > > > +
> > > > +     /* Iterate each ioas. */
> > > > +     xa_for_each(&ictx->objects, index, obj) {
> > > > +             if (obj->type != IOMMUFD_OBJ_IOAS)
> > > > +                     continue;
> > >
> > > Wrong locking
> > >
> > > > +
> > > > +             ioas = (struct iommufd_ioas *)obj;
> > > > +             mutex_lock(&ioas->mutex);
> > > > +
> > > > +             /*
> > > > +              * TODO: Iterate over each device of this iommufd and only save
> > > > +              * hwpt/domain if the device is persisted.
> > > > +              */
> > > > +             list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > > > +                     if (!hwpt_paging->common.domain)
> > > > +                             continue;
> > >
> > > I don't think this should be automatic. The user should directly
> > > serialize/unserialize HWPTs by ID.
> >
> > Why not?  Live Updated uAPI is handled through FDs, and both iommufd
> > and vfiofd have to be preserved; I assume we can automatically
> > determine the hwpt to be preserved through dependencies. Why would we
> > delegate this to the user?
>
> There are HWPTs outside the IOAS so it is inconsisent.
>
> We are not going to reconstruct the IOAS.
>
> The IDR ids of the HWPT may not be available on restore (we cannot
> make this ABI), so without userspace expressly labeling them and
> recovering the new IDR ids it doesn't work.
>
> Finally we expect to discard the preserved HWPTs and replace them we
> rebuilt ones at least as a first step. Userspace needs to sequence all
> of this..

The way LUOv4 is implemented, "LUO sessions" are always participating
LU. Once a user adds file descriptors to a session, that session and
its contents are automatically carried across multiple consecutive
live updates. The user only needs to act if they explicitly want to
remove an FD and opt-out of preservation, or close session. This is
consistent and convenient for long-running VM that should survive by
default.

I was hoping for a similar "preserve by default" or "opt-in-once"
model for iommufd objects that are put into the LUO session to avoid a
flurry of IOCTLs to re-register before every single live update.

On the other hand, userspace still has to issue IOCTLs after retrieval
to bring the restored FDs and associated objects back to a workable
state. Perhaps, we could do something like "Yes, I'm actively using
this object again, so please preserve it if another live update
happens." ?

Pasha
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Jason Gunthorpe 1 day, 4 hours ago
On Tue, Sep 30, 2025 at 11:09:59AM -0400, Pasha Tatashin wrote:
>
> The way LUOv4 is implemented, "LUO sessions" are always participating
> LU. Once a user adds file descriptors to a session, that session and
> its contents are automatically carried across multiple consecutive
> live updates. The user only needs to act if they explicitly want to
> remove an FD and opt-out of preservation, or close session. This is
> consistent and convenient for long-running VM that should survive by
> default.

I don't think this is a good idea. Each kernel should decide on its
own what and how things get included and manage the labels, from
scratch.

If you do this then alot more stuff becomes ABI and I think it will
turn into a huge PITA.

The userspace already has to have the code to setup the luo if it is
on a clean reboot - what is the point of not running that every time?

Jason
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Samiullah Khawaja 2 days, 3 hours ago
On Mon, Sep 29, 2025 at 9:00 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > +                          struct iommufd_lu *iommufd_lu)
> > +{
> > +     struct iommufd_hwpt_paging *hwpt_paging;
> > +     struct iommufd_ioas *ioas = NULL;
> > +     struct iommufd_object *obj;
> > +     unsigned long index;
> > +     int rc;
> > +
> > +     /* Iterate each ioas. */
> > +     xa_for_each(&ictx->objects, index, obj) {
> > +             if (obj->type != IOMMUFD_OBJ_IOAS)
> > +                     continue;
>
> Wrong locking
>
> > +
> > +             ioas = (struct iommufd_ioas *)obj;
> > +             mutex_lock(&ioas->mutex);
> > +
> > +             /*
> > +              * TODO: Iterate over each device of this iommufd and only save
> > +              * hwpt/domain if the device is persisted.
> > +              */
> > +             list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > +                     if (!hwpt_paging->common.domain)
> > +                             continue;
>
> I don't think this should be automatic. The user should directly
> serialize/unserialize HWPTs by ID.
Interesting. So the user should be able to serialize/unserialize HWPTs
before the Live Update PREPARE event? But what if a device was marked
for preservation but the user never serialized the attached HWPT,
would that be considered an error during LUO PREPARE or should iommufd
serialize the remaining HWPTs here?
>
> Jason
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Pasha Tatashin 2 days, 3 hours ago
On Mon, Sep 29, 2025 at 1:32 PM Samiullah Khawaja <skhawaja@google.com> wrote:
>
> On Mon, Sep 29, 2025 at 9:00 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > > +                          struct iommufd_lu *iommufd_lu)
> > > +{
> > > +     struct iommufd_hwpt_paging *hwpt_paging;
> > > +     struct iommufd_ioas *ioas = NULL;
> > > +     struct iommufd_object *obj;
> > > +     unsigned long index;
> > > +     int rc;
> > > +
> > > +     /* Iterate each ioas. */
> > > +     xa_for_each(&ictx->objects, index, obj) {
> > > +             if (obj->type != IOMMUFD_OBJ_IOAS)
> > > +                     continue;
> >
> > Wrong locking
> >
> > > +
> > > +             ioas = (struct iommufd_ioas *)obj;
> > > +             mutex_lock(&ioas->mutex);
> > > +
> > > +             /*
> > > +              * TODO: Iterate over each device of this iommufd and only save
> > > +              * hwpt/domain if the device is persisted.
> > > +              */
> > > +             list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > > +                     if (!hwpt_paging->common.domain)
> > > +                             continue;
> >
> > I don't think this should be automatic. The user should directly
> > serialize/unserialize HWPTs by ID.
> Interesting. So the user should be able to serialize/unserialize HWPTs
> before the Live Update PREPARE event? But what if a device was marked
> for preservation but the user never serialized the attached HWPT,
> would that be considered an error during LUO PREPARE or should iommufd
> serialize the remaining HWPTs here?

Users ~can~ serialize their sessions before system-wide prepare event.
During prepare event all unserialized sessions and their FDs are going
to be serialized anyways.

Pasha

> >
> > Jason
Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
Posted by Jason Gunthorpe 2 days, 3 hours ago
On Mon, Sep 29, 2025 at 10:32:22AM -0700, Samiullah Khawaja wrote:
> On Mon, Sep 29, 2025 at 9:00 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Sun, Sep 28, 2025 at 07:06:21PM +0000, Samiullah Khawaja wrote:
> > > +static int iommufd_save_ioas(struct iommufd_ctx *ictx,
> > > +                          struct iommufd_lu *iommufd_lu)
> > > +{
> > > +     struct iommufd_hwpt_paging *hwpt_paging;
> > > +     struct iommufd_ioas *ioas = NULL;
> > > +     struct iommufd_object *obj;
> > > +     unsigned long index;
> > > +     int rc;
> > > +
> > > +     /* Iterate each ioas. */
> > > +     xa_for_each(&ictx->objects, index, obj) {
> > > +             if (obj->type != IOMMUFD_OBJ_IOAS)
> > > +                     continue;
> >
> > Wrong locking
> >
> > > +
> > > +             ioas = (struct iommufd_ioas *)obj;
> > > +             mutex_lock(&ioas->mutex);
> > > +
> > > +             /*
> > > +              * TODO: Iterate over each device of this iommufd and only save
> > > +              * hwpt/domain if the device is persisted.
> > > +              */
> > > +             list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) {
> > > +                     if (!hwpt_paging->common.domain)
> > > +                             continue;
> >
> > I don't think this should be automatic. The user should directly
> > serialize/unserialize HWPTs by ID.
> Interesting. So the user should be able to serialize/unserialize HWPTs
> before the Live Update PREPARE event? But what if a device was marked
> for preservation but the user never serialized the attached HWPT,
> would that be considered an error during LUO PREPARE or should iommufd
> serialize the remaining HWPTs here?

yes that would be an error

I also think your patch series is a bit upside down, you should
present the iommufd and core pieces first, then come with a driver
implementation last.

It will be easier to understand the context that having a driver
implementation appear out of no where with no callers..

And everything should be driven by iommufd in this step, the iommu
driver should not be magically auto-preserving itself. Just preserve
the drivers linked to devices being preserved by iommufd.

Jason