[RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal

Samiullah Khawaja posted 15 patches 3 days, 2 hours ago
[RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal
Posted by Samiullah Khawaja 3 days, 2 hours ago
Hotplugs should not be allowed when the live update state is not normal.
This means either we have preserved the state of IOMMU hardware units or
restoring the preserved state.

The live update semaphore read lock should be taken before checking the
live update state.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
 drivers/iommu/intel/dmar.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index ec975c73cfe6..248bc7e9b035 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -26,6 +26,7 @@
 #include <linux/dmi.h>
 #include <linux/slab.h>
 #include <linux/iommu.h>
+#include <linux/liveupdate.h>
 #include <linux/numa.h>
 #include <linux/limits.h>
 #include <asm/irq_remapping.h>
@@ -2357,6 +2358,10 @@ static int dmar_device_hotplug(acpi_handle handle, bool insert)
 	if (tmp == NULL)
 		return 0;
 
+	guard_liveupdate_state_read();
+	if (!liveupdate_state_normal())
+		return -EBUSY;
+
 	down_write(&dmar_global_lock);
 	if (insert)
 		ret = dmar_hotplug_insert(tmp);
-- 
2.51.0.536.g15c5d4f767-goog
Re: [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal
Posted by Jason Gunthorpe 2 days, 5 hours ago
On Sun, Sep 28, 2025 at 07:06:11PM +0000, Samiullah Khawaja wrote:
> Hotplugs should not be allowed when the live update state is not normal.
> This means either we have preserved the state of IOMMU hardware units or
> restoring the preserved state.
> 
> The live update semaphore read lock should be taken before checking the
> live update state.
> 
> Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> ---
>  drivers/iommu/intel/dmar.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index ec975c73cfe6..248bc7e9b035 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -26,6 +26,7 @@
>  #include <linux/dmi.h>
>  #include <linux/slab.h>
>  #include <linux/iommu.h>
> +#include <linux/liveupdate.h>
>  #include <linux/numa.h>
>  #include <linux/limits.h>
>  #include <asm/irq_remapping.h>
> @@ -2357,6 +2358,10 @@ static int dmar_device_hotplug(acpi_handle handle, bool insert)
>  	if (tmp == NULL)
>  		return 0;
>  
> +	guard_liveupdate_state_read();
> +	if (!liveupdate_state_normal())
> +		return -EBUSY;

Pasha, this is madness!

Exactly why I said we should not have these crazy globals, people are
just going to sprinkle them randomly everywhere with no possible way
of ever understanding why or what they even are supposed to protect!

There is no reason to block hotplug. Do the locking and state tracking
properly so you only manage the instances that need to participate in
luo because they are linked to already plugged devices that are also
participating in luo.

Jason
Re: [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal
Posted by Samiullah Khawaja 2 days, 4 hours ago
On Mon, Sep 29, 2025 at 8:51 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Sep 28, 2025 at 07:06:11PM +0000, Samiullah Khawaja wrote:
> > Hotplugs should not be allowed when the live update state is not normal.
> > This means either we have preserved the state of IOMMU hardware units or
> > restoring the preserved state.
> >
> > The live update semaphore read lock should be taken before checking the
> > live update state.
> >
> > Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> > ---
> >  drivers/iommu/intel/dmar.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> > index ec975c73cfe6..248bc7e9b035 100644
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -26,6 +26,7 @@
> >  #include <linux/dmi.h>
> >  #include <linux/slab.h>
> >  #include <linux/iommu.h>
> > +#include <linux/liveupdate.h>
> >  #include <linux/numa.h>
> >  #include <linux/limits.h>
> >  #include <asm/irq_remapping.h>
> > @@ -2357,6 +2358,10 @@ static int dmar_device_hotplug(acpi_handle handle, bool insert)
> >       if (tmp == NULL)
> >               return 0;
> >
> > +     guard_liveupdate_state_read();
> > +     if (!liveupdate_state_normal())
> > +             return -EBUSY;
>
> Pasha, this is madness!
>
> Exactly why I said we should not have these crazy globals, people are
> just going to sprinkle them randomly everywhere with no possible way
> of ever understanding why or what they even are supposed to protect!
>
> There is no reason to block hotplug. Do the locking and state tracking
> properly so you only manage the instances that need to participate in
> luo because they are linked to already plugged devices that are also
> participating in luo.
Agreed.

I'll rework this and do proper state tracking once I rebase on top of
LUOv4 for next revision.
>
> Jason
Re: [RFC PATCH 03/15] iommu/vt-d: Prevent hotplugs when live update state is not normal
Posted by Pasha Tatashin 2 days, 4 hours ago
On Mon, Sep 29, 2025 at 11:51 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Sep 28, 2025 at 07:06:11PM +0000, Samiullah Khawaja wrote:
> > Hotplugs should not be allowed when the live update state is not normal.
> > This means either we have preserved the state of IOMMU hardware units or
> > restoring the preserved state.
> >
> > The live update semaphore read lock should be taken before checking the
> > live update state.
> >
> > Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> > ---
> >  drivers/iommu/intel/dmar.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> > index ec975c73cfe6..248bc7e9b035 100644
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -26,6 +26,7 @@
> >  #include <linux/dmi.h>
> >  #include <linux/slab.h>
> >  #include <linux/iommu.h>
> > +#include <linux/liveupdate.h>
> >  #include <linux/numa.h>
> >  #include <linux/limits.h>
> >  #include <asm/irq_remapping.h>
> > @@ -2357,6 +2358,10 @@ static int dmar_device_hotplug(acpi_handle handle, bool insert)
> >       if (tmp == NULL)
> >               return 0;
> >
> > +     guard_liveupdate_state_read();
> > +     if (!liveupdate_state_normal())
> > +             return -EBUSY;
>
> Pasha, this is madness!
>
> Exactly why I said we should not have these crazy globals, people are
> just going to sprinkle them randomly everywhere with no possible way

We now have per session "state", so presumably, LUO should provide an interface:
"struct file" -> session LUO state.

We should probably add interfaces like these:

liveupdate_is_preserved(struct file *) -> return true if file is preserved.
liveupdate_state(struct file *) -> returns the current state (or
LIVEUPDATE_STATE_UNDEFINED if unpreserved) for the session to which
this FD belongs (or  (in the future we could improve to per FD
granularity, if needed, but I think per-session is going to be
scalable enought).
liveupdate_state_read_enter(struct file *) -> to protect state
transition for the session to which this file belongs.

> of ever understanding why or what they even are supposed to protect!
>
> There is no reason to block hotplug. Do the locking and state tracking

This makes sense, adding a new device should be fine.

> properly so you only manage the instances that need to participate in
> luo because they are linked to already plugged devices that are also
> participating in luo.

Pasha