[PATCH v2 2/3] iommu/vt-d: Clear Present bit before tearing down context entry

Lu Baolu posted 3 patches 2 weeks, 4 days ago
[PATCH v2 2/3] iommu/vt-d: Clear Present bit before tearing down context entry
Posted by Lu Baolu 2 weeks, 4 days ago
When tearing down a context entry, the current implementation zeros the
entire 128-bit entry using multiple 64-bit writes. This creates a window
where the hardware can fetch a "torn" entry — where some fields are
already zeroed while the 'Present' bit is still set — leading to
unpredictable behavior or spurious faults.

While x86 provides strong write ordering, the compiler may reorder writes
to the two 64-bit halves of the context entry. Even without compiler
reordering, the hardware fetch is not guaranteed to be atomic with
respect to multiple CPU writes.

Align with the "Guidance to Software for Invalidations" in the VT-d spec
(Section 6.5.3.3) by implementing the recommended ownership handshake:

1. Clear only the 'Present' (P) bit of the context entry first to
   signal the transition of ownership from hardware to software.
2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
3. Perform the required cache and context-cache invalidation to ensure
   hardware no longer has cached references to the entry.
4. Fully zero out the entry only after the invalidation is complete.

Also, add a dma_wmb() to context_set_present() to ensure the entry
is fully initialized before the 'Present' bit becomes visible.

Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
Reported-by: Dmytro Maluka <dmaluka@chromium.org>
Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.h | 21 ++++++++++++++++++++-
 drivers/iommu/intel/iommu.c |  4 +++-
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 25c5e22096d4..599913fb65d5 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -900,7 +900,26 @@ static inline int pfn_level_offset(u64 pfn, int level)
 
 static inline void context_set_present(struct context_entry *context)
 {
-	context->lo |= 1;
+	u64 val;
+
+	dma_wmb();
+	val = READ_ONCE(context->lo) | 1;
+	WRITE_ONCE(context->lo, val);
+}
+
+/*
+ * Clear the Present (P) bit (bit 0) of a context table entry. This initiates
+ * the transition of the entry's ownership from hardware to software. The
+ * caller is responsible for fulfilling the invalidation handshake recommended
+ * by the VT-d spec, Section 6.5.3.3 (Guidance to Software for Invalidations).
+ */
+static inline void context_clear_present(struct context_entry *context)
+{
+	u64 val;
+
+	val = READ_ONCE(context->lo) & GENMASK_ULL(63, 1);
+	WRITE_ONCE(context->lo, val);
+	dma_wmb();
 }
 
 static inline void context_set_fault_enable(struct context_entry *context)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 134302fbcd92..c66cc51f9e51 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1240,10 +1240,12 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
 	}
 
 	did = context_domain_id(context);
-	context_clear_entry(context);
+	context_clear_present(context);
 	__iommu_flush_cache(iommu, context, sizeof(*context));
 	spin_unlock(&iommu->lock);
 	intel_context_flush_no_pasid(info, context, did);
+	context_clear_entry(context);
+	__iommu_flush_cache(iommu, context, sizeof(*context));
 }
 
 int __domain_setup_first_level(struct intel_iommu *iommu, struct device *dev,
-- 
2.43.0

RE: [PATCH v2 2/3] iommu/vt-d: Clear Present bit before tearing down context entry
Posted by Tian, Kevin 2 weeks, 3 days ago
> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, January 20, 2026 2:18 PM
> 
> When tearing down a context entry, the current implementation zeros the
> entire 128-bit entry using multiple 64-bit writes. This creates a window
> where the hardware can fetch a "torn" entry — where some fields are
> already zeroed while the 'Present' bit is still set — leading to
> unpredictable behavior or spurious faults.
> 
> While x86 provides strong write ordering, the compiler may reorder writes
> to the two 64-bit halves of the context entry. Even without compiler
> reordering, the hardware fetch is not guaranteed to be atomic with
> respect to multiple CPU writes.
> 
> Align with the "Guidance to Software for Invalidations" in the VT-d spec
> (Section 6.5.3.3) by implementing the recommended ownership handshake:
> 
> 1. Clear only the 'Present' (P) bit of the context entry first to
>    signal the transition of ownership from hardware to software.
> 2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
> 3. Perform the required cache and context-cache invalidation to ensure
>    hardware no longer has cached references to the entry.
> 4. Fully zero out the entry only after the invalidation is complete.
> 
> Also, add a dma_wmb() to context_set_present() to ensure the entry
> is fully initialized before the 'Present' bit becomes visible.
> 
> Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
> Reported-by: Dmytro Maluka <dmaluka@chromium.org>
> Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

btw there is a context_clear_entry() for copied context entry in
device_pasid_table_setup(), but this patch doesn't touch that
path. It seems to assume that no in-flight DMA will exist at that
point:

        if (context_copied(iommu, bus, devfn)) {
                context_clear_entry(context);
                ...
                /*
                 * At this point, the device is supposed to finish reset at
                 * its driver probe stage, so no in-flight DMA will exist,
                 * and we don't need to worry anymore hereafter.
                 */
                clear_context_copied(iommu, bus, devfn);

Is that guaranteed by all devices? from kdump feature p.o.v. if
that assumption is broken it just means potential DMA errors
in this transition window. But regarding to the issue which this
patch tries to fix, in-fly DMAs may lead to undesired behaviors
including memory corruption etc.

So, should it be fixed too?
Re: [PATCH v2 2/3] iommu/vt-d: Clear Present bit before tearing down context entry
Posted by Baolu Lu 2 weeks, 3 days ago
On 1/21/26 14:23, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Tuesday, January 20, 2026 2:18 PM
>>
>> When tearing down a context entry, the current implementation zeros the
>> entire 128-bit entry using multiple 64-bit writes. This creates a window
>> where the hardware can fetch a "torn" entry — where some fields are
>> already zeroed while the 'Present' bit is still set — leading to
>> unpredictable behavior or spurious faults.
>>
>> While x86 provides strong write ordering, the compiler may reorder writes
>> to the two 64-bit halves of the context entry. Even without compiler
>> reordering, the hardware fetch is not guaranteed to be atomic with
>> respect to multiple CPU writes.
>>
>> Align with the "Guidance to Software for Invalidations" in the VT-d spec
>> (Section 6.5.3.3) by implementing the recommended ownership handshake:
>>
>> 1. Clear only the 'Present' (P) bit of the context entry first to
>>     signal the transition of ownership from hardware to software.
>> 2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
>> 3. Perform the required cache and context-cache invalidation to ensure
>>     hardware no longer has cached references to the entry.
>> 4. Fully zero out the entry only after the invalidation is complete.
>>
>> Also, add a dma_wmb() to context_set_present() to ensure the entry
>> is fully initialized before the 'Present' bit becomes visible.
>>
>> Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
>> Reported-by: Dmytro Maluka <dmaluka@chromium.org>
>> Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> 
> btw there is a context_clear_entry() for copied context entry in
> device_pasid_table_setup(), but this patch doesn't touch that
> path. It seems to assume that no in-flight DMA will exist at that
> point:
> 
>          if (context_copied(iommu, bus, devfn)) {
>                  context_clear_entry(context);
>                  ...
>                  /*
>                   * At this point, the device is supposed to finish reset at
>                   * its driver probe stage, so no in-flight DMA will exist,
>                   * and we don't need to worry anymore hereafter.
>                   */
>                  clear_context_copied(iommu, bus, devfn);
> 
> Is that guaranteed by all devices? from kdump feature p.o.v. if
> that assumption is broken it just means potential DMA errors
> in this transition window. But regarding to the issue which this
> patch tries to fix, in-fly DMAs may lead to undesired behaviors
> including memory corruption etc.
> 
> So, should it be fixed too?

This path is triggered when the device driver has probed the device
(ensuring it has been reset) and then calls the kernel DMA API for the
first time. At this stage, there should be no in-flight DMAs. We can
apply the same logic here to improve code readability, but this is not a
bug that requires a fix. Or not?

Thanks,
baolu
RE: [PATCH v2 2/3] iommu/vt-d: Clear Present bit before tearing down context entry
Posted by Tian, Kevin 2 weeks, 3 days ago
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, January 21, 2026 3:29 PM
> 
> On 1/21/26 14:23, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Tuesday, January 20, 2026 2:18 PM
> >>
> >> When tearing down a context entry, the current implementation zeros
> the
> >> entire 128-bit entry using multiple 64-bit writes. This creates a window
> >> where the hardware can fetch a "torn" entry — where some fields are
> >> already zeroed while the 'Present' bit is still set — leading to
> >> unpredictable behavior or spurious faults.
> >>
> >> While x86 provides strong write ordering, the compiler may reorder
> writes
> >> to the two 64-bit halves of the context entry. Even without compiler
> >> reordering, the hardware fetch is not guaranteed to be atomic with
> >> respect to multiple CPU writes.
> >>
> >> Align with the "Guidance to Software for Invalidations" in the VT-d spec
> >> (Section 6.5.3.3) by implementing the recommended ownership
> handshake:
> >>
> >> 1. Clear only the 'Present' (P) bit of the context entry first to
> >>     signal the transition of ownership from hardware to software.
> >> 2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
> >> 3. Perform the required cache and context-cache invalidation to ensure
> >>     hardware no longer has cached references to the entry.
> >> 4. Fully zero out the entry only after the invalidation is complete.
> >>
> >> Also, add a dma_wmb() to context_set_present() to ensure the entry
> >> is fully initialized before the 'Present' bit becomes visible.
> >>
> >> Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
> >> Reported-by: Dmytro Maluka <dmaluka@chromium.org>
> >> Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> >
> > btw there is a context_clear_entry() for copied context entry in
> > device_pasid_table_setup(), but this patch doesn't touch that
> > path. It seems to assume that no in-flight DMA will exist at that
> > point:
> >
> >          if (context_copied(iommu, bus, devfn)) {
> >                  context_clear_entry(context);
> >                  ...
> >                  /*
> >                   * At this point, the device is supposed to finish reset at
> >                   * its driver probe stage, so no in-flight DMA will exist,
> >                   * and we don't need to worry anymore hereafter.
> >                   */
> >                  clear_context_copied(iommu, bus, devfn);
> >
> > Is that guaranteed by all devices? from kdump feature p.o.v. if
> > that assumption is broken it just means potential DMA errors
> > in this transition window. But regarding to the issue which this
> > patch tries to fix, in-fly DMAs may lead to undesired behaviors
> > including memory corruption etc.
> >
> > So, should it be fixed too?
> 
> This path is triggered when the device driver has probed the device
> (ensuring it has been reset) and then calls the kernel DMA API for the
> first time. At this stage, there should be no in-flight DMAs. We can
> apply the same logic here to improve code readability, but this is not a
> bug that requires a fix. Or not?
> 

device could be in whatever state when kdump is triggered. I'm not
sure whether all device drivers will reset the device at probe time.

Just thought that applying the same due diligence here could prevent
any undesired damage just in case. Not exactly for backporting, but
it's always good to have consistent logic to avoid special case based
on subtle assumptions...

Re: [PATCH v2 2/3] iommu/vt-d: Clear Present bit before tearing down context entry
Posted by Baolu Lu 2 weeks, 3 days ago
On 1/21/26 15:50, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Wednesday, January 21, 2026 3:29 PM
>>
>> On 1/21/26 14:23, Tian, Kevin wrote:
>>>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>>> Sent: Tuesday, January 20, 2026 2:18 PM
>>>>
>>>> When tearing down a context entry, the current implementation zeros
>> the
>>>> entire 128-bit entry using multiple 64-bit writes. This creates a window
>>>> where the hardware can fetch a "torn" entry — where some fields are
>>>> already zeroed while the 'Present' bit is still set — leading to
>>>> unpredictable behavior or spurious faults.
>>>>
>>>> While x86 provides strong write ordering, the compiler may reorder
>> writes
>>>> to the two 64-bit halves of the context entry. Even without compiler
>>>> reordering, the hardware fetch is not guaranteed to be atomic with
>>>> respect to multiple CPU writes.
>>>>
>>>> Align with the "Guidance to Software for Invalidations" in the VT-d spec
>>>> (Section 6.5.3.3) by implementing the recommended ownership
>> handshake:
>>>>
>>>> 1. Clear only the 'Present' (P) bit of the context entry first to
>>>>      signal the transition of ownership from hardware to software.
>>>> 2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
>>>> 3. Perform the required cache and context-cache invalidation to ensure
>>>>      hardware no longer has cached references to the entry.
>>>> 4. Fully zero out the entry only after the invalidation is complete.
>>>>
>>>> Also, add a dma_wmb() to context_set_present() to ensure the entry
>>>> is fully initialized before the 'Present' bit becomes visible.
>>>>
>>>> Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
>>>> Reported-by: Dmytro Maluka <dmaluka@chromium.org>
>>>> Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
>>>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>>>
>>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>>>
>>> btw there is a context_clear_entry() for copied context entry in
>>> device_pasid_table_setup(), but this patch doesn't touch that
>>> path. It seems to assume that no in-flight DMA will exist at that
>>> point:
>>>
>>>           if (context_copied(iommu, bus, devfn)) {
>>>                   context_clear_entry(context);
>>>                   ...
>>>                   /*
>>>                    * At this point, the device is supposed to finish reset at
>>>                    * its driver probe stage, so no in-flight DMA will exist,
>>>                    * and we don't need to worry anymore hereafter.
>>>                    */
>>>                   clear_context_copied(iommu, bus, devfn);
>>>
>>> Is that guaranteed by all devices? from kdump feature p.o.v. if
>>> that assumption is broken it just means potential DMA errors
>>> in this transition window. But regarding to the issue which this
>>> patch tries to fix, in-fly DMAs may lead to undesired behaviors
>>> including memory corruption etc.
>>>
>>> So, should it be fixed too?
>>
>> This path is triggered when the device driver has probed the device
>> (ensuring it has been reset) and then calls the kernel DMA API for the
>> first time. At this stage, there should be no in-flight DMAs. We can
>> apply the same logic here to improve code readability, but this is not a
>> bug that requires a fix. Or not?
>>
> 
> device could be in whatever state when kdump is triggered. I'm not
> sure whether all device drivers will reset the device at probe time.

Okay, agreed.

> Just thought that applying the same due diligence here could prevent
> any undesired damage just in case. Not exactly for backporting, but
> it's always good to have consistent logic to avoid special case based
> on subtle assumptions...

So I will add the following additional change:

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 34f4af4e9b5c..b63a71904cfb 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -840,7 +840,7 @@ static int device_pasid_table_setup(struct device 
*dev, u8 bus, u8 devfn)
         }

         if (context_copied(iommu, bus, devfn)) {
-               context_clear_entry(context);
+               context_clear_present(context);
                 __iommu_flush_cache(iommu, context, sizeof(*context));

                 /*
@@ -860,6 +860,9 @@ static int device_pasid_table_setup(struct device 
*dev, u8 bus, u8 devfn)
                 iommu->flush.flush_iotlb(iommu, 0, 0, 0, 
DMA_TLB_GLOBAL_FLUSH);
                 devtlb_invalidation_with_pasid(iommu, dev, IOMMU_NO_PASID);

+               context_clear_entry(context);
+               __iommu_flush_cache(iommu, context, sizeof(*context));
+
                 /*
                  * At this point, the device is supposed to finish reset at
                  * its driver probe stage, so no in-flight DMA will exist,

Appears good to you?

Thanks,
baolu
RE: [PATCH v2 2/3] iommu/vt-d: Clear Present bit before tearing down context entry
Posted by Tian, Kevin 2 weeks, 3 days ago
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, January 21, 2026 4:04 PM
> 
> On 1/21/26 15:50, Tian, Kevin wrote:
> >> From: Baolu Lu <baolu.lu@linux.intel.com>
> >> Sent: Wednesday, January 21, 2026 3:29 PM
> >>
> >> On 1/21/26 14:23, Tian, Kevin wrote:
> >>>> From: Lu Baolu <baolu.lu@linux.intel.com>
> >>>> Sent: Tuesday, January 20, 2026 2:18 PM
> >>>>
> >>>> When tearing down a context entry, the current implementation zeros
> >> the
> >>>> entire 128-bit entry using multiple 64-bit writes. This creates a window
> >>>> where the hardware can fetch a "torn" entry — where some fields are
> >>>> already zeroed while the 'Present' bit is still set — leading to
> >>>> unpredictable behavior or spurious faults.
> >>>>
> >>>> While x86 provides strong write ordering, the compiler may reorder
> >> writes
> >>>> to the two 64-bit halves of the context entry. Even without compiler
> >>>> reordering, the hardware fetch is not guaranteed to be atomic with
> >>>> respect to multiple CPU writes.
> >>>>
> >>>> Align with the "Guidance to Software for Invalidations" in the VT-d spec
> >>>> (Section 6.5.3.3) by implementing the recommended ownership
> >> handshake:
> >>>>
> >>>> 1. Clear only the 'Present' (P) bit of the context entry first to
> >>>>      signal the transition of ownership from hardware to software.
> >>>> 2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
> >>>> 3. Perform the required cache and context-cache invalidation to ensure
> >>>>      hardware no longer has cached references to the entry.
> >>>> 4. Fully zero out the entry only after the invalidation is complete.
> >>>>
> >>>> Also, add a dma_wmb() to context_set_present() to ensure the entry
> >>>> is fully initialized before the 'Present' bit becomes visible.
> >>>>
> >>>> Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
> >>>> Reported-by: Dmytro Maluka <dmaluka@chromium.org>
> >>>> Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
> >>>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> >>>
> >>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> >>>
> >>> btw there is a context_clear_entry() for copied context entry in
> >>> device_pasid_table_setup(), but this patch doesn't touch that
> >>> path. It seems to assume that no in-flight DMA will exist at that
> >>> point:
> >>>
> >>>           if (context_copied(iommu, bus, devfn)) {
> >>>                   context_clear_entry(context);
> >>>                   ...
> >>>                   /*
> >>>                    * At this point, the device is supposed to finish reset at
> >>>                    * its driver probe stage, so no in-flight DMA will exist,
> >>>                    * and we don't need to worry anymore hereafter.
> >>>                    */
> >>>                   clear_context_copied(iommu, bus, devfn);
> >>>
> >>> Is that guaranteed by all devices? from kdump feature p.o.v. if
> >>> that assumption is broken it just means potential DMA errors
> >>> in this transition window. But regarding to the issue which this
> >>> patch tries to fix, in-fly DMAs may lead to undesired behaviors
> >>> including memory corruption etc.
> >>>
> >>> So, should it be fixed too?
> >>
> >> This path is triggered when the device driver has probed the device
> >> (ensuring it has been reset) and then calls the kernel DMA API for the
> >> first time. At this stage, there should be no in-flight DMAs. We can
> >> apply the same logic here to improve code readability, but this is not a
> >> bug that requires a fix. Or not?
> >>
> >
> > device could be in whatever state when kdump is triggered. I'm not
> > sure whether all device drivers will reset the device at probe time.
> 
> Okay, agreed.
> 
> > Just thought that applying the same due diligence here could prevent
> > any undesired damage just in case. Not exactly for backporting, but
> > it's always good to have consistent logic to avoid special case based
> > on subtle assumptions...
> 
> So I will add the following additional change:
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 34f4af4e9b5c..b63a71904cfb 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -840,7 +840,7 @@ static int device_pasid_table_setup(struct device
> *dev, u8 bus, u8 devfn)
>          }
> 
>          if (context_copied(iommu, bus, devfn)) {
> -               context_clear_entry(context);
> +               context_clear_present(context);
>                  __iommu_flush_cache(iommu, context, sizeof(*context));
> 
>                  /*
> @@ -860,6 +860,9 @@ static int device_pasid_table_setup(struct device
> *dev, u8 bus, u8 devfn)
>                  iommu->flush.flush_iotlb(iommu, 0, 0, 0,
> DMA_TLB_GLOBAL_FLUSH);
>                  devtlb_invalidation_with_pasid(iommu, dev, IOMMU_NO_PASID);
> 
> +               context_clear_entry(context);
> +               __iommu_flush_cache(iommu, context, sizeof(*context));
> +
>                  /*
>                   * At this point, the device is supposed to finish reset at
>                   * its driver probe stage, so no in-flight DMA will exist,
> 
> Appears good to you?
> 

yes
Re: [PATCH v2 2/3] iommu/vt-d: Clear Present bit before tearing down context entry
Posted by Samiullah Khawaja 2 weeks, 4 days ago
On Mon, Jan 19, 2026 at 10:20 PM Lu Baolu <baolu.lu@linux.intel.com> wrote:
>
> When tearing down a context entry, the current implementation zeros the
> entire 128-bit entry using multiple 64-bit writes. This creates a window
> where the hardware can fetch a "torn" entry — where some fields are
> already zeroed while the 'Present' bit is still set — leading to
> unpredictable behavior or spurious faults.
>
> While x86 provides strong write ordering, the compiler may reorder writes
> to the two 64-bit halves of the context entry. Even without compiler
> reordering, the hardware fetch is not guaranteed to be atomic with
> respect to multiple CPU writes.
>
> Align with the "Guidance to Software for Invalidations" in the VT-d spec
> (Section 6.5.3.3) by implementing the recommended ownership handshake:
>
> 1. Clear only the 'Present' (P) bit of the context entry first to
>    signal the transition of ownership from hardware to software.
> 2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
> 3. Perform the required cache and context-cache invalidation to ensure
>    hardware no longer has cached references to the entry.
> 4. Fully zero out the entry only after the invalidation is complete.
>
> Also, add a dma_wmb() to context_set_present() to ensure the entry
> is fully initialized before the 'Present' bit becomes visible.
>
> Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
> Reported-by: Dmytro Maluka <dmaluka@chromium.org>
> Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.h | 21 ++++++++++++++++++++-
>  drivers/iommu/intel/iommu.c |  4 +++-
>  2 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 25c5e22096d4..599913fb65d5 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -900,7 +900,26 @@ static inline int pfn_level_offset(u64 pfn, int level)
>
>  static inline void context_set_present(struct context_entry *context)
>  {
> -       context->lo |= 1;
> +       u64 val;
> +
> +       dma_wmb();
> +       val = READ_ONCE(context->lo) | 1;
> +       WRITE_ONCE(context->lo, val);
> +}
> +
> +/*
> + * Clear the Present (P) bit (bit 0) of a context table entry. This initiates
> + * the transition of the entry's ownership from hardware to software. The
> + * caller is responsible for fulfilling the invalidation handshake recommended
> + * by the VT-d spec, Section 6.5.3.3 (Guidance to Software for Invalidations).
> + */
> +static inline void context_clear_present(struct context_entry *context)
> +{
> +       u64 val;
> +
> +       val = READ_ONCE(context->lo) & GENMASK_ULL(63, 1);
> +       WRITE_ONCE(context->lo, val);
> +       dma_wmb();
>  }
>
>  static inline void context_set_fault_enable(struct context_entry *context)
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 134302fbcd92..c66cc51f9e51 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1240,10 +1240,12 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
>         }
>
>         did = context_domain_id(context);
> -       context_clear_entry(context);
> +       context_clear_present(context);
>         __iommu_flush_cache(iommu, context, sizeof(*context));
>         spin_unlock(&iommu->lock);
>         intel_context_flush_no_pasid(info, context, did);
> +       context_clear_entry(context);
> +       __iommu_flush_cache(iommu, context, sizeof(*context));
>  }
>
>  int __domain_setup_first_level(struct intel_iommu *iommu, struct device *dev,
> --
> 2.43.0
>

Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Re: [PATCH v2 2/3] iommu/vt-d: Clear Present bit before tearing down context entry
Posted by Dmytro Maluka 2 weeks, 4 days ago
On Tue, Jan 20, 2026 at 02:18:13PM +0800, Lu Baolu wrote:
> When tearing down a context entry, the current implementation zeros the
> entire 128-bit entry using multiple 64-bit writes. This creates a window
> where the hardware can fetch a "torn" entry — where some fields are
> already zeroed while the 'Present' bit is still set — leading to
> unpredictable behavior or spurious faults.
> 
> While x86 provides strong write ordering, the compiler may reorder writes
> to the two 64-bit halves of the context entry. Even without compiler
> reordering, the hardware fetch is not guaranteed to be atomic with
> respect to multiple CPU writes.
> 
> Align with the "Guidance to Software for Invalidations" in the VT-d spec
> (Section 6.5.3.3) by implementing the recommended ownership handshake:
> 
> 1. Clear only the 'Present' (P) bit of the context entry first to
>    signal the transition of ownership from hardware to software.
> 2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
> 3. Perform the required cache and context-cache invalidation to ensure
>    hardware no longer has cached references to the entry.
> 4. Fully zero out the entry only after the invalidation is complete.
> 
> Also, add a dma_wmb() to context_set_present() to ensure the entry
> is fully initialized before the 'Present' bit becomes visible.
> 
> Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
> Reported-by: Dmytro Maluka <dmaluka@chromium.org>
> Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.h | 21 ++++++++++++++++++++-
>  drivers/iommu/intel/iommu.c |  4 +++-
>  2 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 25c5e22096d4..599913fb65d5 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -900,7 +900,26 @@ static inline int pfn_level_offset(u64 pfn, int level)
>  
>  static inline void context_set_present(struct context_entry *context)
>  {
> -	context->lo |= 1;
> +	u64 val;
> +
> +	dma_wmb();
> +	val = READ_ONCE(context->lo) | 1;

As IIRC Jason noted, READ_ONCE is not really necessary?

> +	WRITE_ONCE(context->lo, val);
> +}
> +
> +/*
> + * Clear the Present (P) bit (bit 0) of a context table entry. This initiates
> + * the transition of the entry's ownership from hardware to software. The
> + * caller is responsible for fulfilling the invalidation handshake recommended
> + * by the VT-d spec, Section 6.5.3.3 (Guidance to Software for Invalidations).
> + */
> +static inline void context_clear_present(struct context_entry *context)
> +{
> +	u64 val;
> +
> +	val = READ_ONCE(context->lo) & GENMASK_ULL(63, 1);

Maybe "& ~1ULL" would be a bit more readable? (and READ_ONCE not
necessary here either?)

Anyway,

Reviewed-by: Dmytro Maluka <dmaluka@chromium.org>

> +	WRITE_ONCE(context->lo, val);
> +	dma_wmb();
>  }
>  
>  static inline void context_set_fault_enable(struct context_entry *context)
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 134302fbcd92..c66cc51f9e51 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1240,10 +1240,12 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
>  	}
>  
>  	did = context_domain_id(context);
> -	context_clear_entry(context);
> +	context_clear_present(context);
>  	__iommu_flush_cache(iommu, context, sizeof(*context));
>  	spin_unlock(&iommu->lock);
>  	intel_context_flush_no_pasid(info, context, did);
> +	context_clear_entry(context);
> +	__iommu_flush_cache(iommu, context, sizeof(*context));
>  }
>  
>  int __domain_setup_first_level(struct intel_iommu *iommu, struct device *dev,
> -- 
> 2.43.0
>