drivers/iommu/intel/dmar.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Queued invalidation wait descriptor status is volatile in that IOMMU hardware
writes the data upon completion.
Use READ_ONCE() to prevent compiler optimizations which ensures memory
reads every time. As a side effect, READ_ONCE() also enforces strict types and
may add an extra instruction. But it should not have negative
performance impact since we use cpu_relax anyway and the extra time(by
adding an instruction) may allow IOMMU HW request cacheline ownership easier.
e.g. gcc 12.3
BEFORE:
81 38 ad de 00 00 cmpl $0x2,(%rax)
AFTER (with READ_ONCE())
772f: 8b 00 mov (%rax),%eax
7731: 3d ad de 00 00 cmp $0x2,%eax //status data is 32 bit
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
drivers/iommu/intel/dmar.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 304e84949ca7..1c8d3141cb55 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1446,7 +1446,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
*/
writel(qi->free_head << shift, iommu->reg + DMAR_IQT_REG);
- while (qi->desc_status[wait_index] != QI_DONE) {
+ while (READ_ONCE(qi->desc_status[wait_index]) != QI_DONE) {
/*
* We will leave the interrupts disabled, to prevent interrupt
* context to queue another cmd while a cmd is already submitted
--
2.25.1
On 2024/6/8 01:38, Jacob Pan wrote:
> Queued invalidation wait descriptor status is volatile in that IOMMU hardware
> writes the data upon completion.
>
> Use READ_ONCE() to prevent compiler optimizations which ensures memory
> reads every time. As a side effect, READ_ONCE() also enforces strict types and
> may add an extra instruction. But it should not have negative
> performance impact since we use cpu_relax anyway and the extra time(by
> adding an instruction) may allow IOMMU HW request cacheline ownership easier.
>
> e.g. gcc 12.3
> BEFORE:
> 81 38 ad de 00 00 cmpl $0x2,(%rax)
>
> AFTER (with READ_ONCE())
> 772f: 8b 00 mov (%rax),%eax
> 7731: 3d ad de 00 00 cmp $0x2,%eax //status data is 32 bit
>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> drivers/iommu/intel/dmar.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 304e84949ca7..1c8d3141cb55 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1446,7 +1446,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
> */
> writel(qi->free_head << shift, iommu->reg + DMAR_IQT_REG);
>
> - while (qi->desc_status[wait_index] != QI_DONE) {
> + while (READ_ONCE(qi->desc_status[wait_index]) != QI_DONE) {
> /*
> * We will leave the interrupts disabled, to prevent interrupt
> * context to queue another cmd while a cmd is already submitted
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
--
Regards,
Yi Liu
> From: Jacob Pan <jacob.jun.pan@linux.intel.com> > Sent: Saturday, June 8, 2024 1:38 AM > > Queued invalidation wait descriptor status is volatile in that IOMMU > hardware > writes the data upon completion. > > Use READ_ONCE() to prevent compiler optimizations which ensures memory > reads every time. As a side effect, READ_ONCE() also enforces strict types > and > may add an extra instruction. But it should not have negative > performance impact since we use cpu_relax anyway and the extra time(by > adding an instruction) may allow IOMMU HW request cacheline ownership > easier. I didn't get the meaning of the last sentence. > > e.g. gcc 12.3 > BEFORE: > 81 38 ad de 00 00 cmpl $0x2,(%rax) > > AFTER (with READ_ONCE()) > 772f: 8b 00 mov (%rax),%eax > 7731: 3d ad de 00 00 cmp $0x2,%eax //status data is 32 bit > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Do we need a fix tag here? otherwise looks good to me: Reviewed-by: Kevin Tian <kevin.tian@intel.com>
On Mon, 17 Jun 2024 03:04:36 +0000, "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Jacob Pan <jacob.jun.pan@linux.intel.com> > > Sent: Saturday, June 8, 2024 1:38 AM > > > > Queued invalidation wait descriptor status is volatile in that IOMMU > > hardware > > writes the data upon completion. > > > > Use READ_ONCE() to prevent compiler optimizations which ensures memory > > reads every time. As a side effect, READ_ONCE() also enforces strict > > types and > > may add an extra instruction. But it should not have negative > > performance impact since we use cpu_relax anyway and the extra time(by > > adding an instruction) may allow IOMMU HW request cacheline ownership > > easier. > > I didn't get the meaning of the last sentence. The wait descriptor is polled by the CPU and written by the IOMMU concurrently. The IOMMU needs to have the cacheline ownership before writing the status data to signal completion of the wait descriptor. If the CPU polling loop is very tight, it might make IOMMU request for ownership contentious/difficult. Since we already use pause (cpu_relax()) to ease the contention, adding an additional instruction mov (%rax),%eax Will make the cacheline even less contentious since it is just register mov, no memory access. > > > > e.g. gcc 12.3 > > BEFORE: > > 81 38 ad de 00 00 cmpl $0x2,(%rax) > > > > AFTER (with READ_ONCE()) > > 772f: 8b 00 mov (%rax),%eax > > 7731: 3d ad de 00 00 cmp $0x2,%eax //status data > > is 32 bit > > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> > > Do we need a fix tag here? I cannot find the exact commit, this is really old code. > otherwise looks good to me: > > Reviewed-by: Kevin Tian <kevin.tian@intel.com> Thanks, Jacob
© 2016 - 2026 Red Hat, Inc.