[PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait

David Woodhouse posted 1 patch 5 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/0122cbabc0adcc3cf878f5fd7834d8f258c7a2f2.camel@infradead.org
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Yi Liu <yi.l.liu@intel.com>, "Clément Mathieu--Drif" <clement.mathieu--drif@eviden.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
hw/i386/intel_iommu.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
[PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by David Woodhouse 5 months ago
From: David Woodhouse <dwmw@amazon.co.uk>

FreeBSD does both, and this appears to be perfectly valid. The VT-d
spec even talks about the ordering (the status write should be done
first, unsurprisingly).

We certainly shouldn't assert() and abort QEMU if the guest asks for
both.

Fixes: ed7b8fbcfb88 ("intel-iommu: add supports for queued invalidation interface")
Closes: https://gitlab.com/qemu-project/qemu/-/issues/3028
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
v2:
 • Only generate the interrupt once.
 • Spaces around bitwise OR.

This stops QEMU crashing, but I still can't get FreeBSD to boot and use
CPUs with APIC ID > 255 using *either* Intel or AMD IOMMU with
interrupt remapping, or the native 15-bit APIC ID enlightenment.
cf. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288122


 hw/i386/intel_iommu.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 69d72ad35c..851c4656c5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2822,6 +2822,7 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
 {
     uint64_t mask[4] = {VTD_INV_DESC_WAIT_RSVD_LO, VTD_INV_DESC_WAIT_RSVD_HI,
                         VTD_INV_DESC_ALL_ONE, VTD_INV_DESC_ALL_ONE};
+    bool ret = true;
 
     if (!vtd_inv_desc_reserved_check(s, inv_desc, mask, false,
                                      __func__, "wait")) {
@@ -2833,8 +2834,6 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
         uint32_t status_data = (uint32_t)(inv_desc->lo >>
                                VTD_INV_DESC_WAIT_DATA_SHIFT);
 
-        assert(!(inv_desc->lo & VTD_INV_DESC_WAIT_IF));
-
         /* FIXME: need to be masked with HAW? */
         dma_addr_t status_addr = inv_desc->hi;
         trace_vtd_inv_desc_wait_sw(status_addr, status_data);
@@ -2843,18 +2842,22 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
                              &status_data, sizeof(status_data),
                              MEMTXATTRS_UNSPECIFIED)) {
             trace_vtd_inv_desc_wait_write_fail(inv_desc->hi, inv_desc->lo);
-            return false;
+            ret = false;
         }
-    } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
+    }
+
+    if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
         /* Interrupt flag */
         vtd_generate_completion_event(s);
-    } else {
+    }
+
+    if (!(inv_desc->lo & (VTD_INV_DESC_WAIT_IF | VTD_INV_DESC_WAIT_SW))) {
         error_report_once("%s: invalid wait desc: hi=%"PRIx64", lo=%"PRIx64
                           " (unknown type)", __func__, inv_desc->hi,
                           inv_desc->lo);
         return false;
     }
-    return true;
+    return ret;
 }
 
 static bool vtd_process_context_cache_desc(IntelIOMMUState *s,
-- 
2.43.0


Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by Michael Tokarev 4 months, 1 week ago
On 14.07.2025 11:00, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> FreeBSD does both, and this appears to be perfectly valid. The VT-d
> spec even talks about the ordering (the status write should be done
> first, unsurprisingly).
> 
> We certainly shouldn't assert() and abort QEMU if the guest asks for
> both.
> 
> Fixes: ed7b8fbcfb88 ("intel-iommu: add supports for queued invalidation interface")
> Closes: https://gitlab.com/qemu-project/qemu/-/issues/3028
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> v2:
>   • Only generate the interrupt once.
>   • Spaces around bitwise OR.
> 
> This stops QEMU crashing, but I still can't get FreeBSD to boot and use
> CPUs with APIC ID > 255 using *either* Intel or AMD IOMMU with
> interrupt remapping, or the native 15-bit APIC ID enlightenment.
> cf. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288122
> 
> 
>   hw/i386/intel_iommu.c | 15 +++++++++------
>   1 file changed, 9 insertions(+), 6 deletions(-)

This looks like a qemu-stable material (for 10.0).
Please let me know if it is not.

Thanks,

/mjt

Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by Yi Liu 5 months ago
Hi David,

On 2025/7/14 16:00, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> FreeBSD does both, and this appears to be perfectly valid. The VT-d
> spec even talks about the ordering (the status write should be done
> first, unsurprisingly).

interesting. Have you tried setting both flags on baremetal and the hw
gives you both the status code and an interrupt?

> We certainly shouldn't assert() and abort QEMU if the guest asks for
> both.
> 
> Fixes: ed7b8fbcfb88 ("intel-iommu: add supports for queued invalidation interface")
> Closes: https://gitlab.com/qemu-project/qemu/-/issues/3028
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> v2:
>   • Only generate the interrupt once.
>   • Spaces around bitwise OR.
> 
> This stops QEMU crashing, but I still can't get FreeBSD to boot and use
> CPUs with APIC ID > 255 using *either* Intel or AMD IOMMU with
> interrupt remapping, or the native 15-bit APIC ID enlightenment.
> cf. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288122
> 
> 
>   hw/i386/intel_iommu.c | 15 +++++++++------
>   1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 69d72ad35c..851c4656c5 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2822,6 +2822,7 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>   {
>       uint64_t mask[4] = {VTD_INV_DESC_WAIT_RSVD_LO, VTD_INV_DESC_WAIT_RSVD_HI,
>                           VTD_INV_DESC_ALL_ONE, VTD_INV_DESC_ALL_ONE};
> +    bool ret = true;
>   
>       if (!vtd_inv_desc_reserved_check(s, inv_desc, mask, false,
>                                        __func__, "wait")) {
> @@ -2833,8 +2834,6 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>           uint32_t status_data = (uint32_t)(inv_desc->lo >>
>                                  VTD_INV_DESC_WAIT_DATA_SHIFT);
>   
> -        assert(!(inv_desc->lo & VTD_INV_DESC_WAIT_IF));
> -
>           /* FIXME: need to be masked with HAW? */
>           dma_addr_t status_addr = inv_desc->hi;
>           trace_vtd_inv_desc_wait_sw(status_addr, status_data);
> @@ -2843,18 +2842,22 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
>                                &status_data, sizeof(status_data),
>                                MEMTXATTRS_UNSPECIFIED)) {
>               trace_vtd_inv_desc_wait_write_fail(inv_desc->hi, inv_desc->lo);
> -            return false;
> +            ret = false;
>           }
> -    } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
> +    }
> +
> +    if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
>           /* Interrupt flag */
>           vtd_generate_completion_event(s);
> -    } else {
> +    }
> +
> +    if (!(inv_desc->lo & (VTD_INV_DESC_WAIT_IF | VTD_INV_DESC_WAIT_SW))) {
>           error_report_once("%s: invalid wait desc: hi=%"PRIx64", lo=%"PRIx64
>                             " (unknown type)", __func__, inv_desc->hi,
>                             inv_desc->lo);
>           return false;
>       }

I think this "if branch" can be moved just after the inv_desc non-zero
reserved bit checking. Hence you don't need a ret at all. :) btw. I'm
also asking if VT-d spec allows it or not. So let's wait for a while..

> -    return true;
> +    return ret;
>   }
>   
>   static bool vtd_process_context_cache_desc(IntelIOMMUState *s,

-- 
Regards,
Yi Liu

Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by David Woodhouse 5 months ago
On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote:
>Hi David,
>
>On 2025/7/14 16:00, David Woodhouse wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>> 
>> FreeBSD does both, and this appears to be perfectly valid. The VT-d
>> spec even talks about the ordering (the status write should be done
>> first, unsurprisingly).
>
>interesting. Have you tried setting both flags on baremetal and the hw
>gives you both the status code and an interrupt?

I see no reason why it shouldn't. The spec (§6.5.2.8) gives no that the IF and SW bits should be mutually exclusive and even talks about ordering:

Section 6.5.2.11 describes queued invalidation ordering considerations. Hardware completes an 
invalidation wait command as follows:
• If a status write is specified in the wait descriptor (SW=1), hardware performs a coherent write of 
the status data to the status address.
• If an interrupt is requested in the wait descriptor (IF=1), hardware sets the IWC field in the 
Invalidation Completion Status Register. An invalidation completion interrupt may be generated as 
described in the following section



>I think this "if branch" can be moved just after the inv_desc non-zero
>reserved bit checking. Hence you don't need a ret at all. :)

We want to return false if the memory write fails, and the interrupt has to happen afterwards.

> btw. I'm
>also asking if VT-d spec allows it or not. So let's wait for a while..

Ok.
Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by Konstantin Belousov via 5 months ago
On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote:
> >Hi David,
> >
> >On 2025/7/14 16:00, David Woodhouse wrote:
> >> From: David Woodhouse <dwmw@amazon.co.uk>
> >> 
> >> FreeBSD does both, and this appears to be perfectly valid. The VT-d
> >> spec even talks about the ordering (the status write should be done
> >> first, unsurprisingly).
> >
> >interesting. Have you tried setting both flags on baremetal and the hw
> >gives you both the status code and an interrupt?
> 
> I see no reason why it shouldn't. The spec (§6.5.2.8) gives no that the IF and SW bits should be mutually exclusive and even talks about ordering:
> 
> Section 6.5.2.11 describes queued invalidation ordering considerations. Hardware completes an 
> invalidation wait command as follows:
> • If a status write is specified in the wait descriptor (SW=1), hardware performs a coherent write of 
> the status data to the status address.
> • If an interrupt is requested in the wait descriptor (IF=1), hardware sets the IWC field in the 
> Invalidation Completion Status Register. An invalidation completion interrupt may be generated as 
> described in the following section
> 

Yes, and the FreeBSD DMAR code uses that, and relies on that, as was
mentioned earlier in the mail thread.

> 
> 
> >I think this "if branch" can be moved just after the inv_desc non-zero
> >reserved bit checking. Hence you don't need a ret at all. :)
> 
> We want to return false if the memory write fails, and the interrupt has to happen afterwards.
> 
> > btw. I'm
> >also asking if VT-d spec allows it or not. So let's wait for a while..
> 
> Ok.
> 
> 

Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by CLEMENT MATHIEU--DRIF 5 months ago

On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
> 
> 
> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote:
>>> Hi David,
>>>
>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>
>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d
>>>> spec even talks about the ordering (the status write should be done
>>>> first, unsurprisingly).

Are you talking about the ordering constraint mentioned in bullet 
"Page-request Drain (PD)"?

>>>
>>> interesting. Have you tried setting both flags on baremetal and the hw
>>> gives you both the status code and an interrupt?
>>
>> I see no reason why it shouldn't. The spec (§6.5.2.8) gives no that the IF and SW bits should be mutually exclusive and even talks about ordering:
>>
>> Section 6.5.2.11 describes queued invalidation ordering considerations. Hardware completes an
>> invalidation wait command as follows:
>> • If a status write is specified in the wait descriptor (SW=1), hardware performs a coherent write of
>> the status data to the status address.
>> • If an interrupt is requested in the wait descriptor (IF=1), hardware sets the IWC field in the
>> Invalidation Completion Status Register. An invalidation completion interrupt may be generated as
>> described in the following section
>>
> 
> Yes, and the FreeBSD DMAR code uses that, and relies on that, as was
> mentioned earlier in the mail thread.
> 
>>
>>
>>> I think this "if branch" can be moved just after the inv_desc non-zero
>>> reserved bit checking. Hence you don't need a ret at all. :)
>>
>> We want to return false if the memory write fails, and the interrupt has to happen afterwards.

Per spec: "Hardware behavior is undefined if the Status Address 
specified is not an address route-able to memory"

Do we want to trigger the interrupt even when the DMA fails?

>>
>>> btw. I'm
>>> also asking if VT-d spec allows it or not. So let's wait for a while..

Thanks Yi

>>
>> Ok.
>>
>>
Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by Yi Liu 5 months ago
On 2025/7/15 14:11, CLEMENT MATHIEU--DRIF wrote:
> 
> 
> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
>> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>>
>>
>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote:
>>>> Hi David,
>>>>
>>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>
>>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d
>>>>> spec even talks about the ordering (the status write should be done
>>>>> first, unsurprisingly).
> 
> Are you talking about the ordering constraint mentioned in bullet
> "Page-request Drain (PD)"?

David is talking about the IF and SW flags. And he is correct. Spec has 
below sentence. It means a wait descriptor can have both IF and SW set
and indeed completion interrupt happens later than status write.  Let's
go on refine the patch. :)

"The invalidation completion event interrupt must push any in-flight 
invalidation completion status
writes, including status writes that may have originated from the same 
inv_wait_dsc for which the interrupt was generated."


Regards,
Yi Liu
Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by David Woodhouse 4 months, 3 weeks ago
On Tue, 2025-07-15 at 20:35 +0800, Yi Liu wrote:
> 
> David is talking about the IF and SW flags. And he is correct. Spec has 
> below sentence. It means a wait descriptor can have both IF and SW set
> and indeed completion interrupt happens later than status write.  Let's
> go on refine the patch. :)

Are there any refinements you believe are necessary? 

https://gitlab.com/dwmw2/qemu/-/pipelines/1941324674 looks like it's
going to be clean, and I think we resolved everything we talked about.
RE: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by Liu, Yi L 4 months, 2 weeks ago
> From: David Woodhouse <dwmw2@infradead.org>
> Sent: Tuesday, July 22, 2025 8:04 PM
> 
> On Tue, 2025-07-15 at 20:35 +0800, Yi Liu wrote:
> >
> > David is talking about the IF and SW flags. And he is correct. Spec has
> > below sentence. It means a wait descriptor can have both IF and SW set
> > and indeed completion interrupt happens later than status write.  Let's
> > go on refine the patch. :)
> 
> Are there any refinements you believe are necessary?

No extra change I suppose. I was just waiting for a respond from VT-d arch. I've now
got a clear answer. It's ok to generate the interrupt no matter the status write failed
or not if IF is set. Sorry for the delayed respond.

> https://gitlab.com/dwmw2/qemu/-/pipelines/1941324674 looks like it's
> going to be clean, and I think we resolved everything we talked about.

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

Thanks,
Yi Liu
Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by CLEMENT MATHIEU--DRIF 5 months ago

On 15/07/2025 2:35 pm, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links, unless 
> this email comes from a known sender and you know the content is safe.
> 
> 
> On 2025/7/15 14:11, CLEMENT MATHIEU--DRIF wrote:
>>
>>
>> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
>>> Caution: External email. Do not open attachments or click links, 
>>> unless this email comes from a known sender and you know the content 
>>> is safe.
>>>
>>>
>>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote:
>>>>> Hi David,
>>>>>
>>>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>>
>>>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d
>>>>>> spec even talks about the ordering (the status write should be done
>>>>>> first, unsurprisingly).
>>
>> Are you talking about the ordering constraint mentioned in bullet
>> "Page-request Drain (PD)"?
> 
> David is talking about the IF and SW flags. And he is correct. Spec has
> below sentence. It means a wait descriptor can have both IF and SW set
> and indeed completion interrupt happens later than status write.  Let's
> go on refine the patch. :)
> 
> "The invalidation completion event interrupt must push any in-flight
> invalidation completion status
> writes, including status writes that may have originated from the same
> inv_wait_dsc for which the interrupt was generated."

Yep, I agree

> 
> 
> Regards,
> Yi Liu
Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by David Woodhouse 5 months ago
On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
> 
> 
> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
> > Caution: External email. Do not open attachments or click links,
> > unless this email comes from a known sender and you know the
> > content is safe.
> > 
> > 
> > On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
> > > On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
> > > wrote:
> > > > Hi David,
> > > > 
> > > > On 2025/7/14 16:00, David Woodhouse wrote:
> > > > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > > > 
> > > > > FreeBSD does both, and this appears to be perfectly valid. The VT-d
> > > > > spec even talks about the ordering (the status write should be done
> > > > > first, unsurprisingly).
> 
> Are you talking about the ordering constraint mentioned in bullet 
> "Page-request Drain (PD)"?
> 

No, in the v4.0 spec it's just below that bullet list, at the bottom of
§6.5.2.8.

The wording in §6.5.2.9 is even clearer:

"The invalidation completion event interrupt must push any in-flight
invalidation completion status writes, including status writes that may
have originated from the same inv_wait_dsc for which the interrupt was
generated."

> > > > I think this "if branch" can be moved just after the inv_desc non-zero
> > > > reserved bit checking. Hence you don't need a ret at all. :)
> > > 
> > > We want to return false if the memory write fails, and the
> > > interrupt has to happen afterwards.
> 
> Per spec: "Hardware behavior is undefined if the Status Address 
> specified is not an address route-able to memory"
> 
> Do we want to trigger the interrupt even when the DMA fails?

Yes, we do. That's a quality of implementation issue. Just because the
behaviour is 'undefined' and theoretically gives us permission to do
whatever we like to the guest, we should still be as sensible as
possible.


> > > 
> > > > btw. I'm
> > > > also asking if VT-d spec allows it or not. So let's wait for a
> > > > while..

Rapidly losing interest in this conversation. QEMU currently has a
guest-triggerable crash, for crying out loud. I'd like to fix it and
move on to finding out why FreeBSD doesn't work *even* when QEMU
doesn't abort...
Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by CLEMENT MATHIEU--DRIF 5 months ago

On 15/07/2025 10:27 am, David Woodhouse wrote:
> On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
>>
>>
>> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
>>> Caution: External email. Do not open attachments or click links,
>>> unless this email comes from a known sender and you know the
>>> content is safe.
>>>
>>>
>>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
>>>> wrote:
>>>>> Hi David,
>>>>>
>>>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>>
>>>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d
>>>>>> spec even talks about the ordering (the status write should be done
>>>>>> first, unsurprisingly).
>>
>> Are you talking about the ordering constraint mentioned in bullet
>> "Page-request Drain (PD)"?
>>
> 
> No, in the v4.0 spec it's just below that bullet list, at the bottom of
> §6.5.2.8.
> 
> The wording in §6.5.2.9 is even clearer:
> 
> "The invalidation completion event interrupt must push any in-flight
> invalidation completion status writes, including status writes that may
> have originated from the same inv_wait_dsc for which the interrupt was
> generated."

Fine

> 
>>>>> I think this "if branch" can be moved just after the inv_desc non-zero
>>>>> reserved bit checking. Hence you don't need a ret at all. :)
>>>>
>>>> We want to return false if the memory write fails, and the
>>>> interrupt has to happen afterwards.
>>
>> Per spec: "Hardware behavior is undefined if the Status Address
>> specified is not an address route-able to memory"
>>
>> Do we want to trigger the interrupt even when the DMA fails?
> 
> Yes, we do. That's a quality of implementation issue. Just because the
> behaviour is 'undefined' and theoretically gives us permission to do
> whatever we like to the guest, we should still be as sensible as
> possible.

lgtm

> 
> 
>>>>
>>>>> btw. I'm
>>>>> also asking if VT-d spec allows it or not. So let's wait for a
>>>>> while..
> 
> Rapidly losing interest in this conversation. QEMU currently has a
> guest-triggerable crash, for crying out loud. I'd like to fix it and
> move on to finding out why FreeBSD doesn't work *even* when QEMU
> doesn't abort...
Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by Yi Liu 5 months ago
On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote:
> 
> 
> On 15/07/2025 10:27 am, David Woodhouse wrote:
>> On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
>>>
>>>
>>> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
>>>>
>>>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>>>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
>>>>> wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>>>
>>
>>>>>> I think this "if branch" can be moved just after the inv_desc non-zero
>>>>>> reserved bit checking. Hence you don't need a ret at all. :)
>>>>>
>>>>> We want to return false if the memory write fails, and the
>>>>> interrupt has to happen afterwards.
>>>
>>> Per spec: "Hardware behavior is undefined if the Status Address
>>> specified is not an address route-able to memory"
>>>
>>> Do we want to trigger the interrupt even when the DMA fails?
>>
>> Yes, we do. That's a quality of implementation issue. Just because the
>> behaviour is 'undefined' and theoretically gives us permission to do
>> whatever we like to the guest, we should still be as sensible as
>> possible.
> 

Personally, I'm fine with generating the interrupt even the status write
failed. But to avoid potential conflict, I've also raised this question to
the VT-d spec owner. Haven't got a clear answer yet. To further understand
this, may I ask some dumb questions here. Why FreeBSD set both SW and IF
flag. What is the usage model here. Would software consider that all the QI
descriptors prior to this specific wait descriptor has succeeded when
either the interrupt got invoked or the expected status is written back?

Regards,
Yi Liu
Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by Konstantin Belousov 5 months ago
On Wed, Jul 16, 2025 at 12:01:44PM +0800, Yi Liu wrote:
> On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote:
> > 
> > 
> > On 15/07/2025 10:27 am, David Woodhouse wrote:
> > > On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
> > > > 
> > > > 
> > > > On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
> > > > > 
> > > > > On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
> > > > > > On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
> > > > > > wrote:
> > > > > > > Hi David,
> > > > > > > 
> > > > > > > On 2025/7/14 16:00, David Woodhouse wrote:
> > > > > > > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > > > > > > 
> > > 
> > > > > > > I think this "if branch" can be moved just after the inv_desc non-zero
> > > > > > > reserved bit checking. Hence you don't need a ret at all. :)
> > > > > > 
> > > > > > We want to return false if the memory write fails, and the
> > > > > > interrupt has to happen afterwards.
> > > > 
> > > > Per spec: "Hardware behavior is undefined if the Status Address
> > > > specified is not an address route-able to memory"
> > > > 
> > > > Do we want to trigger the interrupt even when the DMA fails?
> > > 
> > > Yes, we do. That's a quality of implementation issue. Just because the
> > > behaviour is 'undefined' and theoretically gives us permission to do
> > > whatever we like to the guest, we should still be as sensible as
> > > possible.
> > 
> 
> Personally, I'm fine with generating the interrupt even the status write
> failed. But to avoid potential conflict, I've also raised this question to
> the VT-d spec owner. Haven't got a clear answer yet. To further understand
> this, may I ask some dumb questions here. Why FreeBSD set both SW and IF
> flag. What is the usage model here. Would software consider that all the QI
> descriptors prior to this specific wait descriptor has succeeded when
> either the interrupt got invoked or the expected status is written back?

FreeBSD queues invalidations, each invalidation has the gen number. To
know that some invalidation finished, FreeBSD waits for the interrupt,
we do not scan the invalidation sequence word otherwise. There might be
further generations of the invalidation descriptors in flight when we
get the interrupt, which means that we need to know which generation is
finished.
Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by Yi Liu 5 months ago
On 2025/7/16 12:05, Konstantin Belousov wrote:
> On Wed, Jul 16, 2025 at 12:01:44PM +0800, Yi Liu wrote:
>> On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote:
>>>
>>>
>>> On 15/07/2025 10:27 am, David Woodhouse wrote:
>>>> On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
>>>>>
>>>>>
>>>>> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
>>>>>>
>>>>>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>>>>>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
>>>>>>> wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>>>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>>>>>
>>>>
>>>>>>>> I think this "if branch" can be moved just after the inv_desc non-zero
>>>>>>>> reserved bit checking. Hence you don't need a ret at all. :)
>>>>>>>
>>>>>>> We want to return false if the memory write fails, and the
>>>>>>> interrupt has to happen afterwards.
>>>>>
>>>>> Per spec: "Hardware behavior is undefined if the Status Address
>>>>> specified is not an address route-able to memory"
>>>>>
>>>>> Do we want to trigger the interrupt even when the DMA fails?
>>>>
>>>> Yes, we do. That's a quality of implementation issue. Just because the
>>>> behaviour is 'undefined' and theoretically gives us permission to do
>>>> whatever we like to the guest, we should still be as sensible as
>>>> possible.
>>>
>>
>> Personally, I'm fine with generating the interrupt even the status write
>> failed. But to avoid potential conflict, I've also raised this question to
>> the VT-d spec owner. Haven't got a clear answer yet. To further understand
>> this, may I ask some dumb questions here. Why FreeBSD set both SW and IF
>> flag. What is the usage model here. Would software consider that all the QI
>> descriptors prior to this specific wait descriptor has succeeded when
>> either the interrupt got invoked or the expected status is written back?
> 
> FreeBSD queues invalidations, each invalidation has the gen number. To
> know that some invalidation finished, FreeBSD waits for the interrupt,
> we do not scan the invalidation sequence word otherwise. There might be
> further generations of the invalidation descriptors in flight when we
> get the interrupt, which means that we need to know which generation is
> finished.

thanks for the explanation. So software still relies on checking the
written back status of the wait descriptor to identify finished
invalidation. If so might be better to generate interrupt when status write
is succeeded? Otherwise, the interrupt is meaningless to software. Does the
current software implementation rely on this interrupt even status write
failed?

Regards,
Yi Liu
Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
Posted by Konstantin Belousov via 5 months ago
On Wed, Jul 16, 2025 at 05:23:57PM +0800, Yi Liu wrote:
> On 2025/7/16 12:05, Konstantin Belousov wrote:
> > On Wed, Jul 16, 2025 at 12:01:44PM +0800, Yi Liu wrote:
> > > On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote:
> > > > 
> > > > 
> > > > On 15/07/2025 10:27 am, David Woodhouse wrote:
> > > > > On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
> > > > > > 
> > > > > > 
> > > > > > On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
> > > > > > > 
> > > > > > > On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
> > > > > > > > On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
> > > > > > > > wrote:
> > > > > > > > > Hi David,
> > > > > > > > > 
> > > > > > > > > On 2025/7/14 16:00, David Woodhouse wrote:
> > > > > > > > > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > > > > > > > > 
> > > > > 
> > > > > > > > > I think this "if branch" can be moved just after the inv_desc non-zero
> > > > > > > > > reserved bit checking. Hence you don't need a ret at all. :)
> > > > > > > > 
> > > > > > > > We want to return false if the memory write fails, and the
> > > > > > > > interrupt has to happen afterwards.
> > > > > > 
> > > > > > Per spec: "Hardware behavior is undefined if the Status Address
> > > > > > specified is not an address route-able to memory"
> > > > > > 
> > > > > > Do we want to trigger the interrupt even when the DMA fails?
> > > > > 
> > > > > Yes, we do. That's a quality of implementation issue. Just because the
> > > > > behaviour is 'undefined' and theoretically gives us permission to do
> > > > > whatever we like to the guest, we should still be as sensible as
> > > > > possible.
> > > > 
> > > 
> > > Personally, I'm fine with generating the interrupt even the status write
> > > failed. But to avoid potential conflict, I've also raised this question to
> > > the VT-d spec owner. Haven't got a clear answer yet. To further understand
> > > this, may I ask some dumb questions here. Why FreeBSD set both SW and IF
> > > flag. What is the usage model here. Would software consider that all the QI
> > > descriptors prior to this specific wait descriptor has succeeded when
> > > either the interrupt got invoked or the expected status is written back?
> > 
> > FreeBSD queues invalidations, each invalidation has the gen number. To
> > know that some invalidation finished, FreeBSD waits for the interrupt,
> > we do not scan the invalidation sequence word otherwise. There might be
> > further generations of the invalidation descriptors in flight when we
> > get the interrupt, which means that we need to know which generation is
> > finished.
> 
> thanks for the explanation. So software still relies on checking the
> written back status of the wait descriptor to identify finished
> invalidation. If so might be better to generate interrupt when status write
> is succeeded? Otherwise, the interrupt is meaningless to software. Does the
> current software implementation rely on this interrupt even status write
> failed?

Yes, if the inv sequence word is not updated past the requested value,
the interrupt is basically ignored.

OTOH, the FreeBSD DMAR driver (should) never program non-main memory
physical address into the address field of the inv descriptor. And we
use the same location for all invalidations issued from the single DMAR
unit.