hw/i386/intel_iommu.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
From: David Woodhouse <dwmw@amazon.co.uk>
FreeBSD does both, and this appears to be perfectly valid. The VT-d
spec even talks about the ordering (the status write should be done
first, unsurprisingly).
We certainly shouldn't assert() and abort QEMU if the guest asks for
both.
Fixes: ed7b8fbcfb88 ("intel-iommu: add supports for queued invalidation interface")
Closes: https://gitlab.com/qemu-project/qemu/-/issues/3028
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
v2:
• Only generate the interrupt once.
• Spaces around bitwise OR.
This stops QEMU crashing, but I still can't get FreeBSD to boot and use
CPUs with APIC ID > 255 using *either* Intel or AMD IOMMU with
interrupt remapping, or the native 15-bit APIC ID enlightenment.
cf. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288122
hw/i386/intel_iommu.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 69d72ad35c..851c4656c5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2822,6 +2822,7 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
{
uint64_t mask[4] = {VTD_INV_DESC_WAIT_RSVD_LO, VTD_INV_DESC_WAIT_RSVD_HI,
VTD_INV_DESC_ALL_ONE, VTD_INV_DESC_ALL_ONE};
+ bool ret = true;
if (!vtd_inv_desc_reserved_check(s, inv_desc, mask, false,
__func__, "wait")) {
@@ -2833,8 +2834,6 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
uint32_t status_data = (uint32_t)(inv_desc->lo >>
VTD_INV_DESC_WAIT_DATA_SHIFT);
- assert(!(inv_desc->lo & VTD_INV_DESC_WAIT_IF));
-
/* FIXME: need to be masked with HAW? */
dma_addr_t status_addr = inv_desc->hi;
trace_vtd_inv_desc_wait_sw(status_addr, status_data);
@@ -2843,18 +2842,22 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
&status_data, sizeof(status_data),
MEMTXATTRS_UNSPECIFIED)) {
trace_vtd_inv_desc_wait_write_fail(inv_desc->hi, inv_desc->lo);
- return false;
+ ret = false;
}
- } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
+ }
+
+ if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
/* Interrupt flag */
vtd_generate_completion_event(s);
- } else {
+ }
+
+ if (!(inv_desc->lo & (VTD_INV_DESC_WAIT_IF | VTD_INV_DESC_WAIT_SW))) {
error_report_once("%s: invalid wait desc: hi=%"PRIx64", lo=%"PRIx64
" (unknown type)", __func__, inv_desc->hi,
inv_desc->lo);
return false;
}
- return true;
+ return ret;
}
static bool vtd_process_context_cache_desc(IntelIOMMUState *s,
--
2.43.0
On 14.07.2025 11:00, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> FreeBSD does both, and this appears to be perfectly valid. The VT-d
> spec even talks about the ordering (the status write should be done
> first, unsurprisingly).
>
> We certainly shouldn't assert() and abort QEMU if the guest asks for
> both.
>
> Fixes: ed7b8fbcfb88 ("intel-iommu: add supports for queued invalidation interface")
> Closes: https://gitlab.com/qemu-project/qemu/-/issues/3028
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> v2:
> • Only generate the interrupt once.
> • Spaces around bitwise OR.
>
> This stops QEMU crashing, but I still can't get FreeBSD to boot and use
> CPUs with APIC ID > 255 using *either* Intel or AMD IOMMU with
> interrupt remapping, or the native 15-bit APIC ID enlightenment.
> cf. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288122
>
>
> hw/i386/intel_iommu.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
This looks like a qemu-stable material (for 10.0).
Please let me know if it is not.
Thanks,
/mjt
Hi David,
On 2025/7/14 16:00, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> FreeBSD does both, and this appears to be perfectly valid. The VT-d
> spec even talks about the ordering (the status write should be done
> first, unsurprisingly).
interesting. Have you tried setting both flags on baremetal and the hw
gives you both the status code and an interrupt?
> We certainly shouldn't assert() and abort QEMU if the guest asks for
> both.
>
> Fixes: ed7b8fbcfb88 ("intel-iommu: add supports for queued invalidation interface")
> Closes: https://gitlab.com/qemu-project/qemu/-/issues/3028
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> v2:
> • Only generate the interrupt once.
> • Spaces around bitwise OR.
>
> This stops QEMU crashing, but I still can't get FreeBSD to boot and use
> CPUs with APIC ID > 255 using *either* Intel or AMD IOMMU with
> interrupt remapping, or the native 15-bit APIC ID enlightenment.
> cf. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288122
>
>
> hw/i386/intel_iommu.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 69d72ad35c..851c4656c5 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2822,6 +2822,7 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
> {
> uint64_t mask[4] = {VTD_INV_DESC_WAIT_RSVD_LO, VTD_INV_DESC_WAIT_RSVD_HI,
> VTD_INV_DESC_ALL_ONE, VTD_INV_DESC_ALL_ONE};
> + bool ret = true;
>
> if (!vtd_inv_desc_reserved_check(s, inv_desc, mask, false,
> __func__, "wait")) {
> @@ -2833,8 +2834,6 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
> uint32_t status_data = (uint32_t)(inv_desc->lo >>
> VTD_INV_DESC_WAIT_DATA_SHIFT);
>
> - assert(!(inv_desc->lo & VTD_INV_DESC_WAIT_IF));
> -
> /* FIXME: need to be masked with HAW? */
> dma_addr_t status_addr = inv_desc->hi;
> trace_vtd_inv_desc_wait_sw(status_addr, status_data);
> @@ -2843,18 +2842,22 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
> &status_data, sizeof(status_data),
> MEMTXATTRS_UNSPECIFIED)) {
> trace_vtd_inv_desc_wait_write_fail(inv_desc->hi, inv_desc->lo);
> - return false;
> + ret = false;
> }
> - } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
> + }
> +
> + if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
> /* Interrupt flag */
> vtd_generate_completion_event(s);
> - } else {
> + }
> +
> + if (!(inv_desc->lo & (VTD_INV_DESC_WAIT_IF | VTD_INV_DESC_WAIT_SW))) {
> error_report_once("%s: invalid wait desc: hi=%"PRIx64", lo=%"PRIx64
> " (unknown type)", __func__, inv_desc->hi,
> inv_desc->lo);
> return false;
> }
I think this "if branch" can be moved just after the inv_desc non-zero
reserved bit checking. Hence you don't need a ret at all. :) btw. I'm
also asking if VT-d spec allows it or not. So let's wait for a while..
> - return true;
> + return ret;
> }
>
> static bool vtd_process_context_cache_desc(IntelIOMMUState *s,
--
Regards,
Yi Liu
On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote: >Hi David, > >On 2025/7/14 16:00, David Woodhouse wrote: >> From: David Woodhouse <dwmw@amazon.co.uk> >> >> FreeBSD does both, and this appears to be perfectly valid. The VT-d >> spec even talks about the ordering (the status write should be done >> first, unsurprisingly). > >interesting. Have you tried setting both flags on baremetal and the hw >gives you both the status code and an interrupt? I see no reason why it shouldn't. The spec (§6.5.2.8) gives no that the IF and SW bits should be mutually exclusive and even talks about ordering: Section 6.5.2.11 describes queued invalidation ordering considerations. Hardware completes an invalidation wait command as follows: • If a status write is specified in the wait descriptor (SW=1), hardware performs a coherent write of the status data to the status address. • If an interrupt is requested in the wait descriptor (IF=1), hardware sets the IWC field in the Invalidation Completion Status Register. An invalidation completion interrupt may be generated as described in the following section >I think this "if branch" can be moved just after the inv_desc non-zero >reserved bit checking. Hence you don't need a ret at all. :) We want to return false if the memory write fails, and the interrupt has to happen afterwards. > btw. I'm >also asking if VT-d spec allows it or not. So let's wait for a while.. Ok.
On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote: > On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote: > >Hi David, > > > >On 2025/7/14 16:00, David Woodhouse wrote: > >> From: David Woodhouse <dwmw@amazon.co.uk> > >> > >> FreeBSD does both, and this appears to be perfectly valid. The VT-d > >> spec even talks about the ordering (the status write should be done > >> first, unsurprisingly). > > > >interesting. Have you tried setting both flags on baremetal and the hw > >gives you both the status code and an interrupt? > > I see no reason why it shouldn't. The spec (§6.5.2.8) gives no that the IF and SW bits should be mutually exclusive and even talks about ordering: > > Section 6.5.2.11 describes queued invalidation ordering considerations. Hardware completes an > invalidation wait command as follows: > • If a status write is specified in the wait descriptor (SW=1), hardware performs a coherent write of > the status data to the status address. > • If an interrupt is requested in the wait descriptor (IF=1), hardware sets the IWC field in the > Invalidation Completion Status Register. An invalidation completion interrupt may be generated as > described in the following section > Yes, and the FreeBSD DMAR code uses that, and relies on that, as was mentioned earlier in the mail thread. > > > >I think this "if branch" can be moved just after the inv_desc non-zero > >reserved bit checking. Hence you don't need a ret at all. :) > > We want to return false if the memory write fails, and the interrupt has to happen afterwards. > > > btw. I'm > >also asking if VT-d spec allows it or not. So let's wait for a while.. > > Ok. > >
On 14/07/2025 11:22 pm, Konstantin Belousov wrote: > Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe. > > > On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote: >> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote: >>> Hi David, >>> >>> On 2025/7/14 16:00, David Woodhouse wrote: >>>> From: David Woodhouse <dwmw@amazon.co.uk> >>>> >>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d >>>> spec even talks about the ordering (the status write should be done >>>> first, unsurprisingly). Are you talking about the ordering constraint mentioned in bullet "Page-request Drain (PD)"? >>> >>> interesting. Have you tried setting both flags on baremetal and the hw >>> gives you both the status code and an interrupt? >> >> I see no reason why it shouldn't. The spec (§6.5.2.8) gives no that the IF and SW bits should be mutually exclusive and even talks about ordering: >> >> Section 6.5.2.11 describes queued invalidation ordering considerations. Hardware completes an >> invalidation wait command as follows: >> • If a status write is specified in the wait descriptor (SW=1), hardware performs a coherent write of >> the status data to the status address. >> • If an interrupt is requested in the wait descriptor (IF=1), hardware sets the IWC field in the >> Invalidation Completion Status Register. An invalidation completion interrupt may be generated as >> described in the following section >> > > Yes, and the FreeBSD DMAR code uses that, and relies on that, as was > mentioned earlier in the mail thread. > >> >> >>> I think this "if branch" can be moved just after the inv_desc non-zero >>> reserved bit checking. Hence you don't need a ret at all. :) >> >> We want to return false if the memory write fails, and the interrupt has to happen afterwards. Per spec: "Hardware behavior is undefined if the Status Address specified is not an address route-able to memory" Do we want to trigger the interrupt even when the DMA fails? >> >>> btw. I'm >>> also asking if VT-d spec allows it or not. So let's wait for a while.. Thanks Yi >> >> Ok. >> >>
On 2025/7/15 14:11, CLEMENT MATHIEU--DRIF wrote: > > > On 14/07/2025 11:22 pm, Konstantin Belousov wrote: >> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe. >> >> >> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote: >>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote: >>>> Hi David, >>>> >>>> On 2025/7/14 16:00, David Woodhouse wrote: >>>>> From: David Woodhouse <dwmw@amazon.co.uk> >>>>> >>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d >>>>> spec even talks about the ordering (the status write should be done >>>>> first, unsurprisingly). > > Are you talking about the ordering constraint mentioned in bullet > "Page-request Drain (PD)"? David is talking about the IF and SW flags. And he is correct. Spec has below sentence. It means a wait descriptor can have both IF and SW set and indeed completion interrupt happens later than status write. Let's go on refine the patch. :) "The invalidation completion event interrupt must push any in-flight invalidation completion status writes, including status writes that may have originated from the same inv_wait_dsc for which the interrupt was generated." Regards, Yi Liu
On Tue, 2025-07-15 at 20:35 +0800, Yi Liu wrote: > > David is talking about the IF and SW flags. And he is correct. Spec has > below sentence. It means a wait descriptor can have both IF and SW set > and indeed completion interrupt happens later than status write. Let's > go on refine the patch. :) Are there any refinements you believe are necessary? https://gitlab.com/dwmw2/qemu/-/pipelines/1941324674 looks like it's going to be clean, and I think we resolved everything we talked about.
> From: David Woodhouse <dwmw2@infradead.org> > Sent: Tuesday, July 22, 2025 8:04 PM > > On Tue, 2025-07-15 at 20:35 +0800, Yi Liu wrote: > > > > David is talking about the IF and SW flags. And he is correct. Spec has > > below sentence. It means a wait descriptor can have both IF and SW set > > and indeed completion interrupt happens later than status write. Let's > > go on refine the patch. :) > > Are there any refinements you believe are necessary? No extra change I suppose. I was just waiting for a respond from VT-d arch. I've now got a clear answer. It's ok to generate the interrupt no matter the status write failed or not if IF is set. Sorry for the delayed respond. > https://gitlab.com/dwmw2/qemu/-/pipelines/1941324674 looks like it's > going to be clean, and I think we resolved everything we talked about. Reviewed-by: Yi Liu <yi.l.liu@intel.com> Thanks, Yi Liu
On 15/07/2025 2:35 pm, Yi Liu wrote: > Caution: External email. Do not open attachments or click links, unless > this email comes from a known sender and you know the content is safe. > > > On 2025/7/15 14:11, CLEMENT MATHIEU--DRIF wrote: >> >> >> On 14/07/2025 11:22 pm, Konstantin Belousov wrote: >>> Caution: External email. Do not open attachments or click links, >>> unless this email comes from a known sender and you know the content >>> is safe. >>> >>> >>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote: >>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote: >>>>> Hi David, >>>>> >>>>> On 2025/7/14 16:00, David Woodhouse wrote: >>>>>> From: David Woodhouse <dwmw@amazon.co.uk> >>>>>> >>>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d >>>>>> spec even talks about the ordering (the status write should be done >>>>>> first, unsurprisingly). >> >> Are you talking about the ordering constraint mentioned in bullet >> "Page-request Drain (PD)"? > > David is talking about the IF and SW flags. And he is correct. Spec has > below sentence. It means a wait descriptor can have both IF and SW set > and indeed completion interrupt happens later than status write. Let's > go on refine the patch. :) > > "The invalidation completion event interrupt must push any in-flight > invalidation completion status > writes, including status writes that may have originated from the same > inv_wait_dsc for which the interrupt was generated." Yep, I agree > > > Regards, > Yi Liu
On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote: > > > On 14/07/2025 11:22 pm, Konstantin Belousov wrote: > > Caution: External email. Do not open attachments or click links, > > unless this email comes from a known sender and you know the > > content is safe. > > > > > > On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote: > > > On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> > > > wrote: > > > > Hi David, > > > > > > > > On 2025/7/14 16:00, David Woodhouse wrote: > > > > > From: David Woodhouse <dwmw@amazon.co.uk> > > > > > > > > > > FreeBSD does both, and this appears to be perfectly valid. The VT-d > > > > > spec even talks about the ordering (the status write should be done > > > > > first, unsurprisingly). > > Are you talking about the ordering constraint mentioned in bullet > "Page-request Drain (PD)"? > No, in the v4.0 spec it's just below that bullet list, at the bottom of §6.5.2.8. The wording in §6.5.2.9 is even clearer: "The invalidation completion event interrupt must push any in-flight invalidation completion status writes, including status writes that may have originated from the same inv_wait_dsc for which the interrupt was generated." > > > > I think this "if branch" can be moved just after the inv_desc non-zero > > > > reserved bit checking. Hence you don't need a ret at all. :) > > > > > > We want to return false if the memory write fails, and the > > > interrupt has to happen afterwards. > > Per spec: "Hardware behavior is undefined if the Status Address > specified is not an address route-able to memory" > > Do we want to trigger the interrupt even when the DMA fails? Yes, we do. That's a quality of implementation issue. Just because the behaviour is 'undefined' and theoretically gives us permission to do whatever we like to the guest, we should still be as sensible as possible. > > > > > > > btw. I'm > > > > also asking if VT-d spec allows it or not. So let's wait for a > > > > while.. Rapidly losing interest in this conversation. QEMU currently has a guest-triggerable crash, for crying out loud. I'd like to fix it and move on to finding out why FreeBSD doesn't work *even* when QEMU doesn't abort...
On 15/07/2025 10:27 am, David Woodhouse wrote: > On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote: >> >> >> On 14/07/2025 11:22 pm, Konstantin Belousov wrote: >>> Caution: External email. Do not open attachments or click links, >>> unless this email comes from a known sender and you know the >>> content is safe. >>> >>> >>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote: >>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> >>>> wrote: >>>>> Hi David, >>>>> >>>>> On 2025/7/14 16:00, David Woodhouse wrote: >>>>>> From: David Woodhouse <dwmw@amazon.co.uk> >>>>>> >>>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d >>>>>> spec even talks about the ordering (the status write should be done >>>>>> first, unsurprisingly). >> >> Are you talking about the ordering constraint mentioned in bullet >> "Page-request Drain (PD)"? >> > > No, in the v4.0 spec it's just below that bullet list, at the bottom of > §6.5.2.8. > > The wording in §6.5.2.9 is even clearer: > > "The invalidation completion event interrupt must push any in-flight > invalidation completion status writes, including status writes that may > have originated from the same inv_wait_dsc for which the interrupt was > generated." Fine > >>>>> I think this "if branch" can be moved just after the inv_desc non-zero >>>>> reserved bit checking. Hence you don't need a ret at all. :) >>>> >>>> We want to return false if the memory write fails, and the >>>> interrupt has to happen afterwards. >> >> Per spec: "Hardware behavior is undefined if the Status Address >> specified is not an address route-able to memory" >> >> Do we want to trigger the interrupt even when the DMA fails? > > Yes, we do. That's a quality of implementation issue. Just because the > behaviour is 'undefined' and theoretically gives us permission to do > whatever we like to the guest, we should still be as sensible as > possible. lgtm > > >>>> >>>>> btw. I'm >>>>> also asking if VT-d spec allows it or not. So let's wait for a >>>>> while.. > > Rapidly losing interest in this conversation. QEMU currently has a > guest-triggerable crash, for crying out loud. I'd like to fix it and > move on to finding out why FreeBSD doesn't work *even* when QEMU > doesn't abort...
On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote: > > > On 15/07/2025 10:27 am, David Woodhouse wrote: >> On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote: >>> >>> >>> On 14/07/2025 11:22 pm, Konstantin Belousov wrote: >>>> >>>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote: >>>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> >>>>> wrote: >>>>>> Hi David, >>>>>> >>>>>> On 2025/7/14 16:00, David Woodhouse wrote: >>>>>>> From: David Woodhouse <dwmw@amazon.co.uk> >>>>>>> >> >>>>>> I think this "if branch" can be moved just after the inv_desc non-zero >>>>>> reserved bit checking. Hence you don't need a ret at all. :) >>>>> >>>>> We want to return false if the memory write fails, and the >>>>> interrupt has to happen afterwards. >>> >>> Per spec: "Hardware behavior is undefined if the Status Address >>> specified is not an address route-able to memory" >>> >>> Do we want to trigger the interrupt even when the DMA fails? >> >> Yes, we do. That's a quality of implementation issue. Just because the >> behaviour is 'undefined' and theoretically gives us permission to do >> whatever we like to the guest, we should still be as sensible as >> possible. > Personally, I'm fine with generating the interrupt even the status write failed. But to avoid potential conflict, I've also raised this question to the VT-d spec owner. Haven't got a clear answer yet. To further understand this, may I ask some dumb questions here. Why FreeBSD set both SW and IF flag. What is the usage model here. Would software consider that all the QI descriptors prior to this specific wait descriptor has succeeded when either the interrupt got invoked or the expected status is written back? Regards, Yi Liu
On Wed, Jul 16, 2025 at 12:01:44PM +0800, Yi Liu wrote: > On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote: > > > > > > On 15/07/2025 10:27 am, David Woodhouse wrote: > > > On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote: > > > > > > > > > > > > On 14/07/2025 11:22 pm, Konstantin Belousov wrote: > > > > > > > > > > On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote: > > > > > > On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> > > > > > > wrote: > > > > > > > Hi David, > > > > > > > > > > > > > > On 2025/7/14 16:00, David Woodhouse wrote: > > > > > > > > From: David Woodhouse <dwmw@amazon.co.uk> > > > > > > > > > > > > > > > > > > I think this "if branch" can be moved just after the inv_desc non-zero > > > > > > > reserved bit checking. Hence you don't need a ret at all. :) > > > > > > > > > > > > We want to return false if the memory write fails, and the > > > > > > interrupt has to happen afterwards. > > > > > > > > Per spec: "Hardware behavior is undefined if the Status Address > > > > specified is not an address route-able to memory" > > > > > > > > Do we want to trigger the interrupt even when the DMA fails? > > > > > > Yes, we do. That's a quality of implementation issue. Just because the > > > behaviour is 'undefined' and theoretically gives us permission to do > > > whatever we like to the guest, we should still be as sensible as > > > possible. > > > > Personally, I'm fine with generating the interrupt even the status write > failed. But to avoid potential conflict, I've also raised this question to > the VT-d spec owner. Haven't got a clear answer yet. To further understand > this, may I ask some dumb questions here. Why FreeBSD set both SW and IF > flag. What is the usage model here. Would software consider that all the QI > descriptors prior to this specific wait descriptor has succeeded when > either the interrupt got invoked or the expected status is written back? FreeBSD queues invalidations, each invalidation has the gen number. To know that some invalidation finished, FreeBSD waits for the interrupt, we do not scan the invalidation sequence word otherwise. There might be further generations of the invalidation descriptors in flight when we get the interrupt, which means that we need to know which generation is finished.
On 2025/7/16 12:05, Konstantin Belousov wrote: > On Wed, Jul 16, 2025 at 12:01:44PM +0800, Yi Liu wrote: >> On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote: >>> >>> >>> On 15/07/2025 10:27 am, David Woodhouse wrote: >>>> On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote: >>>>> >>>>> >>>>> On 14/07/2025 11:22 pm, Konstantin Belousov wrote: >>>>>> >>>>>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote: >>>>>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> >>>>>>> wrote: >>>>>>>> Hi David, >>>>>>>> >>>>>>>> On 2025/7/14 16:00, David Woodhouse wrote: >>>>>>>>> From: David Woodhouse <dwmw@amazon.co.uk> >>>>>>>>> >>>> >>>>>>>> I think this "if branch" can be moved just after the inv_desc non-zero >>>>>>>> reserved bit checking. Hence you don't need a ret at all. :) >>>>>>> >>>>>>> We want to return false if the memory write fails, and the >>>>>>> interrupt has to happen afterwards. >>>>> >>>>> Per spec: "Hardware behavior is undefined if the Status Address >>>>> specified is not an address route-able to memory" >>>>> >>>>> Do we want to trigger the interrupt even when the DMA fails? >>>> >>>> Yes, we do. That's a quality of implementation issue. Just because the >>>> behaviour is 'undefined' and theoretically gives us permission to do >>>> whatever we like to the guest, we should still be as sensible as >>>> possible. >>> >> >> Personally, I'm fine with generating the interrupt even the status write >> failed. But to avoid potential conflict, I've also raised this question to >> the VT-d spec owner. Haven't got a clear answer yet. To further understand >> this, may I ask some dumb questions here. Why FreeBSD set both SW and IF >> flag. What is the usage model here. Would software consider that all the QI >> descriptors prior to this specific wait descriptor has succeeded when >> either the interrupt got invoked or the expected status is written back? > > FreeBSD queues invalidations, each invalidation has the gen number. To > know that some invalidation finished, FreeBSD waits for the interrupt, > we do not scan the invalidation sequence word otherwise. There might be > further generations of the invalidation descriptors in flight when we > get the interrupt, which means that we need to know which generation is > finished. thanks for the explanation. So software still relies on checking the written back status of the wait descriptor to identify finished invalidation. If so might be better to generate interrupt when status write is succeeded? Otherwise, the interrupt is meaningless to software. Does the current software implementation rely on this interrupt even status write failed? Regards, Yi Liu
On Wed, Jul 16, 2025 at 05:23:57PM +0800, Yi Liu wrote: > On 2025/7/16 12:05, Konstantin Belousov wrote: > > On Wed, Jul 16, 2025 at 12:01:44PM +0800, Yi Liu wrote: > > > On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote: > > > > > > > > > > > > On 15/07/2025 10:27 am, David Woodhouse wrote: > > > > > On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote: > > > > > > > > > > > > > > > > > > On 14/07/2025 11:22 pm, Konstantin Belousov wrote: > > > > > > > > > > > > > > On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote: > > > > > > > > On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> > > > > > > > > wrote: > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > On 2025/7/14 16:00, David Woodhouse wrote: > > > > > > > > > > From: David Woodhouse <dwmw@amazon.co.uk> > > > > > > > > > > > > > > > > > > > > > > > > I think this "if branch" can be moved just after the inv_desc non-zero > > > > > > > > > reserved bit checking. Hence you don't need a ret at all. :) > > > > > > > > > > > > > > > > We want to return false if the memory write fails, and the > > > > > > > > interrupt has to happen afterwards. > > > > > > > > > > > > Per spec: "Hardware behavior is undefined if the Status Address > > > > > > specified is not an address route-able to memory" > > > > > > > > > > > > Do we want to trigger the interrupt even when the DMA fails? > > > > > > > > > > Yes, we do. That's a quality of implementation issue. Just because the > > > > > behaviour is 'undefined' and theoretically gives us permission to do > > > > > whatever we like to the guest, we should still be as sensible as > > > > > possible. > > > > > > > > > > Personally, I'm fine with generating the interrupt even the status write > > > failed. But to avoid potential conflict, I've also raised this question to > > > the VT-d spec owner. Haven't got a clear answer yet. To further understand > > > this, may I ask some dumb questions here. Why FreeBSD set both SW and IF > > > flag. What is the usage model here. Would software consider that all the QI > > > descriptors prior to this specific wait descriptor has succeeded when > > > either the interrupt got invoked or the expected status is written back? > > > > FreeBSD queues invalidations, each invalidation has the gen number. To > > know that some invalidation finished, FreeBSD waits for the interrupt, > > we do not scan the invalidation sequence word otherwise. There might be > > further generations of the invalidation descriptors in flight when we > > get the interrupt, which means that we need to know which generation is > > finished. > > thanks for the explanation. So software still relies on checking the > written back status of the wait descriptor to identify finished > invalidation. If so might be better to generate interrupt when status write > is succeeded? Otherwise, the interrupt is meaningless to software. Does the > current software implementation rely on this interrupt even status write > failed? Yes, if the inv sequence word is not updated past the requested value, the interrupt is basically ignored. OTOH, the FreeBSD DMAR driver (should) never program non-main memory physical address into the address field of the inv descriptor. And we use the same location for all invalidations issued from the single DMAR unit.
© 2016 - 2025 Red Hat, Inc.