[PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output

Jahan Murudi posted 1 patch 4 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20250620103123.2174529-1-jahan.murudi.zg@renesas.com
xen/drivers/passthrough/arm/ipmmu-vmsa.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
[PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output
Posted by Jahan Murudi 4 months, 1 week ago
- Fix typo in source comment ("you can found" -> "which can be found").
- Add dsb(sy) after IMCTR write to ensure flush is complete before polling.
- Add dev_info() log in ipmmu_device_reset() to indicate the number of disabled contexts.

These changes improve memory operation ordering, code readability, and runtime traceability
for IPMMU on R-Car Gen3/Gen4 SoCs

Signed-off-by: Jahan Murudi <jahan.murudi.zg@renesas.com>
---
 xen/drivers/passthrough/arm/ipmmu-vmsa.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
index d828d9cf6a..dac0dd6d46 100644
--- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
+++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
@@ -13,7 +13,7 @@
  *
  * Based on Linux's IPMMU-VMSA driver from Renesas BSP:
  *    drivers/iommu/ipmmu-vmsa.c
- * you can found at:
+ * which can be found at:
  *    url: git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git
  *    branch: v4.14.75-ltsi/rcar-3.9.6
  *    commit: e206eb5b81a60e64c35fbc3a999b1a0db2b98044
@@ -433,6 +433,8 @@ static void ipmmu_tlb_invalidate(struct ipmmu_vmsa_domain *domain)
     data |= IMCTR_FLUSH;
     ipmmu_ctx_write_all(domain, IMCTR, data);
 
+    /* Force IMCTR write to complete before polling to avoid false completion check. */
+    dsb(sy);
     ipmmu_tlb_sync(domain);
 }
 
@@ -780,6 +782,8 @@ static void ipmmu_device_reset(struct ipmmu_vmsa_device *mmu)
     /* Disable all contexts. */
     for ( i = 0; i < mmu->num_ctx; ++i )
         ipmmu_ctx_write(mmu, i, IMCTR, 0);
+
+    dev_info(mmu->dev, "Reset completed, disabled %u contexts\n", mmu->num_ctx);
 }
 
 /* R-Car Gen3/Gen4 SoCs product and cut information. */
-- 
2.34.1
Re: [PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output
Posted by Orzel, Michal 4 months, 1 week ago

On 20/06/2025 12:31, Jahan Murudi wrote:
> - Fix typo in source comment ("you can found" -> "which can be found").
> - Add dsb(sy) after IMCTR write to ensure flush is complete before polling.
> - Add dev_info() log in ipmmu_device_reset() to indicate the number of disabled contexts.
> 
> These changes improve memory operation ordering, code readability, and runtime traceability
> for IPMMU on R-Car Gen3/Gen4 SoCs
> 
> Signed-off-by: Jahan Murudi <jahan.murudi.zg@renesas.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>

> ---
>  xen/drivers/passthrough/arm/ipmmu-vmsa.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> index d828d9cf6a..dac0dd6d46 100644
> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> @@ -13,7 +13,7 @@
>   *
>   * Based on Linux's IPMMU-VMSA driver from Renesas BSP:
>   *    drivers/iommu/ipmmu-vmsa.c
> - * you can found at:
> + * which can be found at:
>   *    url: git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git
>   *    branch: v4.14.75-ltsi/rcar-3.9.6
>   *    commit: e206eb5b81a60e64c35fbc3a999b1a0db2b98044
> @@ -433,6 +433,8 @@ static void ipmmu_tlb_invalidate(struct ipmmu_vmsa_domain *domain)
>      data |= IMCTR_FLUSH;
>      ipmmu_ctx_write_all(domain, IMCTR, data);
>  
> +    /* Force IMCTR write to complete before polling to avoid false completion check. */
> +    dsb(sy);
Any clue why Linux (mainline) does not do that?

~Michal
Re: [PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output
Posted by Julien Grall 3 months, 2 weeks ago
Hi,

On 23/06/2025 08:25, Orzel, Michal wrote:
> 
> 
> On 20/06/2025 12:31, Jahan Murudi wrote:
>> - Fix typo in source comment ("you can found" -> "which can be found").
>> - Add dsb(sy) after IMCTR write to ensure flush is complete before polling.
>> - Add dev_info() log in ipmmu_device_reset() to indicate the number of disabled contexts.
>>
>> These changes improve memory operation ordering, code readability, and runtime traceability
>> for IPMMU on R-Car Gen3/Gen4 SoCs
>>
>> Signed-off-by: Jahan Murudi <jahan.murudi.zg@renesas.com>
> Acked-by: Michal Orzel <michal.orzel@amd.com>

This is now committed.

Cheers,

-- 
Julien Grall
RE: [PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output
Posted by Jahan Murudi 4 months, 1 week ago
Hi Michal,

Thank you for your review and the Ack.

>> +    dsb(sy);
> Any clue why Linux (mainline) does not do that?

The implementation writel() which contains an implicit dsb(st) which likely sufficient for Linux for its Stage-1 IOMMU usage where CPU and IOMMU interactions are coherent. 
However, Xen uses the IPMMU as a Stage-2 IOMMU for non-coherent DMA operations (such as PCIe passthrough), requiring the stronger dsb(sy) to ensure writes fully propagate to the IPMMU hardware before continuing.

Regards,
Jahan 

-----Original Message-----
From: Orzel, Michal <michal.orzel@amd.com> 
Sent: 23 June 2025 12:56
To: Jahan Murudi <jahan.murudi.zg@renesas.com>; xen-devel@lists.xenproject.org
Cc: Stefano Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Bertrand Marquis <bertrand.marquis@arm.com>; Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Subject: Re: [PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output



On 20/06/2025 12:31, Jahan Murudi wrote:
> - Fix typo in source comment ("you can found" -> "which can be found").
> - Add dsb(sy) after IMCTR write to ensure flush is complete before polling.
> - Add dev_info() log in ipmmu_device_reset() to indicate the number of disabled contexts.
> 
> These changes improve memory operation ordering, code readability, and 
> runtime traceability for IPMMU on R-Car Gen3/Gen4 SoCs
> 
> Signed-off-by: Jahan Murudi <jahan.murudi.zg@renesas.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>

> ---
>  xen/drivers/passthrough/arm/ipmmu-vmsa.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c 
> b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> index d828d9cf6a..dac0dd6d46 100644
> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> @@ -13,7 +13,7 @@
>   *
>   * Based on Linux's IPMMU-VMSA driver from Renesas BSP:
>   *    drivers/iommu/ipmmu-vmsa.c
> - * you can found at:
> + * which can be found at:
>   *    url: git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git
>   *    branch: v4.14.75-ltsi/rcar-3.9.6
>   *    commit: e206eb5b81a60e64c35fbc3a999b1a0db2b98044
> @@ -433,6 +433,8 @@ static void ipmmu_tlb_invalidate(struct ipmmu_vmsa_domain *domain)
>      data |= IMCTR_FLUSH;
>      ipmmu_ctx_write_all(domain, IMCTR, data);
>  
> +    /* Force IMCTR write to complete before polling to avoid false completion check. */
> +    dsb(sy);
Any clue why Linux (mainline) does not do that?

~Michal

Re: [PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output
Posted by Julien Grall 4 months, 1 week ago

On 25/06/2025 11:28, Jahan Murudi wrote:
> Hi Michal,

Hi Jahan,

> 
> Thank you for your review and the Ack.
> 
>>> +    dsb(sy);
>> Any clue why Linux (mainline) does not do that?

One process remark, we typically comment inline rather than pasting a 
quote and replying at the top of the e-mail.

> 
> The implementation writel() which contains an implicit dsb(st) which likely sufficient for Linux for its Stage-1 IOMMU usage where CPU and IOMMU interactions are coherent.
> However, Xen uses the IPMMU as a Stage-2 IOMMU for non-coherent DMA operations (such as PCIe passthrough), requiring the stronger dsb(sy) to ensure writes fully propagate to the IPMMU hardware before continuing.

I don't follow. Are you saying the IPMMU driver in Linux doesn't 
non-coherent DMA operations?

But even if that's the case, I still don't see why non-coherent DMA 
would matter. From my understanding, here we want to make sure the TLB 
walker sees the change before the flush.

So if the TLB walker is coherent with the rest of the system. Then it 
would be similar to the CPU TLBs where we only need a "dsb st" (well we 
use "nshst" because the TLB is in non-shareable domain).

If the walker is not coherent, then that's a different topic.

Anyway, I am not against using "dsb(sy)". It is stronger than necessary 
but also probably not a massive deal in the TLB flush path.

Cheers,

-- 
Julien Grall
RE: [PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output
Posted by Jahan Murudi 4 months ago
Hi Julien,

On 25/06/2025 16:53, Julien Grall wrote:

>Hi Jahan,

>>>> +    dsb(sy);
>>> Any clue why Linux (mainline) does not do that?

> One process remark, we typically comment inline rather than pasting a quote and replying at the top of the e-mail.

 Thanks for the style note - I'll follow the inline commenting convention moving forward.

>> The implementation writel() which contains an implicit dsb(st) which likely sufficient for Linux for its Stage-1 IOMMU usage where CPU and IOMMU interactions are coherent.
>> However, Xen uses the IPMMU as a Stage-2 IOMMU for non-coherent DMA operations (such as PCIe passthrough), requiring the stronger dsb(sy) to ensure writes fully propagate to the IPMMU >>hardware before continuing.

> I don't follow. Are you saying the IPMMU driver in Linux doesn't non-coherent DMA operations?

Let me clarify my understanding:  In native Linux, the IOMMU works at stage-1 (VA -> PA) and typically assumes coherency between CPU and IOMMU. The implicit dsb(st) in writel() is enough there. But in Xen, we use this as stage-2 (GPA -> HPA) for cases like PCI passthrough where devices might be non-coherent. We might need stronger barrier dsb(sy) in xen because: 1) We can't assume the TLB walker is coherent for stage -2 and we must also prevent(minimise) any DMA operations during TLB invalidation( observed some IPMMU hardware limitations in the documentation) .

> But even if that's the case, I still don't see why non-coherent DMA would matter. From my understanding, here we want to make sure the TLB walker sees the change before the flush.
> So if the TLB walker is coherent with the rest of the system. Then it would be similar to the CPU TLBs where we only need a "dsb st" (well we use "nshst" because the TLB is in non-shareable domain).
> If the walker is not coherent, then that's a different topic.

You're correct that dsb(st) would suffice in an ideal coherent system. However, with PCI passthrough we must handle non-coherent devices. While dsb(st) ensures writes complete, dsb(sy) provides the stronger system-wide visibility we need - guaranteeing all components (including non-coherent devices) see the changes before proceeding.

> Anyway, I am not against using "dsb(sy)". It is stronger than necessary but also probably not a massive deal in the TLB flush path.

Thank you. I agree the performance impact is negligible in the flush path, and it's better to be safe when dealing with passthrough devices in xen.

Regards,
Jahan Murudi
Re: [PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output
Posted by Julien Grall 4 months ago

On 30/06/2025 07:37, Jahan Murudi wrote:
> Hi Julien,
> 
> On 25/06/2025 16:53, Julien Grall wrote:
> 
>> Hi Jahan,
> 
>>>>> +    dsb(sy);
>>>> Any clue why Linux (mainline) does not do that?
> 
>> One process remark, we typically comment inline rather than pasting a quote and replying at the top of the e-mail.
> 
>   Thanks for the style note - I'll follow the inline commenting convention moving forward.
> 
>>> The implementation writel() which contains an implicit dsb(st) which likely sufficient for Linux for its Stage-1 IOMMU usage where CPU and IOMMU interactions are coherent.
>>> However, Xen uses the IPMMU as a Stage-2 IOMMU for non-coherent DMA operations (such as PCIe passthrough), requiring the stronger dsb(sy) to ensure writes fully propagate to the IPMMU >>hardware before continuing.
> 
>> I don't follow. Are you saying the IPMMU driver in Linux doesn't non-coherent DMA operations?
> 
> Let me clarify my understanding:  In native Linux, the IOMMU works at stage-1 (VA -> PA) and typically assumes coherency between CPU and IOMMU. The implicit dsb(st) in writel() is enough there. But in Xen, we use this as stage-2 (GPA -> HPA) for cases like PCI passthrough where devices might be non-coherent. 


I understand for the PCI passthrough, Xen will be using stage-2, so in 
theory the stage-1 could be used by the guest OS. But ultimately, this 
is the same PCI device behind. So if it is not coherent, it should be 
for both stages. Do you have any pointer to the documentation that would 
state otherwise?

 > We might need stronger barrier dsb(sy) in xen because: 1) We can't 
assume the TLB walker is coherent for stage -2

Why would the TLB walker coherent for stage-2 but not stage-1? Any 
pointer to the documentation?

Note, I just noticed that IOMMU_FEAT_COHERENT_WALK is not set for the 
IPMMU. So the "dsb sy" is coherent. However, I find doubful an IOMMU 
would have a difference of coherency between two stages. So maybe we 
should set the flag either unconditionally or based on a register.

 > and we must also prevent(minimise) any DMA operations during TLB 
invalidation( observed some IPMMU hardware limitations in the 
documentation) .

I don't understand what you wrote in parentheses. But isn't it what you 
wrote all true for stage-1?

Cheers,

-- 
Julien Grall
RE: [PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output
Posted by Jahan Murudi 3 months, 3 weeks ago
Hi Julien,

> On 30/06/2025 13:44, Julien Grall wrote:

>> On 25/06/2025 16:53, Julien Grall wrote:
>
>> Hi Jahan,
> 
>>>>>> +    dsb(sy);
>>>>> Any clue why Linux (mainline) does not do that?

> I understand for the PCI passthrough, Xen will be using stage-2, so in theory the stage-1 could be used by the guest OS. But ultimately, this is the same PCI device behind. So if it is not coherent, it should be for both stages. Do you have any pointer to the documentation that would state otherwise?

You're right - coherency characteristics are identical for both stages. My earlier understanding was incorrect.

> Note, I just noticed that IOMMU_FEAT_COHERENT_WALK is not set for the IPMMU. So the "dsb sy" is coherent. However, I find doubful an IOMMU would have a difference of coherency between two stages. So maybe we should set the flag either unconditionally or based on a register.

Excellent observation. Current R-car IPMMU doesn't supports coherent walks - we should indeed set this flag unconditionally.

 >> and we must also prevent(minimise) any DMA operations during TLB invalidation( observed some IPMMU hardware limitations in the
documentation) .

> I don't understand what you wrote in parentheses. But isn't it what you wrote all true for stage-1?

Correct – the hardware reference doc guidelines about minimizing DMA during flushes applies globally. This is true for both stage-1 and stage-2. 
Given that the patch has already been Acked by Michal, can we proceed with applying it?

Regards,
Jahan Murudi

Re: [PATCH] xen/arm: Enhance IPMMU-VMSA driver robustness and debug output
Posted by Julien Grall 3 months, 3 weeks ago
Hi Jahan,

On 07/07/2025 12:24, Jahan Murudi wrote:
>> On 30/06/2025 13:44, Julien Grall wrote:
> 
>>> On 25/06/2025 16:53, Julien Grall wrote:
>>
>>> Hi Jahan,
>>
>>>>>>> +    dsb(sy);
>>>>>> Any clue why Linux (mainline) does not do that?
> 
>> I understand for the PCI passthrough, Xen will be using stage-2, so in theory the stage-1 could be used by the guest OS. But ultimately, this is the same PCI device behind. So if it is not coherent, it should be for both stages. Do you have any pointer to the documentation that would state otherwise?
> 
> You're right - coherency characteristics are identical for both stages. My earlier understanding was incorrect.
> 
>> Note, I just noticed that IOMMU_FEAT_COHERENT_WALK is not set for the IPMMU. So the "dsb sy" is coherent. However, I find doubful an IOMMU would have a difference of coherency between two stages. So maybe we should set the flag either unconditionally or based on a register.
> 
> Excellent observation. Current R-car IPMMU doesn't supports coherent walks - we should indeed set this flag unconditionally.

To clarify, the R-car IPMMU doesn't support coherent walk, then the flag 
IOMMU_FEAT_COHERENT_WALK *must not* be set.

Cheers,

-- 
Julien Grall