Set/Way flushes never work correctly in a virtualized environment.
Our current implementation is based on clearing the valid bit in the p2m
pagetable to track guest memory accesses. This technique doesn't work
when the IOMMU is enabled for the domain and the pagetable is shared
between IOMMU and MMU because it triggers IOMMU faults.
Specifically, p2m_invalidate_root causes IOMMU faults if
iommu_use_hap_pt returns true for the domain.
Add a check in vsysreg.c and vcpreg.c: if a set/way instruction is used
and iommu_use_hap_pt returns true, rather than failing with obscure
IOMMU faults, inject an undef exception straight away into the guest,
and print a verbose error message to explain the problem.
Also add an ASSERT in p2m_invalidate_root to make sure we don't
inadvertently stumble across this problem again in the future.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
This patch is an improvement over the IOMMU faults. I don't know if we
want to give users an option to get past these errors for development
and debugging.
We could add a Xen command line option to make Xen ignore Set/Way
instructions (do nothing on trap). Or we could add an option to avoid
trapping Set/Way instructions altogether (remove HCR_TSW).
Both workarounds are obviously not correct and could lead to memory
corruptions (the former) or bad interference between guests (the latter).
Either way, we can start with this patch.
diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index caf17174b8..125a9281fc 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -105,6 +105,13 @@ void do_sysreg(struct cpu_user_regs *regs,
case HSR_SYSREG_DCISW:
case HSR_SYSREG_DCCSW:
case HSR_SYSREG_DCCISW:
+ if ( iommu_use_hap_pt(current->domain) )
+ {
+ gdprintk(XENLOG_ERR,
+ "d%u uses set/way cache flushes with the IOMMU on. It cannot work. Replace set/way instructions with dc [ci]vac and retry. Injecting exception into the guest now.\n",
+ current->domain->domain_id);
+ return inject_undef_exception(regs, hsr);
+ }
if ( !hsr.sysreg.read )
p2m_set_way_flush(current);
break;
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index d414c4feb9..240913d5ac 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1171,6 +1171,9 @@ void p2m_invalidate_root(struct p2m_domain *p2m)
{
unsigned int i;
+ /* Clearing the valid bit causes IOMMU faults. */
+ ASSERT(!iommu_use_hap_pt(p2m->domain));
+
p2m_write_lock(p2m);
for ( i = 0; i < P2M_ROOT_LEVEL; i++ )
diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
index e3ce56d875..04b68f6901 100644
--- a/xen/arch/arm/vcpreg.c
+++ b/xen/arch/arm/vcpreg.c
@@ -231,6 +231,13 @@ void do_cp15_32(struct cpu_user_regs *regs, const union hsr hsr)
case HSR_CPREG32(DCISW):
case HSR_CPREG32(DCCSW):
case HSR_CPREG32(DCCISW):
+ if ( iommu_use_hap_pt(current->domain) )
+ {
+ gdprintk(XENLOG_ERR,
+ "d%u uses set/way cache flushes with the IOMMU on. It cannot work. Replace set/way instructions with dc [ci]vac and retry. Injecting exception into the guest now.\n",
+ current->domain->domain_id);
+ return inject_undef_exception(regs, hsr);
+ }
if ( !cp32.read )
p2m_set_way_flush(current);
break;
Hi, Title: How about: "xen/arm: Do not invalidate the P2M when the PT is shared with the IOMMU" On 03/08/2021 22:37, Stefano Stabellini wrote: > Set/Way flushes never work correctly in a virtualized environment. > > Our current implementation is based on clearing the valid bit in the p2m > pagetable to track guest memory accesses. This technique doesn't work > when the IOMMU is enabled for the domain and the pagetable is shared > between IOMMU and MMU because it triggers IOMMU faults. > > Specifically, p2m_invalidate_root causes IOMMU faults if > iommu_use_hap_pt returns true for the domain. > > Add a check in vsysreg.c and vcpreg.c: if a set/way instruction is used > and iommu_use_hap_pt returns true, rather than failing with obscure > IOMMU faults, inject an undef exception straight away into the guest, > and print a verbose error message to explain the problem. > > Also add an ASSERT in p2m_invalidate_root to make sure we don't > inadvertently stumble across this problem again in the future. > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> > --- > > This patch is an improvement over the IOMMU faults. I don't know if we > want to give users an option to get past these errors for development > and debugging. > > We could add a Xen command line option to make Xen ignore Set/Way > instructions (do nothing on trap). Or we could add an option to avoid > trapping Set/Way instructions altogether (remove HCR_TSW). > > Both workarounds are obviously not correct and could lead to memory > corruptions (the former) or bad interference between guests (the latter). My answer is similar to when you suggested to not trap the SMCs for all the domains. Yes, it may allow a domain to boot but such option will not do a favor to anyone because more weird behavior may happen. If there is a will to handle set/way when device is assigned, then we need to add support for unsharing the page-tables or figure out a different way to emulate set/way. > Either way, we can start with this patch. > > diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c > index caf17174b8..125a9281fc 100644 > --- a/xen/arch/arm/arm64/vsysreg.c > +++ b/xen/arch/arm/arm64/vsysreg.c > @@ -105,6 +105,13 @@ void do_sysreg(struct cpu_user_regs *regs, > case HSR_SYSREG_DCISW: > case HSR_SYSREG_DCCSW: > case HSR_SYSREG_DCCISW: > + if ( iommu_use_hap_pt(current->domain) ) > + { > + gdprintk(XENLOG_ERR, > + "d%u uses set/way cache flushes with the IOMMU on. It cannot work. Replace set/way instructions with dc [ci]vac and retry. Injecting exception into the guest now.\n", This line would be far too long to print on the serial. I think you want to add a few newline here. > + current->domain->domain_id); Please use %pd. > + return inject_undef_exception(regs, hsr); > + } I would prefer if the undef is added in p2m_set_way_flush(). This will avoid the duplication between the cpreg and sysreg code. > if ( !hsr.sysreg.read ) > p2m_set_way_flush(current); > break; > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index d414c4feb9..240913d5ac 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -1171,6 +1171,9 @@ void p2m_invalidate_root(struct p2m_domain *p2m) > { > unsigned int i; > > + /* Clearing the valid bit causes IOMMU faults. */ How about moving this comment on top of the function and writing: "p2m_invalid_root() should not be called when the P2M is shared with the IOMMU because it will cause IOMMU fault." So one doesn't need to read the invalidation to understand that the function should not be called when the P2M is shared with the IOMMU. > + ASSERT(!iommu_use_hap_pt(p2m->domain)); > + > p2m_write_lock(p2m); > > for ( i = 0; i < P2M_ROOT_LEVEL; i++ ) > diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c > index e3ce56d875..04b68f6901 100644 > --- a/xen/arch/arm/vcpreg.c > +++ b/xen/arch/arm/vcpreg.c > @@ -231,6 +231,13 @@ void do_cp15_32(struct cpu_user_regs *regs, const union hsr hsr) > case HSR_CPREG32(DCISW): > case HSR_CPREG32(DCCSW): > case HSR_CPREG32(DCCISW): > + if ( iommu_use_hap_pt(current->domain) ) > + { > + gdprintk(XENLOG_ERR, > + "d%u uses set/way cache flushes with the IOMMU on. It cannot work. Replace set/way instructions with dc [ci]vac and retry. Injecting exception into the guest now.\n", > + current->domain->domain_id); > + return inject_undef_exception(regs, hsr); > + } > if ( !cp32.read ) > p2m_set_way_flush(current); > break; > Cheers, -- Julien Grall
On Tue, 3 Aug 2021, Julien Grall wrote: > Hi, > > Title: How about: > > "xen/arm: Do not invalidate the P2M when the PT is shared with the IOMMU" OK > On 03/08/2021 22:37, Stefano Stabellini wrote: > > Set/Way flushes never work correctly in a virtualized environment. > > > > Our current implementation is based on clearing the valid bit in the p2m > > pagetable to track guest memory accesses. This technique doesn't work > > when the IOMMU is enabled for the domain and the pagetable is shared > > between IOMMU and MMU because it triggers IOMMU faults. > > > > Specifically, p2m_invalidate_root causes IOMMU faults if > > iommu_use_hap_pt returns true for the domain. > > > > Add a check in vsysreg.c and vcpreg.c: if a set/way instruction is used > > and iommu_use_hap_pt returns true, rather than failing with obscure > > IOMMU faults, inject an undef exception straight away into the guest, > > and print a verbose error message to explain the problem. > > > > Also add an ASSERT in p2m_invalidate_root to make sure we don't > > inadvertently stumble across this problem again in the future. > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> > > --- > > > > This patch is an improvement over the IOMMU faults. I don't know if we > > want to give users an option to get past these errors for development > > and debugging. > > > > We could add a Xen command line option to make Xen ignore Set/Way > > instructions (do nothing on trap). Or we could add an option to avoid > > trapping Set/Way instructions altogether (remove HCR_TSW). > > > > Both workarounds are obviously not correct and could lead to memory > > corruptions (the former) or bad interference between guests (the latter). > > My answer is similar to when you suggested to not trap the SMCs for all the > domains. Yes, it may allow a domain to boot but such option will not do a > favor to anyone because more weird behavior may happen. > > If there is a will to handle set/way when device is assigned, then we need to > add support for unsharing the page-tables or figure out a different way to > emulate set/way. > > > Either way, we can start with this patch. > > > > diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c > > index caf17174b8..125a9281fc 100644 > > --- a/xen/arch/arm/arm64/vsysreg.c > > +++ b/xen/arch/arm/arm64/vsysreg.c > > @@ -105,6 +105,13 @@ void do_sysreg(struct cpu_user_regs *regs, > > case HSR_SYSREG_DCISW: > > case HSR_SYSREG_DCCSW: > > case HSR_SYSREG_DCCISW: > > + if ( iommu_use_hap_pt(current->domain) ) > > + { > > + gdprintk(XENLOG_ERR, > > + "d%u uses set/way cache flushes with the IOMMU on. It > > cannot work. Replace set/way instructions with dc [ci]vac and retry. > > Injecting exception into the guest now.\n", > > This line would be far too long to print on the serial. I think you want to > add a few newline here. Fair enough but I'll try to keep most info on the same line because otherwise with a dom0less boot it can get confusing. I suggest: gprintk(XENLOG_ERR, "uses set/way cache flushes with the IOMMU on. It cannot work. Replace them with dc [ci]vac and retry.\n" "Injecting exception into the guest now.\n"); > > + current->domain->domain_id); > > Please use %pd. I realized I should use gprintk and not gdprintk, and also that either way the domain id gets printed automatically, so there is no need to add d%u. > > + return inject_undef_exception(regs, hsr); > > + } > > I would prefer if the undef is added in p2m_set_way_flush(). This will avoid > the duplication between the cpreg and sysreg code. Yes, I can do that > > if ( !hsr.sysreg.read ) > > p2m_set_way_flush(current); > > break; > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > > index d414c4feb9..240913d5ac 100644 > > --- a/xen/arch/arm/p2m.c > > +++ b/xen/arch/arm/p2m.c > > @@ -1171,6 +1171,9 @@ void p2m_invalidate_root(struct p2m_domain *p2m) > > { > > unsigned int i; > > + /* Clearing the valid bit causes IOMMU faults. */ > > How about moving this comment on top of the function and writing: > > "p2m_invalid_root() should not be called when the P2M is shared with the IOMMU > because it will cause IOMMU fault." > > So one doesn't need to read the invalidation to understand that the function > should not be called when the P2M is shared with the IOMMU. Sure > > + ASSERT(!iommu_use_hap_pt(p2m->domain)); > + > > p2m_write_lock(p2m); > > for ( i = 0; i < P2M_ROOT_LEVEL; i++ ) > > diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c > > index e3ce56d875..04b68f6901 100644 > > --- a/xen/arch/arm/vcpreg.c > > +++ b/xen/arch/arm/vcpreg.c > > @@ -231,6 +231,13 @@ void do_cp15_32(struct cpu_user_regs *regs, const union > > hsr hsr) > > case HSR_CPREG32(DCISW): > > case HSR_CPREG32(DCCSW): > > case HSR_CPREG32(DCCISW): > > + if ( iommu_use_hap_pt(current->domain) ) > > + { > > + gdprintk(XENLOG_ERR, > > + "d%u uses set/way cache flushes with the IOMMU on. It > > cannot work. Replace set/way instructions with dc [ci]vac and retry. > > Injecting exception into the guest now.\n", > > + current->domain->domain_id); > > + return inject_undef_exception(regs, hsr); > > + } > > if ( !cp32.read ) > > p2m_set_way_flush(current); > > break; > > > > Cheers, > > -- > Julien Grall >
Hi Stefano, On 04/08/2021 01:08, Stefano Stabellini wrote: >>> diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c >>> index caf17174b8..125a9281fc 100644 >>> --- a/xen/arch/arm/arm64/vsysreg.c >>> +++ b/xen/arch/arm/arm64/vsysreg.c >>> @@ -105,6 +105,13 @@ void do_sysreg(struct cpu_user_regs *regs, >>> case HSR_SYSREG_DCISW: >>> case HSR_SYSREG_DCCSW: >>> case HSR_SYSREG_DCCISW: >>> + if ( iommu_use_hap_pt(current->domain) ) >>> + { >>> + gdprintk(XENLOG_ERR, >>> + "d%u uses set/way cache flushes with the IOMMU on. It >>> cannot work. Replace set/way instructions with dc [ci]vac and retry. >>> Injecting exception into the guest now.\n", >> >> This line would be far too long to print on the serial. I think you want to >> add a few newline here. > > Fair enough but I'll try to keep most info on the same line because > otherwise with a dom0less boot it can get confusing. I suggest: I am not quite too sure to understand why it would get confusing with dom0less. Can you give an example? Cheers, -- Julien Grall
On Wed, 4 Aug 2021, Julien Grall wrote: > Hi Stefano, > > On 04/08/2021 01:08, Stefano Stabellini wrote: > > > > diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c > > > > index caf17174b8..125a9281fc 100644 > > > > --- a/xen/arch/arm/arm64/vsysreg.c > > > > +++ b/xen/arch/arm/arm64/vsysreg.c > > > > @@ -105,6 +105,13 @@ void do_sysreg(struct cpu_user_regs *regs, > > > > case HSR_SYSREG_DCISW: > > > > case HSR_SYSREG_DCCSW: > > > > case HSR_SYSREG_DCCISW: > > > > + if ( iommu_use_hap_pt(current->domain) ) > > > > + { > > > > + gdprintk(XENLOG_ERR, > > > > + "d%u uses set/way cache flushes with the IOMMU on. > > > > It > > > > cannot work. Replace set/way instructions with dc [ci]vac and retry. > > > > Injecting exception into the guest now.\n", > > > > > > This line would be far too long to print on the serial. I think you want > > > to > > > add a few newline here. > > > > Fair enough but I'll try to keep most info on the same line because > > otherwise with a dom0less boot it can get confusing. I suggest: > > I am not quite too sure to understand why it would get confusing with > dom0less. Can you give an example? I was doing tests with the error messages before implementing the undef exception injection. This is the output of a regular domU (not dom0less): https://pastebin.com/Wytg660j The entire message in this test should be: (XEN) d1v0 uses set/way cache flushes with the IOMMU on. It cannot work. (XEN) Replace them with dc [ci]vac and retry. But actually the first line gets eaten, so we only see: (XEN) Replace them with dc [ci]vac and retry. several times at the bottom of the logs.
Hi Stefano, On 04/08/2021 18:41, Stefano Stabellini wrote: > On Wed, 4 Aug 2021, Julien Grall wrote: >> Hi Stefano, >> >> On 04/08/2021 01:08, Stefano Stabellini wrote: >>>>> diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c >>>>> index caf17174b8..125a9281fc 100644 >>>>> --- a/xen/arch/arm/arm64/vsysreg.c >>>>> +++ b/xen/arch/arm/arm64/vsysreg.c >>>>> @@ -105,6 +105,13 @@ void do_sysreg(struct cpu_user_regs *regs, >>>>> case HSR_SYSREG_DCISW: >>>>> case HSR_SYSREG_DCCSW: >>>>> case HSR_SYSREG_DCCISW: >>>>> + if ( iommu_use_hap_pt(current->domain) ) >>>>> + { >>>>> + gdprintk(XENLOG_ERR, >>>>> + "d%u uses set/way cache flushes with the IOMMU on. >>>>> It >>>>> cannot work. Replace set/way instructions with dc [ci]vac and retry. >>>>> Injecting exception into the guest now.\n", >>>> >>>> This line would be far too long to print on the serial. I think you want >>>> to >>>> add a few newline here. >>> >>> Fair enough but I'll try to keep most info on the same line because >>> otherwise with a dom0less boot it can get confusing. I suggest: >> >> I am not quite too sure to understand why it would get confusing with >> dom0less. Can you give an example? > > I was doing tests with the error messages before implementing the undef > exception injection. This is the output of a regular domU (not > dom0less): https://pastebin.com/Wytg660j > > The entire message in this test should be: > > (XEN) d1v0 uses set/way cache flushes with the IOMMU on. It cannot work. > (XEN) Replace them with dc [ci]vac and retry. > > But actually the first line gets eaten, so we only see: > > (XEN) Replace them with dc [ci]vac and retry. > > several times at the bottom of the logs. That's most likely because you are hitting the ratelimit. However, writing a 80+ line is almost the wrong way to go. This making a lot more difficult to serial the console output... Looking at the output, I think you can shorten to: "The cache should be flushed by VA rather than by set/way." At least this would make clear that using them even without the IOMMU is a bad idea for performance reason. Cheers, -- Julien Grall
© 2016 - 2024 Red Hat, Inc.