From: Julien Grall <jgrall@amazon.com>
It is a bit pointless to crash a domain that is already dying. This will
become more an annoyance with a follow-up change where page-table
allocation will be forbidden when the domain is dying.
Security wise, there is no change as the devices would still have access
to the IOMMU page-tables even if the domain has crashed until Xen
start to relinquish the resources.
For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure
d->is_dying is correctly observed (a follow-up patch will held it in the
relinquish path).
For Arm, there is still a small race possible. But there is so far no
failure specific to a domain dying.
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
This was spotted when trying to destroy IOREQ servers while the domain
is dying. The code will try to add the entry back in the P2M and
therefore update the P2M (see arch_ioreq_server_disable() ->
hvm_add_ioreq_gfn()).
It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however
I didn't try a patch yet because checking d->is_dying can be racy (I
can't find a proper lock).
Changes in v2:
- Patch added
---
xen/drivers/passthrough/iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 879d238bcd31..75419f20f76d 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
flush_flags) )
continue;
- if ( !is_hardware_domain(d) )
+ if ( !is_hardware_domain(d) && !d->is_dying )
domain_crash(d);
break;
--
2.17.1
> -----Original Message----- > From: Julien Grall <julien@xen.org> > Sent: 09 February 2021 15:28 > To: xen-devel@lists.xenproject.org > Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich > <jbeulich@suse.com>; Paul Durrant <paul@xen.org> > Subject: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying > > From: Julien Grall <jgrall@amazon.com> > > It is a bit pointless to crash a domain that is already dying. This will > become more an annoyance with a follow-up change where page-table > allocation will be forbidden when the domain is dying. > > Security wise, there is no change as the devices would still have access > to the IOMMU page-tables even if the domain has crashed until Xen > start to relinquish the resources. > > For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure > d->is_dying is correctly observed (a follow-up patch will held it in the s/held/hold > relinquish path). > > For Arm, there is still a small race possible. But there is so far no > failure specific to a domain dying. > > Signed-off-by: Julien Grall <jgrall@amazon.com> > > --- > > This was spotted when trying to destroy IOREQ servers while the domain > is dying. The code will try to add the entry back in the P2M and > therefore update the P2M (see arch_ioreq_server_disable() -> > hvm_add_ioreq_gfn()). > > It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however s/mappin/mapping > I didn't try a patch yet because checking d->is_dying can be racy (I > can't find a proper lock). > > Changes in v2: > - Patch added > --- > xen/drivers/passthrough/iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c > index 879d238bcd31..75419f20f76d 100644 > --- a/xen/drivers/passthrough/iommu.c > +++ b/xen/drivers/passthrough/iommu.c > @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn, > flush_flags) ) > continue; > > - if ( !is_hardware_domain(d) ) > + if ( !is_hardware_domain(d) && !d->is_dying ) > domain_crash(d); Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)? Paul > > break; > -- > 2.17.1
On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote: > > > -----Original Message----- > > From: Julien Grall <julien@xen.org> > > Sent: 09 February 2021 15:28 > > To: xen-devel@lists.xenproject.org > > Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich > > <jbeulich@suse.com>; Paul Durrant <paul@xen.org> > > Subject: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying > > > > From: Julien Grall <jgrall@amazon.com> > > > > It is a bit pointless to crash a domain that is already dying. This will > > become more an annoyance with a follow-up change where page-table > > allocation will be forbidden when the domain is dying. > > > > Security wise, there is no change as the devices would still have access > > to the IOMMU page-tables even if the domain has crashed until Xen > > start to relinquish the resources. > > > > For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure > > d->is_dying is correctly observed (a follow-up patch will held it in the > > s/held/hold > > > relinquish path). > > > > For Arm, there is still a small race possible. But there is so far no > > failure specific to a domain dying. > > > > Signed-off-by: Julien Grall <jgrall@amazon.com> > > > > --- > > > > This was spotted when trying to destroy IOREQ servers while the domain > > is dying. The code will try to add the entry back in the P2M and > > therefore update the P2M (see arch_ioreq_server_disable() -> > > hvm_add_ioreq_gfn()). > > > > It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however > > s/mappin/mapping > > > I didn't try a patch yet because checking d->is_dying can be racy (I > > can't find a proper lock). > > > > Changes in v2: > > - Patch added > > --- > > xen/drivers/passthrough/iommu.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c > > index 879d238bcd31..75419f20f76d 100644 > > --- a/xen/drivers/passthrough/iommu.c > > +++ b/xen/drivers/passthrough/iommu.c > > @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn, > > flush_flags) ) > > continue; > > > > - if ( !is_hardware_domain(d) ) > > + if ( !is_hardware_domain(d) && !d->is_dying ) > > domain_crash(d); > > Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)? Jan also suggested moving the check in domain_crash(). However, I felt it is potentially a too risky change for 4.15 as there are quite a few callers. Cheers,
On 09.02.2021 22:14, Julien Grall wrote: > On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote: >>> From: Julien Grall <julien@xen.org> >>> Sent: 09 February 2021 15:28 >>> >>> It is a bit pointless to crash a domain that is already dying. This will >>> become more an annoyance with a follow-up change where page-table >>> allocation will be forbidden when the domain is dying. >>> >>> Security wise, there is no change as the devices would still have access >>> to the IOMMU page-tables even if the domain has crashed until Xen >>> start to relinquish the resources. >>> >>> For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure >>> d->is_dying is correctly observed (a follow-up patch will held it in the >>> relinquish path). Am I to understand this to mean that at this point of the series things aren't really correct yet in this regard? If so, wouldn't it be better to re-order? >>> For Arm, there is still a small race possible. But there is so far no >>> failure specific to a domain dying. >>> >>> Signed-off-by: Julien Grall <jgrall@amazon.com> >>> >>> --- >>> >>> This was spotted when trying to destroy IOREQ servers while the domain >>> is dying. The code will try to add the entry back in the P2M and >>> therefore update the P2M (see arch_ioreq_server_disable() -> >>> hvm_add_ioreq_gfn()). >>> >>> It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however >>> I didn't try a patch yet because checking d->is_dying can be racy (I >>> can't find a proper lock). I understand the concern. I find it odd though that we permit iommu_map() to do anything at all when the domain is already dying. So irrespective of the remark below, how about bailing from iommu_map() earlier when the domain is dying? >>> --- a/xen/drivers/passthrough/iommu.c >>> +++ b/xen/drivers/passthrough/iommu.c >>> @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn, >>> flush_flags) ) >>> continue; >>> >>> - if ( !is_hardware_domain(d) ) >>> + if ( !is_hardware_domain(d) && !d->is_dying ) >>> domain_crash(d); >> >> Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)? > > Jan also suggested moving the check in domain_crash(). However, I felt > it is potentially a too risky change for 4.15 as there are quite a few > callers. This is a fair point. However, in such a case I'd prefer symmetry at least throughout this one source file (there are three more places), unless there are strong reasons against doing so. Jan
Hi Jan, On 10/02/2021 14:14, Jan Beulich wrote: > On 09.02.2021 22:14, Julien Grall wrote: >> On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote: >>>> From: Julien Grall <julien@xen.org> >>>> Sent: 09 February 2021 15:28 >>>> >>>> It is a bit pointless to crash a domain that is already dying. This will >>>> become more an annoyance with a follow-up change where page-table >>>> allocation will be forbidden when the domain is dying. >>>> >>>> Security wise, there is no change as the devices would still have access >>>> to the IOMMU page-tables even if the domain has crashed until Xen >>>> start to relinquish the resources. >>>> >>>> For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure >>>> d->is_dying is correctly observed (a follow-up patch will held it in the >>>> relinquish path). > > Am I to understand this to mean that at this point of the series > things aren't really correct yet in this regard? If so, wouldn't > it be better to re-order? You asked this specific order... So are you saying you want me to use the original ordering? > >>>> For Arm, there is still a small race possible. But there is so far no >>>> failure specific to a domain dying. >>>> >>>> Signed-off-by: Julien Grall <jgrall@amazon.com> >>>> >>>> --- >>>> >>>> This was spotted when trying to destroy IOREQ servers while the domain >>>> is dying. The code will try to add the entry back in the P2M and >>>> therefore update the P2M (see arch_ioreq_server_disable() -> >>>> hvm_add_ioreq_gfn()). >>>> >>>> It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however >>>> I didn't try a patch yet because checking d->is_dying can be racy (I >>>> can't find a proper lock). > > I understand the concern. I find it odd though that we permit > iommu_map() to do anything at all when the domain is already > dying. So irrespective of the remark below, how about bailing > from iommu_map() earlier when the domain is dying? I felt this was potentially too racy to use it. But it should be fine if keep the !d->is_dying below. > >>>> --- a/xen/drivers/passthrough/iommu.c >>>> +++ b/xen/drivers/passthrough/iommu.c >>>> @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn, >>>> flush_flags) ) >>>> continue; >>>> >>>> - if ( !is_hardware_domain(d) ) >>>> + if ( !is_hardware_domain(d) && !d->is_dying ) >>>> domain_crash(d); >>> >>> Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)? >> >> Jan also suggested moving the check in domain_crash(). However, I felt >> it is potentially a too risky change for 4.15 as there are quite a few >> callers. > > This is a fair point. However, in such a case I'd prefer symmetry > at least throughout this one source file (there are three more > places), unless there are strong reasons against doing so. I can have a look and see if the decision is easy to make. Cheers, -- Julien Grall
On 10.02.2021 15:58, Julien Grall wrote: > Hi Jan, > > On 10/02/2021 14:14, Jan Beulich wrote: >> On 09.02.2021 22:14, Julien Grall wrote: >>> On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote: >>>>> From: Julien Grall <julien@xen.org> >>>>> Sent: 09 February 2021 15:28 >>>>> >>>>> It is a bit pointless to crash a domain that is already dying. This will >>>>> become more an annoyance with a follow-up change where page-table >>>>> allocation will be forbidden when the domain is dying. >>>>> >>>>> Security wise, there is no change as the devices would still have access >>>>> to the IOMMU page-tables even if the domain has crashed until Xen >>>>> start to relinquish the resources. >>>>> >>>>> For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure >>>>> d->is_dying is correctly observed (a follow-up patch will held it in the >>>>> relinquish path). >> >> Am I to understand this to mean that at this point of the series >> things aren't really correct yet in this regard? If so, wouldn't >> it be better to re-order? > > You asked this specific order... So are you saying you want me to use > the original ordering? Well, it's been a while and I don't recall the specific reason for the request. But then at least the spin_barrier() you mean to rely on could / should be moved here? >>>>> For Arm, there is still a small race possible. But there is so far no >>>>> failure specific to a domain dying. >>>>> >>>>> Signed-off-by: Julien Grall <jgrall@amazon.com> >>>>> >>>>> --- >>>>> >>>>> This was spotted when trying to destroy IOREQ servers while the domain >>>>> is dying. The code will try to add the entry back in the P2M and >>>>> therefore update the P2M (see arch_ioreq_server_disable() -> >>>>> hvm_add_ioreq_gfn()). >>>>> >>>>> It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however >>>>> I didn't try a patch yet because checking d->is_dying can be racy (I >>>>> can't find a proper lock). >> >> I understand the concern. I find it odd though that we permit >> iommu_map() to do anything at all when the domain is already >> dying. So irrespective of the remark below, how about bailing >> from iommu_map() earlier when the domain is dying? > > I felt this was potentially too racy to use it. But it should be fine if > keep the !d->is_dying below. Why? As per later comments I didn't necessarily mean iommu_map() literally - as indicated, the per-vendor functions ought to be suitable to place the check, right after having taken the lock. Jan
© 2016 - 2026 Red Hat, Inc.