port_is_valid() and evtchn_from_port() are fine to use without holding
any locks. Accordingly acquire the per-domain lock slightly later in
evtchn_close() and evtchn_bind_vcpu().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v6: Re-base for re-ordering / shrinking of series.
v4: New.
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -606,17 +606,14 @@ int evtchn_close(struct domain *d1, int
int port2;
long rc = 0;
- again:
- spin_lock(&d1->event_lock);
-
if ( !port_is_valid(d1, port1) )
- {
- rc = -EINVAL;
- goto out;
- }
+ return -EINVAL;
chn1 = evtchn_from_port(d1, port1);
+ again:
+ spin_lock(&d1->event_lock);
+
/* Guest cannot close a Xen-attached event channel. */
if ( unlikely(consumer_is_xen(chn1)) && guest )
{
@@ -1041,16 +1038,13 @@ long evtchn_bind_vcpu(unsigned int port,
if ( (v = domain_vcpu(d, vcpu_id)) == NULL )
return -ENOENT;
- spin_lock(&d->event_lock);
-
if ( !port_is_valid(d, port) )
- {
- rc = -EINVAL;
- goto out;
- }
+ return -EINVAL;
chn = evtchn_from_port(d, port);
+ spin_lock(&d->event_lock);
+
/* Guest cannot re-bind a Xen-attached event channel. */
if ( unlikely(consumer_is_xen(chn)) )
{
On Thu, May 27, 2021 at 01:28:07PM +0200, Jan Beulich wrote: > port_is_valid() and evtchn_from_port() are fine to use without holding > any locks. Accordingly acquire the per-domain lock slightly later in > evtchn_close() and evtchn_bind_vcpu(). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
Hi Jan,
On 27/05/2021 12:28, Jan Beulich wrote:
> port_is_valid() and evtchn_from_port() are fine to use without holding
> any locks. Accordingly acquire the per-domain lock slightly later in
> evtchn_close() and evtchn_bind_vcpu().
So I agree that port_is_valid() and evtchn_from_port() are fine to use
without holding any locks in evtchn_bind_vcpu(). However, this is
misleading to say there is no problem with evtchn_close().
evtchn_close() can be called with current != d and therefore, there is a
risk that port_is_valid() may be valid and then invalid because
d->valid_evtchns is decremented in evtchn_destroy().
Thankfully the memory is still there. So the current code is okayish and
I could reluctantly accept this behavior to be spread. However, I don't
think this should be left uncommented in both the code (maybe on top of
port_is_valid()?) and the commit message.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v6: Re-base for re-ordering / shrinking of series.
> v4: New.
>
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -606,17 +606,14 @@ int evtchn_close(struct domain *d1, int
> int port2;
> long rc = 0;
>
> - again:
> - spin_lock(&d1->event_lock);
> -
> if ( !port_is_valid(d1, port1) )
> - {
> - rc = -EINVAL;
> - goto out;
> - }
> + return -EINVAL;
>
> chn1 = evtchn_from_port(d1, port1);
>
> + again:
> + spin_lock(&d1->event_lock);
> +
> /* Guest cannot close a Xen-attached event channel. */
> if ( unlikely(consumer_is_xen(chn1)) && guest )
> {
> @@ -1041,16 +1038,13 @@ long evtchn_bind_vcpu(unsigned int port,
> if ( (v = domain_vcpu(d, vcpu_id)) == NULL )
> return -ENOENT;
>
> - spin_lock(&d->event_lock);
> -
> if ( !port_is_valid(d, port) )
> - {
> - rc = -EINVAL;
> - goto out;
> - }
> + return -EINVAL;
>
> chn = evtchn_from_port(d, port);
>
> + spin_lock(&d->event_lock);
> +
> /* Guest cannot re-bind a Xen-attached event channel. */
> if ( unlikely(consumer_is_xen(chn)) )
> {
>
Cheers,
--
Julien Grall
On Thu, May 27, 2021 at 07:48:41PM +0100, Julien Grall wrote: > Hi Jan, > > On 27/05/2021 12:28, Jan Beulich wrote: > > port_is_valid() and evtchn_from_port() are fine to use without holding > > any locks. Accordingly acquire the per-domain lock slightly later in > > evtchn_close() and evtchn_bind_vcpu(). > > So I agree that port_is_valid() and evtchn_from_port() are fine to use > without holding any locks in evtchn_bind_vcpu(). However, this is misleading > to say there is no problem with evtchn_close(). > > evtchn_close() can be called with current != d and therefore, there is a The only instances evtchn_close is called with current != d and the domain could be unpaused is in free_xen_event_channel AFAICT. > risk that port_is_valid() may be valid and then invalid because > d->valid_evtchns is decremented in evtchn_destroy(). Hm, I guess you could indeed have parallel calls to free_xen_event_channel and evtchn_destroy in a way that free_xen_event_channel could race with valid_evtchns getting decreased? All callers of free_xen_event_channel are internal to the hypervisor, so maybe with proper ordering of the operations this could be avoided, albeit it's not easy to assert. > Thankfully the memory is still there. So the current code is okayish and I > could reluctantly accept this behavior to be spread. However, I don't think > this should be left uncommented in both the code (maybe on top of > port_is_valid()?) and the commit message. Indeed, I think we need some expansion of the comment in port_is_valid to clarify all this. I'm not sure I understand it properly myself when it's fine to use port_is_valid without holding the per domain event lock. evtchn_close shouldn't be a very common operation anyway, so it holding the per domain lock a bit longer doesn't seem that critical to me anyway. Thanks, Roger.
On 28.05.2021 10:30, Roger Pau Monné wrote: > On Thu, May 27, 2021 at 07:48:41PM +0100, Julien Grall wrote: >> On 27/05/2021 12:28, Jan Beulich wrote: >>> port_is_valid() and evtchn_from_port() are fine to use without holding >>> any locks. Accordingly acquire the per-domain lock slightly later in >>> evtchn_close() and evtchn_bind_vcpu(). >> >> So I agree that port_is_valid() and evtchn_from_port() are fine to use >> without holding any locks in evtchn_bind_vcpu(). However, this is misleading >> to say there is no problem with evtchn_close(). >> >> evtchn_close() can be called with current != d and therefore, there is a > > The only instances evtchn_close is called with current != d and the > domain could be unpaused is in free_xen_event_channel AFAICT. As long as the domain is not paused, ->valid_evtchns can't ever decrease: The only point where this gets done is in evtchn_destroy(). Hence ... >> risk that port_is_valid() may be valid and then invalid because >> d->valid_evtchns is decremented in evtchn_destroy(). > > Hm, I guess you could indeed have parallel calls to > free_xen_event_channel and evtchn_destroy in a way that > free_xen_event_channel could race with valid_evtchns getting > decreased? ... I don't see this as relevant. >> Thankfully the memory is still there. So the current code is okayish and I >> could reluctantly accept this behavior to be spread. However, I don't think >> this should be left uncommented in both the code (maybe on top of >> port_is_valid()?) and the commit message. > > Indeed, I think we need some expansion of the comment in port_is_valid > to clarify all this. I'm not sure I understand it properly myself when > it's fine to use port_is_valid without holding the per domain event > lock. Because of the above property plus the fact that even if ->valid_evtchns decreases, the underlying struct evtchn instance will remain valid (i.e. won't get de-allocated, which happens only in evtchn_destroy_final()), it is always fine to use it without lock. With this I'm having trouble seeing what would need adding to port_is_valid()'s commentary. The only thing which shouldn't happen anywhere is following a port_is_valid() check which has returned false by code assuming the port is going to remain invalid. But I think that's obvious when you don't hold any suitable lock. I do intend to follow Julien's request to explain things more for evtchn_close(). Jan
Hi Jan, On 28/05/2021 11:23, Jan Beulich wrote: > On 28.05.2021 10:30, Roger Pau Monné wrote: >> On Thu, May 27, 2021 at 07:48:41PM +0100, Julien Grall wrote: >>> On 27/05/2021 12:28, Jan Beulich wrote: >>>> port_is_valid() and evtchn_from_port() are fine to use without holding >>>> any locks. Accordingly acquire the per-domain lock slightly later in >>>> evtchn_close() and evtchn_bind_vcpu(). >>> >>> So I agree that port_is_valid() and evtchn_from_port() are fine to use >>> without holding any locks in evtchn_bind_vcpu(). However, this is misleading >>> to say there is no problem with evtchn_close(). >>> >>> evtchn_close() can be called with current != d and therefore, there is a >> >> The only instances evtchn_close is called with current != d and the >> domain could be unpaused is in free_xen_event_channel AFAICT. > > As long as the domain is not paused, ->valid_evtchns can't ever > decrease: The only point where this gets done is in evtchn_destroy(). > Hence ... > >>> risk that port_is_valid() may be valid and then invalid because >>> d->valid_evtchns is decremented in evtchn_destroy(). >> >> Hm, I guess you could indeed have parallel calls to >> free_xen_event_channel and evtchn_destroy in a way that >> free_xen_event_channel could race with valid_evtchns getting >> decreased? > > ... I don't see this as relevant. > >>> Thankfully the memory is still there. So the current code is okayish and I >>> could reluctantly accept this behavior to be spread. However, I don't think >>> this should be left uncommented in both the code (maybe on top of >>> port_is_valid()?) and the commit message. >> >> Indeed, I think we need some expansion of the comment in port_is_valid >> to clarify all this. I'm not sure I understand it properly myself when >> it's fine to use port_is_valid without holding the per domain event >> lock. > > Because of the above property plus the fact that even if > ->valid_evtchns decreases, the underlying struct evtchn instance > will remain valid (i.e. won't get de-allocated, which happens only > in evtchn_destroy_final()), it is always fine to use it without > lock. With this I'm having trouble seeing what would need adding > to port_is_valid()'s commentary. Lets take the example of free_xen_event_channel(). The function is checking if the port is valid. If it is, then evtchn_close() will be called. At this point, it would be fair for a developper to assume that port_is_valid() will also return true in event_close(). To push to the extreme, if free_xen_event_channel() was the only caller of evtchn_close(), one could argue that the check in evtchn_close() could be a BUG_ON(). However, this can't be because this would sooner or later turn to an XSA. Effectively, it means that is_port_valid() *cannot* be used in an ASSERT()/BUG_ON() and every caller should check the return even if the port was previously validated. So I think a comment on top of is_port_valid() would be really useful before someone rediscover it the hard way. Cheers, -- Julien Grall
On Fri, May 28, 2021 at 11:48:51AM +0100, Julien Grall wrote: > Hi Jan, > > On 28/05/2021 11:23, Jan Beulich wrote: > > On 28.05.2021 10:30, Roger Pau Monné wrote: > > > On Thu, May 27, 2021 at 07:48:41PM +0100, Julien Grall wrote: > > > > On 27/05/2021 12:28, Jan Beulich wrote: > > > > > port_is_valid() and evtchn_from_port() are fine to use without holding > > > > > any locks. Accordingly acquire the per-domain lock slightly later in > > > > > evtchn_close() and evtchn_bind_vcpu(). > > > > > > > > So I agree that port_is_valid() and evtchn_from_port() are fine to use > > > > without holding any locks in evtchn_bind_vcpu(). However, this is misleading > > > > to say there is no problem with evtchn_close(). > > > > > > > > evtchn_close() can be called with current != d and therefore, there is a > > > > > > The only instances evtchn_close is called with current != d and the > > > domain could be unpaused is in free_xen_event_channel AFAICT. > > > > As long as the domain is not paused, ->valid_evtchns can't ever > > decrease: The only point where this gets done is in evtchn_destroy(). > > Hence ... > > > > > > risk that port_is_valid() may be valid and then invalid because > > > > d->valid_evtchns is decremented in evtchn_destroy(). > > > > > > Hm, I guess you could indeed have parallel calls to > > > free_xen_event_channel and evtchn_destroy in a way that > > > free_xen_event_channel could race with valid_evtchns getting > > > decreased? > > > > ... I don't see this as relevant. > > > > > > Thankfully the memory is still there. So the current code is okayish and I > > > > could reluctantly accept this behavior to be spread. However, I don't think > > > > this should be left uncommented in both the code (maybe on top of > > > > port_is_valid()?) and the commit message. > > > > > > Indeed, I think we need some expansion of the comment in port_is_valid > > > to clarify all this. I'm not sure I understand it properly myself when > > > it's fine to use port_is_valid without holding the per domain event > > > lock. > > > > Because of the above property plus the fact that even if > > ->valid_evtchns decreases, the underlying struct evtchn instance > > will remain valid (i.e. won't get de-allocated, which happens only > > in evtchn_destroy_final()), it is always fine to use it without > > lock. With this I'm having trouble seeing what would need adding > > to port_is_valid()'s commentary. > > Lets take the example of free_xen_event_channel(). The function is checking > if the port is valid. If it is, then evtchn_close() will be called. > > At this point, it would be fair for a developper to assume that > port_is_valid() will also return true in event_close(). > > To push to the extreme, if free_xen_event_channel() was the only caller of > evtchn_close(), one could argue that the check in evtchn_close() could be a > BUG_ON(). > > However, this can't be because this would sooner or later turn to an XSA. > > Effectively, it means that is_port_valid() *cannot* be used in an > ASSERT()/BUG_ON() and every caller should check the return even if the port > was previously validated. We already have cases of is_port_valid being used in ASSERTs (in the shim) and a BUG_ON (with the domain event lock held in evtchn_close). > So I think a comment on top of is_port_valid() would be really useful before > someone rediscover it the hard way. I think I'm being extremely dull here, sorry. From your text I understand that the value returned by is_port_valid could be stale by the time the user reads it? I think there's some condition that makes this value stale, and it's not the common case? Thanks, Roger.
On 28.05.2021 15:31, Roger Pau Monné wrote: > I think I'm being extremely dull here, sorry. From your text I > understand that the value returned by is_port_valid could be stale by > the time the user reads it? > > I think there's some condition that makes this value stale, and it's > not the common case? It's evtchn_destroy() running in parallel which can have this effect. Jan
Hi Roger, On 28/05/2021 14:31, Roger Pau Monné wrote: > On Fri, May 28, 2021 at 11:48:51AM +0100, Julien Grall wrote: >> Hi Jan, >> >> On 28/05/2021 11:23, Jan Beulich wrote: >>> On 28.05.2021 10:30, Roger Pau Monné wrote: >>>> On Thu, May 27, 2021 at 07:48:41PM +0100, Julien Grall wrote: >>>>> On 27/05/2021 12:28, Jan Beulich wrote: >>>>>> port_is_valid() and evtchn_from_port() are fine to use without holding >>>>>> any locks. Accordingly acquire the per-domain lock slightly later in >>>>>> evtchn_close() and evtchn_bind_vcpu(). >>>>> >>>>> So I agree that port_is_valid() and evtchn_from_port() are fine to use >>>>> without holding any locks in evtchn_bind_vcpu(). However, this is misleading >>>>> to say there is no problem with evtchn_close(). >>>>> >>>>> evtchn_close() can be called with current != d and therefore, there is a >>>> >>>> The only instances evtchn_close is called with current != d and the >>>> domain could be unpaused is in free_xen_event_channel AFAICT. >>> >>> As long as the domain is not paused, ->valid_evtchns can't ever >>> decrease: The only point where this gets done is in evtchn_destroy(). >>> Hence ... >>> >>>>> risk that port_is_valid() may be valid and then invalid because >>>>> d->valid_evtchns is decremented in evtchn_destroy(). >>>> >>>> Hm, I guess you could indeed have parallel calls to >>>> free_xen_event_channel and evtchn_destroy in a way that >>>> free_xen_event_channel could race with valid_evtchns getting >>>> decreased? >>> >>> ... I don't see this as relevant. >>> >>>>> Thankfully the memory is still there. So the current code is okayish and I >>>>> could reluctantly accept this behavior to be spread. However, I don't think >>>>> this should be left uncommented in both the code (maybe on top of >>>>> port_is_valid()?) and the commit message. >>>> >>>> Indeed, I think we need some expansion of the comment in port_is_valid >>>> to clarify all this. I'm not sure I understand it properly myself when >>>> it's fine to use port_is_valid without holding the per domain event >>>> lock. >>> >>> Because of the above property plus the fact that even if >>> ->valid_evtchns decreases, the underlying struct evtchn instance >>> will remain valid (i.e. won't get de-allocated, which happens only >>> in evtchn_destroy_final()), it is always fine to use it without >>> lock. With this I'm having trouble seeing what would need adding >>> to port_is_valid()'s commentary. >> >> Lets take the example of free_xen_event_channel(). The function is checking >> if the port is valid. If it is, then evtchn_close() will be called. >> >> At this point, it would be fair for a developper to assume that >> port_is_valid() will also return true in event_close(). >> >> To push to the extreme, if free_xen_event_channel() was the only caller of >> evtchn_close(), one could argue that the check in evtchn_close() could be a >> BUG_ON(). >> >> However, this can't be because this would sooner or later turn to an XSA. >> >> Effectively, it means that is_port_valid() *cannot* be used in an >> ASSERT()/BUG_ON() and every caller should check the return even if the port >> was previously validated. > > We already have cases of is_port_valid being used in ASSERTs (in the > shim) and a BUG_ON (with the domain event lock held in evtchn_close). I was likely a bit too restrictive in my remark. The BUG_ON() in evtchn_close() is fine because this is used on a (in theory) an open port with the both domains event lock held. This would be more a problem if we have either: if ( !port_is_valid(d1, port1) ) return -EINVAL; /* .... */ BUG_ON(port_is_valid(d1, port1)); > >> So I think a comment on top of is_port_valid() would be really useful before >> someone rediscover it the hard way. > > I think I'm being extremely dull here, sorry. From your text I > understand that the value returned by is_port_valid could be stale by > the time the user reads it? > > I think there's some condition that makes this value stale, and it's > not the common case? It is any code that race with evtchn_destroy(). Cheers, -- Julien Grall
On 27.05.2021 20:48, Julien Grall wrote: > On 27/05/2021 12:28, Jan Beulich wrote: >> port_is_valid() and evtchn_from_port() are fine to use without holding >> any locks. Accordingly acquire the per-domain lock slightly later in >> evtchn_close() and evtchn_bind_vcpu(). > > So I agree that port_is_valid() and evtchn_from_port() are fine to use > without holding any locks in evtchn_bind_vcpu(). However, this is > misleading to say there is no problem with evtchn_close(). > > evtchn_close() can be called with current != d and therefore, there is a > risk that port_is_valid() may be valid and then invalid because > d->valid_evtchns is decremented in evtchn_destroy(). While this is the case for other functions as well (and hence a comment along the lines of what you ask for below should have been in place already), I've added /* * While calling the function is okay without holding a suitable lock yet * (see the comment ahead of struct evtchn_port_ops for which ones those * are), for a dying domain it may start returning false at any point - see * evtchn_destroy(). This is not a fundamental problem though, as the * struct evtchn instance won't disappear (and will continue to hold valid * data) until final cleanup of the domain, at which point the domain itself * cannot be looked up anymore and hence calls here can't occur anymore in * the first place. */ ... > Thankfully the memory is still there. So the current code is okayish and > I could reluctantly accept this behavior to be spread. However, I don't > think this should be left uncommented in both the code (maybe on top of > port_is_valid()?) and the commit message. ... ahead of port_is_valid() (and not, as I did intend originally, in evtchn_close()). As far as the commit message goes, I'll have it refer to the comment only. I hope this satisfies the requests of both of you. I'll take the liberty and retain your ack, Roger. Jan
Hi Jan, On 01/06/2021 12:54, Jan Beulich wrote: > On 27.05.2021 20:48, Julien Grall wrote: >> On 27/05/2021 12:28, Jan Beulich wrote: >>> port_is_valid() and evtchn_from_port() are fine to use without holding >>> any locks. Accordingly acquire the per-domain lock slightly later in >>> evtchn_close() and evtchn_bind_vcpu(). >> >> So I agree that port_is_valid() and evtchn_from_port() are fine to use >> without holding any locks in evtchn_bind_vcpu(). However, this is >> misleading to say there is no problem with evtchn_close(). >> >> evtchn_close() can be called with current != d and therefore, there is a >> risk that port_is_valid() may be valid and then invalid because >> d->valid_evtchns is decremented in evtchn_destroy(). > > While this is the case for other functions as well (and hence a > comment along the lines of what you ask for below should have > been in place already), I've added > > /* > * While calling the function is okay without holding a suitable lock yet > * (see the comment ahead of struct evtchn_port_ops for which ones those > * are), for a dying domain it may start returning false at any point - see > * evtchn_destroy(). This is not a fundamental problem though, as the > * struct evtchn instance won't disappear (and will continue to hold valid > * data) until final cleanup of the domain, at which point the domain itself > * cannot be looked up anymore and hence calls here can't occur anymore in > * the first place. > */ > > ... > >> Thankfully the memory is still there. So the current code is okayish and >> I could reluctantly accept this behavior to be spread. However, I don't >> think this should be left uncommented in both the code (maybe on top of >> port_is_valid()?) and the commit message. > > ... ahead of port_is_valid() (and not, as I did intend originally, > in evtchn_close()). As far as the commit message goes, I'll have it > refer to the comment only. > > I hope this satisfies the requests of both of you. I'll take the > liberty and retain your ack, Roger. Yes, this satistfies my requests. Feel free to add my reviewed-by on the patch. Cheers, -- Julien Grall
© 2016 - 2026 Red Hat, Inc.