[PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()

Jan Beulich posted 1 patch 2 years ago
Failed in applying to current master (apply log)
There is a newer version of this series
[PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Jan Beulich 2 years ago
See the code comment. The higher the rate of vCPU-s migrating across
pCPU-s, the less useful this attempted optimization actually is. With
credit2 the migration rate looks to be unduly high even on mostly idle
systems, and hence on large systems lock contention here isn't very
difficult to observe.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1559,6 +1559,16 @@ void evtchn_move_pirqs(struct vcpu *v)
     unsigned int port;
     struct evtchn *chn;
 
+    /*
+     * The work done below is an attempt to keep pIRQ-s on the pCPU-s that the
+     * vCPU-s they're to be delivered to run on. In order to limit lock
+     * contention, check for an empty list prior to acquiring the lock. In the
+     * worst case a pIRQ just bound to this vCPU will be delivered elsewhere
+     * until the vCPU is migrated (again) to another pCPU.
+     */
+    if ( !v->pirq_evtchn_head )
+        return;
+
     spin_lock(&d->event_lock);
     for ( port = v->pirq_evtchn_head; port; port = chn->u.pirq.next_port )
     {
Re: [PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Julien Grall 2 years ago
Hi Jan,

On 08/04/2022 08:16, Jan Beulich wrote:
> See the code comment. The higher the rate of vCPU-s migrating across
> pCPU-s, the less useful this attempted optimization actually is. With
> credit2 the migration rate looks to be unduly high even on mostly idle
> systems, and hence on large systems lock contention here isn't very
> difficult to observe.

"high" and "large" is quite vague. Do you have more details on where you 
observed this issue and the improvement after this patch?

> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -1559,6 +1559,16 @@ void evtchn_move_pirqs(struct vcpu *v)
>       unsigned int port;
>       struct evtchn *chn;
>   
> +    /*
> +     * The work done below is an attempt to keep pIRQ-s on the pCPU-s that the
> +     * vCPU-s they're to be delivered to run on. In order to limit lock
> +     * contention, check for an empty list prior to acquiring the lock. In the
> +     * worst case a pIRQ just bound to this vCPU will be delivered elsewhere
> +     * until the vCPU is migrated (again) to another pCPU.
> +     */

AFAIU, the downside is another pCPU (and therefore vCPU) will get 
disturbed by the interrupt. Maybe we should revive "evtchn: convert 
domain event lock to an r/w one"?

> +    if ( !v->pirq_evtchn_head )
> +        return;
> +
>       spin_lock(&d->event_lock);
>       for ( port = v->pirq_evtchn_head; port; port = chn->u.pirq.next_port )
>       {
> 

Cheers,

-- 
Julien Grall
Re: [PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Jan Beulich 2 years ago
On 08.04.2022 13:02, Julien Grall wrote:
> On 08/04/2022 08:16, Jan Beulich wrote:
>> See the code comment. The higher the rate of vCPU-s migrating across
>> pCPU-s, the less useful this attempted optimization actually is. With
>> credit2 the migration rate looks to be unduly high even on mostly idle
>> systems, and hence on large systems lock contention here isn't very
>> difficult to observe.
> 
> "high" and "large" is quite vague. Do you have more details on where you 
> observed this issue and the improvement after this patch?

I have no data beyond the observation on the failed 4.12 osstest flights,
where I mentioned I would make such a patch and send out as RFC.

>> --- a/xen/common/event_channel.c
>> +++ b/xen/common/event_channel.c
>> @@ -1559,6 +1559,16 @@ void evtchn_move_pirqs(struct vcpu *v)
>>       unsigned int port;
>>       struct evtchn *chn;
>>   
>> +    /*
>> +     * The work done below is an attempt to keep pIRQ-s on the pCPU-s that the
>> +     * vCPU-s they're to be delivered to run on. In order to limit lock
>> +     * contention, check for an empty list prior to acquiring the lock. In the
>> +     * worst case a pIRQ just bound to this vCPU will be delivered elsewhere
>> +     * until the vCPU is migrated (again) to another pCPU.
>> +     */
> 
> AFAIU, the downside is another pCPU (and therefore vCPU) will get 
> disturbed by the interrupt.

But only rarely, i.e. in case a race would actually have occurred.

> Maybe we should revive "evtchn: convert 
> domain event lock to an r/w one"?

Not sure - the patch was rejected for there, overall, being too few
cases where read_lock() would suffice.

Jan
Re: [PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Julien Grall 2 years ago
Hi,

On 08/04/2022 13:25, Jan Beulich wrote:
> On 08.04.2022 13:02, Julien Grall wrote:
>> On 08/04/2022 08:16, Jan Beulich wrote:
>>> See the code comment. The higher the rate of vCPU-s migrating across
>>> pCPU-s, the less useful this attempted optimization actually is. With
>>> credit2 the migration rate looks to be unduly high even on mostly idle
>>> systems, and hence on large systems lock contention here isn't very
>>> difficult to observe.
>>
>> "high" and "large" is quite vague. Do you have more details on where you
>> observed this issue and the improvement after this patch?
> 
> I have no data beyond the observation on the failed 4.12 osstest flights,
> where I mentioned I would make such a patch and send out as RFC.

Ok. I think a pointer to the failed 4.12 osstest would be good here.

> 
>>> --- a/xen/common/event_channel.c
>>> +++ b/xen/common/event_channel.c
>>> @@ -1559,6 +1559,16 @@ void evtchn_move_pirqs(struct vcpu *v)
>>>        unsigned int port;
>>>        struct evtchn *chn;
>>>    
>>> +    /*
>>> +     * The work done below is an attempt to keep pIRQ-s on the pCPU-s that the
>>> +     * vCPU-s they're to be delivered to run on. In order to limit lock
>>> +     * contention, check for an empty list prior to acquiring the lock. In the
>>> +     * worst case a pIRQ just bound to this vCPU will be delivered elsewhere
>>> +     * until the vCPU is migrated (again) to another pCPU.
>>> +     */
>>
>> AFAIU, the downside is another pCPU (and therefore vCPU) will get
>> disturbed by the interrupt.
> 
> But only rarely, i.e. in case a race would actually have occurred.

Maybe I am too paranoid here. The other side of race is controlled by a 
domain. So wouldn't it be possible to increase how often the race is hit?

Cheers,

-- 
Julien Grall
Re: [PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Jan Beulich 2 years ago
On 08.04.2022 19:17, Julien Grall wrote:
> On 08/04/2022 13:25, Jan Beulich wrote:
>> On 08.04.2022 13:02, Julien Grall wrote:
>>> On 08/04/2022 08:16, Jan Beulich wrote:
>>>> See the code comment. The higher the rate of vCPU-s migrating across
>>>> pCPU-s, the less useful this attempted optimization actually is. With
>>>> credit2 the migration rate looks to be unduly high even on mostly idle
>>>> systems, and hence on large systems lock contention here isn't very
>>>> difficult to observe.
>>>
>>> "high" and "large" is quite vague. Do you have more details on where you
>>> observed this issue and the improvement after this patch?
>>
>> I have no data beyond the observation on the failed 4.12 osstest flights,
>> where I mentioned I would make such a patch and send out as RFC.
> 
> Ok. I think a pointer to the failed 4.12 osstest would be good here.

Done, albeit personally I don't think that's overly helpful.

>>>> --- a/xen/common/event_channel.c
>>>> +++ b/xen/common/event_channel.c
>>>> @@ -1559,6 +1559,16 @@ void evtchn_move_pirqs(struct vcpu *v)
>>>>        unsigned int port;
>>>>        struct evtchn *chn;
>>>>    
>>>> +    /*
>>>> +     * The work done below is an attempt to keep pIRQ-s on the pCPU-s that the
>>>> +     * vCPU-s they're to be delivered to run on. In order to limit lock
>>>> +     * contention, check for an empty list prior to acquiring the lock. In the
>>>> +     * worst case a pIRQ just bound to this vCPU will be delivered elsewhere
>>>> +     * until the vCPU is migrated (again) to another pCPU.
>>>> +     */
>>>
>>> AFAIU, the downside is another pCPU (and therefore vCPU) will get
>>> disturbed by the interrupt.
>>
>> But only rarely, i.e. in case a race would actually have occurred.
> 
> Maybe I am too paranoid here. The other side of race is controlled by a 
> domain. So wouldn't it be possible to increase how often the race is hit?

Yes, of course - just to harm itself. The important points are
- that correctness will be maintained, and
- that this is relevant for pass-through guests only (which are
  already not supposed to be entirely untrusted).

Jan
Re: [PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Julien Grall 2 years ago
Hi,

On 11/04/2022 07:13, Jan Beulich wrote:
>>>>> --- a/xen/common/event_channel.c
>>>>> +++ b/xen/common/event_channel.c
>>>>> @@ -1559,6 +1559,16 @@ void evtchn_move_pirqs(struct vcpu *v)
>>>>>         unsigned int port;
>>>>>         struct evtchn *chn;
>>>>>     
>>>>> +    /*
>>>>> +     * The work done below is an attempt to keep pIRQ-s on the pCPU-s that the
>>>>> +     * vCPU-s they're to be delivered to run on. In order to limit lock
>>>>> +     * contention, check for an empty list prior to acquiring the lock. In the
>>>>> +     * worst case a pIRQ just bound to this vCPU will be delivered elsewhere
>>>>> +     * until the vCPU is migrated (again) to another pCPU.
>>>>> +     */
>>>>
>>>> AFAIU, the downside is another pCPU (and therefore vCPU) will get
>>>> disturbed by the interrupt.
>>>
>>> But only rarely, i.e. in case a race would actually have occurred.
>>
>> Maybe I am too paranoid here. The other side of race is controlled by a
>> domain. So wouldn't it be possible to increase how often the race is hit?
> 
> Yes, of course - just to harm itself.

Are you sure? Wouldn't this also harm the next vCPU running on the pCPU 
because it will get interrupted more often?

Cheers,

-- 
Julien Grall
Re: [PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Jan Beulich 2 years ago
On 11.04.2022 12:25, Julien Grall wrote:
> On 11/04/2022 07:13, Jan Beulich wrote:
>>>>>> --- a/xen/common/event_channel.c
>>>>>> +++ b/xen/common/event_channel.c
>>>>>> @@ -1559,6 +1559,16 @@ void evtchn_move_pirqs(struct vcpu *v)
>>>>>>         unsigned int port;
>>>>>>         struct evtchn *chn;
>>>>>>     
>>>>>> +    /*
>>>>>> +     * The work done below is an attempt to keep pIRQ-s on the pCPU-s that the
>>>>>> +     * vCPU-s they're to be delivered to run on. In order to limit lock
>>>>>> +     * contention, check for an empty list prior to acquiring the lock. In the
>>>>>> +     * worst case a pIRQ just bound to this vCPU will be delivered elsewhere
>>>>>> +     * until the vCPU is migrated (again) to another pCPU.
>>>>>> +     */
>>>>>
>>>>> AFAIU, the downside is another pCPU (and therefore vCPU) will get
>>>>> disturbed by the interrupt.
>>>>
>>>> But only rarely, i.e. in case a race would actually have occurred.
>>>
>>> Maybe I am too paranoid here. The other side of race is controlled by a
>>> domain. So wouldn't it be possible to increase how often the race is hit?
>>
>> Yes, of course - just to harm itself.
> 
> Are you sure? Wouldn't this also harm the next vCPU running on the pCPU 
> because it will get interrupted more often?

Possibly, sure. But we don't make any promises here. And recall that
this optimization as a whole has been put under question in the past.
If we'd drop it, we'd also impact various vCPU-s in not really
predictable ways.

Jan
Re: [PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Julien Grall 2 years ago
Hi,

On 11/04/2022 11:45, Jan Beulich wrote:
> On 11.04.2022 12:25, Julien Grall wrote:
>> On 11/04/2022 07:13, Jan Beulich wrote:
>>>>>>> --- a/xen/common/event_channel.c
>>>>>>> +++ b/xen/common/event_channel.c
>>>>>>> @@ -1559,6 +1559,16 @@ void evtchn_move_pirqs(struct vcpu *v)
>>>>>>>          unsigned int port;
>>>>>>>          struct evtchn *chn;
>>>>>>>      
>>>>>>> +    /*
>>>>>>> +     * The work done below is an attempt to keep pIRQ-s on the pCPU-s that the
>>>>>>> +     * vCPU-s they're to be delivered to run on. In order to limit lock
>>>>>>> +     * contention, check for an empty list prior to acquiring the lock. In the
>>>>>>> +     * worst case a pIRQ just bound to this vCPU will be delivered elsewhere
>>>>>>> +     * until the vCPU is migrated (again) to another pCPU.
>>>>>>> +     */
>>>>>>
>>>>>> AFAIU, the downside is another pCPU (and therefore vCPU) will get
>>>>>> disturbed by the interrupt.
>>>>>
>>>>> But only rarely, i.e. in case a race would actually have occurred.
>>>>
>>>> Maybe I am too paranoid here. The other side of race is controlled by a
>>>> domain. So wouldn't it be possible to increase how often the race is hit?
>>>
>>> Yes, of course - just to harm itself.
>>
>> Are you sure? Wouldn't this also harm the next vCPU running on the pCPU
>> because it will get interrupted more often?
> 
> Possibly, sure. But we don't make any promises here. And recall that
> this optimization as a whole has been put under question in the past.

I don't remember this discussion. Do you have a pointer?

> If we'd drop it, we'd also impact various vCPU-s in not really
> predictable ways.

Cheers,

-- 
Julien Grall
Re: [PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Jan Beulich 2 years ago
On 11.04.2022 13:00, Julien Grall wrote:
> On 11/04/2022 11:45, Jan Beulich wrote:
>> On 11.04.2022 12:25, Julien Grall wrote:
>>> On 11/04/2022 07:13, Jan Beulich wrote:
>>>>>>>> --- a/xen/common/event_channel.c
>>>>>>>> +++ b/xen/common/event_channel.c
>>>>>>>> @@ -1559,6 +1559,16 @@ void evtchn_move_pirqs(struct vcpu *v)
>>>>>>>>          unsigned int port;
>>>>>>>>          struct evtchn *chn;
>>>>>>>>      
>>>>>>>> +    /*
>>>>>>>> +     * The work done below is an attempt to keep pIRQ-s on the pCPU-s that the
>>>>>>>> +     * vCPU-s they're to be delivered to run on. In order to limit lock
>>>>>>>> +     * contention, check for an empty list prior to acquiring the lock. In the
>>>>>>>> +     * worst case a pIRQ just bound to this vCPU will be delivered elsewhere
>>>>>>>> +     * until the vCPU is migrated (again) to another pCPU.
>>>>>>>> +     */
>>>>>>>
>>>>>>> AFAIU, the downside is another pCPU (and therefore vCPU) will get
>>>>>>> disturbed by the interrupt.
>>>>>>
>>>>>> But only rarely, i.e. in case a race would actually have occurred.
>>>>>
>>>>> Maybe I am too paranoid here. The other side of race is controlled by a
>>>>> domain. So wouldn't it be possible to increase how often the race is hit?
>>>>
>>>> Yes, of course - just to harm itself.
>>>
>>> Are you sure? Wouldn't this also harm the next vCPU running on the pCPU
>>> because it will get interrupted more often?
>>
>> Possibly, sure. But we don't make any promises here. And recall that
>> this optimization as a whole has been put under question in the past.
> 
> I don't remember this discussion. Do you have a pointer?

I'm sorry, but no, I don't have a pointer. This may even have been on irc.
All I recall (or at least I think I do) is that it was Andrew who raised
the concern.

Jan
Re: [PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Luca Fancellu 2 years ago

> On 8 Apr 2022, at 08:16, Jan Beulich <jbeulich@suse.com> wrote:
> 
> See the code comment. The higher the rate of vCPU-s migrating across
> pCPU-s, the less useful this attempted optimization actually is. With
> credit2 the migration rate looks to be unduly high even on mostly idle
> systems, and hence on large systems lock contention here isn't very
> difficult to observe.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I’ve tested this patch on a Juno board, starting Dom0, creating/destroying few guests,
doing some networking from guests of different cpu pool and everything worked.

Tested-by: Luca Fancellu <luca.fancellu@arm.com>

> 
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -1559,6 +1559,16 @@ void evtchn_move_pirqs(struct vcpu *v)
>     unsigned int port;
>     struct evtchn *chn;
> 
> +    /*
> +     * The work done below is an attempt to keep pIRQ-s on the pCPU-s that the
> +     * vCPU-s they're to be delivered to run on. In order to limit lock
> +     * contention, check for an empty list prior to acquiring the lock. In the
> +     * worst case a pIRQ just bound to this vCPU will be delivered elsewhere
> +     * until the vCPU is migrated (again) to another pCPU.
> +     */
> +    if ( !v->pirq_evtchn_head )
> +        return;
> +
>     spin_lock(&d->event_lock);
>     for ( port = v->pirq_evtchn_head; port; port = chn->u.pirq.next_port )
>     {
> 
> 

Re: [PATCH RFC] evtchn: add early-out to evtchn_move_pirqs()
Posted by Julien Grall 2 years ago
Hi Luca,

On 08/04/2022 11:41, Luca Fancellu wrote:
> 
> 
>> On 8 Apr 2022, at 08:16, Jan Beulich <jbeulich@suse.com> wrote:
>>
>> See the code comment. The higher the rate of vCPU-s migrating across
>> pCPU-s, the less useful this attempted optimization actually is. With
>> credit2 the migration rate looks to be unduly high even on mostly idle
>> systems, and hence on large systems lock contention here isn't very
>> difficult to observe.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I’ve tested this patch on a Juno board, starting Dom0, creating/destroying few guests,
> doing some networking from guests of different cpu pool and everything worked.

FWIW, we don't support PIRQ on Arm (alloc_pirq_struct() will always 
return NULL). So evtchn_mode_pirqs() is basically a NOP.

This patch has the advantage for Arm to avoid taking the lock for 
nothing. I will comment on the x86 part separately.

Cheers,

-- 
Julien Grall