[PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu

Prasad Pandit posted 1 patch 1 year, 11 months ago
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
Posted by Prasad Pandit 1 year, 11 months ago
From: Prasad Pandit <pjp@fedoraproject.org>

kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
request for a vcpu even when its 'events->nmi.pending' is zero.
Ex:
    qemu_thread_start
     kvm_vcpu_thread_fn
      qemu_wait_io_event
       qemu_wait_io_event_common
        process_queued_cpu_work
         do_kvm_cpu_synchronize_post_init/_reset
          kvm_arch_put_registers
           kvm_put_vcpu_events (cpu, level=[2|3])

This leads vCPU threads in QEMU to constantly acquire & release the
global mutex lock, delaying the guest boot due to lock contention.
Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.

Fixes: bdedff263132 ("KVM: x86: Route pending NMIs from userspace through process_nmi()")
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1a3aaa7dafae..468870450b8b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5405,7 +5405,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
 		vcpu->arch.nmi_pending = 0;
 		atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
-		kvm_make_request(KVM_REQ_NMI, vcpu);
+		if (events->nmi.pending)
+			kvm_make_request(KVM_REQ_NMI, vcpu);
 	}
 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);

--
2.43.0
Re: [PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
Posted by Dongli Zhang 1 year, 10 months ago
Hi Prasad,

On 1/2/24 23:53, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
> request for a vcpu even when its 'events->nmi.pending' is zero.
> Ex:
>     qemu_thread_start
>      kvm_vcpu_thread_fn
>       qemu_wait_io_event
>        qemu_wait_io_event_common
>         process_queued_cpu_work
>          do_kvm_cpu_synchronize_post_init/_reset
>           kvm_arch_put_registers
>            kvm_put_vcpu_events (cpu, level=[2|3])
> 
> This leads vCPU threads in QEMU to constantly acquire & release the
> global mutex lock, delaying the guest boot due to lock contention.

Would you mind sharing where and how the lock contention is at QEMU space? That
is, how the QEMU mutex lock is impacted by KVM KVM_REQ_NMI?

Or you meant line 3031 at QEMU side?

2858 int kvm_cpu_exec(CPUState *cpu)
2859 {
2860     struct kvm_run *run = cpu->kvm_run;
2861     int ret, run_ret;
... ...
3023         default:
3024             DPRINTF("kvm_arch_handle_exit\n");
3025             ret = kvm_arch_handle_exit(cpu, run);
3026             break;
3027         }
3028     } while (ret == 0);
3029
3030     cpu_exec_end(cpu);
3031     qemu_mutex_lock_iothread();
3032
3033     if (ret < 0) {
3034         cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
3035         vm_stop(RUN_STATE_INTERNAL_ERROR);
3036     }
3037
3038     qatomic_set(&cpu->exit_request, 0);
3039     return ret;
3040 }

Thank you very much!

Dongli Zhang

> Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.
> 
> Fixes: bdedff263132 ("KVM: x86: Route pending NMIs from userspace through process_nmi()")
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
> ---
>  arch/x86/kvm/x86.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1a3aaa7dafae..468870450b8b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5405,7 +5405,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>  	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>  		vcpu->arch.nmi_pending = 0;
>  		atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> -		kvm_make_request(KVM_REQ_NMI, vcpu);
> +		if (events->nmi.pending)
> +			kvm_make_request(KVM_REQ_NMI, vcpu);
>  	}
>  	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> 
> --
> 2.43.0
> 
>
Re: [PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
Posted by Sean Christopherson 1 year, 10 months ago
On Tue, Feb 06, 2024, Dongli Zhang wrote:
> Hi Prasad,
> 
> On 1/2/24 23:53, Prasad Pandit wrote:
> > From: Prasad Pandit <pjp@fedoraproject.org>
> > 
> > kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
> > request for a vcpu even when its 'events->nmi.pending' is zero.
> > Ex:
> >     qemu_thread_start
> >      kvm_vcpu_thread_fn
> >       qemu_wait_io_event
> >        qemu_wait_io_event_common
> >         process_queued_cpu_work
> >          do_kvm_cpu_synchronize_post_init/_reset
> >           kvm_arch_put_registers
> >            kvm_put_vcpu_events (cpu, level=[2|3])
> > 
> > This leads vCPU threads in QEMU to constantly acquire & release the
> > global mutex lock, delaying the guest boot due to lock contention.
> 
> Would you mind sharing where and how the lock contention is at QEMU space? That
> is, how the QEMU mutex lock is impacted by KVM KVM_REQ_NMI?
> 
> Or you meant line 3031 at QEMU side?

Yeah, something like that.  Details in this thread.

https://lore.kernel.org/all/CAE8KmOyffXD4k69vRAFwesaqrBGzFY3i+kefbkHcQf4=jNYzOA@mail.gmail.com

> 2858 int kvm_cpu_exec(CPUState *cpu)
> 2859 {
> 2860     struct kvm_run *run = cpu->kvm_run;
> 2861     int ret, run_ret;
> ... ...
> 3023         default:
> 3024             DPRINTF("kvm_arch_handle_exit\n");
> 3025             ret = kvm_arch_handle_exit(cpu, run);
> 3026             break;
> 3027         }
> 3028     } while (ret == 0);
> 3029
> 3030     cpu_exec_end(cpu);
> 3031     qemu_mutex_lock_iothread();
> 3032
> 3033     if (ret < 0) {
> 3034         cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
> 3035         vm_stop(RUN_STATE_INTERNAL_ERROR);
> 3036     }
> 3037
> 3038     qatomic_set(&cpu->exit_request, 0);
> 3039     return ret;
> 3040 }
Re: [PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
Posted by Dongli Zhang 1 year, 10 months ago

On 2/6/24 12:58, Sean Christopherson wrote:
> On Tue, Feb 06, 2024, Dongli Zhang wrote:
>> Hi Prasad,
>>
>> On 1/2/24 23:53, Prasad Pandit wrote:
>>> From: Prasad Pandit <pjp@fedoraproject.org>
>>>
>>> kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
>>> request for a vcpu even when its 'events->nmi.pending' is zero.
>>> Ex:
>>>     qemu_thread_start
>>>      kvm_vcpu_thread_fn
>>>       qemu_wait_io_event
>>>        qemu_wait_io_event_common
>>>         process_queued_cpu_work
>>>          do_kvm_cpu_synchronize_post_init/_reset
>>>           kvm_arch_put_registers
>>>            kvm_put_vcpu_events (cpu, level=[2|3])
>>>
>>> This leads vCPU threads in QEMU to constantly acquire & release the
>>> global mutex lock, delaying the guest boot due to lock contention.
>>
>> Would you mind sharing where and how the lock contention is at QEMU space? That
>> is, how the QEMU mutex lock is impacted by KVM KVM_REQ_NMI?
>>
>> Or you meant line 3031 at QEMU side?
> 
> Yeah, something like that.  Details in this thread.
> 
> https://urldefense.com/v3/__https://lore.kernel.org/all/CAE8KmOyffXD4k69vRAFwesaqrBGzFY3i*kefbkHcQf4=jNYzOA@mail.gmail.com__;Kw!!ACWV5N9M2RV99hQ!N61g2QXuC5B5RpVNBowgKUXjHzX4vp0lCXuton3fKVRbzBuXaVtJgePu0ddSf3EB9EEQORTmwop4vD5KrQ$ 

Thank you very much for pointing to the discussion. I should have found them :)

Here is my understanding.

1. During the VM creation, the mp_state of AP (non-BSP) is always
KVM_MP_STATE_UNINITIALIZED, until INIT/SIPI.

2. Ideally, AP should block at below. That is, line 3775 is always false.

3760 bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
3761 {
3762         struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
3763         bool waited = false;
3764
3765         vcpu->stat.generic.blocking = 1;
3766
3767         preempt_disable();
3768         kvm_arch_vcpu_blocking(vcpu);
3769         prepare_to_rcuwait(wait);
3770         preempt_enable();
3771
3772         for (;;) {
3773                 set_current_state(TASK_INTERRUPTIBLE);
3774
3775                 if (kvm_vcpu_check_block(vcpu) < 0)
3776                         break;
3777
3778                 waited = true;
3779                 schedule();
3780         }

3. Unfortunately, the issue may set KVM_REQ_NMI for AP.

4. This leads to the kvm_vcpu_check_block() to return.

kvm_arch_vcpu_ioctl_run()
-> kvm_vcpu_block()
   -> kvm_vcpu_check_block()
      -> kvm_arch_vcpu_runnable()
         -> kvm_vcpu_has_events()
            -> kvm_test_request(KVM_REQ_NMI, vcpu)


5. The kvm_arch_vcpu_ioctl_run() returns to QEMU with -EAGAIN.

6. The QEMU side is not able to handle -EAGAIN, but to goto line 2984 to return.

It acquires the global mutex at line 2976 (release before entering into guest
again). The KVM_REQ_NMI is never cleared until INIT/SIPI.


2808 int kvm_cpu_exec(CPUState *cpu)
2809 {
... ...
2868         if (run_ret < 0) {
2869             if (run_ret == -EINTR || run_ret == -EAGAIN) {
2870                 trace_kvm_io_window_exit();
2871                 kvm_eat_signals(cpu);
2872                 ret = EXCP_INTERRUPT;
2873                 break;
2874             }
... ...
2973     } while (ret == 0);
2974
2975     cpu_exec_end(cpu);
2976     bql_lock();
2977
2978     if (ret < 0) {
2979         cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
2980         vm_stop(RUN_STATE_INTERNAL_ERROR);
2981     }
2982
2983     qatomic_set(&cpu->exit_request, 0);
2984     return ret;
2985 }


7. The QEMU AP vCPU thread enters into KVM_RUN again. Same flow as step 4, goto
step 4, again and again.

The lock has been frequently acquired/released. The vCPU 0 is unhappy with it,
especially when the number of APs is large!

I guess it is not an issue after VM reboot (without QEMU instance re-creation
because the mpstate is not KVM_MP_STATE_UNINITIALIZED any longer).


Thank you very much!

Dongli Zhang

> 
>> 2858 int kvm_cpu_exec(CPUState *cpu)
>> 2859 {
>> 2860     struct kvm_run *run = cpu->kvm_run;
>> 2861     int ret, run_ret;
>> ... ...
>> 3023         default:
>> 3024             DPRINTF("kvm_arch_handle_exit\n");
>> 3025             ret = kvm_arch_handle_exit(cpu, run);
>> 3026             break;
>> 3027         }
>> 3028     } while (ret == 0);
>> 3029
>> 3030     cpu_exec_end(cpu);
>> 3031     qemu_mutex_lock_iothread();
>> 3032
>> 3033     if (ret < 0) {
>> 3034         cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
>> 3035         vm_stop(RUN_STATE_INTERNAL_ERROR);
>> 3036     }
>> 3037
>> 3038     qatomic_set(&cpu->exit_request, 0);
>> 3039     return ret;
>> 3040 }
Re: [PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
Posted by Sean Christopherson 1 year, 10 months ago
On Wed, 03 Jan 2024 13:23:43 +0530, Prasad Pandit wrote:
> kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
> request for a vcpu even when its 'events->nmi.pending' is zero.
> Ex:
>     qemu_thread_start
>      kvm_vcpu_thread_fn
>       qemu_wait_io_event
>        qemu_wait_io_event_common
>         process_queued_cpu_work
>          do_kvm_cpu_synchronize_post_init/_reset
>           kvm_arch_put_registers
>            kvm_put_vcpu_events (cpu, level=[2|3])
> 
> [...]

Applied to kvm-x86 fixes, thanks!

[1/1] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
      https://github.com/kvm-x86/linux/commit/6231c9e1a9f3

--
https://github.com/kvm-x86/linux/tree/next
Re: [PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
Posted by Prasad Pandit 1 year, 11 months ago
On Wed, 3 Jan 2024 at 13:24, Prasad Pandit <ppandit@redhat.com> wrote:
> kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
> request for a vcpu even when its 'events->nmi.pending' is zero.
> Ex:
>     qemu_thread_start
>      kvm_vcpu_thread_fn
>       qemu_wait_io_event
>        qemu_wait_io_event_common
>         process_queued_cpu_work
>          do_kvm_cpu_synchronize_post_init/_reset
>           kvm_arch_put_registers
>            kvm_put_vcpu_events (cpu, level=[2|3])
>
> This leads vCPU threads in QEMU to constantly acquire & release the
> global mutex lock, delaying the guest boot due to lock contention.
> Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.
>
> Fixes: bdedff263132 ("KVM: x86: Route pending NMIs from userspace through process_nmi()")
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
> ---
>  arch/x86/kvm/x86.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1a3aaa7dafae..468870450b8b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5405,7 +5405,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>         if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>                 vcpu->arch.nmi_pending = 0;
>                 atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> -               kvm_make_request(KVM_REQ_NMI, vcpu);
> +               if (events->nmi.pending)
> +                       kvm_make_request(KVM_REQ_NMI, vcpu);
>         }
>         static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> --
> 2.43.0

Ping...!
---
  - Prasad
Re: [PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
Posted by Sean Christopherson 1 year, 11 months ago
On Mon, Jan 08, 2024, Prasad Pandit wrote:
> On Wed, 3 Jan 2024 at 13:24, Prasad Pandit <ppandit@redhat.com> wrote:
> > kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
> > request for a vcpu even when its 'events->nmi.pending' is zero.
> > Ex:
> >     qemu_thread_start
> >      kvm_vcpu_thread_fn
> >       qemu_wait_io_event
> >        qemu_wait_io_event_common
> >         process_queued_cpu_work
> >          do_kvm_cpu_synchronize_post_init/_reset
> >           kvm_arch_put_registers
> >            kvm_put_vcpu_events (cpu, level=[2|3])
> >
> > This leads vCPU threads in QEMU to constantly acquire & release the
> > global mutex lock, delaying the guest boot due to lock contention.
> > Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.
> >
> > Fixes: bdedff263132 ("KVM: x86: Route pending NMIs from userspace through process_nmi()")
> > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
> > ---
> >  arch/x86/kvm/x86.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 1a3aaa7dafae..468870450b8b 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5405,7 +5405,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> >         if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> >                 vcpu->arch.nmi_pending = 0;
> >                 atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> > -               kvm_make_request(KVM_REQ_NMI, vcpu);
> > +               if (events->nmi.pending)
> > +                       kvm_make_request(KVM_REQ_NMI, vcpu);
> >         }
> >         static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> > --
> > 2.43.0
> 
> Ping...!

This is on my list of things to grab for 6.8, I'm just waiting for various pull
requests to fully land in order to simplify my branch management.
Re: [PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
Posted by Prasad Pandit 1 year, 11 months ago
On Mon, 8 Jan 2024 at 23:08, Sean Christopherson <seanjc@google.com> wrote:
> This is on my list of things to grab for 6.8, I'm just waiting for various pull
> requests to fully land in order to simplify my branch management.

* Okay, cool.

Thank you.
---
  - Prasad