target/i386/kvm/kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Recently, we meet a SMI race bug triggered by one monitor tool in our
production environment. This monitor executes 'info registers -a' hmp
at a fixed frequency, even during VM startup process, which makes
some AP stay in KVM_MP_STATE_UNINITIALIZED forever, thus VM hangs.
The complete calling processes for the SMI race are as follows:
//thread1 //thread2 //thread3
`info registers -a` hmp [1] AP(vcpu1) thread [2] BSP(vcpu0) send INIT/SIPI [3]
[2]
KVM: KVM_RUN and then
schedule() in kvm_vcpu_block() loop
[1]
for each cpu: cpu_synchronize_state
if !qemu_thread_is_self()
1. insert to cpu->work_list, and handle asynchronously
2. then kick the AP(vcpu1) by sending SIG_IPI/SIGUSR1 signal
[2]
KVM: checks signal_pending, breaks loop and returns -EINTR
Qemu: break kvm_cpu_exec loop, run
1. qemu_wait_io_event()
=> process_queued_cpu_work => cpu->work_list.func()
e.i. do_kvm_cpu_synchronize_state() callback
=> kvm_arch_get_registers
=> kvm_get_mp_state /* KVM: get_mpstate also calls
kvm_apic_accept_events() to handle INIT and SIPI */
=> cpu->vcpu_dirty = true;
// end of qemu_wait_io_event
[3]
SeaBIOS: BSP enters non-root mode and runs reset_vector() in SeaBIOS.
send INIT and then SIPI by writing APIC_ICR during smp_scan
KVM: BSP(vcpu0) exits, then => handle_apic_write
=> kvm_lapic_reg_write => kvm_apic_send_ipi to all APs
=> for each AP: __apic_accept_irq, e.g. for AP(vcpu1)
==> case APIC_DM_INIT: apic->pending_events = (1UL << KVM_APIC_INIT)
(not kick the AP yet)
==> case APIC_DM_STARTUP: set_bit(KVM_APIC_SIPI, &apic->pending_events)
(not kick the AP yet)
[2]
Qemu continue:
2. kvm_cpu_exec()
=> if (cpu->vcpu_dirty):
=> kvm_arch_put_registers
=> kvm_put_vcpu_events
KVM: kvm_vcpu_ioctl_x86_set_vcpu_events
=> clear_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
e.i. pending_events changes from 11b to 10b
// end of kvm_vcpu_ioctl_x86_set_vcpu_events
Qemu: => after put_registers, cpu->vcpu_dirty = false;
=> kvm_vcpu_ioctl(cpu, KVM_RUN, 0)
KVM: KVM_RUN
=> schedule() in kvm_vcpu_block() until Qemu's next SIG_IPI/SIGUSR1 signal
/* But AP(vcpu1)'s mp_state will never change from KVM_MP_STATE_UNINITIALIZED
to KVM_MP_STATE_INIT_RECEIVED, even then to KVM_MP_STATE_RUNNABLE without
handling INIT inside kvm_apic_accept_events(), considering BSP will never
send INIT/SIPI again during smp_scan. Then AP(vcpu1) will never enter
non-root mode */
[3]
SeaBIOS: waits CountCPUs == expected_cpus_count and loops forever
e.i. the AP(vcpu1) stays: EIP=0000fff0 && CS =f000 ffff0000
and BSP(vcpu0) appears 100% utilized as it is in a while loop.
To fix this, avoid clobbering SMI when not putting "reset" state, just
like NMI abd SIPI does.
Signed-off-by: Fei Li <lifei.shirley@bytedance.com>
---
target/i386/kvm/kvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 369626f8c8..598661799a 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5056,7 +5056,7 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
events.sipi_vector = env->sipi_vector;
- if (has_msr_smbase) {
+ if (has_msr_smbase && level >= KVM_PUT_RESET_STATE) {
events.flags |= KVM_VCPUEVENT_VALID_SMM;
events.smi.smm = !!(env->hflags & HF_SMM_MASK);
events.smi.smm_inside_nmi = !!(env->hflags2 & HF2_SMM_INSIDE_NMI_MASK);
--
2.39.2 (Apple Git-143)
Dear maintainers, Could you please help to review the patch [PATCH] KVM: x86: Restrict writeback of SMI VCPU state? This fixes a race condition causing VM hang when frequently running `info registers -a` via HMP during VM startup. The issue occurs because unrestricted SMI state writeback conflicts with vCPU initialization sequences. It would be very appreciated for us to know if this patch properly resolve the race condition, and if validated, we would like to apply it to our production environment. Let me know if further details are needed. :) Best regards, and thanks again! Fei On 9/9/25 2:33 PM, Fei Li wrote: > Recently, we meet a SMI race bug triggered by one monitor tool in our > production environment. This monitor executes 'info registers -a' hmp > at a fixed frequency, even during VM startup process, which makes > some AP stay in KVM_MP_STATE_UNINITIALIZED forever, thus VM hangs. > > The complete calling processes for the SMI race are as follows: > > //thread1 //thread2 //thread3 > `info registers -a` hmp [1] AP(vcpu1) thread [2] BSP(vcpu0) send INIT/SIPI [3] > > [2] > KVM: KVM_RUN and then > schedule() in kvm_vcpu_block() loop > > [1] > for each cpu: cpu_synchronize_state > if !qemu_thread_is_self() > 1. insert to cpu->work_list, and handle asynchronously > 2. then kick the AP(vcpu1) by sending SIG_IPI/SIGUSR1 signal > > [2] > KVM: checks signal_pending, breaks loop and returns -EINTR > Qemu: break kvm_cpu_exec loop, run > 1. qemu_wait_io_event() > => process_queued_cpu_work => cpu->work_list.func() > e.i. do_kvm_cpu_synchronize_state() callback > => kvm_arch_get_registers > => kvm_get_mp_state /* KVM: get_mpstate also calls > kvm_apic_accept_events() to handle INIT and SIPI */ > => cpu->vcpu_dirty = true; > // end of qemu_wait_io_event > > [3] > SeaBIOS: BSP enters non-root mode and runs reset_vector() in SeaBIOS. > send INIT and then SIPI by writing APIC_ICR during smp_scan > KVM: BSP(vcpu0) exits, then => handle_apic_write > => kvm_lapic_reg_write => kvm_apic_send_ipi to all APs > => for each AP: __apic_accept_irq, e.g. for AP(vcpu1) > ==> case APIC_DM_INIT: apic->pending_events = (1UL << KVM_APIC_INIT) > (not kick the AP yet) > ==> case APIC_DM_STARTUP: set_bit(KVM_APIC_SIPI, &apic->pending_events) > (not kick the AP yet) > > [2] > Qemu continue: > 2. kvm_cpu_exec() > => if (cpu->vcpu_dirty): > => kvm_arch_put_registers > => kvm_put_vcpu_events > KVM: kvm_vcpu_ioctl_x86_set_vcpu_events > => clear_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events); > e.i. pending_events changes from 11b to 10b > // end of kvm_vcpu_ioctl_x86_set_vcpu_events > Qemu: => after put_registers, cpu->vcpu_dirty = false; > => kvm_vcpu_ioctl(cpu, KVM_RUN, 0) > KVM: KVM_RUN > => schedule() in kvm_vcpu_block() until Qemu's next SIG_IPI/SIGUSR1 signal > /* But AP(vcpu1)'s mp_state will never change from KVM_MP_STATE_UNINITIALIZED > to KVM_MP_STATE_INIT_RECEIVED, even then to KVM_MP_STATE_RUNNABLE without > handling INIT inside kvm_apic_accept_events(), considering BSP will never > send INIT/SIPI again during smp_scan. Then AP(vcpu1) will never enter > non-root mode */ > > [3] > SeaBIOS: waits CountCPUs == expected_cpus_count and loops forever > e.i. the AP(vcpu1) stays: EIP=0000fff0 && CS =f000 ffff0000 > and BSP(vcpu0) appears 100% utilized as it is in a while loop. > > To fix this, avoid clobbering SMI when not putting "reset" state, just > like NMI abd SIPI does. > > Signed-off-by: Fei Li <lifei.shirley@bytedance.com> > --- > target/i386/kvm/kvm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 369626f8c8..598661799a 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -5056,7 +5056,7 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level) > > events.sipi_vector = env->sipi_vector; > > - if (has_msr_smbase) { > + if (has_msr_smbase && level >= KVM_PUT_RESET_STATE) { > events.flags |= KVM_VCPUEVENT_VALID_SMM; > events.smi.smm = !!(env->hflags & HF_SMM_MASK); > events.smi.smm_inside_nmi = !!(env->hflags2 & HF2_SMM_INSIDE_NMI_MASK);
On 9/9/25 2:33 PM, Fei Li wrote: > Recently, we meet a SMI race bug triggered by one monitor tool in our > production environment. This monitor executes 'info registers -a' hmp > at a fixed frequency, even during VM startup process, which makes > some AP stay in KVM_MP_STATE_UNINITIALIZED forever, thus VM hangs. > > The complete calling processes for the SMI race are as follows: > > //thread1 //thread2 //thread3 > `info registers -a` hmp [1] AP(vcpu1) thread [2] BSP(vcpu0) send INIT/SIPI [3] > > [2] > KVM: KVM_RUN and then > schedule() in kvm_vcpu_block() loop > > [1] > for each cpu: cpu_synchronize_state > if !qemu_thread_is_self() > 1. insert to cpu->work_list, and handle asynchronously > 2. then kick the AP(vcpu1) by sending SIG_IPI/SIGUSR1 signal > > [2] > KVM: checks signal_pending, breaks loop and returns -EINTR > Qemu: break kvm_cpu_exec loop, run > 1. qemu_wait_io_event() > => process_queued_cpu_work => cpu->work_list.func() > e.i. do_kvm_cpu_synchronize_state() callback > => kvm_arch_get_registers > => kvm_get_mp_state /* KVM: get_mpstate also calls > kvm_apic_accept_events() to handle INIT and SIPI */ > => cpu->vcpu_dirty = true; > // end of qemu_wait_io_event > > [3] > SeaBIOS: BSP enters non-root mode and runs reset_vector() in SeaBIOS. > send INIT and then SIPI by writing APIC_ICR during smp_scan > KVM: BSP(vcpu0) exits, then => handle_apic_write > => kvm_lapic_reg_write => kvm_apic_send_ipi to all APs > => for each AP: __apic_accept_irq, e.g. for AP(vcpu1) > ==> case APIC_DM_INIT: apic->pending_events = (1UL << KVM_APIC_INIT) > (not kick the AP yet) > ==> case APIC_DM_STARTUP: set_bit(KVM_APIC_SIPI, &apic->pending_events) > (not kick the AP yet) > > [2] > Qemu continue: > 2. kvm_cpu_exec() > => if (cpu->vcpu_dirty): > => kvm_arch_put_registers > => kvm_put_vcpu_events > KVM: kvm_vcpu_ioctl_x86_set_vcpu_events > => clear_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events); > e.i. pending_events changes from 11b to 10b > // end of kvm_vcpu_ioctl_x86_set_vcpu_events > Qemu: => after put_registers, cpu->vcpu_dirty = false; > => kvm_vcpu_ioctl(cpu, KVM_RUN, 0) > KVM: KVM_RUN > => schedule() in kvm_vcpu_block() until Qemu's next SIG_IPI/SIGUSR1 signal > /* But AP(vcpu1)'s mp_state will never change from KVM_MP_STATE_UNINITIALIZED > to KVM_MP_STATE_INIT_RECEIVED, even then to KVM_MP_STATE_RUNNABLE without > handling INIT inside kvm_apic_accept_events(), considering BSP will never > send INIT/SIPI again during smp_scan. Then AP(vcpu1) will never enter > non-root mode */ > > [3] > SeaBIOS: waits CountCPUs == expected_cpus_count and loops forever > e.i. the AP(vcpu1) stays: EIP=0000fff0 && CS =f000 ffff0000 > and BSP(vcpu0) appears 100% utilized as it is in a while loop. > > To fix this, avoid clobbering SMI when not putting "reset" state, just > like NMI abd SIPI does. > > Signed-off-by: Fei Li <lifei.shirley@bytedance.com> > --- > target/i386/kvm/kvm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 369626f8c8..598661799a 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -5056,7 +5056,7 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level) > > events.sipi_vector = env->sipi_vector; > > - if (has_msr_smbase) { > + if (has_msr_smbase && level >= KVM_PUT_RESET_STATE) { > events.flags |= KVM_VCPUEVENT_VALID_SMM; > events.smi.smm = !!(env->hflags & HF_SMM_MASK); > events.smi.smm_inside_nmi = !!(env->hflags2 & HF2_SMM_INSIDE_NMI_MASK); Hi all, The previous context is from one KVM topic: https://lore-kernel.gnuweeb.org/lkml/f904b674-98ba-4e13-a64c-fd30b6ac4a2e@bytedance.com/T/#m80f426b55e272f9b257e2d7f6ff4902b7eb60178 Please help to review this patch, thanks for your time. Have a nice day, thanks Fei
© 2016 - 2025 Red Hat, Inc.