arch/loongarch/kernel/traps.c | 1 + arch/loongarch/kvm/vcpu.c | 36 ++++++++++++++++++++++++++++++++++- 2 files changed, 36 insertions(+), 1 deletion(-)
If interrupt arrive when vCPU is running, vCPU will exit because of
interrupt exception. Currently interrupt exception is handled after
local_irq_enable() is called, and it is handled by host kernel rather
than KVM hypervisor. It will introduce extra another interrupt
exception and then host will handle irq.
If KVM hypervisor detect that it is interrupt exception, interrupt
can be handle early in KVM hypervisor before local_irq_enable() is
called.
On 3C5000 dual-way machine, there will be 10% -- 15% performance
improvement with netperf UDP_RR option with 10G ethernet card.
original with patch improvement
netperf UDP_RR 7200 8100 +12%
The total performance is low because irqchip is emulated in qemu VMM,
however from the same testbed, there is performance improvement
actually.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
v1 ... v2:
1. Move guest_timing_exit_irqoff() after host interrupt handling like
other architectures.
2. Construct interrupt context pt_regs from guest entering context
3. Add cond_resched() after irq enabling
---
arch/loongarch/kernel/traps.c | 1 +
arch/loongarch/kvm/vcpu.c | 36 ++++++++++++++++++++++++++++++++++-
2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/kernel/traps.c b/arch/loongarch/kernel/traps.c
index 2ec3106c0da3..eed0d8b02ee3 100644
--- a/arch/loongarch/kernel/traps.c
+++ b/arch/loongarch/kernel/traps.c
@@ -1114,6 +1114,7 @@ asmlinkage void noinstr do_vint(struct pt_regs *regs, unsigned long sp)
irqentry_exit(regs, state);
}
+EXPORT_SYMBOL(do_vint);
unsigned long eentry;
unsigned long tlbrentry;
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 9e1a9b4aa4c6..bab7a71eb965 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -5,6 +5,7 @@
#include <linux/kvm_host.h>
#include <linux/entry-kvm.h>
+#include <asm/exception.h>
#include <asm/fpu.h>
#include <asm/lbt.h>
#include <asm/loongarch.h>
@@ -304,6 +305,23 @@ static int kvm_pre_enter_guest(struct kvm_vcpu *vcpu)
return ret;
}
+static void kvm_handle_irq(struct kvm_vcpu *vcpu)
+{
+ struct pt_regs regs, *old;
+
+ /*
+ * Construct pseudo pt_regs, only necessary registers is added
+ * Interrupt context coming from guest enter context
+ */
+ old = (struct pt_regs *)(vcpu->arch.host_sp - sizeof(struct pt_regs));
+ /* Disable preemption in irq exit function irqentry_exit() */
+ regs.csr_prmd = 0;
+ regs.regs[LOONGARCH_GPR_SP] = vcpu->arch.host_sp;
+ regs.regs[LOONGARCH_GPR_FP] = old->regs[LOONGARCH_GPR_FP];
+ regs.csr_era = old->regs[LOONGARCH_GPR_RA];
+ do_vint(®s, (unsigned long)®s);
+}
+
/*
* Return 1 for resume guest and "<= 0" for resume host.
*/
@@ -321,8 +339,23 @@ static int kvm_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu)
kvm_lose_pmu(vcpu);
- guest_timing_exit_irqoff();
guest_state_exit_irqoff();
+
+ /*
+ * VM exit because of host interrupts
+ * Handle irq directly before enabling irq
+ */
+ if (!ecode && intr)
+ kvm_handle_irq(vcpu);
+
+ /*
+ * Wait until after servicing IRQs to account guest time so that any
+ * ticks that occurred while running the guest are properly accounted
+ * to the guest. Waiting until IRQs are enabled degrades the accuracy
+ * of accounting via context tracking, but the loss of accuracy is
+ * acceptable for all known use cases.
+ */
+ guest_timing_exit_irqoff();
local_irq_enable();
trace_kvm_exit(vcpu, ecode);
@@ -331,6 +364,7 @@ static int kvm_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu)
} else {
WARN(!intr, "vm exiting with suspicious irq\n");
++vcpu->stat.int_exits;
+ cond_resched();
}
if (ret == RESUME_GUEST)
base-commit: 80e54e84911a923c40d7bee33a34c1b4be148d7a
--
2.39.3
Hi Paolo, Sean
This idea comes from x86, do you have any guidance or suggestion about it?
Also I notice that there is such irq_enable()/irq_disable() pair on x86,
I do not know why it is so.
local_irq_enable();
++vcpu->stat.exits;
local_irq_disable();
guest_timing_exit_irqoff();
local_irq_enable();
Regards
Bibo Mao
On 2025/3/11 下午3:47, Bibo Mao wrote:
> If interrupt arrive when vCPU is running, vCPU will exit because of
> interrupt exception. Currently interrupt exception is handled after
> local_irq_enable() is called, and it is handled by host kernel rather
> than KVM hypervisor. It will introduce extra another interrupt
> exception and then host will handle irq.
>
> If KVM hypervisor detect that it is interrupt exception, interrupt
> can be handle early in KVM hypervisor before local_irq_enable() is
> called.
>
> On 3C5000 dual-way machine, there will be 10% -- 15% performance
> improvement with netperf UDP_RR option with 10G ethernet card.
> original with patch improvement
> netperf UDP_RR 7200 8100 +12%
>
> The total performance is low because irqchip is emulated in qemu VMM,
> however from the same testbed, there is performance improvement
> actually.
>
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
> v1 ... v2:
> 1. Move guest_timing_exit_irqoff() after host interrupt handling like
> other architectures.
> 2. Construct interrupt context pt_regs from guest entering context
> 3. Add cond_resched() after irq enabling
> ---
> arch/loongarch/kernel/traps.c | 1 +
> arch/loongarch/kvm/vcpu.c | 36 ++++++++++++++++++++++++++++++++++-
> 2 files changed, 36 insertions(+), 1 deletion(-)
>
> diff --git a/arch/loongarch/kernel/traps.c b/arch/loongarch/kernel/traps.c
> index 2ec3106c0da3..eed0d8b02ee3 100644
> --- a/arch/loongarch/kernel/traps.c
> +++ b/arch/loongarch/kernel/traps.c
> @@ -1114,6 +1114,7 @@ asmlinkage void noinstr do_vint(struct pt_regs *regs, unsigned long sp)
>
> irqentry_exit(regs, state);
> }
> +EXPORT_SYMBOL(do_vint);
>
> unsigned long eentry;
> unsigned long tlbrentry;
> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
> index 9e1a9b4aa4c6..bab7a71eb965 100644
> --- a/arch/loongarch/kvm/vcpu.c
> +++ b/arch/loongarch/kvm/vcpu.c
> @@ -5,6 +5,7 @@
>
> #include <linux/kvm_host.h>
> #include <linux/entry-kvm.h>
> +#include <asm/exception.h>
> #include <asm/fpu.h>
> #include <asm/lbt.h>
> #include <asm/loongarch.h>
> @@ -304,6 +305,23 @@ static int kvm_pre_enter_guest(struct kvm_vcpu *vcpu)
> return ret;
> }
>
> +static void kvm_handle_irq(struct kvm_vcpu *vcpu)
> +{
> + struct pt_regs regs, *old;
> +
> + /*
> + * Construct pseudo pt_regs, only necessary registers is added
> + * Interrupt context coming from guest enter context
> + */
> + old = (struct pt_regs *)(vcpu->arch.host_sp - sizeof(struct pt_regs));
> + /* Disable preemption in irq exit function irqentry_exit() */
> + regs.csr_prmd = 0;
> + regs.regs[LOONGARCH_GPR_SP] = vcpu->arch.host_sp;
> + regs.regs[LOONGARCH_GPR_FP] = old->regs[LOONGARCH_GPR_FP];
> + regs.csr_era = old->regs[LOONGARCH_GPR_RA];
> + do_vint(®s, (unsigned long)®s);
> +}
> +
> /*
> * Return 1 for resume guest and "<= 0" for resume host.
> */
> @@ -321,8 +339,23 @@ static int kvm_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu)
>
> kvm_lose_pmu(vcpu);
>
> - guest_timing_exit_irqoff();
> guest_state_exit_irqoff();
> +
> + /*
> + * VM exit because of host interrupts
> + * Handle irq directly before enabling irq
> + */
> + if (!ecode && intr)
> + kvm_handle_irq(vcpu);
> +
> + /*
> + * Wait until after servicing IRQs to account guest time so that any
> + * ticks that occurred while running the guest are properly accounted
> + * to the guest. Waiting until IRQs are enabled degrades the accuracy
> + * of accounting via context tracking, but the loss of accuracy is
> + * acceptable for all known use cases.
> + */
> + guest_timing_exit_irqoff();
> local_irq_enable();
>
> trace_kvm_exit(vcpu, ecode);
> @@ -331,6 +364,7 @@ static int kvm_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu)
> } else {
> WARN(!intr, "vm exiting with suspicious irq\n");
> ++vcpu->stat.int_exits;
> + cond_resched();
> }
>
> if (ret == RESUME_GUEST)
>
> base-commit: 80e54e84911a923c40d7bee33a34c1b4be148d7a
>
On Tue, Mar 25, 2025, bibo mao wrote: > Hi Paolo, Sean > > This idea comes from x86, do you have any guidance or suggestion about it? > > Also I notice that there is such irq_enable()/irq_disable() pair on x86, I > do not know why it is so. Because on AMD (SVM), IRQ VM-Exits don't consume the IRQ, i.e. the exit is purely a notification. KVM still needs to enable IRQs to actually handle the pending IRQ. And if the IRQ that triggered VM-Exit is for the host's tick, then it's desirable to handle the tick IRQ before guest_timing_exit_irqoff() so that the timeslice is accounted to the guest, not the host (the tick IRQ arrived while the guest was active). On Intel (VMX), KVM always runs in a mode where the VM-Exit acknowledge/consumes the IRQ, and so KVM _must_ manually call into the appropriate interrupt handler. > local_irq_enable(); > ++vcpu->stat.exits; > local_irq_disable(); > guest_timing_exit_irqoff(); > local_irq_enable(); > > Regards > Bibo Mao > > On 2025/3/11 下午3:47, Bibo Mao wrote: > > If interrupt arrive when vCPU is running, vCPU will exit because of > > interrupt exception. Currently interrupt exception is handled after > > local_irq_enable() is called, and it is handled by host kernel rather > > than KVM hypervisor. It will introduce extra another interrupt > > exception and then host will handle irq. > > > > If KVM hypervisor detect that it is interrupt exception, interrupt > > can be handle early in KVM hypervisor before local_irq_enable() is > > called. The correctness of this depends on how LoongArch virtualization processes IRQs. If the IRQ is consumed by the VM-Exit, then manually handling the IRQ early is both optimal and necessary for correctness. If the IRQ is NOT consumed by the VM-Exit, then manually calling the interrupt handler from KVM will result in every IRQ effectively happening twice: once on the manual call, and against when KVM enables IRQs and the "real" IRQ fires.
On 2025/3/26 下午11:09, Sean Christopherson wrote: > On Tue, Mar 25, 2025, bibo mao wrote: >> Hi Paolo, Sean >> >> This idea comes from x86, do you have any guidance or suggestion about it? >> >> Also I notice that there is such irq_enable()/irq_disable() pair on x86, I >> do not know why it is so. > > Because on AMD (SVM), IRQ VM-Exits don't consume the IRQ, i.e. the exit is purely > a notification. KVM still needs to enable IRQs to actually handle the pending IRQ. Good design. Previously only on some realtime HW platforms HW interrupt can be configured with high priority event. So with this, IRQ will trigger VM-Exits however no IRQ context since it is treated async event. > And if the IRQ that triggered VM-Exit is for the host's tick, then it's desirable > to handle the tick IRQ before guest_timing_exit_irqoff() so that the timeslice is > accounted to the guest, not the host (the tick IRQ arrived while the guest was > active). > > On Intel (VMX), KVM always runs in a mode where the VM-Exit acknowledge/consumes > the IRQ, and so KVM _must_ manually call into the appropriate interrupt handler. > >> local_irq_enable(); >> ++vcpu->stat.exits; >> local_irq_disable(); >> guest_timing_exit_irqoff(); >> local_irq_enable(); >> >> Regards >> Bibo Mao >> >> On 2025/3/11 下午3:47, Bibo Mao wrote: >>> If interrupt arrive when vCPU is running, vCPU will exit because of >>> interrupt exception. Currently interrupt exception is handled after >>> local_irq_enable() is called, and it is handled by host kernel rather >>> than KVM hypervisor. It will introduce extra another interrupt >>> exception and then host will handle irq. >>> >>> If KVM hypervisor detect that it is interrupt exception, interrupt >>> can be handle early in KVM hypervisor before local_irq_enable() is >>> called. > > The correctness of this depends on how LoongArch virtualization processes IRQs. > If the IRQ is consumed by the VM-Exit, then manually handling the IRQ early is > both optimal and necessary for correctness. If the IRQ is NOT consumed by the LoongArch KVM is similiar with Intel vmx, host intterrupt causes VM-Exit, and also there will be extra interrupt exception if local_irq_enable() is called in VM-Exit path. > VM-Exit, then manually calling the interrupt handler from KVM will result in every > IRQ effectively happening twice: once on the manual call, and against when KVM By test on LoongArch platform, manual call about IRQ handler at early stage will lower interrupt level and ack IRQ. IRQ will not trigger again. So I think it is consumed by the VM-Exit. And thanks for your guidance. Regards Bibo Mao > enables IRQs and the "real" IRQ fires. >
© 2016 - 2025 Red Hat, Inc.