There is generic entry framework to handle signals and check for
reschedule etc before entering the guest. Use that framework for
powerpc.
Advantages:
- Less code duplication.
- powerpc kvm now understands NR and NR_lazy bits.
- powerpc can enable HAVE_POSIX_CPU_TIMERS_TASK_WORK, currently the
powerpc/kvm code doesn't handle TIF_NOTIFY_RESUME.
Testing: No splats seen in below cases.
- Booted KVM guest on PowerVM and PowerNV systems.
- Ran stress-ng CPU stressors in each above case.
- On PowerNV host, tried preempt=lazy/full and run stress-ng CPU stressor
in the KVM guest.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/kvm/book3s_hv.c | 13 +++++++------
arch/powerpc/kvm/powerpc.c | 22 ++++++++--------------
3 files changed, 16 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6722625a4..83807ae44 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -299,6 +299,7 @@ config PPC
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN && EXECMEM
+ select KVM_XFER_TO_GUEST_WORK
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_PAGE_SIZE
select MMU_GATHER_RCU_TABLE_FREE
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 19f4d298d..123539642 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -80,8 +80,8 @@
#include <asm/ultravisor.h>
#include <asm/dtl.h>
#include <asm/plpar_wrappers.h>
-
#include <trace/events/ipi.h>
+#include <linux/entry-kvm.h>
#include "book3s.h"
#include "book3s_hv.h"
@@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
}
if (need_resched())
- cond_resched();
+ schedule();
kvmppc_update_vpas(vcpu);
@@ -5097,10 +5097,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
return -EINVAL;
}
- /* No need to go into the guest when all we'll do is come back out */
- if (signal_pending(current)) {
- run->exit_reason = KVM_EXIT_INTR;
- return -EINTR;
+ /* use generic frameworks to handle signals, need_resched */
+ if (__xfer_to_guest_mode_work_pending()) {
+ r = xfer_to_guest_mode_handle_work(vcpu);
+ if (r)
+ return r;
}
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 153587741..4ff334532 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -34,6 +34,7 @@
#endif
#include <asm/ultravisor.h>
#include <asm/setup.h>
+#include <linux/entry-kvm.h>
#include "timing.h"
#include "../mm/mmu_decl.h"
@@ -80,24 +81,17 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
{
int r;
+ /* use generic framework to handle need resched and signals */
+ if (__xfer_to_guest_mode_work_pending()) {
+ r = xfer_to_guest_mode_handle_work(vcpu);
+ if (r)
+ return r;
+ }
+
WARN_ON(irqs_disabled());
hard_irq_disable();
while (true) {
- if (need_resched()) {
- local_irq_enable();
- cond_resched();
- hard_irq_disable();
- continue;
- }
-
- if (signal_pending(current)) {
- kvmppc_account_exit(vcpu, SIGNAL_EXITS);
- vcpu->run->exit_reason = KVM_EXIT_INTR;
- r = -EINTR;
- break;
- }
-
vcpu->mode = IN_GUEST_MODE;
/*
--
2.48.1
On 2025-04-21 15:58:36 [+0530], Shrikanth Hegde wrote:
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 19f4d298d..123539642 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -80,8 +80,8 @@
> #include <asm/ultravisor.h>
> #include <asm/dtl.h>
> #include <asm/plpar_wrappers.h>
> -
> #include <trace/events/ipi.h>
> +#include <linux/entry-kvm.h>
>
> #include "book3s.h"
> #include "book3s_hv.h"
> @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
> }
>
> if (need_resched())
> - cond_resched();
> + schedule();
This looks unrelated and odd. I don't why but this should be a
cond_resched() so it can be optimized away on PREEMPT kernels.
> kvmppc_update_vpas(vcpu);
>
> @@ -5097,10 +5097,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
> return -EINVAL;
> }
>
> - /* No need to go into the guest when all we'll do is come back out */
> - if (signal_pending(current)) {
> - run->exit_reason = KVM_EXIT_INTR;
> - return -EINTR;
> + /* use generic frameworks to handle signals, need_resched */
> + if (__xfer_to_guest_mode_work_pending()) {
> + r = xfer_to_guest_mode_handle_work(vcpu);
This could be unconditional.
> + if (r)
> + return r;
> }
>
> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 153587741..4ff334532 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -34,6 +34,7 @@
> #endif
> #include <asm/ultravisor.h>
> #include <asm/setup.h>
> +#include <linux/entry-kvm.h>
>
> #include "timing.h"
> #include "../mm/mmu_decl.h"
> @@ -80,24 +81,17 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
> {
> int r;
>
> + /* use generic framework to handle need resched and signals */
> + if (__xfer_to_guest_mode_work_pending()) {
> + r = xfer_to_guest_mode_handle_work(vcpu);
there is nothing special you do checking and handling the work. Couldn't
you invoke xfer_to_guest_mode_handle_work() unconditionally?
> + if (r)
> + return r;
> + }
> +
> WARN_ON(irqs_disabled());
> hard_irq_disable();
>
> while (true) {
> - if (need_resched()) {
> - local_irq_enable();
> - cond_resched();
> - hard_irq_disable();
> - continue;
> - }
> -
> - if (signal_pending(current)) {
> - kvmppc_account_exit(vcpu, SIGNAL_EXITS);
> - vcpu->run->exit_reason = KVM_EXIT_INTR;
> - r = -EINTR;
> - break;
I don't how this works but couldn't SIGNAL_EXITS vanish now that it
isn't updated anymore? The stat itself moves in kvm_handle_signal_exit()
to a different counter so it is not lost. The reader just needs to look
somewhere else for it.
> - }
> -
> vcpu->mode = IN_GUEST_MODE;
>
> /*
Sebastian
On 4/24/25 20:12, Sebastian Andrzej Siewior wrote:
Thanks Sebastian for taking a look.
> On 2025-04-21 15:58:36 [+0530], Shrikanth Hegde wrote:
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 19f4d298d..123539642 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -80,8 +80,8 @@
>> #include <asm/ultravisor.h>
>> #include <asm/dtl.h>
>> #include <asm/plpar_wrappers.h>
>> -
>> #include <trace/events/ipi.h>
>> +#include <linux/entry-kvm.h>
>>
>> #include "book3s.h"
>> #include "book3s_hv.h"
>> @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>> }
>>
>> if (need_resched())
>> - cond_resched();
>> + schedule();
>
> This looks unrelated and odd. I don't why but this should be a
> cond_resched() so it can be optimized away on PREEMPT kernels.
This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.
>
>> kvmppc_update_vpas(vcpu);
>>
>> @@ -5097,10 +5097,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
>> return -EINVAL;
>> }
>>
>> - /* No need to go into the guest when all we'll do is come back out */
>> - if (signal_pending(current)) {
>> - run->exit_reason = KVM_EXIT_INTR;
>> - return -EINTR;
>> + /* use generic frameworks to handle signals, need_resched */
>> + if (__xfer_to_guest_mode_work_pending()) {
>> + r = xfer_to_guest_mode_handle_work(vcpu);
> This could be unconditional.
>
>> + if (r)
>> + return r;
>> }
>>
>> #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index 153587741..4ff334532 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -34,6 +34,7 @@
>> #endif
>> #include <asm/ultravisor.h>
>> #include <asm/setup.h>
>> +#include <linux/entry-kvm.h>
>>
>> #include "timing.h"
>> #include "../mm/mmu_decl.h"
>> @@ -80,24 +81,17 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>> {
>> int r;
>>
>> + /* use generic framework to handle need resched and signals */
>> + if (__xfer_to_guest_mode_work_pending()) {
>> + r = xfer_to_guest_mode_handle_work(vcpu);
>
> there is nothing special you do checking and handling the work. Couldn't
> you invoke xfer_to_guest_mode_handle_work() unconditionally?
>
I followed what was in arch/x86/kvm/x86.c. Since xfer_to_guest_mode_handle_work does the same check
it makes sense to call it without checks too.
Will update in v2.
>> + if (r)
>> + return r;
>> + }
>> +
>> WARN_ON(irqs_disabled());
>> hard_irq_disable();
>>
>> while (true) {
>> - if (need_resched()) {
>> - local_irq_enable();
>> - cond_resched();
>> - hard_irq_disable();
>> - continue;
>> - }
>> -
>> - if (signal_pending(current)) {
>> - kvmppc_account_exit(vcpu, SIGNAL_EXITS);
>> - vcpu->run->exit_reason = KVM_EXIT_INTR;
>> - r = -EINTR;
>> - break;
>
> I don't how this works but couldn't SIGNAL_EXITS vanish now that it
> isn't updated anymore? The stat itself moves in kvm_handle_signal_exit()
> to a different counter so it is not lost. The reader just needs to look
> somewhere else for it.
ok. thanks for pointing out.
AFAIU it is updating the stats mostly. But below could keep the stats happy.
I will update that in v2.
if (__xfer_to_guest_mode_work_pending()) {
r = xfer_to_guest_mode_handle_work(vcpu);
+ /* generic framework doesn't update ppc specific stats*/
+ if (r == -EINTR)
+ kvmppc_account_exit(vcpu, SIGNAL_EXITS);
if (r)
return r;
>
>> - }
>> -
>> vcpu->mode = IN_GUEST_MODE;
>>
>> /*
>
> Sebastian
On 2025-04-24 21:27:59 [+0530], Shrikanth Hegde wrote:
> > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > index 19f4d298d..123539642 100644
> > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
> > > }
> > > if (need_resched())
> > > - cond_resched();
> > > + schedule();
> >
>
>
> > This looks unrelated and odd. I don't why but this should be a
> > cond_resched() so it can be optimized away on PREEMPT kernels.
>
> This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.
But this makes no sense. On preempt=full the cond_resched() gets patched
out while schedule() doesn't. Okay, this explains the stuck.
On preempt=full need_resched() should not return true because the
preemption happens right away. Unless you are in a preempt-disabled
or interrupt disabled section. But any of the conditions can't be true
because in both cases you can't invoke schedule(). So you must have had
a wake up on the local CPU which sets need-resched but the schedule()
was delayed for some reason. Once that need-resched bit is observed by
a remote CPU then it won't send an interrupt for a scheduling request
because it should happen any time soon… This should be fixed.
If you replace the above with preempt_disable(); preempt_enable() then it
should also work…
…
> > > --- a/arch/powerpc/kvm/powerpc.c
> > > +++ b/arch/powerpc/kvm/powerpc.c
> > > @@ -34,6 +34,7 @@
> > > #endif
> > > #include <asm/ultravisor.h>
> > > #include <asm/setup.h>
> > > +#include <linux/entry-kvm.h>
> > > #include "timing.h"
> > > #include "../mm/mmu_decl.h"
> > > @@ -80,24 +81,17 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
> > > {
> > > int r;
> > > + /* use generic framework to handle need resched and signals */
> > > + if (__xfer_to_guest_mode_work_pending()) {
> > > + r = xfer_to_guest_mode_handle_work(vcpu);
> >
> > there is nothing special you do checking and handling the work. Couldn't
> > you invoke xfer_to_guest_mode_handle_work() unconditionally?
> >
>
> I followed what was in arch/x86/kvm/x86.c. Since xfer_to_guest_mode_handle_work does the same check
> it makes sense to call it without checks too.
Yeah but I guess x86 did some other updates, too.
> Will update in v2.
>
…
> > > -
> > > - if (signal_pending(current)) {
> > > - kvmppc_account_exit(vcpu, SIGNAL_EXITS);
> > > - vcpu->run->exit_reason = KVM_EXIT_INTR;
> > > - r = -EINTR;
> > > - break;
> >
> > I don't how this works but couldn't SIGNAL_EXITS vanish now that it
> > isn't updated anymore? The stat itself moves in kvm_handle_signal_exit()
> > to a different counter so it is not lost. The reader just needs to look
> > somewhere else for it.
>
> ok. thanks for pointing out.
>
> AFAIU it is updating the stats mostly. But below could keep the stats happy.
> I will update that in v2.
>
> if (__xfer_to_guest_mode_work_pending()) {
> r = xfer_to_guest_mode_handle_work(vcpu);
> + /* generic framework doesn't update ppc specific stats*/
> + if (r == -EINTR)
> + kvmppc_account_exit(vcpu, SIGNAL_EXITS);
> if (r)
> return r;
Either that or you rip it out entirely but that is not my call.
Sebastian
On 4/25/25 00:08, Sebastian Andrzej Siewior wrote:
> On 2025-04-24 21:27:59 [+0530], Shrikanth Hegde wrote:
>>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>>> index 19f4d298d..123539642 100644
>>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>>> @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>>> }
>>>> if (need_resched())
>>>> - cond_resched();
>>>> + schedule();
>>>
>>
>>
>>> This looks unrelated and odd. I don't why but this should be a
>>> cond_resched() so it can be optimized away on PREEMPT kernels.
>>
>> This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.
>
> But this makes no sense. On preempt=full the cond_resched() gets patched
> out while schedule() doesn't. Okay, this explains the stuck.
cond_resched works. What you said is right about schedule and preemption models.
Initially I had some other code changes and they were causing it get stuck. i retested it.
But looking at the semantics of usage of xfer_to_guest_mode_work
I think using schedule is probably right over here.
Correct me if i got it all wrong.
on x86:
kvm_arch_vcpu_ioctl_run
vcpu_run
for () {
.. run guest..
xfer_to_guest_mode_handle_work
schedule
}
on Powerpc: ( taking book3s_hv flavour):
kvm_arch_vcpu_ioctl_run
kvmppc_vcpu_run_hv *1
do while() {
kvmhv_run_single_vcpu or kvmppc_run_vcpu
-- checking for need_resched and signals and bails out *2
}
*1 - checks for need resched and signals before entering guest
*2 - checks for need resched and signals while running the guest
This patch is addressing only *1 but it needs to address *2 as well using generic framework.
I think it is doable for books3s_hv atleast. (though might need rewrite)
__kvmppc_vcpu_run is a block box to me yet. I think it first makes sense
to move it C and then try use the xfer_to_guest_mode_handle_work.
nick, vaibhav, any idea on __kvmppc_vcpu_run on how is it handling signal pending, and need_resched.
So this is going to need more work specially on *2 and doing that is also key for preempt=lazy/full to work
for kvm on powepc. will try to figure out.
On 2025-04-25 16:49:19 [+0530], Shrikanth Hegde wrote:
> On 4/25/25 00:08, Sebastian Andrzej Siewior wrote:
> > On 2025-04-24 21:27:59 [+0530], Shrikanth Hegde wrote:
> > > > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > > > index 19f4d298d..123539642 100644
> > > > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > > > @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
> > > > > }
> > > > > if (need_resched())
> > > > > - cond_resched();
> > > > > + schedule();
> > > >
> > >
> > >
> > > > This looks unrelated and odd. I don't why but this should be a
> > > > cond_resched() so it can be optimized away on PREEMPT kernels.
> > >
> > > This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.
> >
> > But this makes no sense. On preempt=full the cond_resched() gets patched
> > out while schedule() doesn't. Okay, this explains the stuck.
>
> cond_resched works. What you said is right about schedule and preemption models.
> Initially I had some other code changes and they were causing it get stuck. i retested it.
so it is unrelated then ;)
> But looking at the semantics of usage of xfer_to_guest_mode_work
> I think using schedule is probably right over here.
> Correct me if i got it all wrong.
No, if you do xfer_to_guest_mode_work() then it will invoke schedule()
when appropriate. It just the thing in kvmhv_run_single_vcpu() looks odd
and might have been duct tape or an accident and could probably be
removed.
> on x86:
> kvm_arch_vcpu_ioctl_run
> vcpu_run
> for () {
> .. run guest..
> xfer_to_guest_mode_handle_work
> schedule
> }
>
>
> on Powerpc: ( taking book3s_hv flavour):
> kvm_arch_vcpu_ioctl_run
> kvmppc_vcpu_run_hv *1
> do while() {
> kvmhv_run_single_vcpu or kvmppc_run_vcpu
> -- checking for need_resched and signals and bails out *2
> }
>
>
> *1 - checks for need resched and signals before entering guest
I don't see the need_resched() check here.
> *2 - checks for need resched and signals while running the guest
>
>
> This patch is addressing only *1 but it needs to address *2 as well using generic framework.
> I think it is doable for books3s_hv atleast. (though might need rewrite)
>
> __kvmppc_vcpu_run is a block box to me yet. I think it first makes sense
> to move it C and then try use the xfer_to_guest_mode_handle_work.
> nick, vaibhav, any idea on __kvmppc_vcpu_run on how is it handling signal pending, and need_resched.
>
>
> So this is going to need more work specially on *2 and doing that is also key for preempt=lazy/full to work
> for kvm on powepc. will try to figure out.
Okay.
Sebastian
On 4/25/25 19:01, Sebastian Andrzej Siewior wrote:
> On 2025-04-25 16:49:19 [+0530], Shrikanth Hegde wrote:
>> On 4/25/25 00:08, Sebastian Andrzej Siewior wrote:
>>> On 2025-04-24 21:27:59 [+0530], Shrikanth Hegde wrote:
>>>>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>>>>> index 19f4d298d..123539642 100644
>>>>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>>>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>>>>> @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>>>>> }
>>>>>> if (need_resched())
>>>>>> - cond_resched();
>>>>>> + schedule();
>>>>>
>>>>
>>>>
>>>>> This looks unrelated and odd. I don't why but this should be a
>>>>> cond_resched() so it can be optimized away on PREEMPT kernels.
>>>>
>>>> This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.
>>>
When i look back, i think below makes sense, since this is what even the
generic framework will do and it would work for all preemption models.
Since this happens in a loop, it is necessary to check for reschedule.
if need_resched()
schedule()
Ideal would be call xfer_to_guest_mode_handle_work in the loop, which
would handle the above too.
>>> But this makes no sense. On preempt=full the cond_resched() gets patched
>>> out while schedule() doesn't. Okay, this explains the stuck.
>>
>> cond_resched works. What you said is right about schedule and preemption models.
>> Initially I had some other code changes and they were causing it get stuck. i retested it.
>
> so it is unrelated then ;)
>
>> But looking at the semantics of usage of xfer_to_guest_mode_work
>> I think using schedule is probably right over here.
>> Correct me if i got it all wrong.
>
> No, if you do xfer_to_guest_mode_work() then it will invoke schedule()
> when appropriate. It just the thing in kvmhv_run_single_vcpu() looks odd
> and might have been duct tape or an accident and could probably be
> removed.
>
>> on x86:
>> kvm_arch_vcpu_ioctl_run
>> vcpu_run
>> for () {
>> .. run guest..
>> xfer_to_guest_mode_handle_work
>> schedule
>> }
>>
>>
>> on Powerpc: ( taking book3s_hv flavour):
>> kvm_arch_vcpu_ioctl_run
>> kvmppc_vcpu_run_hv *1
>> do while() {
>> kvmhv_run_single_vcpu or kvmppc_run_vcpu
>> -- checking for need_resched and signals and bails out *2
>> }
>>
>>
>> *1 - checks for need resched and signals before entering guest
> I don't see the need_resched() check here.
>
>> *2 - checks for need resched and signals while running the guest
>>
>>
>> This patch is addressing only *1 but it needs to address *2 as well using generic framework.
>> I think it is doable for books3s_hv atleast. (though might need rewrite)
>>
>> __kvmppc_vcpu_run is a block box to me yet. I think it first makes sense
>> to move it C and then try use the xfer_to_guest_mode_handle_work.
>> nick, vaibhav, any idea on __kvmppc_vcpu_run on how is it handling signal pending, and need_resched.
>>
>>
>> So this is going to need more work specially on *2 and doing that is also key for preempt=lazy/full to work
>> for kvm on powepc. will try to figure out.
>
> Okay.
>
> Sebastian
On 4/25/25 19:01, Sebastian Andrzej Siewior wrote:
> On 2025-04-25 16:49:19 [+0530], Shrikanth Hegde wrote:
>> On 4/25/25 00:08, Sebastian Andrzej Siewior wrote:
>>> On 2025-04-24 21:27:59 [+0530], Shrikanth Hegde wrote:
>>>>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>>>>> index 19f4d298d..123539642 100644
>>>>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>>>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>>>>> @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>>>>> }
>>>>>> if (need_resched())
>>>>>> - cond_resched();
>>>>>> + schedule();
>>>>>
>>>>
>>>>
>>>>> This looks unrelated and odd. I don't why but this should be a
>>>>> cond_resched() so it can be optimized away on PREEMPT kernels.
>>>>
>>>> This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.
>>>
>>> But this makes no sense. On preempt=full the cond_resched() gets patched
>>> out while schedule() doesn't. Okay, this explains the stuck.
>>
>> cond_resched works. What you said is right about schedule and preemption models.
>> Initially I had some other code changes and they were causing it get stuck. i retested it.
>
> so it is unrelated then ;)
>
>> But looking at the semantics of usage of xfer_to_guest_mode_work
>> I think using schedule is probably right over here.
>> Correct me if i got it all wrong.
>
> No, if you do xfer_to_guest_mode_work() then it will invoke schedule()
> when appropriate. It just the thing in kvmhv_run_single_vcpu() looks odd
> and might have been duct tape or an accident and could probably be
> removed.
>
I was wondering xfer_to_guest_mode_work could also call cond_resched
instead of schedule since for preempt=full/lazy is preemptible
as early as possible right?
>> on x86:
>> kvm_arch_vcpu_ioctl_run
>> vcpu_run
>> for () {
>> .. run guest..
>> xfer_to_guest_mode_handle_work
>> schedule
>> }
>>
>>
>> on Powerpc: ( taking book3s_hv flavour):
>> kvm_arch_vcpu_ioctl_run
>> kvmppc_vcpu_run_hv *1
>> do while() {
>> kvmhv_run_single_vcpu or kvmppc_run_vcpu
>> -- checking for need_resched and signals and bails out *2
>> }
>>
>>
>> *1 - checks for need resched and signals before entering guest
> I don't see the need_resched() check here.
>
right, i think *2 is critical since it is in a loop.
*1 is probably an optimization to skip a few cycles.
>> *2 - checks for need resched and signals while running the guest
>>
>>
>> This patch is addressing only *1 but it needs to address *2 as well using generic framework.
>> I think it is doable for books3s_hv atleast. (though might need rewrite)
>>
>> __kvmppc_vcpu_run is a block box to me yet. I think it first makes sense
>> to move it C and then try use the xfer_to_guest_mode_handle_work.
>> nick, vaibhav, any idea on __kvmppc_vcpu_run on how is it handling signal pending, and need_resched.
>>
>>
>> So this is going to need more work specially on *2 and doing that is also key for preempt=lazy/full to work
>> for kvm on powepc. will try to figure out.
>
> Okay.
>
> Sebastian
On 2025-04-25 19:54:56 [+0530], Shrikanth Hegde wrote: > > > But looking at the semantics of usage of xfer_to_guest_mode_work > > > I think using schedule is probably right over here. > > > Correct me if i got it all wrong. > > > > No, if you do xfer_to_guest_mode_work() then it will invoke schedule() > > when appropriate. It just the thing in kvmhv_run_single_vcpu() looks odd > > and might have been duct tape or an accident and could probably be > > removed. > > > > I was wondering xfer_to_guest_mode_work could also call cond_resched > instead of schedule since for preempt=full/lazy is preemptible > as early as possible right? No, I think it is okay. For preempt=full you shouldn't observe the flag in xfer_to_guest_mode_work() so it does not matter. Sebastian
© 2016 - 2026 Red Hat, Inc.