powerpc: kvm: generic framework and run posix timers in task context

[PATCH 1/2] powerpc: kvm: use generic transfer to guest mode work

Posted by Shrikanth Hegde 9 months, 3 weeks ago

There is generic entry framework to handle signals and check for
reschedule etc before entering the guest. Use that framework for
powerpc.

Advantages:
- Less code duplication.
- powerpc kvm now understands NR and NR_lazy bits.
- powerpc can enable HAVE_POSIX_CPU_TIMERS_TASK_WORK, currently the
  powerpc/kvm code doesn't handle TIF_NOTIFY_RESUME.

Testing: No splats seen in below cases.
- Booted KVM guest on PowerVM and PowerNV systems. 
- Ran stress-ng CPU stressors in each above case. 
- On PowerNV host, tried preempt=lazy/full and run stress-ng CPU stressor
  in the KVM guest. 

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
 arch/powerpc/Kconfig         |  1 +
 arch/powerpc/kvm/book3s_hv.c | 13 +++++++------
 arch/powerpc/kvm/powerpc.c   | 22 ++++++++--------------
 3 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6722625a4..83807ae44 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -299,6 +299,7 @@ config PPC
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select KASAN_VMALLOC			if KASAN && EXECMEM
+	select KVM_XFER_TO_GUEST_WORK
 	select LOCK_MM_AND_FIND_VMA
 	select MMU_GATHER_PAGE_SIZE
 	select MMU_GATHER_RCU_TABLE_FREE
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 19f4d298d..123539642 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -80,8 +80,8 @@
 #include <asm/ultravisor.h>
 #include <asm/dtl.h>
 #include <asm/plpar_wrappers.h>
-
 #include <trace/events/ipi.h>
+#include <linux/entry-kvm.h>
 
 #include "book3s.h"
 #include "book3s_hv.h"
@@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 	}
 
 	if (need_resched())
-		cond_resched();
+		schedule();
 
 	kvmppc_update_vpas(vcpu);
 
@@ -5097,10 +5097,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		return -EINVAL;
 	}
 
-	/* No need to go into the guest when all we'll do is come back out */
-	if (signal_pending(current)) {
-		run->exit_reason = KVM_EXIT_INTR;
-		return -EINTR;
+	/* use generic frameworks to handle signals, need_resched  */
+	if (__xfer_to_guest_mode_work_pending()) {
+		r = xfer_to_guest_mode_handle_work(vcpu);
+		if (r)
+			return r;
 	}
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 153587741..4ff334532 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -34,6 +34,7 @@
 #endif
 #include <asm/ultravisor.h>
 #include <asm/setup.h>
+#include <linux/entry-kvm.h>
 
 #include "timing.h"
 #include "../mm/mmu_decl.h"
@@ -80,24 +81,17 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 {
 	int r;
 
+	/* use generic framework to handle need resched and signals */
+	if (__xfer_to_guest_mode_work_pending()) {
+		r = xfer_to_guest_mode_handle_work(vcpu);
+		if (r)
+			return r;
+	}
+
 	WARN_ON(irqs_disabled());
 	hard_irq_disable();
 
 	while (true) {
-		if (need_resched()) {
-			local_irq_enable();
-			cond_resched();
-			hard_irq_disable();
-			continue;
-		}
-
-		if (signal_pending(current)) {
-			kvmppc_account_exit(vcpu, SIGNAL_EXITS);
-			vcpu->run->exit_reason = KVM_EXIT_INTR;
-			r = -EINTR;
-			break;
-		}
-
 		vcpu->mode = IN_GUEST_MODE;
 
 		/*
-- 
2.48.1

Re: [PATCH 1/2] powerpc: kvm: use generic transfer to guest mode work

Posted by Sebastian Andrzej Siewior 9 months, 2 weeks ago

On 2025-04-21 15:58:36 [+0530], Shrikanth Hegde wrote:
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 19f4d298d..123539642 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -80,8 +80,8 @@
>  #include <asm/ultravisor.h>
>  #include <asm/dtl.h>
>  #include <asm/plpar_wrappers.h>
> -
>  #include <trace/events/ipi.h>
> +#include <linux/entry-kvm.h>
>  
>  #include "book3s.h"
>  #include "book3s_hv.h"
> @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>  	}
>  
>  	if (need_resched())
> -		cond_resched();
> +		schedule();

This looks unrelated and odd. I don't why but this should be a
cond_resched() so it can be optimized away on PREEMPT kernels.

>  	kvmppc_update_vpas(vcpu);
>  
> @@ -5097,10 +5097,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
>  		return -EINVAL;
>  	}
>  
> -	/* No need to go into the guest when all we'll do is come back out */
> -	if (signal_pending(current)) {
> -		run->exit_reason = KVM_EXIT_INTR;
> -		return -EINTR;
> +	/* use generic frameworks to handle signals, need_resched  */
> +	if (__xfer_to_guest_mode_work_pending()) {
> +		r = xfer_to_guest_mode_handle_work(vcpu);
This could be unconditional.

> +		if (r)
> +			return r;
>  	}
>  
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 153587741..4ff334532 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -34,6 +34,7 @@
>  #endif
>  #include <asm/ultravisor.h>
>  #include <asm/setup.h>
> +#include <linux/entry-kvm.h>
>  
>  #include "timing.h"
>  #include "../mm/mmu_decl.h"
> @@ -80,24 +81,17 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>  {
>  	int r;
>  
> +	/* use generic framework to handle need resched and signals */
> +	if (__xfer_to_guest_mode_work_pending()) {
> +		r = xfer_to_guest_mode_handle_work(vcpu);

there is nothing special you do checking and handling the work. Couldn't
you invoke xfer_to_guest_mode_handle_work() unconditionally?

> +		if (r)
> +			return r;
> +	}
> +
>  	WARN_ON(irqs_disabled());
>  	hard_irq_disable();
>  
>  	while (true) {
> -		if (need_resched()) {
> -			local_irq_enable();
> -			cond_resched();
> -			hard_irq_disable();
> -			continue;
> -		}
> -
> -		if (signal_pending(current)) {
> -			kvmppc_account_exit(vcpu, SIGNAL_EXITS);
> -			vcpu->run->exit_reason = KVM_EXIT_INTR;
> -			r = -EINTR;
> -			break;

I don't how this works but couldn't SIGNAL_EXITS vanish now that it
isn't updated anymore? The stat itself moves in kvm_handle_signal_exit()
to a different counter so it is not lost. The reader just needs to look
somewhere else for it.

> -		}
> -
>  		vcpu->mode = IN_GUEST_MODE;
>  
>  		/*

Sebastian

Re: [PATCH 1/2] powerpc: kvm: use generic transfer to guest mode work

Posted by Shrikanth Hegde 9 months, 2 weeks ago


On 4/24/25 20:12, Sebastian Andrzej Siewior wrote:

Thanks Sebastian for taking a look.

> On 2025-04-21 15:58:36 [+0530], Shrikanth Hegde wrote:
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 19f4d298d..123539642 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -80,8 +80,8 @@
>>   #include <asm/ultravisor.h>
>>   #include <asm/dtl.h>
>>   #include <asm/plpar_wrappers.h>
>> -
>>   #include <trace/events/ipi.h>
>> +#include <linux/entry-kvm.h>
>>   
>>   #include "book3s.h"
>>   #include "book3s_hv.h"
>> @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>   	}
>>   
>>   	if (need_resched())
>> -		cond_resched();
>> +		schedule();
> 


> This looks unrelated and odd. I don't why but this should be a
> cond_resched() so it can be optimized away on PREEMPT kernels.

This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.

> 
>>   	kvmppc_update_vpas(vcpu);
>>   
>> @@ -5097,10 +5097,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
>>   		return -EINVAL;
>>   	}
>>   
>> -	/* No need to go into the guest when all we'll do is come back out */
>> -	if (signal_pending(current)) {
>> -		run->exit_reason = KVM_EXIT_INTR;
>> -		return -EINTR;
>> +	/* use generic frameworks to handle signals, need_resched  */
>> +	if (__xfer_to_guest_mode_work_pending()) {
>> +		r = xfer_to_guest_mode_handle_work(vcpu);
> This could be unconditional.
> 
>> +		if (r)
>> +			return r;
>>   	}
>>   
>>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index 153587741..4ff334532 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -34,6 +34,7 @@
>>   #endif
>>   #include <asm/ultravisor.h>
>>   #include <asm/setup.h>
>> +#include <linux/entry-kvm.h>
>>   
>>   #include "timing.h"
>>   #include "../mm/mmu_decl.h"
>> @@ -80,24 +81,17 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>>   {
>>   	int r;
>>   
>> +	/* use generic framework to handle need resched and signals */
>> +	if (__xfer_to_guest_mode_work_pending()) {
>> +		r = xfer_to_guest_mode_handle_work(vcpu);
> 
> there is nothing special you do checking and handling the work. Couldn't
> you invoke xfer_to_guest_mode_handle_work() unconditionally?
> 

I followed what was in arch/x86/kvm/x86.c. Since xfer_to_guest_mode_handle_work does the same check
it makes sense to call it without checks too.

Will update in v2.

>> +		if (r)
>> +			return r;
>> +	}
>> +
>>   	WARN_ON(irqs_disabled());
>>   	hard_irq_disable();
>>   
>>   	while (true) {
>> -		if (need_resched()) {
>> -			local_irq_enable();
>> -			cond_resched();
>> -			hard_irq_disable();
>> -			continue;
>> -		}
>> -
>> -		if (signal_pending(current)) {
>> -			kvmppc_account_exit(vcpu, SIGNAL_EXITS);
>> -			vcpu->run->exit_reason = KVM_EXIT_INTR;
>> -			r = -EINTR;
>> -			break;
> 
> I don't how this works but couldn't SIGNAL_EXITS vanish now that it
> isn't updated anymore? The stat itself moves in kvm_handle_signal_exit()
> to a different counter so it is not lost. The reader just needs to look
> somewhere else for it.

ok. thanks for pointing out.

AFAIU it is updating the stats mostly. But below could keep the stats happy.
I will update that in v2.

         if (__xfer_to_guest_mode_work_pending()) {
                 r = xfer_to_guest_mode_handle_work(vcpu);
+               /* generic framework doesn't update ppc specific stats*/
+               if (r == -EINTR)
+                       kvmppc_account_exit(vcpu, SIGNAL_EXITS);
                 if (r)
                         return r;

> 
>> -		}
>> -
>>   		vcpu->mode = IN_GUEST_MODE;
>>   
>>   		/*
> 
> Sebastian

Re: [PATCH 1/2] powerpc: kvm: use generic transfer to guest mode work

Posted by Sebastian Andrzej Siewior 9 months, 2 weeks ago

On 2025-04-24 21:27:59 [+0530], Shrikanth Hegde wrote:
> > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > index 19f4d298d..123539642 100644
> > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
> > >   	}
> > >   	if (need_resched())
> > > -		cond_resched();
> > > +		schedule();
> > 
> 
> 
> > This looks unrelated and odd. I don't why but this should be a
> > cond_resched() so it can be optimized away on PREEMPT kernels.
> 
> This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.

But this makes no sense. On preempt=full the cond_resched() gets patched
out while schedule() doesn't. Okay, this explains the stuck.

On preempt=full need_resched() should not return true because the
preemption happens right away. Unless you are in a preempt-disabled
or interrupt disabled section. But any of the conditions can't be true
because in both cases you can't invoke schedule(). So you must have had
a wake up on the local CPU which sets need-resched but the schedule()
was delayed for some reason. Once that need-resched bit is observed by
a remote CPU then it won't send an interrupt for a scheduling request
because it should happen any time soon… This should be fixed.

If you replace the above with preempt_disable(); preempt_enable() then it
should also work…

…
> > > --- a/arch/powerpc/kvm/powerpc.c
> > > +++ b/arch/powerpc/kvm/powerpc.c
> > > @@ -34,6 +34,7 @@
> > >   #endif
> > >   #include <asm/ultravisor.h>
> > >   #include <asm/setup.h>
> > > +#include <linux/entry-kvm.h>
> > >   #include "timing.h"
> > >   #include "../mm/mmu_decl.h"
> > > @@ -80,24 +81,17 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
> > >   {
> > >   	int r;
> > > +	/* use generic framework to handle need resched and signals */
> > > +	if (__xfer_to_guest_mode_work_pending()) {
> > > +		r = xfer_to_guest_mode_handle_work(vcpu);
> > 
> > there is nothing special you do checking and handling the work. Couldn't
> > you invoke xfer_to_guest_mode_handle_work() unconditionally?
> > 
> 
> I followed what was in arch/x86/kvm/x86.c. Since xfer_to_guest_mode_handle_work does the same check
> it makes sense to call it without checks too.

Yeah but I guess x86 did some other updates, too.

> Will update in v2.
> 
…
> > > -
> > > -		if (signal_pending(current)) {
> > > -			kvmppc_account_exit(vcpu, SIGNAL_EXITS);
> > > -			vcpu->run->exit_reason = KVM_EXIT_INTR;
> > > -			r = -EINTR;
> > > -			break;
> > 
> > I don't how this works but couldn't SIGNAL_EXITS vanish now that it
> > isn't updated anymore? The stat itself moves in kvm_handle_signal_exit()
> > to a different counter so it is not lost. The reader just needs to look
> > somewhere else for it.
> 
> ok. thanks for pointing out.
> 
> AFAIU it is updating the stats mostly. But below could keep the stats happy.
> I will update that in v2.
> 
>         if (__xfer_to_guest_mode_work_pending()) {
>                 r = xfer_to_guest_mode_handle_work(vcpu);
> +               /* generic framework doesn't update ppc specific stats*/
> +               if (r == -EINTR)
> +                       kvmppc_account_exit(vcpu, SIGNAL_EXITS);
>                 if (r)
>                         return r;

Either that or you rip it out entirely but that is not my call.

Sebastian

Re: [PATCH 1/2] powerpc: kvm: use generic transfer to guest mode work

Posted by Shrikanth Hegde 9 months, 2 weeks ago

On 4/25/25 00:08, Sebastian Andrzej Siewior wrote:
> On 2025-04-24 21:27:59 [+0530], Shrikanth Hegde wrote:
>>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>>> index 19f4d298d..123539642 100644
>>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>>> @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>>>    	}
>>>>    	if (need_resched())
>>>> -		cond_resched();
>>>> +		schedule();
>>>
>>
>>
>>> This looks unrelated and odd. I don't why but this should be a
>>> cond_resched() so it can be optimized away on PREEMPT kernels.
>>
>> This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.
> 
> But this makes no sense. On preempt=full the cond_resched() gets patched
> out while schedule() doesn't. Okay, this explains the stuck.

cond_resched works. What you said is right about schedule and preemption models.
Initially I had some other code changes and they were causing it get stuck. i retested it.

But looking at the semantics of usage of xfer_to_guest_mode_work
I think using schedule is probably right over here.
Correct me if i got it all wrong.

on x86:
kvm_arch_vcpu_ioctl_run
	vcpu_run
		for () {
			.. run guest..
			xfer_to_guest_mode_handle_work
				schedule
		}

on Powerpc:  ( taking book3s_hv flavour):
kvm_arch_vcpu_ioctl_run
kvmppc_vcpu_run_hv  *1
	do while() {
		kvmhv_run_single_vcpu or kvmppc_run_vcpu
			-- checking for need_resched and signals and bails out *2
	}

*1 - checks for need resched and signals before entering guest
*2 - checks for need resched and signals while running the guest

This patch is addressing only *1 but it needs to address *2 as well using generic framework.
I think it is doable for books3s_hv atleast. (though might need rewrite)

__kvmppc_vcpu_run is a block box to me yet. I think it first makes sense
to move it C and then try use the xfer_to_guest_mode_handle_work.
nick, vaibhav, any idea on __kvmppc_vcpu_run on how is it handling signal pending, and need_resched.

So this is going to need more work specially on *2 and doing that is also key for preempt=lazy/full to work
for kvm on powepc. will try to figure out.

Re: [PATCH 1/2] powerpc: kvm: use generic transfer to guest mode work

Posted by Sebastian Andrzej Siewior 9 months, 2 weeks ago

On 2025-04-25 16:49:19 [+0530], Shrikanth Hegde wrote:
> On 4/25/25 00:08, Sebastian Andrzej Siewior wrote:
> > On 2025-04-24 21:27:59 [+0530], Shrikanth Hegde wrote:
> > > > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > > > index 19f4d298d..123539642 100644
> > > > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > > > @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
> > > > >    	}
> > > > >    	if (need_resched())
> > > > > -		cond_resched();
> > > > > +		schedule();
> > > > 
> > > 
> > > 
> > > > This looks unrelated and odd. I don't why but this should be a
> > > > cond_resched() so it can be optimized away on PREEMPT kernels.
> > > 
> > > This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.
> > 
> > But this makes no sense. On preempt=full the cond_resched() gets patched
> > out while schedule() doesn't. Okay, this explains the stuck.
> 
> cond_resched works. What you said is right about schedule and preemption models.
> Initially I had some other code changes and they were causing it get stuck. i retested it.

so it is unrelated then ;)

> But looking at the semantics of usage of xfer_to_guest_mode_work
> I think using schedule is probably right over here.
> Correct me if i got it all wrong.

No, if you do xfer_to_guest_mode_work() then it will invoke schedule()
when appropriate. It just the thing in kvmhv_run_single_vcpu() looks odd
and might have been duct tape or an accident and could probably be
removed.

> on x86:
> kvm_arch_vcpu_ioctl_run
> 	vcpu_run
> 		for () {
> 			.. run guest..
> 			xfer_to_guest_mode_handle_work
> 				schedule
> 		}
> 
> 
> on Powerpc:  ( taking book3s_hv flavour):
> kvm_arch_vcpu_ioctl_run
> kvmppc_vcpu_run_hv  *1
> 	do while() {
> 		kvmhv_run_single_vcpu or kvmppc_run_vcpu
> 			-- checking for need_resched and signals and bails out *2
> 	}
>
> 
> *1 - checks for need resched and signals before entering guest
I don't see the need_resched() check here.

> *2 - checks for need resched and signals while running the guest
> 
> 
> This patch is addressing only *1 but it needs to address *2 as well using generic framework.
> I think it is doable for books3s_hv atleast. (though might need rewrite)
> 
> __kvmppc_vcpu_run is a block box to me yet. I think it first makes sense
> to move it C and then try use the xfer_to_guest_mode_handle_work.
> nick, vaibhav, any idea on __kvmppc_vcpu_run on how is it handling signal pending, and need_resched.
> 
> 
> So this is going to need more work specially on *2 and doing that is also key for preempt=lazy/full to work
> for kvm on powepc. will try to figure out.

Okay.

Sebastian

Re: [PATCH 1/2] powerpc: kvm: use generic transfer to guest mode work

Posted by Shrikanth Hegde 9 months, 2 weeks ago


On 4/25/25 19:01, Sebastian Andrzej Siewior wrote:
> On 2025-04-25 16:49:19 [+0530], Shrikanth Hegde wrote:
>> On 4/25/25 00:08, Sebastian Andrzej Siewior wrote:
>>> On 2025-04-24 21:27:59 [+0530], Shrikanth Hegde wrote:
>>>>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>>>>> index 19f4d298d..123539642 100644
>>>>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>>>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>>>>> @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>>>>>     	}
>>>>>>     	if (need_resched())
>>>>>> -		cond_resched();
>>>>>> +		schedule();
>>>>>
>>>>
>>>>
>>>>> This looks unrelated and odd. I don't why but this should be a
>>>>> cond_resched() so it can be optimized away on PREEMPT kernels.
>>>>
>>>> This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.
>>>

When i look back, i think below makes sense, since this is what even the 
generic framework will do and it would work for all preemption models.

Since this happens in a loop, it is necessary to check for reschedule.

if need_resched()
	schedule()

Ideal would be call xfer_to_guest_mode_handle_work in the loop, which 
would handle the above too.

>>> But this makes no sense. On preempt=full the cond_resched() gets patched
>>> out while schedule() doesn't. Okay, this explains the stuck.
>>
>> cond_resched works. What you said is right about schedule and preemption models.
>> Initially I had some other code changes and they were causing it get stuck. i retested it.
> 
> so it is unrelated then ;)
> 
>> But looking at the semantics of usage of xfer_to_guest_mode_work
>> I think using schedule is probably right over here.
>> Correct me if i got it all wrong.
> 
> No, if you do xfer_to_guest_mode_work() then it will invoke schedule()
> when appropriate. It just the thing in kvmhv_run_single_vcpu() looks odd
> and might have been duct tape or an accident and could probably be
> removed.
> 
>> on x86:
>> kvm_arch_vcpu_ioctl_run
>> 	vcpu_run
>> 		for () {
>> 			.. run guest..
>> 			xfer_to_guest_mode_handle_work
>> 				schedule
>> 		}
>>
>>
>> on Powerpc:  ( taking book3s_hv flavour):
>> kvm_arch_vcpu_ioctl_run
>> kvmppc_vcpu_run_hv  *1
>> 	do while() {
>> 		kvmhv_run_single_vcpu or kvmppc_run_vcpu
>> 			-- checking for need_resched and signals and bails out *2
>> 	}
>>
>>
>> *1 - checks for need resched and signals before entering guest
> I don't see the need_resched() check here.
> 
>> *2 - checks for need resched and signals while running the guest
>>
>>
>> This patch is addressing only *1 but it needs to address *2 as well using generic framework.
>> I think it is doable for books3s_hv atleast. (though might need rewrite)
>>
>> __kvmppc_vcpu_run is a block box to me yet. I think it first makes sense
>> to move it C and then try use the xfer_to_guest_mode_handle_work.
>> nick, vaibhav, any idea on __kvmppc_vcpu_run on how is it handling signal pending, and need_resched.
>>
>>
>> So this is going to need more work specially on *2 and doing that is also key for preempt=lazy/full to work
>> for kvm on powepc. will try to figure out.
> 
> Okay.
> 
> Sebastian

Re: [PATCH 1/2] powerpc: kvm: use generic transfer to guest mode work

Posted by Shrikanth Hegde 9 months, 2 weeks ago


On 4/25/25 19:01, Sebastian Andrzej Siewior wrote:
> On 2025-04-25 16:49:19 [+0530], Shrikanth Hegde wrote:
>> On 4/25/25 00:08, Sebastian Andrzej Siewior wrote:
>>> On 2025-04-24 21:27:59 [+0530], Shrikanth Hegde wrote:
>>>>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>>>>> index 19f4d298d..123539642 100644
>>>>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>>>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>>>>> @@ -4901,7 +4901,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>>>>>     	}
>>>>>>     	if (need_resched())
>>>>>> -		cond_resched();
>>>>>> +		schedule();
>>>>>
>>>>
>>>>
>>>>> This looks unrelated and odd. I don't why but this should be a
>>>>> cond_resched() so it can be optimized away on PREEMPT kernels.
>>>>
>>>> This is needed, otherwise KVM on powerVM setup gets stuck on preempt=full/lazy.
>>>
>>> But this makes no sense. On preempt=full the cond_resched() gets patched
>>> out while schedule() doesn't. Okay, this explains the stuck.
>>
>> cond_resched works. What you said is right about schedule and preemption models.
>> Initially I had some other code changes and they were causing it get stuck. i retested it.
> 
> so it is unrelated then ;)
> 
>> But looking at the semantics of usage of xfer_to_guest_mode_work
>> I think using schedule is probably right over here.
>> Correct me if i got it all wrong.
> 
> No, if you do xfer_to_guest_mode_work() then it will invoke schedule()
> when appropriate. It just the thing in kvmhv_run_single_vcpu() looks odd
> and might have been duct tape or an accident and could probably be
> removed.
>

I was wondering xfer_to_guest_mode_work could also call cond_resched
instead of schedule since for preempt=full/lazy is preemptible
as early as possible right?

  
>> on x86:
>> kvm_arch_vcpu_ioctl_run
>> 	vcpu_run
>> 		for () {
>> 			.. run guest..
>> 			xfer_to_guest_mode_handle_work
>> 				schedule
>> 		}
>>
>>
>> on Powerpc:  ( taking book3s_hv flavour):
>> kvm_arch_vcpu_ioctl_run
>> kvmppc_vcpu_run_hv  *1
>> 	do while() {
>> 		kvmhv_run_single_vcpu or kvmppc_run_vcpu
>> 			-- checking for need_resched and signals and bails out *2
>> 	}
>>
>>
>> *1 - checks for need resched and signals before entering guest
> I don't see the need_resched() check here.
> 

right, i think *2 is critical since it is in a loop.
*1 is probably an optimization to skip a few cycles.

>> *2 - checks for need resched and signals while running the guest
>>
>>
>> This patch is addressing only *1 but it needs to address *2 as well using generic framework.
>> I think it is doable for books3s_hv atleast. (though might need rewrite)
>>
>> __kvmppc_vcpu_run is a block box to me yet. I think it first makes sense
>> to move it C and then try use the xfer_to_guest_mode_handle_work.
>> nick, vaibhav, any idea on __kvmppc_vcpu_run on how is it handling signal pending, and need_resched.
>>
>>
>> So this is going to need more work specially on *2 and doing that is also key for preempt=lazy/full to work
>> for kvm on powepc. will try to figure out.
> 
> Okay.
>


  
> Sebastian

Re: [PATCH 1/2] powerpc: kvm: use generic transfer to guest mode work

Posted by Sebastian Andrzej Siewior 9 months, 2 weeks ago

On 2025-04-25 19:54:56 [+0530], Shrikanth Hegde wrote:
> > > But looking at the semantics of usage of xfer_to_guest_mode_work
> > > I think using schedule is probably right over here.
> > > Correct me if i got it all wrong.
> > 
> > No, if you do xfer_to_guest_mode_work() then it will invoke schedule()
> > when appropriate. It just the thing in kvmhv_run_single_vcpu() looks odd
> > and might have been duct tape or an accident and could probably be
> > removed.
> > 
> 
> I was wondering xfer_to_guest_mode_work could also call cond_resched
> instead of schedule since for preempt=full/lazy is preemptible
> as early as possible right?

No, I think it is okay. For preempt=full you shouldn't observe the 
flag in xfer_to_guest_mode_work() so it does not matter.

Sebastian