arch/x86/kvm/svm/sev.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
An AP destroy request for a target vCPU is typically followed by an
RMPADJUST to remove the VMSA attribute from the page currently being
used as the VMSA for the target vCPU. This can result in a vCPU that
is about to VMRUN to exit with #VMEXIT_INVALID.
This usually does not happen as APs are typically sitting in HLT when
being destroyed and therefore the vCPU thread is not running at the time.
However, if HLT is allowed inside the VM, then the vCPU could be about to
VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
guest to crash. An RMPADJUST against an in-use (already running) VMSA
results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
attribute cannot be changed until the VMRUN for target vCPU exits. The
Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
HLT inside the guest.
Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
request before returning to the initiating vCPU.
Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
arch/x86/kvm/svm/sev.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0d898d6b697f..a040f29bb07b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4071,6 +4071,16 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
if (kick) {
kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
kvm_vcpu_kick(target_vcpu);
+
+ if (request == SVM_VMGEXIT_AP_DESTROY) {
+ /*
+ * A destroy is likely to be followed by an RMPADJUST
+ * that will remove the VMSA flag, so be sure the vCPU
+ * got the request in case it is on the way to a VMRUN.
+ */
+ while (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu))
+ cond_resched();
+ }
}
mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
base-commit: f8d892c137f7448d7b49f5e3ad7aa7b5a48a64ed
--
2.46.2
On 3/17/25 12:20, Tom Lendacky wrote:
> An AP destroy request for a target vCPU is typically followed by an
> RMPADJUST to remove the VMSA attribute from the page currently being
> used as the VMSA for the target vCPU. This can result in a vCPU that
> is about to VMRUN to exit with #VMEXIT_INVALID.
>
> This usually does not happen as APs are typically sitting in HLT when
> being destroyed and therefore the vCPU thread is not running at the time.
> However, if HLT is allowed inside the VM, then the vCPU could be about to
> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
> guest to crash. An RMPADJUST against an in-use (already running) VMSA
> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
> attribute cannot be changed until the VMRUN for target vCPU exits. The
> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
> HLT inside the guest.
>
> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
> request before returning to the initiating vCPU.
>
> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Sean,
If you're ok with this approach for the fix, this patch may need to be
adjusted given your series around AP creation fixes, unless you want to
put this as an early patch in your series. Let me know what you'd like
to do.
Thanks,
Tom
> ---
> arch/x86/kvm/svm/sev.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0d898d6b697f..a040f29bb07b 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4071,6 +4071,16 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
> if (kick) {
> kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
> kvm_vcpu_kick(target_vcpu);
> +
> + if (request == SVM_VMGEXIT_AP_DESTROY) {
> + /*
> + * A destroy is likely to be followed by an RMPADJUST
> + * that will remove the VMSA flag, so be sure the vCPU
> + * got the request in case it is on the way to a VMRUN.
> + */
> + while (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu))
> + cond_resched();
> + }
> }
>
> mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
>
> base-commit: f8d892c137f7448d7b49f5e3ad7aa7b5a48a64ed
On Mon, Mar 17, 2025, Tom Lendacky wrote:
> On 3/17/25 12:20, Tom Lendacky wrote:
> > An AP destroy request for a target vCPU is typically followed by an
> > RMPADJUST to remove the VMSA attribute from the page currently being
> > used as the VMSA for the target vCPU. This can result in a vCPU that
> > is about to VMRUN to exit with #VMEXIT_INVALID.
> >
> > This usually does not happen as APs are typically sitting in HLT when
> > being destroyed and therefore the vCPU thread is not running at the time.
> > However, if HLT is allowed inside the VM, then the vCPU could be about to
> > VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
> > a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
> > guest to crash. An RMPADJUST against an in-use (already running) VMSA
> > results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
> > attribute cannot be changed until the VMRUN for target vCPU exits. The
> > Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
> > HLT inside the guest.
> >
> > Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
> > request before returning to the initiating vCPU.
> >
> > Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
> > Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>
> Sean,
>
> If you're ok with this approach for the fix, this patch may need to be
> adjusted given your series around AP creation fixes, unless you want to
> put this as an early patch in your series. Let me know what you'd like
> to do.
This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get
far enough along to consume the request.
Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
to be annotated with KVM_REQUEST_WAIT.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 04e6c5604bc3..67abfe97c600 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -124,7 +124,8 @@
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
+ KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
On 3/17/25 12:28, Sean Christopherson wrote:
> On Mon, Mar 17, 2025, Tom Lendacky wrote:
>> On 3/17/25 12:20, Tom Lendacky wrote:
>>> An AP destroy request for a target vCPU is typically followed by an
>>> RMPADJUST to remove the VMSA attribute from the page currently being
>>> used as the VMSA for the target vCPU. This can result in a vCPU that
>>> is about to VMRUN to exit with #VMEXIT_INVALID.
>>>
>>> This usually does not happen as APs are typically sitting in HLT when
>>> being destroyed and therefore the vCPU thread is not running at the time.
>>> However, if HLT is allowed inside the VM, then the vCPU could be about to
>>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
>>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
>>> guest to crash. An RMPADJUST against an in-use (already running) VMSA
>>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
>>> attribute cannot be changed until the VMRUN for target vCPU exits. The
>>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
>>> HLT inside the guest.
>>>
>>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
>>> request before returning to the initiating vCPU.
>>>
>>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Sean,
>>
>> If you're ok with this approach for the fix, this patch may need to be
>> adjusted given your series around AP creation fixes, unless you want to
>> put this as an early patch in your series. Let me know what you'd like
>> to do.
>
> This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get
> far enough along to consume the request.
>
> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
> to be annotated with KVM_REQUEST_WAIT.
Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
This is much simpler. Let me test it out and resend if everything goes ok.
Thanks,
Tom
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 04e6c5604bc3..67abfe97c600 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -124,7 +124,8 @@
> KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> #define KVM_REQ_HV_TLB_FLUSH \
> KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)
> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
> + KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
>
> #define CR0_RESERVED_BITS \
> (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
>
>
On 3/17/25 12:36, Tom Lendacky wrote:
> On 3/17/25 12:28, Sean Christopherson wrote:
>> On Mon, Mar 17, 2025, Tom Lendacky wrote:
>>> On 3/17/25 12:20, Tom Lendacky wrote:
>>>> An AP destroy request for a target vCPU is typically followed by an
>>>> RMPADJUST to remove the VMSA attribute from the page currently being
>>>> used as the VMSA for the target vCPU. This can result in a vCPU that
>>>> is about to VMRUN to exit with #VMEXIT_INVALID.
>>>>
>>>> This usually does not happen as APs are typically sitting in HLT when
>>>> being destroyed and therefore the vCPU thread is not running at the time.
>>>> However, if HLT is allowed inside the VM, then the vCPU could be about to
>>>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
>>>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
>>>> guest to crash. An RMPADJUST against an in-use (already running) VMSA
>>>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
>>>> attribute cannot be changed until the VMRUN for target vCPU exits. The
>>>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
>>>> HLT inside the guest.
>>>>
>>>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
>>>> request before returning to the initiating vCPU.
>>>>
>>>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
>>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>>
>>> Sean,
>>>
>>> If you're ok with this approach for the fix, this patch may need to be
>>> adjusted given your series around AP creation fixes, unless you want to
>>> put this as an early patch in your series. Let me know what you'd like
>>> to do.
>>
>> This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get
>> far enough along to consume the request.
>>
>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
>> to be annotated with KVM_REQUEST_WAIT.
>
> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
> This is much simpler. Let me test it out and resend if everything goes ok.
So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
me try to track down what is happening with this approach...
Thanks,
Tom
>
> Thanks,
> Tom
>
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 04e6c5604bc3..67abfe97c600 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -124,7 +124,8 @@
>> KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>> #define KVM_REQ_HV_TLB_FLUSH \
>> KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>> -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)
>> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
>> + KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
>>
>> #define CR0_RESERVED_BITS \
>> (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
>>
>>
On 3/18/25 07:43, Tom Lendacky wrote:
> On 3/17/25 12:36, Tom Lendacky wrote:
>> On 3/17/25 12:28, Sean Christopherson wrote:
>>> On Mon, Mar 17, 2025, Tom Lendacky wrote:
>>>> On 3/17/25 12:20, Tom Lendacky wrote:
>>>>> An AP destroy request for a target vCPU is typically followed by an
>>>>> RMPADJUST to remove the VMSA attribute from the page currently being
>>>>> used as the VMSA for the target vCPU. This can result in a vCPU that
>>>>> is about to VMRUN to exit with #VMEXIT_INVALID.
>>>>>
>>>>> This usually does not happen as APs are typically sitting in HLT when
>>>>> being destroyed and therefore the vCPU thread is not running at the time.
>>>>> However, if HLT is allowed inside the VM, then the vCPU could be about to
>>>>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
>>>>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
>>>>> guest to crash. An RMPADJUST against an in-use (already running) VMSA
>>>>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
>>>>> attribute cannot be changed until the VMRUN for target vCPU exits. The
>>>>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
>>>>> HLT inside the guest.
>>>>>
>>>>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
>>>>> request before returning to the initiating vCPU.
>>>>>
>>>>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
>>>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>>>
>>>> Sean,
>>>>
>>>> If you're ok with this approach for the fix, this patch may need to be
>>>> adjusted given your series around AP creation fixes, unless you want to
>>>> put this as an early patch in your series. Let me know what you'd like
>>>> to do.
>>>
>>> This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get
>>> far enough along to consume the request.
>>>
>>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
>>> to be annotated with KVM_REQUEST_WAIT.
>>
>> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
>> This is much simpler. Let me test it out and resend if everything goes ok.
>
> So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
> me try to track down what is happening with this approach...
Looks like I need to use kvm_make_vcpus_request_mask() instead of just a
plain kvm_make_request() followed by a kvm_vcpu_kick().
Let me try that and see how this works.
Thanks,
Tom
>
> Thanks,
> Tom
>
>>
>> Thanks,
>> Tom
>>
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 04e6c5604bc3..67abfe97c600 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -124,7 +124,8 @@
>>> KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>> #define KVM_REQ_HV_TLB_FLUSH \
>>> KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>> -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)
>>> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
>>> + KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
>>>
>>> #define CR0_RESERVED_BITS \
>>> (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
>>>
>>>
On 3/18/25 08:47, Tom Lendacky wrote:
> On 3/18/25 07:43, Tom Lendacky wrote:
>> On 3/17/25 12:36, Tom Lendacky wrote:
>>> On 3/17/25 12:28, Sean Christopherson wrote:
>>>> On Mon, Mar 17, 2025, Tom Lendacky wrote:
>>>>> On 3/17/25 12:20, Tom Lendacky wrote:
>>>>>> An AP destroy request for a target vCPU is typically followed by an
>>>>>> RMPADJUST to remove the VMSA attribute from the page currently being
>>>>>> used as the VMSA for the target vCPU. This can result in a vCPU that
>>>>>> is about to VMRUN to exit with #VMEXIT_INVALID.
>>>>>>
>>>>>> This usually does not happen as APs are typically sitting in HLT when
>>>>>> being destroyed and therefore the vCPU thread is not running at the time.
>>>>>> However, if HLT is allowed inside the VM, then the vCPU could be about to
>>>>>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
>>>>>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
>>>>>> guest to crash. An RMPADJUST against an in-use (already running) VMSA
>>>>>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
>>>>>> attribute cannot be changed until the VMRUN for target vCPU exits. The
>>>>>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
>>>>>> HLT inside the guest.
>>>>>>
>>>>>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
>>>>>> request before returning to the initiating vCPU.
>>>>>>
>>>>>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
>>>>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>>>
>>>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
>>>> to be annotated with KVM_REQUEST_WAIT.
>>>
>>> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
>>> This is much simpler. Let me test it out and resend if everything goes ok.
>>
>> So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
>> me try to track down what is happening with this approach...
>
> Looks like I need to use kvm_make_vcpus_request_mask() instead of just a
> plain kvm_make_request() followed by a kvm_vcpu_kick().
>
> Let me try that and see how this works.
(Lost the thread headers somehow on previous response, resending to keep
it in the thread)
This appears to be working ok. The kvm_make_vcpus_request_mask() function
would need to be EXPORT_SYMBOL_GPL, though, any objections to that?
I could also simplify this a bit by creating a new function that takes a
target vCPU and then calls kvm_make_vcpus_request_mask() from there.
Thoughts?
This is what the patch currently looks like:
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 32ae3aa50c7e..51aa63591b0a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -123,7 +123,8 @@
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
+ KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6e3f5042d9ce..0c45cc0c0571 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4038,8 +4038,13 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
out:
if (kick) {
- kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
- kvm_vcpu_kick(target_vcpu);
+ DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
+
+ bitmap_zero(vcpu_bitmap, KVM_MAX_VCPUS);
+ bitmap_set(vcpu_bitmap, target_vcpu->vcpu_idx, 1);
+ kvm_make_vcpus_request_mask(vcpu->kvm,
+ KVM_REQ_UPDATE_PROTECTED_GUEST_STATE,
+ vcpu_bitmap);
}
mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ba0327e2d0d3..08c135f3d31f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -268,6 +268,7 @@ bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
return called;
}
+EXPORT_SYMBOL_GPL(kvm_make_vcpus_request_mask);
bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
{
Thanks,
Tom
>
> Thanks,
> Tom
>
>>
>> Thanks,
>> Tom
>>
>>>
>>> Thanks,
>>> Tom
>>>
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 04e6c5604bc3..67abfe97c600 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -124,7 +124,8 @@
>>>> KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>>> #define KVM_REQ_HV_TLB_FLUSH \
>>>> KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>>> -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)
>>>> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
>>>> + KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
>>>>
>>>> #define CR0_RESERVED_BITS \
>>>> (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
>>>>
>>>>
On Fri, Mar 21, 2025, Tom Lendacky wrote:
> On 3/18/25 08:47, Tom Lendacky wrote:
> > On 3/18/25 07:43, Tom Lendacky wrote:
> >>>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
> >>>> to be annotated with KVM_REQUEST_WAIT.
> >>>
> >>> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
> >>> This is much simpler. Let me test it out and resend if everything goes ok.
> >>
> >> So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
> >> me try to track down what is happening with this approach...
> >
> > Looks like I need to use kvm_make_vcpus_request_mask() instead of just a
> > plain kvm_make_request() followed by a kvm_vcpu_kick().
Ugh, I was going to say "you don't need to do that", but I forgot that
kvm_vcpu_kick() subtly doesn't honor KVM_REQUEST_WAIT.
Ooof, I'm 99% certain that's causing bugs elsewhere. E.g. arm64's KVM_REQ_SLEEP
uses the same "broken" pattern (LOL, which means that of course RISC-V does too).
In quotes, because kvm_vcpu_kick() is the one that sucks.
I would rather fix that a bit more directly and obviously. IMO, converting to
smp_call_function_single() isntead of bastardizing smp_send_reschedule() is worth
doing regardless of the WAIT mess. This will allow cleaning up a bunch of
make_request+kick pairs, it'll just take a bit of care to make sure we don't
create a WAIT where one isn't wanted (though those probably should have a big fat
comment anyways).
Compiled tested only.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5de20409bcd9..fd9d9a3ee075 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1505,7 +1505,16 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
-void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
+
+#ifndef CONFIG_S390
+void __kvm_vcpu_kick(struct kvm_vcpu *vcpu, bool wait);
+
+static inline void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
+{
+ __kvm_vcpu_kick(vcpu, false);
+}
+#endif
+
int kvm_vcpu_yield_to(struct kvm_vcpu *target);
void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool yield_to_kernel_mode);
@@ -2253,6 +2262,14 @@ static __always_inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
__kvm_make_request(req, vcpu);
}
+#ifndef CONFIG_S390
+static inline void kvm_make_request_and_kick(int req, struct kvm_vcpu *vcpu)
+{
+ kvm_make_request(req, vcpu);
+ __kvm_vcpu_kick(vcpu, req & KVM_REQUEST_WAIT);
+}
+#endif
+
static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
{
return READ_ONCE(vcpu->requests);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 201c14ff476f..2a5120e2e6b4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3734,7 +3734,7 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
/*
* Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
*/
-void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
+void __kvm_vcpu_kick(struct kvm_vcpu *vcpu, bool wait)
{
int me, cpu;
@@ -3764,12 +3764,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
if (kvm_arch_vcpu_should_kick(vcpu)) {
cpu = READ_ONCE(vcpu->cpu);
if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
- smp_send_reschedule(cpu);
+ smp_call_function_single(cpu, ack_kick, NULL, wait);
}
out:
put_cpu();
}
-EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
+EXPORT_SYMBOL_GPL(__kvm_vcpu_kick);
#endif /* !CONFIG_S390 */
int kvm_vcpu_yield_to(struct kvm_vcpu *target)
On 3/21/25 18:17, Sean Christopherson wrote:
> On Fri, Mar 21, 2025, Tom Lendacky wrote:
>> On 3/18/25 08:47, Tom Lendacky wrote:
>>> On 3/18/25 07:43, Tom Lendacky wrote:
>>>>>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
>>>>>> to be annotated with KVM_REQUEST_WAIT.
>>>>>
>>>>> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
>>>>> This is much simpler. Let me test it out and resend if everything goes ok.
>>>>
>>>> So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
>>>> me try to track down what is happening with this approach...
>>>
>>> Looks like I need to use kvm_make_vcpus_request_mask() instead of just a
>>> plain kvm_make_request() followed by a kvm_vcpu_kick().
>
> Ugh, I was going to say "you don't need to do that", but I forgot that
> kvm_vcpu_kick() subtly doesn't honor KVM_REQUEST_WAIT.
>
> Ooof, I'm 99% certain that's causing bugs elsewhere. E.g. arm64's KVM_REQ_SLEEP
> uses the same "broken" pattern (LOL, which means that of course RISC-V does too).
> In quotes, because kvm_vcpu_kick() is the one that sucks.
>
> I would rather fix that a bit more directly and obviously. IMO, converting to
> smp_call_function_single() isntead of bastardizing smp_send_reschedule() is worth
> doing regardless of the WAIT mess. This will allow cleaning up a bunch of
> make_request+kick pairs, it'll just take a bit of care to make sure we don't
> create a WAIT where one isn't wanted (though those probably should have a big fat
> comment anyways).
>
> Compiled tested only.
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5de20409bcd9..fd9d9a3ee075 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1505,7 +1505,16 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu);
> void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
> void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
> bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
> -void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
> +
> +#ifndef CONFIG_S390
> +void __kvm_vcpu_kick(struct kvm_vcpu *vcpu, bool wait);
> +
> +static inline void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> +{
> + __kvm_vcpu_kick(vcpu, false);
> +}
> +#endif
> +
> int kvm_vcpu_yield_to(struct kvm_vcpu *target);
> void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool yield_to_kernel_mode);
>
> @@ -2253,6 +2262,14 @@ static __always_inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
> __kvm_make_request(req, vcpu);
> }
>
> +#ifndef CONFIG_S390
> +static inline void kvm_make_request_and_kick(int req, struct kvm_vcpu *vcpu)
> +{
> + kvm_make_request(req, vcpu);
> + __kvm_vcpu_kick(vcpu, req & KVM_REQUEST_WAIT);
> +}
> +#endif
> +
> static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
> {
> return READ_ONCE(vcpu->requests);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 201c14ff476f..2a5120e2e6b4 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3734,7 +3734,7 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
> /*
> * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
> */
> -void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> +void __kvm_vcpu_kick(struct kvm_vcpu *vcpu, bool wait)
> {
> int me, cpu;
>
> @@ -3764,12 +3764,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> if (kvm_arch_vcpu_should_kick(vcpu)) {
> cpu = READ_ONCE(vcpu->cpu);
> if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> - smp_send_reschedule(cpu);
> + smp_call_function_single(cpu, ack_kick, NULL, wait);
In general, this approach works. However, this change triggered
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
&& !oops_in_progress);
in kernel/smp.c.
Call path was:
WARNING: CPU: 13 PID: 3467 at kernel/smp.c:652 smp_call_function_single+0x100/0x120
...
Call Trace:
<TASK>
? show_regs+0x69/0x80
? __warn+0x8d/0x130
? smp_call_function_single+0x100/0x120
? report_bug+0x182/0x190
? handle_bug+0x63/0xa0
? exc_invalid_op+0x19/0x70
? asm_exc_invalid_op+0x1b/0x20
? __pfx_ack_kick+0x10/0x10 [kvm]
? __pfx_ack_kick+0x10/0x10 [kvm]
? smp_call_function_single+0x100/0x120
? srso_alias_return_thunk+0x5/0xfbef5
? migrate_folio_done+0x7f/0x90
__kvm_vcpu_kick+0xa1/0xb0 [kvm]
svm_complete_interrupt_delivery+0x93/0xa0 [kvm_amd]
svm_deliver_interrupt+0x3e/0x50 [kvm_amd]
__apic_accept_irq+0x17f/0x2a0 [kvm]
kvm_irq_delivery_to_apic_fast+0x149/0x1b0 [kvm]
kvm_arch_set_irq_inatomic+0x9b/0xd0 [kvm]
irqfd_wakeup+0xf4/0x230 [kvm]
? __pfx_kvm_set_msi+0x10/0x10 [kvm]
__wake_up_common+0x7b/0xa0
__wake_up_locked_key+0x18/0x20
eventfd_write+0xbe/0x1d0
? srso_alias_return_thunk+0x5/0xfbef5
? security_file_permission+0x134/0x150
vfs_write+0xfb/0x3f0
? srso_alias_return_thunk+0x5/0xfbef5
? __handle_mm_fault+0x930/0x1040
ksys_write+0x6a/0xe0
__x64_sys_write+0x19/0x20
x64_sys_call+0x16af/0x2140
do_syscall_64+0x6b/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? count_memcg_events.constprop.0+0x1e/0x40
? srso_alias_return_thunk+0x5/0xfbef5
? handle_mm_fault+0x18c/0x270
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? irqentry_exit_to_user_mode+0x2f/0x170
? srso_alias_return_thunk+0x5/0xfbef5
? irqentry_exit+0x1d/0x30
? srso_alias_return_thunk+0x5/0xfbef5
? exc_page_fault+0x89/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Thanks,
Tom
> }
> out:
> put_cpu();
> }
> -EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
> +EXPORT_SYMBOL_GPL(__kvm_vcpu_kick);
> #endif /* !CONFIG_S390 */
>
> int kvm_vcpu_yield_to(struct kvm_vcpu *target)
>
On 3/25/25 12:49, Tom Lendacky wrote:
> On 3/21/25 18:17, Sean Christopherson wrote:
>> On Fri, Mar 21, 2025, Tom Lendacky wrote:
>>> On 3/18/25 08:47, Tom Lendacky wrote:
>>>> On 3/18/25 07:43, Tom Lendacky wrote:
>>>>>>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
>>>>>>> to be annotated with KVM_REQUEST_WAIT.
>>>>>>
>>>>>> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
>>>>>> This is much simpler. Let me test it out and resend if everything goes ok.
>>>>>
>>>>> So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
>>>>> me try to track down what is happening with this approach...
>>>>
>>>> Looks like I need to use kvm_make_vcpus_request_mask() instead of just a
>>>> plain kvm_make_request() followed by a kvm_vcpu_kick().
>>
>> Ugh, I was going to say "you don't need to do that", but I forgot that
>> kvm_vcpu_kick() subtly doesn't honor KVM_REQUEST_WAIT.
>>
>> Ooof, I'm 99% certain that's causing bugs elsewhere. E.g. arm64's KVM_REQ_SLEEP
>> uses the same "broken" pattern (LOL, which means that of course RISC-V does too).
>> In quotes, because kvm_vcpu_kick() is the one that sucks.
>>
>> I would rather fix that a bit more directly and obviously. IMO, converting to
>> smp_call_function_single() isntead of bastardizing smp_send_reschedule() is worth
>> doing regardless of the WAIT mess. This will allow cleaning up a bunch of
>> make_request+kick pairs, it'll just take a bit of care to make sure we don't
>> create a WAIT where one isn't wanted (though those probably should have a big fat
>> comment anyways).
>>
>> Compiled tested only.
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 5de20409bcd9..fd9d9a3ee075 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1505,7 +1505,16 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu);
>> void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
>> void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
>> bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
>> -void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>> +
>> +#ifndef CONFIG_S390
>> +void __kvm_vcpu_kick(struct kvm_vcpu *vcpu, bool wait);
>> +
>> +static inline void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>> +{
>> + __kvm_vcpu_kick(vcpu, false);
>> +}
>> +#endif
>> +
>> int kvm_vcpu_yield_to(struct kvm_vcpu *target);
>> void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool yield_to_kernel_mode);
>>
>> @@ -2253,6 +2262,14 @@ static __always_inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
>> __kvm_make_request(req, vcpu);
>> }
>>
>> +#ifndef CONFIG_S390
>> +static inline void kvm_make_request_and_kick(int req, struct kvm_vcpu *vcpu)
>> +{
>> + kvm_make_request(req, vcpu);
>> + __kvm_vcpu_kick(vcpu, req & KVM_REQUEST_WAIT);
>> +}
>> +#endif
>> +
>> static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
>> {
>> return READ_ONCE(vcpu->requests);
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 201c14ff476f..2a5120e2e6b4 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -3734,7 +3734,7 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
>> /*
>> * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
>> */
>> -void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>> +void __kvm_vcpu_kick(struct kvm_vcpu *vcpu, bool wait)
>> {
>> int me, cpu;
>>
>> @@ -3764,12 +3764,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>> if (kvm_arch_vcpu_should_kick(vcpu)) {
>> cpu = READ_ONCE(vcpu->cpu);
>> if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
>> - smp_send_reschedule(cpu);
>> + smp_call_function_single(cpu, ack_kick, NULL, wait);
>
> In general, this approach works. However, this change triggered
>
> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> && !oops_in_progress);
>
> in kernel/smp.c.
Is keeping the old behavior desirable when IRQs are disabled? Something
like:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a6fedcadd036..81cbc55eac3a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3754,8 +3754,14 @@ void __kvm_vcpu_kick(struct kvm_vcpu *vcpu, bool wait)
*/
if (kvm_arch_vcpu_should_kick(vcpu)) {
cpu = READ_ONCE(vcpu->cpu);
- if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
- smp_call_function_single(cpu, ack_kick, NULL, wait);
+ if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
+ WARN_ON_ONCE(wait && irqs_disabled());
+
+ if (irqs_disabled())
+ smp_send_reschedule(cpu);
+ else
+ smp_call_function_single(cpu, ack_kick, NULL, wait);
+ }
}
out:
put_cpu();
>
> Call path was:
> WARNING: CPU: 13 PID: 3467 at kernel/smp.c:652 smp_call_function_single+0x100/0x120
> ...
>
> Call Trace:
> <TASK>
> ? show_regs+0x69/0x80
> ? __warn+0x8d/0x130
> ? smp_call_function_single+0x100/0x120
> ? report_bug+0x182/0x190
> ? handle_bug+0x63/0xa0
> ? exc_invalid_op+0x19/0x70
> ? asm_exc_invalid_op+0x1b/0x20
> ? __pfx_ack_kick+0x10/0x10 [kvm]
> ? __pfx_ack_kick+0x10/0x10 [kvm]
> ? smp_call_function_single+0x100/0x120
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? migrate_folio_done+0x7f/0x90
> __kvm_vcpu_kick+0xa1/0xb0 [kvm]
> svm_complete_interrupt_delivery+0x93/0xa0 [kvm_amd]
> svm_deliver_interrupt+0x3e/0x50 [kvm_amd]
> __apic_accept_irq+0x17f/0x2a0 [kvm]
> kvm_irq_delivery_to_apic_fast+0x149/0x1b0 [kvm]
> kvm_arch_set_irq_inatomic+0x9b/0xd0 [kvm]
> irqfd_wakeup+0xf4/0x230 [kvm]
> ? __pfx_kvm_set_msi+0x10/0x10 [kvm]
> __wake_up_common+0x7b/0xa0
> __wake_up_locked_key+0x18/0x20
> eventfd_write+0xbe/0x1d0
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? security_file_permission+0x134/0x150
> vfs_write+0xfb/0x3f0
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? __handle_mm_fault+0x930/0x1040
> ksys_write+0x6a/0xe0
> __x64_sys_write+0x19/0x20
> x64_sys_call+0x16af/0x2140
> do_syscall_64+0x6b/0x110
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? count_memcg_events.constprop.0+0x1e/0x40
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? handle_mm_fault+0x18c/0x270
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? irqentry_exit_to_user_mode+0x2f/0x170
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? irqentry_exit+0x1d/0x30
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? exc_page_fault+0x89/0x160
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Thanks,
> Tom
>
>> }
>> out:
>> put_cpu();
>> }
>> -EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
>> +EXPORT_SYMBOL_GPL(__kvm_vcpu_kick);
>> #endif /* !CONFIG_S390 */
>>
>> int kvm_vcpu_yield_to(struct kvm_vcpu *target)
>>
On Wed, Mar 26, 2025, Tom Lendacky wrote:
> On 3/25/25 12:49, Tom Lendacky wrote:
> > On 3/21/25 18:17, Sean Christopherson wrote:
> >> On Fri, Mar 21, 2025, Tom Lendacky wrote:
> >>> On 3/18/25 08:47, Tom Lendacky wrote:
> >>>> On 3/18/25 07:43, Tom Lendacky wrote:
> >>>>>>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
> >>>>>>> to be annotated with KVM_REQUEST_WAIT.
> >>>>>>
> >>>>>> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
> >>>>>> This is much simpler. Let me test it out and resend if everything goes ok.
> >>>>>
> >>>>> So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
> >>>>> me try to track down what is happening with this approach...
> >>>>
> >>>> Looks like I need to use kvm_make_vcpus_request_mask() instead of just a
> >>>> plain kvm_make_request() followed by a kvm_vcpu_kick().
> >>
> >> Ugh, I was going to say "you don't need to do that", but I forgot that
> >> kvm_vcpu_kick() subtly doesn't honor KVM_REQUEST_WAIT.
> >>
> >> Ooof, I'm 99% certain that's causing bugs elsewhere. E.g. arm64's KVM_REQ_SLEEP
> >> uses the same "broken" pattern (LOL, which means that of course RISC-V does too).
> >> In quotes, because kvm_vcpu_kick() is the one that sucks.
> >>
> >> I would rather fix that a bit more directly and obviously. IMO, converting to
> >> smp_call_function_single() isntead of bastardizing smp_send_reschedule() is worth
> >> doing regardless of the WAIT mess. This will allow cleaning up a bunch of
> >> make_request+kick pairs, it'll just take a bit of care to make sure we don't
> >> create a WAIT where one isn't wanted (though those probably should have a big fat
> >> comment anyways).
...
> >> @@ -3764,12 +3764,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> >> if (kvm_arch_vcpu_should_kick(vcpu)) {
> >> cpu = READ_ONCE(vcpu->cpu);
> >> if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> >> - smp_send_reschedule(cpu);
> >> + smp_call_function_single(cpu, ack_kick, NULL, wait);
> >
> > In general, this approach works. However, this change triggered
> >
> > WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> > && !oops_in_progress);
> >
> > in kernel/smp.c.
Drat, I forgot that smp_call_function_xxx() can deadlock even if wait=false due
to needing to take locks to set the callback function.
> Is keeping the old behavior desirable when IRQs are disabled? Something
> like:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a6fedcadd036..81cbc55eac3a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3754,8 +3754,14 @@ void __kvm_vcpu_kick(struct kvm_vcpu *vcpu, bool wait)
> */
> if (kvm_arch_vcpu_should_kick(vcpu)) {
> cpu = READ_ONCE(vcpu->cpu);
> - if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> - smp_call_function_single(cpu, ack_kick, NULL, wait);
> + if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
> + WARN_ON_ONCE(wait && irqs_disabled());
> +
> + if (irqs_disabled())
> + smp_send_reschedule(cpu);
> + else
> + smp_call_function_single(cpu, ack_kick, NULL, wait);
> + }
> }
> out:
> put_cpu();
That, or keying off wait, and letting smp_call_function_xxx() yell about trying
to use it with IRQs disabled, i.e.
if (wait)
smp_call_function_single(cpu, ack_kick, NULL, wait);
else
smp_send_reschedule(cpu);
My vote would be for the checking "wait", so that the behavior is consistent for
a given request.
© 2016 - 2025 Red Hat, Inc.