target/i386/kvm/kvm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
Userspace should set the ret field of hypercall after handling
KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
Reported-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
---
To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
otherwise, TDX guest boot could fail.
A matching QEMU tree including this patch is here:
https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
Previously, the issue was not triggered because no one would modify the ret
value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
value could be modified.
---
target/i386/kvm/kvm.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8e17942c3b..4bcccb48d1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
static int kvm_handle_hypercall(struct kvm_run *run)
{
+ int ret = -EINVAL;
+
if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
- return kvm_handle_hc_map_gpa_range(run);
+ ret = kvm_handle_hc_map_gpa_range(run);
+
+ run->hypercall.ret = ret;
- return -EINVAL;
+ return ret;
}
#define VMX_INVALID_GUEST_STATE 0x80000021
base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
--
2.46.0
On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
> Date: Thu, 12 Dec 2024 11:26:28 +0800
> From: Binbin Wu <binbin.wu@linux.intel.com>
> Subject: [PATCH] i386/kvm: Set return value after handling
> KVM_EXIT_HYPERCALL
> X-Mailer: git-send-email 2.46.0
>
> Userspace should set the ret field of hypercall after handling
> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>
> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
> Reported-by: Farrah Chen <farrah.chen@intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> Tested-by: Farrah Chen <farrah.chen@intel.com>
> ---
> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
> otherwise, TDX guest boot could fail.
> A matching QEMU tree including this patch is here:
> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>
> Previously, the issue was not triggered because no one would modify the ret
> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
> value could be modified.
Could you explain the specific reasons here in detail? It would be
helpful with debugging or reproducing the issue.
> ---
> target/i386/kvm/kvm.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 8e17942c3b..4bcccb48d1 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>
> static int kvm_handle_hypercall(struct kvm_run *run)
> {
> + int ret = -EINVAL;
> +
> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
> - return kvm_handle_hc_map_gpa_range(run);
> + ret = kvm_handle_hc_map_gpa_range(run);
> +
> + run->hypercall.ret = ret;
ret may be negative but hypercall.ret is u64. Do we need to set it to
-ret?
> - return -EINVAL;
> + return ret;
> }
>
> #define VMX_INVALID_GUEST_STATE 0x80000021
>
> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
> --
> 2.46.0
>
>
On 12/12/24 09:07, Zhao Liu wrote:
> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
>> Date: Thu, 12 Dec 2024 11:26:28 +0800
>> From: Binbin Wu <binbin.wu@linux.intel.com>
>> Subject: [PATCH] i386/kvm: Set return value after handling
>> KVM_EXIT_HYPERCALL
>> X-Mailer: git-send-email 2.46.0
>>
>> Userspace should set the ret field of hypercall after handling
>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>>
>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>> ---
>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>> otherwise, TDX guest boot could fail.
>> A matching QEMU tree including this patch is here:
>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>
>> Previously, the issue was not triggered because no one would modify the ret
>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>> value could be modified.
>
> Could you explain the specific reasons here in detail? It would be
> helpful with debugging or reproducing the issue.
>
>> ---
>> target/i386/kvm/kvm.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 8e17942c3b..4bcccb48d1 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>
>> static int kvm_handle_hypercall(struct kvm_run *run)
>> {
>> + int ret = -EINVAL;
>> +
>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>> - return kvm_handle_hc_map_gpa_range(run);
>> + ret = kvm_handle_hc_map_gpa_range(run);
>> +
>> + run->hypercall.ret = ret;
>
> ret may be negative but hypercall.ret is u64. Do we need to set it to
> -ret?
If ret is less than zero, will stop the VM anyway as
RUN_STATE_INTERNAL_ERROR.
If this has to be fixed in QEMU, I think there's no need to set anything
if ret != 0; also because kvm_convert_memory() returns -1 on error and
that's not how the error would be passed to the guest.
However, I think the right fix should simply be this in KVM:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83fe0a78146f..e2118ba93ef6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
}
vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+ vcpu->run->ret = 0;
vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
vcpu->run->hypercall.args[0] = gpa;
vcpu->run->hypercall.args[1] = npages;
While there is arguably a change in behavior of the kernel both with
the patches in kvm-coco-queue and with the above one, _in practice_
the above change is one that userspace will not notice.
Paolo
>> - return -EINVAL;
>> + return ret;
>> }
>>
>> #define VMX_INVALID_GUEST_STATE 0x80000021
>>
>> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
>> --
>> 2.46.0
>>
>>
On Thu, Dec 12, 2024, Paolo Bonzini wrote:
> On 12/12/24 09:07, Zhao Liu wrote:
> > On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
> > > Date: Thu, 12 Dec 2024 11:26:28 +0800
> > > From: Binbin Wu <binbin.wu@linux.intel.com>
> > > Subject: [PATCH] i386/kvm: Set return value after handling
> > > KVM_EXIT_HYPERCALL
> > > X-Mailer: git-send-email 2.46.0
> > >
> > > Userspace should set the ret field of hypercall after handling
> > > KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
> > >
> > > Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
> > > Reported-by: Farrah Chen <farrah.chen@intel.com>
> > > Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> > > Tested-by: Farrah Chen <farrah.chen@intel.com>
> > > ---
> > > To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
> > > otherwise, TDX guest boot could fail.
> > > A matching QEMU tree including this patch is here:
> > > https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
> > >
> > > Previously, the issue was not triggered because no one would modify the ret
> > > value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
> > > https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
> > > value could be modified.
> >
> > Could you explain the specific reasons here in detail? It would be
> > helpful with debugging or reproducing the issue.
> >
> > > ---
> > > target/i386/kvm/kvm.c | 8 ++++++--
> > > 1 file changed, 6 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > index 8e17942c3b..4bcccb48d1 100644
> > > --- a/target/i386/kvm/kvm.c
> > > +++ b/target/i386/kvm/kvm.c
> > > @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
> > > static int kvm_handle_hypercall(struct kvm_run *run)
> > > {
> > > + int ret = -EINVAL;
> > > +
> > > if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
> > > - return kvm_handle_hc_map_gpa_range(run);
> > > + ret = kvm_handle_hc_map_gpa_range(run);
> > > +
> > > + run->hypercall.ret = ret;
> >
> > ret may be negative but hypercall.ret is u64. Do we need to set it to
> > -ret?
>
> If ret is less than zero, will stop the VM anyway as
> RUN_STATE_INTERNAL_ERROR.
>
> If this has to be fixed in QEMU, I think there's no need to set anything
> if ret != 0; also because kvm_convert_memory() returns -1 on error and
> that's not how the error would be passed to the guest.
>
> However, I think the right fix should simply be this in KVM:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 83fe0a78146f..e2118ba93ef6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
> }
> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> + vcpu->run->ret = 0;
vcpu->run->hypercall.ret
> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> vcpu->run->hypercall.args[0] = gpa;
> vcpu->run->hypercall.args[1] = npages;
>
> While there is arguably a change in behavior of the kernel both with
> the patches in kvm-coco-queue and with the above one, _in practice_
> the above change is one that userspace will not notice.
I agree that KVM should initialize "ret", but I don't think '0' is the right
value. KVM shouldn't assume userspace will successfully handle the hypercall.
What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
On 12/12/24 20:13, Sean Christopherson wrote:
> On Thu, Dec 12, 2024, Paolo Bonzini wrote:
>> On 12/12/24 09:07, Zhao Liu wrote:
>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800
>>>> From: Binbin Wu <binbin.wu@linux.intel.com>
>>>> Subject: [PATCH] i386/kvm: Set return value after handling
>>>> KVM_EXIT_HYPERCALL
>>>> X-Mailer: git-send-email 2.46.0
>>>>
>>>> Userspace should set the ret field of hypercall after handling
>>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>>>>
>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>>> ---
>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>>>> otherwise, TDX guest boot could fail.
>>>> A matching QEMU tree including this patch is here:
>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>>>
>>>> Previously, the issue was not triggered because no one would modify the ret
>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>>>> value could be modified.
>>>
>>> Could you explain the specific reasons here in detail? It would be
>>> helpful with debugging or reproducing the issue.
>>>
>>>> ---
>>>> target/i386/kvm/kvm.c | 8 ++++++--
>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>> index 8e17942c3b..4bcccb48d1 100644
>>>> --- a/target/i386/kvm/kvm.c
>>>> +++ b/target/i386/kvm/kvm.c
>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>>> static int kvm_handle_hypercall(struct kvm_run *run)
>>>> {
>>>> + int ret = -EINVAL;
>>>> +
>>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>>> - return kvm_handle_hc_map_gpa_range(run);
>>>> + ret = kvm_handle_hc_map_gpa_range(run);
>>>> +
>>>> + run->hypercall.ret = ret;
>>>
>>> ret may be negative but hypercall.ret is u64. Do we need to set it to
>>> -ret?
>>
>> If ret is less than zero, will stop the VM anyway as
>> RUN_STATE_INTERNAL_ERROR.
>>
>> If this has to be fixed in QEMU, I think there's no need to set anything
>> if ret != 0; also because kvm_convert_memory() returns -1 on error and
>> that's not how the error would be passed to the guest.
>>
>> However, I think the right fix should simply be this in KVM:
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 83fe0a78146f..e2118ba93ef6 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>> }
>> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
>> + vcpu->run->ret = 0;
>
> vcpu->run->hypercall.ret
>
>> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
>> vcpu->run->hypercall.args[0] = gpa;
>> vcpu->run->hypercall.args[1] = npages;
>>
>> While there is arguably a change in behavior of the kernel both with
>> the patches in kvm-coco-queue and with the above one, _in practice_
>> the above change is one that userspace will not notice.
>
> I agree that KVM should initialize "ret", but I don't think '0' is the right
> value. KVM shouldn't assume userspace will successfully handle the hypercall.
> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the
guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is
fixing, just with a different value passed to the guest.
In other words, the above one-liner is pulling the "don't break
userspace" card.
Paolo
On 12/13/2024 5:28 AM, Paolo Bonzini wrote:
> On 12/12/24 20:13, Sean Christopherson wrote:
>> On Thu, Dec 12, 2024, Paolo Bonzini wrote:
>>> On 12/12/24 09:07, Zhao Liu wrote:
>>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
>>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800
>>>>> From: Binbin Wu <binbin.wu@linux.intel.com>
>>>>> Subject: [PATCH] i386/kvm: Set return value after handling
>>>>> KVM_EXIT_HYPERCALL
>>>>> X-Mailer: git-send-email 2.46.0
>>>>>
>>>>> Userspace should set the ret field of hypercall after handling
>>>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>>>>>
>>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>>>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>>>> ---
>>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>>>>> otherwise, TDX guest boot could fail.
>>>>> A matching QEMU tree including this patch is here:
>>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>>>>
>>>>> Previously, the issue was not triggered because no one would modify the ret
>>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>>>>> value could be modified.
>>>>
>>>> Could you explain the specific reasons here in detail? It would be
>>>> helpful with debugging or reproducing the issue.
>>>>
>>>>> ---
>>>>> target/i386/kvm/kvm.c | 8 ++++++--
>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>>> index 8e17942c3b..4bcccb48d1 100644
>>>>> --- a/target/i386/kvm/kvm.c
>>>>> +++ b/target/i386/kvm/kvm.c
>>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>>>> static int kvm_handle_hypercall(struct kvm_run *run)
>>>>> {
>>>>> + int ret = -EINVAL;
>>>>> +
>>>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>>>> - return kvm_handle_hc_map_gpa_range(run);
>>>>> + ret = kvm_handle_hc_map_gpa_range(run);
>>>>> +
>>>>> + run->hypercall.ret = ret;
>>>>
>>>> ret may be negative but hypercall.ret is u64. Do we need to set it to
>>>> -ret?
>>>
>>> If ret is less than zero, will stop the VM anyway as
>>> RUN_STATE_INTERNAL_ERROR.
>>>
>>> If this has to be fixed in QEMU, I think there's no need to set anything
>>> if ret != 0; also because kvm_convert_memory() returns -1 on error and
>>> that's not how the error would be passed to the guest.
>>>
>>> However, I think the right fix should simply be this in KVM:
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 83fe0a78146f..e2118ba93ef6 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>>> }
>>> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
>>> + vcpu->run->ret = 0;
>>
>> vcpu->run->hypercall.ret
>>
>>> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
>>> vcpu->run->hypercall.args[0] = gpa;
>>> vcpu->run->hypercall.args[1] = npages;
>>>
>>> While there is arguably a change in behavior of the kernel both with
>>> the patches in kvm-coco-queue and with the above one, _in practice_
>>> the above change is one that userspace will not notice.
>>
>> I agree that KVM should initialize "ret", but I don't think '0' is the right
>> value. KVM shouldn't assume userspace will successfully handle the hypercall.
>> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
>
> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest.
>
> In other words, the above one-liner is pulling the "don't break userspace" card.
>
> Paolo
>
>
If the change need to be done in KVM, there are other 3 functions that use
KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 40fe7258843e..a624f7289282 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
}
vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+ vcpu->run->ret = 0;
vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
vcpu->run->hypercall.args[0] = gpa;
vcpu->run->hypercall.args[1] = 1;
@@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
case VMGEXIT_PSC_OP_PRIVATE:
case VMGEXIT_PSC_OP_SHARED:
vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+ vcpu->run->ret = 0;
vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn);
vcpu->run->hypercall.args[1] = npages;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 85c8aee263c1..c50c2edc8c56 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx)
pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size);
tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL;
+ tdx->vcpu->run->ret = 0;
tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4f94b1e24eae..3f82bb2357e3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
}
vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+ vcpu->run->ret = 0;
vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
vcpu->run->hypercall.args[0] = gpa;
vcpu->run->hypercall.args[1] = npages;
On 12/13/2024 9:46 AM, Binbin Wu wrote:
>
>
>
> On 12/13/2024 5:28 AM, Paolo Bonzini wrote:
>> On 12/12/24 20:13, Sean Christopherson wrote:
>>> On Thu, Dec 12, 2024, Paolo Bonzini wrote:
>>>> On 12/12/24 09:07, Zhao Liu wrote:
>>>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
>>>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800
>>>>>> From: Binbin Wu <binbin.wu@linux.intel.com>
>>>>>> Subject: [PATCH] i386/kvm: Set return value after handling
>>>>>> KVM_EXIT_HYPERCALL
>>>>>> X-Mailer: git-send-email 2.46.0
>>>>>>
>>>>>> Userspace should set the ret field of hypercall after handling
>>>>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>>>>>>
>>>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>>>>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>>>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>>>>> ---
>>>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>>>>>> otherwise, TDX guest boot could fail.
>>>>>> A matching QEMU tree including this patch is here:
>>>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>>>>>
>>>>>> Previously, the issue was not triggered because no one would modify the ret
>>>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>>>>>> value could be modified.
>>>>>
>>>>> Could you explain the specific reasons here in detail? It would be
>>>>> helpful with debugging or reproducing the issue.
>>>>>
>>>>>> ---
>>>>>> target/i386/kvm/kvm.c | 8 ++++++--
>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>>>> index 8e17942c3b..4bcccb48d1 100644
>>>>>> --- a/target/i386/kvm/kvm.c
>>>>>> +++ b/target/i386/kvm/kvm.c
>>>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>>>>> static int kvm_handle_hypercall(struct kvm_run *run)
>>>>>> {
>>>>>> + int ret = -EINVAL;
>>>>>> +
>>>>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>>>>> - return kvm_handle_hc_map_gpa_range(run);
>>>>>> + ret = kvm_handle_hc_map_gpa_range(run);
>>>>>> +
>>>>>> + run->hypercall.ret = ret;
>>>>>
>>>>> ret may be negative but hypercall.ret is u64. Do we need to set it to
>>>>> -ret?
>>>>
>>>> If ret is less than zero, will stop the VM anyway as
>>>> RUN_STATE_INTERNAL_ERROR.
>>>>
>>>> If this has to be fixed in QEMU, I think there's no need to set anything
>>>> if ret != 0; also because kvm_convert_memory() returns -1 on error and
>>>> that's not how the error would be passed to the guest.
>>>>
>>>> However, I think the right fix should simply be this in KVM:
>>>>
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 83fe0a78146f..e2118ba93ef6 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>>>> }
>>>> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
>>>> + vcpu->run->ret = 0;
>>>
>>> vcpu->run->hypercall.ret
>>>
>>>> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
>>>> vcpu->run->hypercall.args[0] = gpa;
>>>> vcpu->run->hypercall.args[1] = npages;
>>>>
>>>> While there is arguably a change in behavior of the kernel both with
>>>> the patches in kvm-coco-queue and with the above one, _in practice_
>>>> the above change is one that userspace will not notice.
>>>
>>> I agree that KVM should initialize "ret", but I don't think '0' is the right
>>> value. KVM shouldn't assume userspace will successfully handle the hypercall.
>>> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
>>
>> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest.
>>
>> In other words, the above one-liner is pulling the "don't break userspace" card.
>>
>> Paolo
>>
>>
> If the change need to be done in KVM, there are other 3 functions that use
> KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue.
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 40fe7258843e..a624f7289282 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
> }
>
> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> + vcpu->run->ret = 0;
> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> vcpu->run->hypercall.args[0] = gpa;
> vcpu->run->hypercall.args[1] = 1;
> @@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
> case VMGEXIT_PSC_OP_PRIVATE:
> case VMGEXIT_PSC_OP_SHARED:
> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> + vcpu->run->ret = 0;
> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn);
> vcpu->run->hypercall.args[1] = npages;
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 85c8aee263c1..c50c2edc8c56 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx)
> pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size);
>
> tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL;
> + tdx->vcpu->run->ret = 0;
> tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
> tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4f94b1e24eae..3f82bb2357e3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
> }
>
> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> + vcpu->run->ret = 0;
> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> vcpu->run->hypercall.args[0] = gpa;
> vcpu->run->hypercall.args[1] = npages;
>
Maybe we could add a helper to fill the vcpu->run?
On 12/13/2024 9:46 AM, Binbin Wu wrote:
>
>
>
> On 12/13/2024 5:28 AM, Paolo Bonzini wrote:
>> On 12/12/24 20:13, Sean Christopherson wrote:
>>> On Thu, Dec 12, 2024, Paolo Bonzini wrote:
>>>> On 12/12/24 09:07, Zhao Liu wrote:
>>>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
>>>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800
>>>>>> From: Binbin Wu <binbin.wu@linux.intel.com>
>>>>>> Subject: [PATCH] i386/kvm: Set return value after handling
>>>>>> KVM_EXIT_HYPERCALL
>>>>>> X-Mailer: git-send-email 2.46.0
>>>>>>
>>>>>> Userspace should set the ret field of hypercall after handling
>>>>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>>>>>>
>>>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>>>>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>>>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>>>>> ---
>>>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>>>>>> otherwise, TDX guest boot could fail.
>>>>>> A matching QEMU tree including this patch is here:
>>>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>>>>>
>>>>>> Previously, the issue was not triggered because no one would modify the ret
>>>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>>>>>> value could be modified.
>>>>>
>>>>> Could you explain the specific reasons here in detail? It would be
>>>>> helpful with debugging or reproducing the issue.
>>>>>
>>>>>> ---
>>>>>> target/i386/kvm/kvm.c | 8 ++++++--
>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>>>> index 8e17942c3b..4bcccb48d1 100644
>>>>>> --- a/target/i386/kvm/kvm.c
>>>>>> +++ b/target/i386/kvm/kvm.c
>>>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>>>>> static int kvm_handle_hypercall(struct kvm_run *run)
>>>>>> {
>>>>>> + int ret = -EINVAL;
>>>>>> +
>>>>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>>>>> - return kvm_handle_hc_map_gpa_range(run);
>>>>>> + ret = kvm_handle_hc_map_gpa_range(run);
>>>>>> +
>>>>>> + run->hypercall.ret = ret;
>>>>>
>>>>> ret may be negative but hypercall.ret is u64. Do we need to set it to
>>>>> -ret?
>>>>
>>>> If ret is less than zero, will stop the VM anyway as
>>>> RUN_STATE_INTERNAL_ERROR.
>>>>
>>>> If this has to be fixed in QEMU, I think there's no need to set anything
>>>> if ret != 0; also because kvm_convert_memory() returns -1 on error and
>>>> that's not how the error would be passed to the guest.
>>>>
>>>> However, I think the right fix should simply be this in KVM:
>>>>
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 83fe0a78146f..e2118ba93ef6 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>>>> }
>>>> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
>>>> + vcpu->run->ret = 0;
>>>
>>> vcpu->run->hypercall.ret
>>>
>>>> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
>>>> vcpu->run->hypercall.args[0] = gpa;
>>>> vcpu->run->hypercall.args[1] = npages;
>>>>
>>>> While there is arguably a change in behavior of the kernel both with
>>>> the patches in kvm-coco-queue and with the above one, _in practice_
>>>> the above change is one that userspace will not notice.
>>>
>>> I agree that KVM should initialize "ret", but I don't think '0' is the right
>>> value. KVM shouldn't assume userspace will successfully handle the hypercall.
>>> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
>>
>> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest.
>>
>> In other words, the above one-liner is pulling the "don't break userspace" card.
>>
>> Paolo
>>
>>
> If the change need to be done in KVM, there are other 3 functions that use
> KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue.
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 40fe7258843e..a624f7289282 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
> }
>
> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> + vcpu->run->ret = 0;
> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> vcpu->run->hypercall.args[0] = gpa;
> vcpu->run->hypercall.args[1] = 1;
> @@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
> case VMGEXIT_PSC_OP_PRIVATE:
> case VMGEXIT_PSC_OP_SHARED:
> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> + vcpu->run->ret = 0;
> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn);
> vcpu->run->hypercall.args[1] = npages;
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 85c8aee263c1..c50c2edc8c56 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx)
> pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size);
>
> tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL;
> + tdx->vcpu->run->ret = 0;
Sorry, this should be " tdx->vcpu.run->ret = 0;"
> tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
> tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4f94b1e24eae..3f82bb2357e3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
> }
>
> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> + vcpu->run->ret = 0;
> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> vcpu->run->hypercall.args[0] = gpa;
> vcpu->run->hypercall.args[1] = npages;
>
On Thu, Dec 12, 2024, Paolo Bonzini wrote:
> On 12/12/24 20:13, Sean Christopherson wrote:
> > On Thu, Dec 12, 2024, Paolo Bonzini wrote:
> > > If ret is less than zero, will stop the VM anyway as
> > > RUN_STATE_INTERNAL_ERROR.
> > >
> > > If this has to be fixed in QEMU, I think there's no need to set anything
> > > if ret != 0; also because kvm_convert_memory() returns -1 on error and
> > > that's not how the error would be passed to the guest.
> > >
> > > However, I think the right fix should simply be this in KVM:
> > >
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 83fe0a78146f..e2118ba93ef6 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
> > > }
> > > vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> > > + vcpu->run->ret = 0;
> >
> > vcpu->run->hypercall.ret
> >
> > > vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> > > vcpu->run->hypercall.args[0] = gpa;
> > > vcpu->run->hypercall.args[1] = npages;
> > >
> > > While there is arguably a change in behavior of the kernel both with
> > > the patches in kvm-coco-queue and with the above one, _in practice_
> > > the above change is one that userspace will not notice.
> >
> > I agree that KVM should initialize "ret", but I don't think '0' is the right
> > value. KVM shouldn't assume userspace will successfully handle the hypercall.
> > What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
>
> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest
> sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just
> with a different value passed to the guest.
>
> In other words, the above one-liner is pulling the "don't break userspace"
> card.
But how is anything breaking userspace? QEMU needs to opt-in to intercepting
KVM_HC_MAP_GPA_RANGE, and this has been KVM's behavior since commit 0dbb11230437
("KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall").
Ah, "ret" happens to be deep in the union and KVM zero allocates vcpu->run, so
QEMU gets lucky and "ret" happens to be zero because no other non-fatal userspace
exit on x86 happens to need as many bytes. Hilarious.
FWIW, if TDX marshalls hypercall state into KVM's "normal" registers, then KVM's
shenanigans with vcpu->run->hypercall.ret might go away? Though regardless of
what happens on that front, I think it makes to explicitly initialize "ret" to
*something*.
I checked our VMM, and it does the right thing, so I don't have any objection
to explicitly zeroing "ret". Though it needs a comment explaining that it's a
terrible hack for broken userspace ;-)
On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
> Userspace should set the ret field of hypercall after handling
> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>
> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
> Reported-by: Farrah Chen <farrah.chen@intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> Tested-by: Farrah Chen <farrah.chen@intel.com>
> ---
> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
> otherwise, TDX guest boot could fail.
> A matching QEMU tree including this patch is here:
> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>
> Previously, the issue was not triggered because no one would modify the ret
> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
> value could be modified.
> ---
> target/i386/kvm/kvm.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 8e17942c3b..4bcccb48d1 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>
> static int kvm_handle_hypercall(struct kvm_run *run)
> {
> + int ret = -EINVAL;
> +
> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
> - return kvm_handle_hc_map_gpa_range(run);
> + ret = kvm_handle_hc_map_gpa_range(run);
LGTM to the issue it tries to fix :-)
> +
> + run->hypercall.ret = ret;
>
> - return -EINVAL;
> + return ret;
> }
>
> #define VMX_INVALID_GUEST_STATE 0x80000021
>
> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
> --
> 2.46.0
>
On 12/12/2024 11:26 AM, Binbin Wu wrote:
> Userspace should set the ret field of hypercall after handling
> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>
> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
> Reported-by: Farrah Chen <farrah.chen@intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> Tested-by: Farrah Chen <farrah.chen@intel.com>
> ---
> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
> otherwise, TDX guest boot could fail.
> A matching QEMU tree including this patch is here:
> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>
> Previously, the issue was not triggered because no one would modify the ret
> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
> value could be modified.
> ---
> target/i386/kvm/kvm.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 8e17942c3b..4bcccb48d1 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>
> static int kvm_handle_hypercall(struct kvm_run *run)
> {
> + int ret = -EINVAL;
> +
> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
> - return kvm_handle_hc_map_gpa_range(run);
> + ret = kvm_handle_hc_map_gpa_range(run);
> +
> + run->hypercall.ret = ret;
Updating run->hypercall.ret is useful only when QEMU needs to re-enter
the guest. For the case of ret < 0, QEMU will stop the vcpu.
I think we might need re-think on the handling of KVM_EXIT_HYPERCALL.
E.g., in what error case should QEMU stop the vcpu, and in what case can
QEMU return the error back to the guest via run->hypercall.ret.
> - return -EINVAL;
> + return ret;
> }
>
> #define VMX_INVALID_GUEST_STATE 0x80000021
>
> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
On 12/12/2024 11:44 AM, Xiaoyao Li wrote:
> On 12/12/2024 11:26 AM, Binbin Wu wrote:
>> Userspace should set the ret field of hypercall after handling
>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>>
>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>> ---
>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>> otherwise, TDX guest boot could fail.
>> A matching QEMU tree including this patch is here:
>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>
>> Previously, the issue was not triggered because no one would modify the ret
>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>> value could be modified.
>> ---
>> target/i386/kvm/kvm.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 8e17942c3b..4bcccb48d1 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>> static int kvm_handle_hypercall(struct kvm_run *run)
>> {
>> + int ret = -EINVAL;
>> +
>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>> - return kvm_handle_hc_map_gpa_range(run);
>> + ret = kvm_handle_hc_map_gpa_range(run);
>> +
>> + run->hypercall.ret = ret;
>
> Updating run->hypercall.ret is useful only when QEMU needs to re-enter the guest. For the case of ret < 0, QEMU will stop the vcpu.
IMHO, assign run->hypercall.ret anyway should be OK, no need to add a
per-condition on ret, although the value is not used when ret < 0.
Currently, since QEMU will stop the vcpu when ret < 0, this patch doesn't
convert ret to -Exxx that the ABI expects.
>
> I think we might need re-think on the handling of KVM_EXIT_HYPERCALL. E.g., in what error case should QEMU stop the vcpu, and in what case can QEMU return the error back to the guest via run->hypercall.ret.
Actually, I had the similar question before.
https://lore.kernel.org/kvm/d25cc62c-0f56-4be2-968a-63c8b1d63b5a@linux.intel.com/
It might depends on the hypercall number?
Another option is QEMU always sets run->hypercall.ret appropriately and continues the vcpu thread.
>
>> - return -EINVAL;
>> + return ret;
>> }
>> #define VMX_INVALID_GUEST_STATE 0x80000021
>>
>> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
>
On 12/12/2024 1:18 PM, Binbin Wu wrote:
>
>
>
> On 12/12/2024 11:44 AM, Xiaoyao Li wrote:
>> On 12/12/2024 11:26 AM, Binbin Wu wrote:
>>> Userspace should set the ret field of hypercall after handling
>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>>>
>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for
>>> KVM_HC_MAP_GPA_RANGE")
>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>> ---
>>> To test the TDX code in kvm-coco-queue, please apply the patch to the
>>> QEMU,
>>> otherwise, TDX guest boot could fail.
>>> A matching QEMU tree including this patch is here:
>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-
>>> upstream-v6.1-fix_kvm_hypercall_return_value
>>>
>>> Previously, the issue was not triggered because no one would modify
>>> the ret
>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-
>>> seanjc@google.com/, the
>>> value could be modified.
>>> ---
>>> target/i386/kvm/kvm.c | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>> index 8e17942c3b..4bcccb48d1 100644
>>> --- a/target/i386/kvm/kvm.c
>>> +++ b/target/i386/kvm/kvm.c
>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct
>>> kvm_run *run)
>>> static int kvm_handle_hypercall(struct kvm_run *run)
>>> {
>>> + int ret = -EINVAL;
>>> +
>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>> - return kvm_handle_hc_map_gpa_range(run);
>>> + ret = kvm_handle_hc_map_gpa_range(run);
>>> +
>>> + run->hypercall.ret = ret;
>>
>> Updating run->hypercall.ret is useful only when QEMU needs to re-enter
>> the guest. For the case of ret < 0, QEMU will stop the vcpu.
>
> IMHO, assign run->hypercall.ret anyway should be OK, no need to add a
> per-condition on ret, although the value is not used when ret < 0.
>
> Currently, since QEMU will stop the vcpu when ret < 0, this patch doesn't
> convert ret to -Exxx that the ABI expects.
I was thinking if it is better to let each specific handling function to
update run->hypercall.ret with its own logic. E.g., for this case,
let kvm_handle_hc_map_gpa_range() to update the run->hypercall.ret.
Reusing the return value of the handling function to update
run->hypercall.ret seems not logically correct to me.
>>
>> I think we might need re-think on the handling of KVM_EXIT_HYPERCALL.
>> E.g., in what error case should QEMU stop the vcpu, and in what case
>> can QEMU return the error back to the guest via run->hypercall.ret.
>
> Actually, I had the similar question before.
> https://lore.kernel.org/kvm/
> d25cc62c-0f56-4be2-968a-63c8b1d63b5a@linux.intel.com/
>
> It might depends on the hypercall number?
> Another option is QEMU always sets run->hypercall.ret appropriately and
> continues the vcpu thread.
>
>
>>
>>> - return -EINVAL;
>>> + return ret;
>>> }
>>> #define VMX_INVALID_GUEST_STATE 0x80000021
>>>
>>> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
>>
>
On 12/12/2024 3:09 PM, Xiaoyao Li wrote:
> On 12/12/2024 1:18 PM, Binbin Wu wrote:
>>
>>
>>
>> On 12/12/2024 11:44 AM, Xiaoyao Li wrote:
>>> On 12/12/2024 11:26 AM, Binbin Wu wrote:
>>>> Userspace should set the ret field of hypercall after handling
>>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM.
>>>>
>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>>> ---
>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>>>> otherwise, TDX guest boot could fail.
>>>> A matching QEMU tree including this patch is here:
>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu- upstream-v6.1-fix_kvm_hypercall_return_value
>>>>
>>>> Previously, the issue was not triggered because no one would modify the ret
>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7- seanjc@google.com/, the
>>>> value could be modified.
>>>> ---
>>>> target/i386/kvm/kvm.c | 8 ++++++--
>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>> index 8e17942c3b..4bcccb48d1 100644
>>>> --- a/target/i386/kvm/kvm.c
>>>> +++ b/target/i386/kvm/kvm.c
>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>>> static int kvm_handle_hypercall(struct kvm_run *run)
>>>> {
>>>> + int ret = -EINVAL;
>>>> +
>>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>>> - return kvm_handle_hc_map_gpa_range(run);
>>>> + ret = kvm_handle_hc_map_gpa_range(run);
>>>> +
>>>> + run->hypercall.ret = ret;
>>>
>>> Updating run->hypercall.ret is useful only when QEMU needs to re-enter the guest. For the case of ret < 0, QEMU will stop the vcpu.
>>
>> IMHO, assign run->hypercall.ret anyway should be OK, no need to add a
>> per-condition on ret, although the value is not used when ret < 0.
>>
>> Currently, since QEMU will stop the vcpu when ret < 0, this patch doesn't
>> convert ret to -Exxx that the ABI expects.
>
> I was thinking if it is better to let each specific handling function to update run->hypercall.ret with its own logic. E.g., for this case, let kvm_handle_hc_map_gpa_range() to update the run->hypercall.ret.
I think it makes sense.
Also, each handling function can decide whether the vcpu should continue if the handling failed.
- Return 0 and set the error code ( 0 or -Exxx) to run->hypercall.ret if it want to continue.
- Return negative value if it want to stop the vcpu thread.
>
> Reusing the return value of the handling function to update
> run->hypercall.ret seems not logically correct to me.
>
>>>
>>> I think we might need re-think on the handling of KVM_EXIT_HYPERCALL. E.g., in what error case should QEMU stop the vcpu, and in what case can QEMU return the error back to the guest via run->hypercall.ret.
>>
>> Actually, I had the similar question before.
>> https://lore.kernel.org/kvm/ d25cc62c-0f56-4be2-968a-63c8b1d63b5a@linux.intel.com/
>>
>> It might depends on the hypercall number?
>> Another option is QEMU always sets run->hypercall.ret appropriately and continues the vcpu thread.
>>
>>
>>>
>>>> - return -EINVAL;
>>>> + return ret;
>>>> }
>>>> #define VMX_INVALID_GUEST_STATE 0x80000021
>>>>
>>>> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
>>>
>>
>
© 2016 - 2026 Red Hat, Inc.