arch/x86/hyperv/hv_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
With commit 0e20f1f4c2cb ("x86/hyperv: Clean up hv_do_hypercall()"),
config checks were added to conditionally restrict export
of hv_hypercall_pg symbol at the same time when a usage of that symbol
was added in mshv_vtl_main.c driver. This results in missing symbol
warning when mshv_vtl_main is compiled. Change the logic to
export it unconditionally.
Fixes: 96a1d2495c2f ("Drivers: hv: Introduce mshv_vtl driver")
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
---
Omitted cc:stable tag, since mshv_vtl_main changes have not yet
landed in stable/rc kernels. I can add if required.
---
arch/x86/hyperv/hv_init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 896e2913bc41..0ba9cae9ffaf 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -36,6 +36,7 @@
#include <linux/export.h>
void *hv_hypercall_pg;
+EXPORT_SYMBOL_GPL(hv_hypercall_pg);
#ifdef CONFIG_X86_64
static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2)
@@ -73,7 +74,6 @@ static inline void hv_set_hypercall_pg(void *ptr)
{
hv_hypercall_pg = ptr;
}
-EXPORT_SYMBOL_GPL(hv_hypercall_pg);
#endif
union hv_ghcb * __percpu *hv_ghcb_pg;
--
2.34.1
On Mon, Aug 25, 2025 at 11:22:08AM +0530, Naman Jain wrote:
> With commit 0e20f1f4c2cb ("x86/hyperv: Clean up hv_do_hypercall()"),
> config checks were added to conditionally restrict export
> of hv_hypercall_pg symbol at the same time when a usage of that symbol
> was added in mshv_vtl_main.c driver. This results in missing symbol
> warning when mshv_vtl_main is compiled. Change the logic to
> export it unconditionally.
>
> Fixes: 96a1d2495c2f ("Drivers: hv: Introduce mshv_vtl driver")
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Oh gawd, that commit is terrible and adds yet another hypercall
interface.
I would argue the proper fix is moving the whole of mshv_vtl_return()
into the kernel proper and doing it like hv_std_hypercall() on x86_64.
Additionally, how is that function not utterly broken? What happens if
an interrupt or NMI comes in after native_write_cr2() and before the
actual hypercall does VMEXIT and trips a #PF?
And an rax:rcx return, I though the canonical pair was AX,DX !?!?
Also, that STACK_FRAME_NON_STANDARD() annotation is broken, this must
not be used for anything that can end up in vmlinux.o -- that is, the
moment you built-in this driver (=y) this comes unstuck.
The reason you're getting warnings is because you're violating the
normal calling convention and scribbling BP, we yelled at the TDX guys
for doing this, now you're getting yelled at. WTF !?!
Please explain how just shutting up objtool makes the unwind work when
the NMI hits your BP scribble?
All in all, I would suggest fixing this by reverting that patch and
trying again after fixing the calling convention of that hypercall.
Yours grumpy..
On 8/25/2025 3:12 PM, Peter Zijlstra wrote:
> On Mon, Aug 25, 2025 at 11:22:08AM +0530, Naman Jain wrote:
>> With commit 0e20f1f4c2cb ("x86/hyperv: Clean up hv_do_hypercall()"),
>> config checks were added to conditionally restrict export
>> of hv_hypercall_pg symbol at the same time when a usage of that symbol
>> was added in mshv_vtl_main.c driver. This results in missing symbol
>> warning when mshv_vtl_main is compiled. Change the logic to
>> export it unconditionally.
>>
>> Fixes: 96a1d2495c2f ("Drivers: hv: Introduce mshv_vtl driver")
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>
> Oh gawd, that commit is terrible and adds yet another hypercall
> interface.
>
> I would argue the proper fix is moving the whole of mshv_vtl_return()
> into the kernel proper and doing it like hv_std_hypercall() on x86_64.
Thanks for the review comments.
This is doable, I can move the hypercall part of it to
arch/x86/hyperv/hv_init.c if I understand correctly.
>
> Additionally, how is that function not utterly broken? What happens if
> an interrupt or NMI comes in after native_write_cr2() and before the
> actual hypercall does VMEXIT and trips a #PF?
mshv_vtl driver is used for OpenHCL paravisor. The interrupts are
disabled, and NMIs aren't sent to the paravisor by the virt stack.
>
> And an rax:rcx return, I though the canonical pair was AX,DX !?!?
Here, the code uses rax and rcx not as a means to return one 128 bit
value. The code uses them in that way as an ABI.
>
> Also, that STACK_FRAME_NON_STANDARD() annotation is broken, this must
> not be used for anything that can end up in vmlinux.o -- that is, the
> moment you built-in this driver (=y) this comes unstuck.
>
> The reason you're getting warnings is because you're violating the
> normal calling convention and scribbling BP, we yelled at the TDX guys
> for doing this, now you're getting yelled at. WTF !?!
>
> Please explain how just shutting up objtool makes the unwind work when
> the NMI hits your BP scribble?
>
Returning to a lower VTL treats the base pointer register as a general
purpose one and this VTL transition function makes sure to preserve the
bp register due to which the objtool trips over on the assembly touching
the bp register. We considered this warning harmless and thus we are
using this macro. Moreover NMIs are not an issue here as they won't be
coming as mentioned other. If there are alternate approaches that I
should be using, please suggest.
I now understand that as part of your effort to enable IBT config on
x64, you changed the indirect calls to direct calls in Hyper-V. As of
today, there is no requirement to enable IBT in OpenHCL kernel as this
runs as a paravisor in VTL2 and it does not effect the guest VMs running
in VTL0.
I can disable CONFIG_X86_KERNEL_IBT when CONFIG_MSHV_VTL is enabled in
Kconfig in next version.
> All in all, I would suggest fixing this by reverting that patch and
> trying again after fixing the calling convention of that hypercall.
>
>
> Yours grumpy..
Regards,
Naman
On Tue, Aug 26, 2025 at 05:00:31PM +0530, Naman Jain wrote:
>
>
> On 8/25/2025 3:12 PM, Peter Zijlstra wrote:
> > On Mon, Aug 25, 2025 at 11:22:08AM +0530, Naman Jain wrote:
> > > With commit 0e20f1f4c2cb ("x86/hyperv: Clean up hv_do_hypercall()"),
> > > config checks were added to conditionally restrict export
> > > of hv_hypercall_pg symbol at the same time when a usage of that symbol
> > > was added in mshv_vtl_main.c driver. This results in missing symbol
> > > warning when mshv_vtl_main is compiled. Change the logic to
> > > export it unconditionally.
> > >
> > > Fixes: 96a1d2495c2f ("Drivers: hv: Introduce mshv_vtl driver")
> > > Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> >
> > Oh gawd, that commit is terrible and adds yet another hypercall
> > interface.
> >
> > I would argue the proper fix is moving the whole of mshv_vtl_return()
> > into the kernel proper and doing it like hv_std_hypercall() on x86_64.
>
> Thanks for the review comments.
>
> This is doable, I can move the hypercall part of it to
> arch/x86/hyperv/hv_init.c if I understand correctly.
>
> >
> > Additionally, how is that function not utterly broken? What happens if
> > an interrupt or NMI comes in after native_write_cr2() and before the
> > actual hypercall does VMEXIT and trips a #PF?
>
> mshv_vtl driver is used for OpenHCL paravisor. The interrupts are
> disabled, and NMIs aren't sent to the paravisor by the virt stack.
I do not know what OpenHCL is. Nor is it clear from the code what NMIs
can't happen. Anyway, same can be achieved with breakpoints / kprobes.
You can get a trap after setting CR2 and scribble it.
You simply cannot use CR2 this way.
> > And an rax:rcx return, I though the canonical pair was AX,DX !?!?
>
> Here, the code uses rax and rcx not as a means to return one 128 bit
> value. The code uses them in that way as an ABI.
Still daft. Creating an ABI that goes against pre-existing conventions
is weird.
> > Also, that STACK_FRAME_NON_STANDARD() annotation is broken, this must
> > not be used for anything that can end up in vmlinux.o -- that is, the
> > moment you built-in this driver (=y) this comes unstuck.
> >
> > The reason you're getting warnings is because you're violating the
> > normal calling convention and scribbling BP, we yelled at the TDX guys
> > for doing this, now you're getting yelled at. WTF !?!
> >
> > Please explain how just shutting up objtool makes the unwind work when
> > the NMI hits your BP scribble?
>
> Returning to a lower VTL treats the base pointer register as a general
> purpose one and this VTL transition function makes sure to preserve the
> bp register due to which the objtool trips over on the assembly touching
> the bp register. We considered this warning harmless and thus we are
> using this macro. Moreover NMIs are not an issue here as they won't be
> coming as mentioned other. If there are alternate approaches that I should
> be using, please suggest.
Using BP in an ABI like that is ridiculous and broken. We told the same
to the TDX folks when they tried, IIRC TDX was fixed.
It is simply not acceptable to break the regular calling convention with
a new ABI.
Again, I can put a breakpoint or kprobe in the region where BP is
scribbled.
Basically the argument is really simple: you run in Linux, you play by
the Linux rules. Using BP as argument is simply not possible. If your
ABI requires that, your ABI is broken and will not be supported. Rev the
ABI and try again. Same for CR2, that is not an available register in
any way.
> I now understand that as part of your effort to enable IBT config on
> x64, you changed the indirect calls to direct calls in Hyper-V.
Yeah, I was cleaning up indirect calls, and this really didn't need to
be one.
> As of today, there is no requirement to enable IBT in OpenHCL kernel
> as this runs as a paravisor in VTL2 and it does not effect the guest
> VMs running
> in VTL0.
I do not know what OpenHCL or VTLn means and as such pretty much the
whole of your statement makes no sense.
Anyway, AFAICT the whole idea of a hypercall page is to 'abtract' out
the VMCALL vs VMMCALL nonsense Intel/AMD inflicted on us. Surely you
don't actually need that. HyperV already knows about all the gazillion
of ways to do hypercalls.
> I can disable CONFIG_X86_KERNEL_IBT when CONFIG_MSHV_VTL is enabled in
> Kconfig in next version.
Or you can just straight up say: "We at microsoft don't care about
security." :-/
Doing that will ensure no distro will build your module, most build bots
will not build your module, nobody cares about your module.
And no, the problems with BP and CR2 are not related to IBT, that is
separate and no less broken. They violate the basic rules of x86_64.
On Tue, Aug 26, 2025, Peter Zijlstra wrote:
> On Tue, Aug 26, 2025 at 05:00:31PM +0530, Naman Jain wrote:
> >
> >
> > On 8/25/2025 3:12 PM, Peter Zijlstra wrote:
> > > On Mon, Aug 25, 2025 at 11:22:08AM +0530, Naman Jain wrote:
> > > > With commit 0e20f1f4c2cb ("x86/hyperv: Clean up hv_do_hypercall()"),
> > > > config checks were added to conditionally restrict export
> > > > of hv_hypercall_pg symbol at the same time when a usage of that symbol
> > > > was added in mshv_vtl_main.c driver. This results in missing symbol
> > > > warning when mshv_vtl_main is compiled. Change the logic to
> > > > export it unconditionally.
> > > >
> > > > Fixes: 96a1d2495c2f ("Drivers: hv: Introduce mshv_vtl driver")
> > > > Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> > >
> > > Oh gawd, that commit is terrible and adds yet another hypercall
> > > interface.
> > >
> > > I would argue the proper fix is moving the whole of mshv_vtl_return()
> > > into the kernel proper and doing it like hv_std_hypercall() on x86_64.
> >
> > Thanks for the review comments.
> >
> > This is doable, I can move the hypercall part of it to
> > arch/x86/hyperv/hv_init.c if I understand correctly.
> >
> > >
> > > Additionally, how is that function not utterly broken? What happens if
> > > an interrupt or NMI comes in after native_write_cr2() and before the
> > > actual hypercall does VMEXIT and trips a #PF?
> >
> > mshv_vtl driver is used for OpenHCL paravisor. The interrupts are
> > disabled, and NMIs aren't sent to the paravisor by the virt stack.
>
> I do not know what OpenHCL is. Nor is it clear from the code what NMIs
> can't happen.
FWIW, NMIs likely aren't a problem because the NMI handler saves/restores CR2
specifically to guard against #PFs in NMI context clobbering guest state. AMD
CPUs can block NMIs via GIF=0, but blocking NMIs on Intel for the VM-Entry =>
VM-Exit would require worse hacks than saving/restoring CR2 in the NMI handler.
:-(
> Anyway, same can be achieved with breakpoints / kprobes. You can get a trap
> after setting CR2 and scribble it.
Ya, KVM marks everything for vmx_vcpu_enter_exit() to/from VM-Enter/VM-Exit as
noinstr (no instrumentation) to prevent breakpoints/kprobes from clobbering CR2
and other state (and DR7 is zero for good measure).
On 8/26/2025 5:07 AM, Peter Zijlstra wrote: > On Tue, Aug 26, 2025 at 05:00:31PM +0530, Naman Jain wrote: Peter, Naman is OOF; genuinely hoping it's not presumptuous of me to reply to the thread - seems time sensitive. [...] > > I do not know what OpenHCL is. Nor is it clear from the code what NMIs > can't happen. Anyway, same can be achieved with breakpoints / kprobes. > You can get a trap after setting CR2 and scribble it. > > You simply cannot use CR2 this way. > The code in question runs with interrupts disabled, and the kernel runs without the memory swapping when using that module - the kernel is a firmware to host a vTPM for virtual machines. Somewhat similar to SMM. That should've been reflected somewhere in the comments and in Kconfig, we could do better. All in all, the page fault cannot happen in that path thus CR2 won't be trashed. Nor this kind of code can be stepped through in a self-hosted kernel debugger like kgdb. There are other examples of such code iiuc: the asm glue code in the interrupt exception entries where the trap frames are handled, likely return from the kernel to the user land. The thread context-swap code hardly can be stepped through with convenience if at all in the self-hosted debugger, too. >>> And an rax:rcx return, I though the canonical pair was AX,DX !?!? >> >> Here, the code uses rax and rcx not as a means to return one 128 bit >> value. The code uses them in that way as an ABI. > > Still daft. Creating an ABI that goes against pre-existing conventions > is weird. > It is weird. It really sucks to have to conform to ABIs introduced a decade+ ago when Hyper-V just appeared in the kernel and just for x86. As weird as the pair ax:cx looks for anyone who takes joy and pride in writing x86 asm code, it still works for the customers, we have to care about the backward compat. From the discussion, it doesn't appear we can ask for much as you're right: the asm chunk looks abominable due to being essentially a context-swap with an entity foreign to the Linux/System-V ABI. Same must be true for the calls into the UEFI runtime services to a certain extent due to different calling conventions. We have a much cleaner story on ARM64 due to no legacy and using their calling conventions aka AAPCS64 and other standards everywhere (not only in Linux Hyper-V code) to the best of my knowledge. If nothing of that saves the patches from the death row, maybe it'd be possible to give the patches the experimental status or get some time extension to learn what can be improved? I am asking to save the time spent by folks reviewing the parts that you don't see as being prohibitively bad. >>> Also, that STACK_FRAME_NON_STANDARD() annotation is broken, this must >>> not be used for anything that can end up in vmlinux.o -- that is, the >>> moment you built-in this driver (=y) this comes unstuck. >>> >>> The reason you're getting warnings is because you're violating the >>> normal calling convention and scribbling BP, we yelled at the TDX guys >>> for doing this, now you're getting yelled at. WTF !?! >>> >>> Please explain how just shutting up objtool makes the unwind work when >>> the NMI hits your BP scribble? >> >> Returning to a lower VTL treats the base pointer register as a general >> purpose one and this VTL transition function makes sure to preserve the >> bp register due to which the objtool trips over on the assembly touching >> the bp register. We considered this warning harmless and thus we are >> using this macro. Moreover NMIs are not an issue here as they won't be >> coming as mentioned other. If there are alternate approaches that I should >> be using, please suggest. > > Using BP in an ABI like that is ridiculous and broken. We told the same > to the TDX folks when they tried, IIRC TDX was fixed. > > It is simply not acceptable to break the regular calling convention with > a new ABI. > > Again, I can put a breakpoint or kprobe in the region where BP is > scribbled. > > Basically the argument is really simple: you run in Linux, you play by > the Linux rules. Using BP as argument is simply not possible. If your > ABI requires that, your ABI is broken and will not be supported. Rev the > ABI and try again. Same for CR2, that is not an available register in > any way. > >> I now understand that as part of your effort to enable IBT config on >> x64, you changed the indirect calls to direct calls in Hyper-V. > > Yeah, I was cleaning up indirect calls, and this really didn't need to > be one. > >> As of today, there is no requirement to enable IBT in OpenHCL kernel >> as this runs as a paravisor in VTL2 and it does not effect the guest >> VMs running >> in VTL0. > > I do not know what OpenHCL or VTLn means and as such pretty much the > whole of your statement makes no sense. > > Anyway, AFAICT the whole idea of a hypercall page is to 'abtract' out > the VMCALL vs VMMCALL nonsense Intel/AMD inflicted on us. Surely you > don't actually need that. HyperV already knows about all the gazillion > of ways to do hypercalls. > >> I can disable CONFIG_X86_KERNEL_IBT when CONFIG_MSHV_VTL is enabled in >> Kconfig in next version. > > Or you can just straight up say: "We at microsoft don't care about > security." :-/ > > Doing that will ensure no distro will build your module, most build bots > will not build your module, nobody cares about your module. > > And no, the problems with BP and CR2 are not related to IBT, that is > separate and no less broken. They violate the basic rules of x86_64. -- Thank you, Roman
On 8/27/25 01:04, Roman Kisel wrote:
> On 8/26/2025 5:07 AM, Peter Zijlstra wrote:
>> I do not know what OpenHCL is. Nor is it clear from the code what NMIs
>> can't happen. Anyway, same can be achieved with breakpoints / kprobes.
>> You can get a trap after setting CR2 and scribble it.
>>
>> You simply cannot use CR2 this way.
>
> The code in question runs with interrupts disabled, and the kernel runs
> without the memory swapping when using that module - the kernel is
> a firmware to host a vTPM for virtual machines. Somewhat similar to SMM.
> That should've been reflected somewhere in the comments and in Kconfig,
> we could do better. All in all, the page fault cannot happen in that
> path thus CR2 won't be trashed.
>
> Nor this kind of code can be stepped through in a self-hosted
> kernel debugger like kgdb. There are other examples of such code iiuc:
As Sean mentioned, you do have to make sure that this is annotated as
noinstr (not instrumentable). And also just use assembly - KVM started
with a similar asm block, though without the sketchy "register asm", and
I was initially skeptical but using a dedicated .S file was absolutely
the right thing to do.
This will reduce mshv_vtl_return to a much nicer
hvp = hv_vp_assist_page[smp_processor_id()];
if (mshv_vsm_capabilities.return_action_available) {
...
}
kernel_fpu_begin_mask(0);
fxrstor(&vtl0->fx_state);
__mshv_vtl_return(vtl0, hvp);
fxsave(&vtl0->fx_state);
kernel_fpu_end();
and your assembly function __mshv_vtl_return() will look like
.section .noinstr.text, "ax"
SYM_FUNC_START(__mshv_vtl_return)
push %rbp
mov %rsp, %rbp
/* push other callee-save registers r12-r15, rbx */
...
/* register switch clobbers all registers except rax/rcx */
mov %_ASM_ARG1, %rax
mov %_ASM_ARG2, %rcx
/* grab rbx/rbp/rsi/rdi/r8-r15 */
mov MSHV_VTL_CPU_CONTEXT_rbx(%rax), %rbx
mov MSHV_VTL_CPU_CONTEXT_rbp(%rax), %rbp
...
/* these are special... */
mov MSHV_VTL_CPU_CONTEXT_rax(%rax), %rdx
mov %rdx, HV_VP_ASSIST_PAGE_vtl_ret_x64rax(%rcx)
mov MSHV_VTL_CPU_CONTEXT_rcx(%rax), %rdx
mov %rdx, HV_VP_ASSIST_PAGE_vtl_ret_x64rcx(%rcx)
mov MSHV_VTL_CPU_CONTEXT_cr2(%rax), %rdx
mov %rdx, %cr2
mov MSHV_VTL_CPU_CONTEXT_rdx(%rax), %rdx
/* stash host registers on stack */
pushq %rax
pushq %rcx
xor %ecx, %ecx
(call here, see below)
/* stash guest registers on stack, restore saved host copies */
pushq %rax
pushq %rcx
mov 16(%rsp), %rcx
mov 24(%rsp), %rax
/* these are special... */
mov %rdx, MSHV_VTL_CPU_CONTEXT_rdx(%rax)
mov %cr2, %rdx
mov %rdx, MSHV_VTL_CPU_CONTEXT_cr2(%rax)
pop MSHV_VTL_CPU_CONTEXT_rcx(%rax)
pop MSHV_VTL_CPU_CONTEXT_rax(%rax)
add $16, %%rsp
/* save rbx/rbp/rsi/rdi/r8-r15 */
mov %rbx, MSHV_VTL_CPU_CONTEXT_rbx(%rax)
mov %rbp, MSHV_VTL_CPU_CONTEXT_rbp(%rax)
...
/* pop callee-save registers r12-r15, rbx */
...
pop %rbp
RET
SYM_FUNC_END(__mshv_vtl_return)
(You can use a custom mshv-asm-offsets.c similar to
arch/x86/kvm/asm-offsets.c, with corresponding Makefile rules). It's a
tiny bit longer, but not even that much given all the register and asm
constraints boilerplate. And you get more freedom in return.
The indirect call is potentially problematic, but one possibility is to
use a static call. I haven't looked too deeply into it, but I *think*
it's mostly a matter of making mshv "depends on HAVE_STATIC_CALL" and
making it possible to include include/linux/static_call_types.h from
assembly, so that you can just write
DEFINE_STATIC_CALL_NULL(hv_vtl_return_hypercall, void (*)(void));
...
static_call_update(hv_vtl_return_hypercall,
(u64)((u8 *)hv_hypercall_pg +
mshv_vsm_page_offsets.vtl_return_offset));
and in the assembly file:
call STATIC_CALL_TRAMP(hv_vtl_return_hypercall)
> We have a much cleaner story on ARM64 due to no legacy and using
> their calling conventions aka AAPCS64 and other standards everywhere
> (not only in Linux Hyper-V code) to the best of my knowledge.
It's fine. The RAX/RCX pair is weird, but more in the sense that you
don't even need to do that for two registers (RAX and RSP, or RCX and
RSP, would be enough). It makes sense though that they just used the
first two in the x86 encoding order.
And anyhow, you can't fix it just like
drivers/gpu/drm/vmwgfx/vmwgfx_msg.c cannot be fixed. It's just that
STACK_FRAME_NON_STANDARD is pretty rough instrument, and in your case
the whole register + asm + STACK_FRAME_NON_STANDARD is a lot more
complex than just putting together a .S file for the noinstr parts.
>>> Returning to a lower VTL treats the base pointer register as a general
>>> purpose one and this VTL transition function makes sure to preserve the
>>> bp register due to which the objtool trips over on the assembly touching
>>> the bp register. We considered this warning harmless and thus we are
>>> using this macro. Moreover NMIs are not an issue here as they won't be
>>> coming as mentioned other. If there are alternate approaches that I
>>> should be using, please suggest.
>>
>> Using BP in an ABI like that is ridiculous and broken. We told the same
>> to the TDX folks when they tried, IIRC TDX was fixed.
I agree it's not great, but it's doable, after all VMX also uses RBP as
an "argument" (in the sense that it's both an input and an output
register to VMLAUNCH and VMRESUME).
>> Basically the argument is really simple: you run in Linux, you play by
>> the Linux rules. Using BP as argument is simply not possible. If your
>> ABI requires that, your ABI is broken and will not be supported. Rev the
>> ABI and try again. Same for CR2, that is not an available register in
>> any way.
Aside for Peter: please tone down the rhetoric, or just turn it off
completely. It helps no one and this hardly makes sense for this kind
of code (which should not be C, but that's easily fixed isn't it?).
Yes, it would be easier if the register switch was done in the
hypervisor and not in Linux, with a nice 16*8-byte block passed in RAX
or something like that, but so is life.
Paolo
On Tue, Sep 16, 2025, Paolo Bonzini wrote: > On 8/27/25 01:04, Roman Kisel wrote: > > On 8/26/2025 5:07 AM, Peter Zijlstra wrote: > > > I do not know what OpenHCL is. Nor is it clear from the code what NMIs > > > can't happen. Anyway, same can be achieved with breakpoints / kprobes. > > > You can get a trap after setting CR2 and scribble it. > > > > > > You simply cannot use CR2 this way. > > > > The code in question runs with interrupts disabled, and the kernel runs > > without the memory swapping when using that module - the kernel is > > a firmware to host a vTPM for virtual machines. Somewhat similar to SMM. > > That should've been reflected somewhere in the comments and in Kconfig, > > we could do better. All in all, the page fault cannot happen in that > > path thus CR2 won't be trashed. > > > > Nor this kind of code can be stepped through in a self-hosted > > kernel debugger like kgdb. There are other examples of such code iiuc: > > As Sean mentioned, you do have to make sure that this is annotated as > noinstr (not instrumentable). And also just use assembly - KVM started with > a similar asm block, though without the sketchy "register asm", Ooh, yeah, don't use "register asm". I missed that when I peeked at the code. Using "register asm" will most definitely cause problems, because the compiler doesn't track usage in C code, i.e. will happily use the GPR and clobber your asm value in the process. That inevitably leads to very confusing and somewhat transient errors. E.g. if someone inserts a printk for debugging, the call to printk can clobber the very state it's trying to print. > and I was initially skeptical but using a dedicated .S file was absolutely > the right thing to do. +1000 to putting the assembly in a .S file. I too was a bit skeptical about moving the entire sequence into proper assembly; thankfully, some non-KVM folks talked us into it :-)
On 9/16/2025 8:22 PM, Sean Christopherson wrote: > On Tue, Sep 16, 2025, Paolo Bonzini wrote: >> On 8/27/25 01:04, Roman Kisel wrote: >>> On 8/26/2025 5:07 AM, Peter Zijlstra wrote: >>>> I do not know what OpenHCL is. Nor is it clear from the code what NMIs >>>> can't happen. Anyway, same can be achieved with breakpoints / kprobes. >>>> You can get a trap after setting CR2 and scribble it. >>>> >>>> You simply cannot use CR2 this way. >>> >>> The code in question runs with interrupts disabled, and the kernel runs >>> without the memory swapping when using that module - the kernel is >>> a firmware to host a vTPM for virtual machines. Somewhat similar to SMM. >>> That should've been reflected somewhere in the comments and in Kconfig, >>> we could do better. All in all, the page fault cannot happen in that >>> path thus CR2 won't be trashed. >>> >>> Nor this kind of code can be stepped through in a self-hosted >>> kernel debugger like kgdb. There are other examples of such code iiuc: >> >> As Sean mentioned, you do have to make sure that this is annotated as >> noinstr (not instrumentable). And also just use assembly - KVM started with >> a similar asm block, though without the sketchy "register asm", > > Ooh, yeah, don't use "register asm". I missed that when I peeked at the code. > Using "register asm" will most definitely cause problems, because the compiler > doesn't track usage in C code, i.e. will happily use the GPR and clobber your > asm value in the process. That inevitably leads to very confusing and somewhat > transient errors. E.g. if someone inserts a printk for debugging, the call to > printk can clobber the very state it's trying to print. > >> and I was initially skeptical but using a dedicated .S file was absolutely >> the right thing to do. > > +1000 to putting the assembly in a .S file. I too was a bit skeptical about > moving the entire sequence into proper assembly; thankfully, some non-KVM folks > talked us into it :-) Thank you so much Sean and Paolo for your valuable inputs. I will try out these things. Summarizing the suggestions here: * Use noinstr (no instrumentation) * Have separate .S file * Don't use "register asm". * Use static calls for solving IBT problems * RAX:RCX is probably ok to be used, considering ABI. Whether we would still need to use STACK_FRAME_NON_STANDARD, I am not sure, but I will see based on how it goes. I hope this addresses the concerns Peter raised. If there's anything I might have missed, I'm happy to make further adjustments if needed. Regards, Naman
On Thu, Sep 18, 2025 at 11:33:18AM +0530, Naman Jain wrote: > Thank you so much Sean and Paolo for your valuable inputs. I will try > out these things. Summarizing the suggestions here: > * Use noinstr (no instrumentation) > * Have separate .S file > * Don't use "register asm". > * Use static calls for solving IBT problems > * RAX:RCX is probably ok to be used, considering ABI. Whether we would still > need to use STACK_FRAME_NON_STANDARD, I am not sure, but I will see based on > how it goes. > > I hope this addresses the concerns Peter raised. If there's anything I might > have missed, I'm happy to make further adjustments if needed. It would be a definite improvement. I'm just *really* sad people still create interfaces like this, even though we've known for years how bad they are. At some point we've really have to push back and say enough is enough.
On 9/18/2025 12:17 PM, Peter Zijlstra wrote: > On Thu, Sep 18, 2025 at 11:33:18AM +0530, Naman Jain wrote: > >> Thank you so much Sean and Paolo for your valuable inputs. I will try >> out these things. Summarizing the suggestions here: >> * Use noinstr (no instrumentation) >> * Have separate .S file >> * Don't use "register asm". >> * Use static calls for solving IBT problems >> * RAX:RCX is probably ok to be used, considering ABI. Whether we would still >> need to use STACK_FRAME_NON_STANDARD, I am not sure, but I will see based on >> how it goes. >> >> I hope this addresses the concerns Peter raised. If there's anything I might >> have missed, I'm happy to make further adjustments if needed. > > It would be a definite improvement. I'm just *really* sad people still > create interfaces like this, even though we've known for years how bad > they are. > > At some point we've really have to push back and say enough is enough. Hi Peter, Paolo, Sean, I am facing issues in this approach, after moving the assembly code to a separate file, using static calls, and making it noinstr. We need to make a call to STATIC_CALL_TRAMP_STR(hv_hypercall_pg + offset) in the assembly code. This offset is populated at run time in the driver, so I have to pass this offset to the assembly function via function parameters or a shared variable. This leaves noinstr section and results in below warning: [1]: vmlinux.o: warning: objtool: __mshv_vtl_return_call+0x4f: call to mshv_vtl_call_addr() leaves .noinstr.text section To fix this, one of the ways was to avoid making indirect calls. So I used EXPORT_STATIC_CALL to export the static call *trampoline and key* for the static call we created in C driver. Then I figured, we could simply call __SCT__<static_callname> in assembly code and it should work fine. But then it leads to this error in objtool. [2]: arch/x86/hyperv/mshv_vtl_asm.o: error: objtool: static_call: can't find static_call_key symbol: __SCK__mshv_vtl_return_hypercall This gets hidden/fixed with X86_KERNEL_IBT config enablement. If I comment out the objtool check that leads to this warning, I can see the same symbols for __SCT__, __SCK__ in final vmlinux with and without IBT, and it even works fine on the VM. So it seems to be a matter of timing when objtool is checking for this symbol. My theory is IBT enablement is helping here as it adds ENDBR instructions to various static/indirect/direct calls, which may be adding the missing symbol to the section before objtool checks for it. Other ways I tried to fix the initial problem of noinstr was to make the exported shared pointer variable as noinstr, but it does not work since noinstr can only be applied to functions and not data types. KVM code had similar noisntr calls from assembly, but they have actual functions which are marked noinstr, not just some address stored in a variable. I also tried of introducing wrapper functions marked as noinstr, to initialize static calls but it does not help. Adding compilation dependencies in Makefile also does not help. $(obj)/mshv_vtl_asm.o: $(obj)/hv_vtl.o This left me with no option, but to enable IBT and add the dependency. But yes, this is controversial and if there are alternate ways to handle this or make it work, I would be more than happy to hear your suggestions on it and implement them. Regards, Naman
On Mon, Oct 06, 2025 at 04:20:03PM +0530, Naman Jain wrote: > I am facing issues in this approach, after moving the assembly code to a > separate file, using static calls, and making it noinstr. > > We need to make a call to STATIC_CALL_TRAMP_STR(hv_hypercall_pg + offset) in > the assembly code. This offset is populated at run time in the driver, so I > have to pass this offset to the assembly function via function parameters or > a shared variable. This leaves noinstr section and results in below warning: > > [1]: vmlinux.o: warning: objtool: __mshv_vtl_return_call+0x4f: call to > mshv_vtl_call_addr() leaves .noinstr.text section > > > To fix this, one of the ways was to avoid making indirect calls. So I used > EXPORT_STATIC_CALL to export the static call *trampoline and key* for the > static call we created in C driver. Then I figured, we could simply call > __SCT__<static_callname> in assembly code and it should work fine. But then > it leads to this error in objtool. Easiest solution is to create a second static_call and have hv_set_hypercall_pg() set that to +offset. DEFINE_STATIC_CALL(__hv_vtl_hypercall); hv_set_hypercall() ... static_call_update(__hv_vtl_hypercall, ptr+offset); /* +- cast to right function type */ > [2]: arch/x86/hyperv/mshv_vtl_asm.o: error: objtool: static_call: can't find > static_call_key symbol: __SCK__mshv_vtl_return_hypercall Look at arch/x86/include/asm/preempt.h you might need that __STATIC_CALL_MOD_ADDRESSABLE() thing somewhere. Also, what's actually in that hypercall page that is so magical and can't just be an ALTERNATIVE() ?
On Mon, Oct 6, 2025 at 1:10 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Oct 06, 2025 at 04:20:03PM +0530, Naman Jain wrote: > > > I am facing issues in this approach, after moving the assembly code to a > > separate file, using static calls, and making it noinstr. > > > > We need to make a call to STATIC_CALL_TRAMP_STR(hv_hypercall_pg + offset) in > > the assembly code. This offset is populated at run time in the driver, so I > > have to pass this offset to the assembly function via function parameters or > > a shared variable. This leaves noinstr section and results in below warning: > > > > [1]: vmlinux.o: warning: objtool: __mshv_vtl_return_call+0x4f: call to > > mshv_vtl_call_addr() leaves .noinstr.text section > > > > > > To fix this, one of the ways was to avoid making indirect calls. So I used > > EXPORT_STATIC_CALL to export the static call *trampoline and key* for the > > static call we created in C driver. Then I figured, we could simply call > > __SCT__<static_callname> in assembly code and it should work fine. But then > > it leads to this error in objtool. > > Easiest solution is to create a second static_call and have > hv_set_hypercall_pg() set that to +offset. Yes, my idea was to add +offset directly in the static_call_update, as you sketched below. Sorry if that wasn't too clear. I didn't think of using a static call also for the base of the page, since that's not what the assembly code needs. And I think we agree that you absolutely don't want indirect calls, as that makes register usage much simpler. This way static calls end up killing multiple birds with a stone. > Also, what's actually in that hypercall page that is so magical and > can't just be an ALTERNATIVE() ? Do you mean an alternative for VMCALL vs. VMMCALL? If so, that's just not guaranteed to work: "An attempt to invoke a hypercall by any other means (for example, copying the code from the hypercall code page to an alternate location and executing it from there) might result in an undefined operation (#UD) exception" (or it might not). Paolo
On 10/6/2025 4:49 PM, Paolo Bonzini wrote: > On Mon, Oct 6, 2025 at 1:10 PM Peter Zijlstra <peterz@infradead.org> wrote: >> >> On Mon, Oct 06, 2025 at 04:20:03PM +0530, Naman Jain wrote: >> >>> I am facing issues in this approach, after moving the assembly code to a >>> separate file, using static calls, and making it noinstr. >>> >>> We need to make a call to STATIC_CALL_TRAMP_STR(hv_hypercall_pg + offset) in >>> the assembly code. This offset is populated at run time in the driver, so I >>> have to pass this offset to the assembly function via function parameters or >>> a shared variable. This leaves noinstr section and results in below warning: >>> >>> [1]: vmlinux.o: warning: objtool: __mshv_vtl_return_call+0x4f: call to >>> mshv_vtl_call_addr() leaves .noinstr.text section >>> >>> >>> To fix this, one of the ways was to avoid making indirect calls. So I used >>> EXPORT_STATIC_CALL to export the static call *trampoline and key* for the >>> static call we created in C driver. Then I figured, we could simply call >>> __SCT__<static_callname> in assembly code and it should work fine. But then >>> it leads to this error in objtool. >> >> Easiest solution is to create a second static_call and have >> hv_set_hypercall_pg() set that to +offset. > > Yes, my idea was to add +offset directly in the static_call_update, as > you sketched below. Sorry if that wasn't too clear. I didn't think of > using a static call also for the base of the page, since that's not > what the assembly code needs. > > And I think we agree that you absolutely don't want indirect calls, as > that makes register usage much simpler. This way static calls end up > killing multiple birds with a stone. > Actually I was implementing it in the same way by introducing a new static call using hypercall_pg+offset. The problem was using it in asm file. When I tried introducing a wrapper static call earlier, I did not use ASM_CALL_CONSTRAINT, so objtool was asking me to save/restore registers again. Since this was not possible, I thought it may not work. With something similar to how its done in arch/x86/include/asm/preempt.h, I was able to make it work, and remove dependency on IBT config. Thanks a lot Peter and Paolo for helping on this. Regards, Naman >> Also, what's actually in that hypercall page that is so magical and >> can't just be an ALTERNATIVE() ? > > Do you mean an alternative for VMCALL vs. VMMCALL? If so, that's just > not guaranteed to work: "An attempt to invoke a hypercall by any other > means (for example, copying the code from the hypercall code page to > an alternate location and executing it from there) might result in an > undefined operation (#UD) exception" (or it might not). > > Paolo >
On Thu, 2025-09-18 at 08:47 +0200, Peter Zijlstra wrote: > On Thu, Sep 18, 2025 at 11:33:18AM +0530, Naman Jain wrote: > > > Thank you so much Sean and Paolo for your valuable inputs. I will > > try out these things. Summarizing the suggestions here: > > * Use noinstr (no instrumentation) > > * Have separate .S file > > * Don't use "register asm". > > * Use static calls for solving IBT problems > > * RAX:RCX is probably ok to be used, considering ABI. Whether we > > would still need to use STACK_FRAME_NON_STANDARD, I am not sure, > > but I will see based on how it goes. > > > > I hope this addresses the concerns Peter raised. If there's > > anything I might have missed, I'm happy to make further adjustments > > if needed. > > It would be a definite improvement. I'm just *really* sad people > still create interfaces like this, even though we've known for years > how bad they are. Thanks for that. Our only defence at Microsoft is that the hypervisor ABI for this is 10 years old and counting. > At some point we've really have to push back and say enough is > enough. Although we can't really do anything about the old ABI since we have tons of things that use it, there is interest in Microsoft for co- designing a new ABI that we could add to hyper-v for Linux use and which would also be a part of bringing up and using virtual secure mode (VSM) in KVM via the planes API. I was thinking the best way of discussing it as a community would be the PUCK call, but if there are alternative forums, we could use that as well. This is going to be pretty long term: besides designing and testing a new ABI, we were counting on Amazon upstreaming their VSM implementation for KVM and we're a bit under resourced here if it turns out we have to do everything. Regards, James
On Mon, Aug 25, 2025 at 11:22:08AM +0530, Naman Jain wrote:
> With commit 0e20f1f4c2cb ("x86/hyperv: Clean up hv_do_hypercall()"),
> config checks were added to conditionally restrict export
> of hv_hypercall_pg symbol at the same time when a usage of that symbol
> was added in mshv_vtl_main.c driver. This results in missing symbol
> warning when mshv_vtl_main is compiled. Change the logic to
> export it unconditionally.
Note that exporting variables like this always is a bad idea only
used as a last resort. It would be much better to just build a proper
API for using it instead.
© 2016 - 2026 Red Hat, Inc.