From: Andrew Cooper <andrew.cooper3@citrix.com>
Per the discussion about FRED MSR writes with WRMSRNS instruction [1],
use the alternatives mechanism to choose WRMSRNS when it's available,
otherwise fallback to WRMSR.
[1] https://lore.kernel.org/lkml/15f56e6a-6edd-43d0-8e83-bb6430096514@citrix.com/
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
arch/x86/include/asm/msr.h | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index d642037f9ed5..3e402d717815 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -99,19 +99,6 @@ static __always_inline void __wrmsr(unsigned int msr, u32 low, u32 high)
: : "c" (msr), "a"(low), "d" (high) : "memory");
}
-/*
- * WRMSRNS behaves exactly like WRMSR with the only difference being
- * that it is not a serializing instruction by default.
- */
-static __always_inline void __wrmsrns(u32 msr, u32 low, u32 high)
-{
- /* Instruction opcode for WRMSRNS; supported in binutils >= 2.40. */
- asm volatile("1: .byte 0x0f,0x01,0xc6\n"
- "2:\n"
- _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
- : : "c" (msr), "a"(low), "d" (high));
-}
-
#define native_rdmsr(msr, val1, val2) \
do { \
u64 __val = __rdmsr((msr)); \
@@ -312,9 +299,22 @@ do { \
#endif /* !CONFIG_PARAVIRT_XXL */
+/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
+#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
+
+/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
static __always_inline void wrmsrns(u32 msr, u64 val)
{
- __wrmsrns(msr, val, val >> 32);
+ /*
+ * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
+ * DS prefix to avoid a trailing NOP.
+ */
+ asm volatile("1: "
+ ALTERNATIVE("ds wrmsr",
+ WRMSRNS, X86_FEATURE_WRMSRNS)
+ "2:\n"
+ _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
+ : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)));
}
/*
--
2.45.2
On 07/08/2024 6:47 am, Xin Li (Intel) wrote:
> From: Andrew Cooper <andrew.cooper3@citrix.com>
>
> Per the discussion about FRED MSR writes with WRMSRNS instruction [1],
> use the alternatives mechanism to choose WRMSRNS when it's available,
> otherwise fallback to WRMSR.
>
> [1] https://lore.kernel.org/lkml/15f56e6a-6edd-43d0-8e83-bb6430096514@citrix.com/
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> ---
> arch/x86/include/asm/msr.h | 28 ++++++++++++++--------------
> 1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index d642037f9ed5..3e402d717815 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -99,19 +99,6 @@ static __always_inline void __wrmsr(unsigned int msr, u32 low, u32 high)
> : : "c" (msr), "a"(low), "d" (high) : "memory");
> }
>
> -/*
> - * WRMSRNS behaves exactly like WRMSR with the only difference being
> - * that it is not a serializing instruction by default.
> - */
> -static __always_inline void __wrmsrns(u32 msr, u32 low, u32 high)
> -{
> - /* Instruction opcode for WRMSRNS; supported in binutils >= 2.40. */
> - asm volatile("1: .byte 0x0f,0x01,0xc6\n"
> - "2:\n"
> - _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
> - : : "c" (msr), "a"(low), "d" (high));
> -}
> -
> #define native_rdmsr(msr, val1, val2) \
> do { \
> u64 __val = __rdmsr((msr)); \
> @@ -312,9 +299,22 @@ do { \
>
> #endif /* !CONFIG_PARAVIRT_XXL */
>
> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
> +
> +/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
> static __always_inline void wrmsrns(u32 msr, u64 val)
> {
> - __wrmsrns(msr, val, val >> 32);
> + /*
> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
> + * DS prefix to avoid a trailing NOP.
> + */
> + asm volatile("1: "
> + ALTERNATIVE("ds wrmsr",
This isn't the version I presented, and there's no discussion of the
alteration.
The choice of CS over DS was deliberate, and came from Intel:
https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
So unless Intel want to retract that whitepaper, and all the binutils
work with it, I'd suggest keeping it as CS like we use elsewhere, and as
explicitly instructed by Intel.
~Andrew
On 8/9/2024 4:07 PM, Andrew Cooper wrote:
> On 07/08/2024 6:47 am, Xin Li (Intel) wrote:
>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
>> +
>> +/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
>> static __always_inline void wrmsrns(u32 msr, u64 val)
>> {
>> - __wrmsrns(msr, val, val >> 32);
>> + /*
>> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
>> + * DS prefix to avoid a trailing NOP.
>> + */
>> + asm volatile("1: "
>> + ALTERNATIVE("ds wrmsr",
>
> This isn't the version I presented, and there's no discussion of the
> alteration.
I'm trying to implement wrmsr() as
static __always_inline void wrmsr(u32 msr, u64 val)
{
asm volatile("1: " ALTERNATIVE_2("wrmsr", WRMSRNS, X86_FEATURE_WRMSRNS,
"call asm_xen_write_msr", X86_FEATURE_XENPV)
"2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
: : "c" (msr), "a" (val), "d" ((u32)(val >> 32)),
"D" (msr), "S" (val));
}
As the CALL instruction is 5-byte long, and we need to pad nop for both
WRMSR and WRMSRNS, what about not using segment prefix at all?
> The choice of CS over DS was deliberate, and came from Intel:
>
> https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
>
> So unless Intel want to retract that whitepaper, and all the binutils
> work with it, I'd suggest keeping it as CS like we use elsewhere, and as
> explicitly instructed by Intel.
On August 16, 2024 8:52:51 PM GMT+03:00, Xin Li <xin@zytor.com> wrote:
>On 8/9/2024 4:07 PM, Andrew Cooper wrote:
>> On 07/08/2024 6:47 am, Xin Li (Intel) wrote:
>>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>>> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
>>> +
>>> +/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
>>> static __always_inline void wrmsrns(u32 msr, u64 val)
>>> {
>>> - __wrmsrns(msr, val, val >> 32);
>>> + /*
>>> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
>>> + * DS prefix to avoid a trailing NOP.
>>> + */
>>> + asm volatile("1: "
>>> + ALTERNATIVE("ds wrmsr",
>>
>> This isn't the version I presented, and there's no discussion of the
>> alteration.
>
>I'm trying to implement wrmsr() as
>
>static __always_inline void wrmsr(u32 msr, u64 val)
>{
> asm volatile("1: " ALTERNATIVE_2("wrmsr", WRMSRNS, X86_FEATURE_WRMSRNS,
> "call asm_xen_write_msr", X86_FEATURE_XENPV)
> "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
> : : "c" (msr), "a" (val), "d" ((u32)(val >> 32)),
> "D" (msr), "S" (val));
>}
>
>
>As the CALL instruction is 5-byte long, and we need to pad nop for both
>WRMSR and WRMSRNS, what about not using segment prefix at all?
The alternatives macros can handle arbitrary insn sizes - no need for any padding.
--
Sent from a small device: formatting sucks and brevity is inevitable.
On August 17, 2024 7:23:07 AM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On August 16, 2024 8:52:51 PM GMT+03:00, Xin Li <xin@zytor.com> wrote:
>>On 8/9/2024 4:07 PM, Andrew Cooper wrote:
>>> On 07/08/2024 6:47 am, Xin Li (Intel) wrote:
>>>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>>>> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
>>>> +
>>>> +/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
>>>> static __always_inline void wrmsrns(u32 msr, u64 val)
>>>> {
>>>> - __wrmsrns(msr, val, val >> 32);
>>>> + /*
>>>> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
>>>> + * DS prefix to avoid a trailing NOP.
>>>> + */
>>>> + asm volatile("1: "
>>>> + ALTERNATIVE("ds wrmsr",
>>>
>>> This isn't the version I presented, and there's no discussion of the
>>> alteration.
>>
>>I'm trying to implement wrmsr() as
>>
>>static __always_inline void wrmsr(u32 msr, u64 val)
>>{
>> asm volatile("1: " ALTERNATIVE_2("wrmsr", WRMSRNS, X86_FEATURE_WRMSRNS,
>> "call asm_xen_write_msr", X86_FEATURE_XENPV)
>> "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
>> : : "c" (msr), "a" (val), "d" ((u32)(val >> 32)),
>> "D" (msr), "S" (val));
>>}
>>
>>
>>As the CALL instruction is 5-byte long, and we need to pad nop for both
>>WRMSR and WRMSRNS, what about not using segment prefix at all?
>
>The alternatives macros can handle arbitrary insn sizes - no need for any padding.
>
>
The padding avoids unnecessary NOPs.
On Sat, Aug 17, 2024 at 12:22:22PM -0700, H. Peter Anvin wrote:
> The padding avoids unnecessary NOPs.
And a NOP vs some other prefix to have one less insn matters practically
how exactly here?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/17/24 22:49, Borislav Petkov wrote: > On Sat, Aug 17, 2024 at 12:22:22PM -0700, H. Peter Anvin wrote: >> The padding avoids unnecessary NOPs. > > And a NOP vs some other prefix to have one less insn matters practically > how exactly here? Pretty much nothing, but it is zero cost and possibly some insanely small gain. -hpa
On Sat, Aug 17, 2024 at 05:23:07PM +0300, Borislav Petkov wrote:
> >static __always_inline void wrmsr(u32 msr, u64 val)
> >{
> > asm volatile("1: " ALTERNATIVE_2("wrmsr", WRMSRNS, X86_FEATURE_WRMSRNS,
> > "call asm_xen_write_msr", X86_FEATURE_XENPV)
> > "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
> > : : "c" (msr), "a" (val), "d" ((u32)(val >> 32)),
> > "D" (msr), "S" (val));
> >}
> >
> >
> >As the CALL instruction is 5-byte long, and we need to pad nop for both
> >WRMSR and WRMSRNS, what about not using segment prefix at all?
>
> The alternatives macros can handle arbitrary insn sizes - no need for any padding.
What you cannot do is have a CALL insn in only one of the alternative
insn groups because the compiler is free to clobber callee regs and
those might be live where you place the alternative and thus will have
a nice lovely corruption.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On August 17, 2024 7:43:16 AM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On Sat, Aug 17, 2024 at 05:23:07PM +0300, Borislav Petkov wrote:
>> >static __always_inline void wrmsr(u32 msr, u64 val)
>> >{
>> > asm volatile("1: " ALTERNATIVE_2("wrmsr", WRMSRNS, X86_FEATURE_WRMSRNS,
>> > "call asm_xen_write_msr", X86_FEATURE_XENPV)
>> > "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
>> > : : "c" (msr), "a" (val), "d" ((u32)(val >> 32)),
>> > "D" (msr), "S" (val));
>> >}
>> >
>> >
>> >As the CALL instruction is 5-byte long, and we need to pad nop for both
>> >WRMSR and WRMSRNS, what about not using segment prefix at all?
>>
>> The alternatives macros can handle arbitrary insn sizes - no need for any padding.
>
>What you cannot do is have a CALL insn in only one of the alternative
>insn groups because the compiler is free to clobber callee regs and
>those might be live where you place the alternative and thus will have
>a nice lovely corruption.
>
Only if the call goes to a C function.
On Sat, Aug 17, 2024 at 12:22:56PM -0700, H. Peter Anvin wrote:
> Only if the call goes to a C function.
Aha, you have a patch in the thread which calls a by-hand asm thing.
I haven't looked in detail but we have a THUNK macro for such stuff.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/17/2024 7:43 AM, Borislav Petkov wrote:
> On Sat, Aug 17, 2024 at 05:23:07PM +0300, Borislav Petkov wrote:
>>> static __always_inline void wrmsr(u32 msr, u64 val)
>>> {
>>> asm volatile("1: " ALTERNATIVE_2("wrmsr", WRMSRNS, X86_FEATURE_WRMSRNS,
>>> "call asm_xen_write_msr", X86_FEATURE_XENPV)
>>> "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
>>> : : "c" (msr), "a" (val), "d" ((u32)(val >> 32)),
>>> "D" (msr), "S" (val));
>>> }
>>>
>>>
>>> As the CALL instruction is 5-byte long, and we need to pad nop for both
>>> WRMSR and WRMSRNS, what about not using segment prefix at all?
>>
>> The alternatives macros can handle arbitrary insn sizes - no need for any padding.
>
> What you cannot do is have a CALL insn in only one of the alternative
> insn groups because the compiler is free to clobber callee regs and
> those might be live where you place the alternative and thus will have
> a nice lovely corruption.
Yeah, asm_xen_write_msr() is a wrapper function to xen_write_msr() which
saves/restores those regs.
My first draft patch calls xen_write_msr() directly and works fine. But
hpa said the same thing as you said.
Thanks!
Xin
On 16/08/2024 6:52 pm, Xin Li wrote:
> On 8/9/2024 4:07 PM, Andrew Cooper wrote:
>> On 07/08/2024 6:47 am, Xin Li (Intel) wrote:
>>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>>> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
>>> +
>>> +/* Non-serializing WRMSR, when available. Falls back to a
>>> serializing WRMSR. */
>>> static __always_inline void wrmsrns(u32 msr, u64 val)
>>> {
>>> - __wrmsrns(msr, val, val >> 32);
>>> + /*
>>> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a
>>> redundant
>>> + * DS prefix to avoid a trailing NOP.
>>> + */
>>> + asm volatile("1: "
>>> + ALTERNATIVE("ds wrmsr",
>>
>> This isn't the version I presented, and there's no discussion of the
>> alteration.
>
> I'm trying to implement wrmsr() as
>
> static __always_inline void wrmsr(u32 msr, u64 val)
> {
> asm volatile("1: " ALTERNATIVE_2("wrmsr", WRMSRNS,
> X86_FEATURE_WRMSRNS,
> "call asm_xen_write_msr", X86_FEATURE_XENPV)
> "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
> : : "c" (msr), "a" (val), "d" ((u32)(val >> 32)),
> "D" (msr), "S" (val));
> }
>
>
> As the CALL instruction is 5-byte long, and we need to pad nop for both
> WRMSR and WRMSRNS, what about not using segment prefix at all?
The prefix was a minor optimisation to avoid having a trailing nop at
the end.
When combined with a call, you need 3 prefixes on WRMSR and 2 prefixes
on WRMSRNS to make all options be 5 bytes long.
That said, there's already a paravirt hook for this, and if you're
looking to work around the code gen mess for that, then doing it like
this by doubling up into rdi and rsi isn't great either.
My suggestion, not that I've had time to experiment, was to change
paravirt to use a non-C ABI and have asm_xen_write_msr() recombine
edx:eax into rsi. That way the top level wrmsr() retains sensible
codegen for native even when paravirt is active.
~Andrew
On 8/16/24 11:40, Andrew Cooper wrote: >> >> As the CALL instruction is 5-byte long, and we need to pad nop for both >> WRMSR and WRMSRNS, what about not using segment prefix at all? > You can use up to 4 prefixes of any kind (which includes opcode prefixes before 0F) before most decoders start hurting, so we can pad it out to 5 bytes by doing 3f 3f .. .. .. > > My suggestion, not that I've had time to experiment, was to change > paravirt to use a non-C ABI and have asm_xen_write_msr() recombine > edx:eax into rsi. That way the top level wrmsr() retains sensible > codegen for native even when paravirt is active. > I have attached what should be an "obvious" example... famous last words. -hpa /* * Input in %rax, MSR number in %ecx * %rdx is clobbered, as the native stub is assumed to do * * Change xen_do_write_msr to return its error code instead * of passing a pointer - this gives the extra benefit that * we can get the *actual invocation address* in the error * messages. * * ex_handler_msr() should be changed to get the MSR data * from %rax regardless of Xen vs native; alternatively the * Xen handler can set up %edx as well. * * Let the native pattern look like: * * 48 89 c2 mov %rax,%rdx * 48 c1 ea 20 shr $32,%rdx * 3e 0f 30 ds wrmsr <--- trap point * * ... which can be replaced with ... * 48 89 c2 mov %rax,%rdx * 48 c1 ea 20 shr $32,%rdx * 0f 01 c6 wrmsrns <--- trap point * * ... or ... * e8 xx xx xx xx call asm_xen_write_msr * 74 02 jz 1f * 3e 0f 0b ds ud2 <--- trap point * 1: * ds ud2 can of course be replaced with any other 3-byte trapping * instruction. * * This also removes the need for Xen to maintain different safe and * unsafe MSR routines, as the difference is handled by the same * trap handler as is used natively. */ SYM_FUNC_START(asm_xen_write_msr) FRAME_BEGIN push %rax /* Save in case of error */ push %rcx push %rsi push %rdi push %r8 push %r9 push %r10 push %r11 mov %rax,%rdi mov %ecx,%esi call xen_do_write_msr test %eax,%eax /* ZF=0 on error */ pop %r11 pop %r10 pop %r9 pop %r8 pop %rdi pop %rsi pop %rcx #ifdef OPTIONAL mov 4(%rsp),%edx #endif pop %rax FRAME_END RET SYM_FUNC_END(asm_xen_write_msr)
On 8/16/24 14:26, H. Peter Anvin wrote:
> On 8/16/24 11:40, Andrew Cooper wrote:
>>>
>>> As the CALL instruction is 5-byte long, and we need to pad nop for both
>>> WRMSR and WRMSRNS, what about not using segment prefix at all?
>>
>
> You can use up to 4 prefixes of any kind (which includes opcode prefixes
> before 0F) before most decoders start hurting, so we can pad it out to 5
> bytes by doing 3f 3f .. .. ..
>
>>
>> My suggestion, not that I've had time to experiment, was to change
>> paravirt to use a non-C ABI and have asm_xen_write_msr() recombine
>> edx:eax into rsi. That way the top level wrmsr() retains sensible
>> codegen for native even when paravirt is active.
>>
>
> I have attached what should be an "obvious" example... famous last words.
>
After looking at what xen_do_write_msr looks like, I realized we can do
*much* better than that. This means using two alternative sequences, one
for native/Xen and the other for native wrmsr/wrmsrns.
The call/jz sequence is *exactly* the same length as mov, shr, which
means that xen_do_write_msr can simply return a flag if it wants the MSR
access to be performed natively.
An important point is that in no code path can the error code be set to
-EIO except by native MSR invocation, so the only possibilities are
"done successfully" or "execute natively." [That being said, that would
not be *that* hard to fix if needed.]
The new C prototype for xen_do_write_msr() becomes:
bool xen_do_write_msr(uint32_t msr, uint64_t val)
... with a true return meaning "execute natively."
(I changed the order of the arguments from my last version since it is
more consistent with what is used everywhere else.)
RDMSR is a bit trickier. I think the best option there is to set up a
new permission trap handler type that amounts to "get the address from
the stack, apply a specific offset, and invoke the fixup handler for
that address:
case EX_TYPE_UPLEVEL: {
/* Let reg hold the unsigned number of machine
* words to pop off the stack before the return
* address, and imm the signed offset from the
* return address to the desired trap point.
*
* pointer in units of machine words, and imm the
* signed offset from this stack word...
*/
unsigned long *sp = (unsigned long *)regs->sp + reg;
regs->ip = *sp++ + (int16_t)imm;
regs->sp = (unsigned long)sp;
goto again; /* Loop back to the beginning */
}
Again, "obviously correct" code attached.
-hpa
NOTE:
I had to dig several levels down to uncover actual situation, but
pmu_msr_write() is actually a completely pointless function: all it does
is shuffle some arguments, then calls pmu_msr_chk_emulated() and if it
returns true AND the emulated flag is clear then does *exactly the same
thing* that the calling code would have done if pmu_msr_write() itself
had returned true... in other words, the whole function could be
replaced by:
bool pmu_msr_write(uint32_t msr, uint64_t val)
{
bool emulated;
return pmu_msr_chk_emulated(msr, &val, false, &emulated) &&
emulated;
}
pmu_msr_read() does the equivalent stupidity; the obvious fix(es) to
pmu_msr_chk_emulated() I hope are obvious.
/*
* Input in %rax, MSR number in %ecx
*
* %edx is set up to match %rax >> 32 like the native stub
* is expected to do
*
* Change xen_do_write_msr to return a true value if
* the MSR access should be executed natively (or vice versa,
* if you prefer.)
*
* bool xen_do_write_msr(uint32_t msr, uint64_t value)
*
* Let the native pattern look like:
*
* 48 89 c2 mov %rax,%rdx
* 48 c1 ea 20 shr $32,%rdx
* 3e 0f 30 ds wrmsr <--- trap point
*
* ... which can be replaced with ...
* 48 89 c2 mov %rax,%rdx
* 48 c1 ea 20 shr $32,%rdx
* 0f 01 c6 wrmsrns <--- trap point
*
* FOR XEN, replace the FIRST SEVEN BYTES with:
* e8 xx xx xx xx call asm_xen_write_msr
* 74 03 jz .+5
*
* If ZF=0 then this will fall down to the actual native WRMSR[NS]
* instruction.
*
* This also removes the need for Xen to maintain different safe and
* unsafe MSR routines, as the difference is handled by the same
* trap handler as is used natively.
*/
SYM_FUNC_START(asm_xen_write_msr)
FRAME_BEGIN
push %rax /* Save in case of native fallback */
push %rcx
push %rsi
push %rdi
push %r8
push %r9
push %r10
push %r11
mov %ecx,%edi /* MSR number */
mov %rax,%rsi /* MSR data */
call xen_do_write_msr
test %al,%al /* ZF=1 means we are done */
pop %r11
pop %r10
pop %r9
pop %r8
pop %rdi
pop %rsi
pop %rcx
mov 4(%rsp),%edx /* Set up %edx for native execution */
pop %rax
FRAME_END
RET
SYM_FUNC_END(asm_xen_write_msr)
/*
* RDMSR code. It isn't quite as clean; it requires a new trap handler
* type:
*
* case EX_TYPE_UPLEVEL: {
* /* Let reg hold the unsigned number of machine
* * words to pop off the stack before the return
* * address, and imm the signed offset from the
* * return address to the desired trap point.
* *
* * pointer in units of machine words, and imm the
* * signed offset from this stack word...
* * /
* unsigned long *sp = (unsigned long *)regs->sp + reg;
* regs->ip = *sp++ + (int16_t)imm;
* regs->sp = (unsigned long)sp;
* goto again; /* Loop back to the beginning * /
* }
*
* The prototype of the Xen C code:
* struct { uint64_t val, bool native } xen_do_read_msr(uint32_t msr)
*
* Native code:
* 0f 32 rdmsr <--- trap point
* 48 c1 e2 20 shl $0x20,%rdx
* 48 09 d0 or %rdx,%rax
*
* Xen code (cs rex is just for padding)
* 2e 40 e8 xx xx xx xx cs rex call asm_xen_read_msr
*/
SYM_FUNC_START(asm_xen_read_msr)
FRAME_BEGIN
push %rcx
push %rsi
push %rdi
push %r8
push %r9
push %r10
push %r11
mov %ecx,%edi /* MSR number */
call xen_do_read_msr
test %dl,%dl /* ZF=1 means we are done */
pop %r11
pop %r10
pop %r9
pop %r8
pop %rdi
pop %rsi
pop %rcx
jz 2f
1:
rdmsr
_ASM_EXTABLE_TYPE(1b, 2f, \
EX_TYPE_UPLEVEL|EX_DATA_IMM(-7)|EX_DATA_REG(0))
shl $32,%rdx
or %rdx,%rax
/*
* The top of the stack points directly at the return address;
* back up by 7 bytes from the return address.
*/
2:
FRAME_END
RET
SYM_FUNC_END(asm_xen_read_msr)
On 8/16/24 15:59, H. Peter Anvin wrote:
>
> RDMSR is a bit trickier. I think the best option there is to set up a
> new trap fixup handler type that amounts to "get the address from
> the stack, apply a specific offset, and invoke the fixup handler for
> that address:
>
>
> case EX_TYPE_UPLEVEL: {
> /* Let reg hold the unsigned number of machine
> * words to pop off the stack before the return
> * address, and imm the signed offset from the
> * return address to the desired trap point.
> *
> * pointer in units of machine words, and imm the
> * signed offset from this stack word...
> */
> unsigned long *sp = (unsigned long *)regs->sp + reg;
> regs->ip = *sp++ + (int16_t)imm;
> regs->sp = (unsigned long)sp;
> goto again; /* Loop back to the beginning */
> }
>
> Again, "obviously correct" code attached.
>
Here is an untested patch implemented the above functionality, tidied up
and wrapped in a macro as _ASM_EXTABLE_FUNC_REWIND().
-hpa
On 16/08/2024 10:26 pm, H. Peter Anvin wrote: > On 8/16/24 11:40, Andrew Cooper wrote: >>> >>> As the CALL instruction is 5-byte long, and we need to pad nop for both >>> WRMSR and WRMSRNS, what about not using segment prefix at all? >> > > You can use up to 4 prefixes of any kind (which includes opcode > prefixes before 0F) before most decoders start hurting, so we can pad > it out to 5 bytes by doing 3f 3f .. .. .. > >> >> My suggestion, not that I've had time to experiment, was to change >> paravirt to use a non-C ABI and have asm_xen_write_msr() recombine >> edx:eax into rsi. That way the top level wrmsr() retains sensible >> codegen for native even when paravirt is active. >> > > I have attached what should be an "obvious" example... famous last words. Ah, now I see what you mean about Xen's #GP semantics. That's a neat way of doing it. It means the faulting path will really take 2 faults on Xen, but it's a faulting path anyway so speed is already out of the window. Do you mind about teaching the #UD handler to deal with WRMSR like that? I ask, because I can't think of anything nicer. There are plenty of 3-byte instructions which #GP in PV guests (CPL3), and LTR is my go-to for debugging purposes, as it's not emulated by Xen. Anything here (and it can't be an actual WRMSR) will be slightly confusing to read in an OOPS, especially #UD for what is logically a #GP. But, a clear UD of some form in the disassembly is probably better than a random other instruction unrelated to the operation. ~Andrew
On 16/08/2024 11:27 pm, Andrew Cooper wrote: > On 16/08/2024 10:26 pm, H. Peter Anvin wrote: >> On 8/16/24 11:40, Andrew Cooper wrote: >>>> As the CALL instruction is 5-byte long, and we need to pad nop for both >>>> WRMSR and WRMSRNS, what about not using segment prefix at all? >> You can use up to 4 prefixes of any kind (which includes opcode >> prefixes before 0F) before most decoders start hurting, so we can pad >> it out to 5 bytes by doing 3f 3f .. .. .. >> >>> My suggestion, not that I've had time to experiment, was to change >>> paravirt to use a non-C ABI and have asm_xen_write_msr() recombine >>> edx:eax into rsi. That way the top level wrmsr() retains sensible >>> codegen for native even when paravirt is active. >>> >> I have attached what should be an "obvious" example... famous last words. > Ah, now I see what you mean about Xen's #GP semantics. > > That's a neat way of doing it. It means the faulting path will really > take 2 faults on Xen, but it's a faulting path anyway so speed is > already out of the window. > > Do you mind about teaching the #UD handler to deal with WRMSR like that? > > I ask, because I can't think of anything nicer. > > There are plenty of 3-byte instructions which #GP in PV guests (CPL3), > and LTR is my go-to for debugging purposes, as it's not emulated by Xen. > > Anything here (and it can't be an actual WRMSR) will be slightly > confusing to read in an OOPS, especially #UD for what is logically a #GP. > > But, a clear UD of some form in the disassembly is probably better than > a random other instruction unrelated to the operation. > > ~Andrew Oh, P.S. We can probably drop most of the register manipulation by making the new xen_do_write_msr be no_caller_saved_registers. As we're intentionally not a C ABI to start with, we might as well not spill registers we don't use either. ~Andrew
On August 16, 2024 3:34:53 PM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >On 16/08/2024 11:27 pm, Andrew Cooper wrote: >> On 16/08/2024 10:26 pm, H. Peter Anvin wrote: >>> On 8/16/24 11:40, Andrew Cooper wrote: >>>>> As the CALL instruction is 5-byte long, and we need to pad nop for both >>>>> WRMSR and WRMSRNS, what about not using segment prefix at all? >>> You can use up to 4 prefixes of any kind (which includes opcode >>> prefixes before 0F) before most decoders start hurting, so we can pad >>> it out to 5 bytes by doing 3f 3f .. .. .. >>> >>>> My suggestion, not that I've had time to experiment, was to change >>>> paravirt to use a non-C ABI and have asm_xen_write_msr() recombine >>>> edx:eax into rsi. That way the top level wrmsr() retains sensible >>>> codegen for native even when paravirt is active. >>>> >>> I have attached what should be an "obvious" example... famous last words. >> Ah, now I see what you mean about Xen's #GP semantics. >> >> That's a neat way of doing it. It means the faulting path will really >> take 2 faults on Xen, but it's a faulting path anyway so speed is >> already out of the window. >> >> Do you mind about teaching the #UD handler to deal with WRMSR like that? >> >> I ask, because I can't think of anything nicer. >> >> There are plenty of 3-byte instructions which #GP in PV guests (CPL3), >> and LTR is my go-to for debugging purposes, as it's not emulated by Xen. >> >> Anything here (and it can't be an actual WRMSR) will be slightly >> confusing to read in an OOPS, especially #UD for what is logically a #GP. >> >> But, a clear UD of some form in the disassembly is probably better than >> a random other instruction unrelated to the operation. >> >> ~Andrew > >Oh, P.S. > >We can probably drop most of the register manipulation by making the new >xen_do_write_msr be no_caller_saved_registers. As we're intentionally >not a C ABI to start with, we might as well not spill registers we don't >use either. > >~Andrew If you are willing to require a new enough gcc, certainly.
On August 16, 2024 11:40:05 AM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>On 16/08/2024 6:52 pm, Xin Li wrote:
>> On 8/9/2024 4:07 PM, Andrew Cooper wrote:
>>> On 07/08/2024 6:47 am, Xin Li (Intel) wrote:
>>>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>>>> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
>>>> +
>>>> +/* Non-serializing WRMSR, when available. Falls back to a
>>>> serializing WRMSR. */
>>>> static __always_inline void wrmsrns(u32 msr, u64 val)
>>>> {
>>>> - __wrmsrns(msr, val, val >> 32);
>>>> + /*
>>>> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a
>>>> redundant
>>>> + * DS prefix to avoid a trailing NOP.
>>>> + */
>>>> + asm volatile("1: "
>>>> + ALTERNATIVE("ds wrmsr",
>>>
>>> This isn't the version I presented, and there's no discussion of the
>>> alteration.
>>
>> I'm trying to implement wrmsr() as
>>
>> static __always_inline void wrmsr(u32 msr, u64 val)
>> {
>> asm volatile("1: " ALTERNATIVE_2("wrmsr", WRMSRNS,
>> X86_FEATURE_WRMSRNS,
>> "call asm_xen_write_msr", X86_FEATURE_XENPV)
>> "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
>> : : "c" (msr), "a" (val), "d" ((u32)(val >> 32)),
>> "D" (msr), "S" (val));
>> }
>>
>>
>> As the CALL instruction is 5-byte long, and we need to pad nop for both
>> WRMSR and WRMSRNS, what about not using segment prefix at all?
>
>The prefix was a minor optimisation to avoid having a trailing nop at
>the end.
>
>When combined with a call, you need 3 prefixes on WRMSR and 2 prefixes
>on WRMSRNS to make all options be 5 bytes long.
>
>That said, there's already a paravirt hook for this, and if you're
>looking to work around the code gen mess for that, then doing it like
>this by doubling up into rdi and rsi isn't great either.
>
>My suggestion, not that I've had time to experiment, was to change
>paravirt to use a non-C ABI and have asm_xen_write_msr() recombine
>edx:eax into rsi. That way the top level wrmsr() retains sensible
>codegen for native even when paravirt is active.
>
>~Andrew
Heh, that was my suggestion, except that I suggested defining it so rax pass the full value; the generation of edx still is necessary but there is no real reason to have to recombine them. (One could even add that code to the assembly pattern as the CALL instruction is longer.)
My biggest question is how the #GP on an invalid MSR access is handled with Xen. The rest is trivial.
On 16/08/2024 8:18 pm, H. Peter Anvin wrote:
> On August 16, 2024 11:40:05 AM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 16/08/2024 6:52 pm, Xin Li wrote:
>>> On 8/9/2024 4:07 PM, Andrew Cooper wrote:
>>>> On 07/08/2024 6:47 am, Xin Li (Intel) wrote:
>>>>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>>>>> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
>>>>> +
>>>>> +/* Non-serializing WRMSR, when available. Falls back to a
>>>>> serializing WRMSR. */
>>>>> static __always_inline void wrmsrns(u32 msr, u64 val)
>>>>> {
>>>>> - __wrmsrns(msr, val, val >> 32);
>>>>> + /*
>>>>> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a
>>>>> redundant
>>>>> + * DS prefix to avoid a trailing NOP.
>>>>> + */
>>>>> + asm volatile("1: "
>>>>> + ALTERNATIVE("ds wrmsr",
>>>> This isn't the version I presented, and there's no discussion of the
>>>> alteration.
>>> I'm trying to implement wrmsr() as
>>>
>>> static __always_inline void wrmsr(u32 msr, u64 val)
>>> {
>>> asm volatile("1: " ALTERNATIVE_2("wrmsr", WRMSRNS,
>>> X86_FEATURE_WRMSRNS,
>>> "call asm_xen_write_msr", X86_FEATURE_XENPV)
>>> "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
>>> : : "c" (msr), "a" (val), "d" ((u32)(val >> 32)),
>>> "D" (msr), "S" (val));
>>> }
>>>
>>>
>>> As the CALL instruction is 5-byte long, and we need to pad nop for both
>>> WRMSR and WRMSRNS, what about not using segment prefix at all?
>> The prefix was a minor optimisation to avoid having a trailing nop at
>> the end.
>>
>> When combined with a call, you need 3 prefixes on WRMSR and 2 prefixes
>> on WRMSRNS to make all options be 5 bytes long.
>>
>> That said, there's already a paravirt hook for this, and if you're
>> looking to work around the code gen mess for that, then doing it like
>> this by doubling up into rdi and rsi isn't great either.
>>
>> My suggestion, not that I've had time to experiment, was to change
>> paravirt to use a non-C ABI and have asm_xen_write_msr() recombine
>> edx:eax into rsi. That way the top level wrmsr() retains sensible
>> codegen for native even when paravirt is active.
>>
>> ~Andrew
> Heh, that was my suggestion, except that I suggested defining it so rax pass the full value; the generation of edx still is necessary but there is no real reason to have to recombine them. (One could even add that code to the assembly pattern as the CALL instruction is longer.)
Hmm yeah, having %rax be full is likely how the logic is going to look
before generating %edx.
> My biggest question is how the #GP on an invalid MSR access is handled with Xen. The rest is trivial.
For historical reasons it's a mess.
xen_do_write_msr() does several things.
* Turns FSBASE/GSBASE/GSKERN into their respective hypercalls (although
no error checking at all!)
* Discards modifications to the SYSCALL/SYSENTER MSRs
Otherwise, uses the native accessor, taking the #GP path to Xen and is
emulated (including full decode).
Way back in the day, XenPV's paravirt wrmsr would swallow #GP and
pretend success, and then started depending on this behaviour in order
to boot. But times have moved on, and even the normal accessors are
really safe+warn, and I'm sure all of this could be cleaned up.
For ages I've been wanting to make a single PV_PRIV_OP hypercall so we
can skip the x86 emulator, but I've never had time to do that either.
~Andrew
On August 9, 2024 4:07:35 PM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>On 07/08/2024 6:47 am, Xin Li (Intel) wrote:
>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>
>> Per the discussion about FRED MSR writes with WRMSRNS instruction [1],
>> use the alternatives mechanism to choose WRMSRNS when it's available,
>> otherwise fallback to WRMSR.
>>
>> [1] https://lore.kernel.org/lkml/15f56e6a-6edd-43d0-8e83-bb6430096514@citrix.com/
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
>> ---
>> arch/x86/include/asm/msr.h | 28 ++++++++++++++--------------
>> 1 file changed, 14 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
>> index d642037f9ed5..3e402d717815 100644
>> --- a/arch/x86/include/asm/msr.h
>> +++ b/arch/x86/include/asm/msr.h
>> @@ -99,19 +99,6 @@ static __always_inline void __wrmsr(unsigned int msr, u32 low, u32 high)
>> : : "c" (msr), "a"(low), "d" (high) : "memory");
>> }
>>
>> -/*
>> - * WRMSRNS behaves exactly like WRMSR with the only difference being
>> - * that it is not a serializing instruction by default.
>> - */
>> -static __always_inline void __wrmsrns(u32 msr, u32 low, u32 high)
>> -{
>> - /* Instruction opcode for WRMSRNS; supported in binutils >= 2.40. */
>> - asm volatile("1: .byte 0x0f,0x01,0xc6\n"
>> - "2:\n"
>> - _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
>> - : : "c" (msr), "a"(low), "d" (high));
>> -}
>> -
>> #define native_rdmsr(msr, val1, val2) \
>> do { \
>> u64 __val = __rdmsr((msr)); \
>> @@ -312,9 +299,22 @@ do { \
>>
>> #endif /* !CONFIG_PARAVIRT_XXL */
>>
>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
>> +
>> +/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
>> static __always_inline void wrmsrns(u32 msr, u64 val)
>> {
>> - __wrmsrns(msr, val, val >> 32);
>> + /*
>> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
>> + * DS prefix to avoid a trailing NOP.
>> + */
>> + asm volatile("1: "
>> + ALTERNATIVE("ds wrmsr",
>
>This isn't the version I presented, and there's no discussion of the
>alteration.
>
>The choice of CS over DS was deliberate, and came from Intel:
>
>https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
>
>So unless Intel want to retract that whitepaper, and all the binutils
>work with it, I'd suggest keeping it as CS like we use elsewhere, and as
>explicitly instructed by Intel.
>
>~Andrew
I looked around the kernel, and I believe we are inconsistent. I see both 0x2e (CS) and 0x3e (DS) prefixes used for padding where open-coded.
We can't use cs in all cases, since you can't do a store to the code segment (always readonly) so we use 0x3e (DS) to patch out LOCK.
In the paper you describe, it only mentions 0x2e as a "benign prefix" in a specific example, not as any kind of specific recommendation. It is particularly irrelevant when it comes to padding a two instructions to the same length as the paper deals with assignment.
If you want, I'm perfectly happy to go and ask if there is any general recommendation (except for direct conditional branch hints, of course.)
On 10/08/2024 1:01 am, H. Peter Anvin wrote:
> On August 9, 2024 4:07:35 PM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 07/08/2024 6:47 am, Xin Li (Intel) wrote:
>>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>>
>>> Per the discussion about FRED MSR writes with WRMSRNS instruction [1],
>>> use the alternatives mechanism to choose WRMSRNS when it's available,
>>> otherwise fallback to WRMSR.
>>>
>>> [1] https://lore.kernel.org/lkml/15f56e6a-6edd-43d0-8e83-bb6430096514@citrix.com/
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
>>> ---
>>> arch/x86/include/asm/msr.h | 28 ++++++++++++++--------------
>>> 1 file changed, 14 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
>>> index d642037f9ed5..3e402d717815 100644
>>> --- a/arch/x86/include/asm/msr.h
>>> +++ b/arch/x86/include/asm/msr.h
>>> @@ -99,19 +99,6 @@ static __always_inline void __wrmsr(unsigned int msr, u32 low, u32 high)
>>> : : "c" (msr), "a"(low), "d" (high) : "memory");
>>> }
>>>
>>> -/*
>>> - * WRMSRNS behaves exactly like WRMSR with the only difference being
>>> - * that it is not a serializing instruction by default.
>>> - */
>>> -static __always_inline void __wrmsrns(u32 msr, u32 low, u32 high)
>>> -{
>>> - /* Instruction opcode for WRMSRNS; supported in binutils >= 2.40. */
>>> - asm volatile("1: .byte 0x0f,0x01,0xc6\n"
>>> - "2:\n"
>>> - _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
>>> - : : "c" (msr), "a"(low), "d" (high));
>>> -}
>>> -
>>> #define native_rdmsr(msr, val1, val2) \
>>> do { \
>>> u64 __val = __rdmsr((msr)); \
>>> @@ -312,9 +299,22 @@ do { \
>>>
>>> #endif /* !CONFIG_PARAVIRT_XXL */
>>>
>>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>>> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
>>> +
>>> +/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
>>> static __always_inline void wrmsrns(u32 msr, u64 val)
>>> {
>>> - __wrmsrns(msr, val, val >> 32);
>>> + /*
>>> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
>>> + * DS prefix to avoid a trailing NOP.
>>> + */
>>> + asm volatile("1: "
>>> + ALTERNATIVE("ds wrmsr",
>> This isn't the version I presented, and there's no discussion of the
>> alteration.
>>
>> The choice of CS over DS was deliberate, and came from Intel:
>>
>> https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
>>
>> So unless Intel want to retract that whitepaper, and all the binutils
>> work with it, I'd suggest keeping it as CS like we use elsewhere, and as
>> explicitly instructed by Intel.
>>
>> ~Andrew
> I looked around the kernel, and I believe we are inconsistent. I see both 0x2e (CS) and 0x3e (DS) prefixes used for padding where open-coded.
>
> We can't use cs in all cases, since you can't do a store to the code segment (always readonly) so we use 0x3e (DS) to patch out LOCK.
>
> In the paper you describe, it only mentions 0x2e as a "benign prefix" in a specific example, not as any kind of specific recommendation. It is particularly irrelevant when it comes to padding a two instructions to the same length as the paper deals with assignment.
>
> If you want, I'm perfectly happy to go and ask if there is any general recommendation (except for direct conditional branch hints, of course.)
It would be lovely if there could be a single coherent statement.
In addition to store semantics, off the top of my head:
* CS is the P4 hint-not-taken (presumably Jcc only), ignored now.
* DS is both the P4 hint-taken (presumably Jcc only), newly reintroduced
in Redwood Cove with tweaked semantics (definitely Jcc only), and the
CET notrack prefix (JMP/CALL *IND only).
Plus whatever else I've missed[1].
~Andrew
[1] I'm ignoring XuCode mode memory operand semantics as not relevant to
CPL0 software (where AIUI unprefixed is physical and DS prefixed is
virtual) but I'm noting it here to highlight that there's definitely
extra complexity beyond what is in SDM Vol2.
On Tue, Aug 06 2024 at 22:47, Xin Li wrote:
>
> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
> +#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
> +
> +/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
> static __always_inline void wrmsrns(u32 msr, u64 val)
> {
> - __wrmsrns(msr, val, val >> 32);
> + /*
> + * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
> + * DS prefix to avoid a trailing NOP.
> + */
> + asm volatile("1: "
> + ALTERNATIVE("ds wrmsr",
> + WRMSRNS, X86_FEATURE_WRMSRNS)
Please get rid of this horrible line break. You have 100 characters line width.
Thanks,
tglx
© 2016 - 2025 Red Hat, Inc.