... on capable toolchains.
This avoids needing to hold rc in a register across the RDMSR, and in most
cases removes direct testing and branching based on rc, as the fault label can
be rearranged to directly land on the out-of-line block.
There is a subtle difference in behaviour. The old behaviour would, on fault,
still produce 0's and write to val.
The new behaviour only writes val on success, and write_msr() is the only
place where this matters. Move temp out of switch() scope and initialise it
to 0.
Resolves: https://gitlab.com/xen-project/xen/-/work_items/217
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
Doing this safely does depend on getting GCC-14 into CI somewhere. Debian
13/Trixie satisfies this, as does archlinux I expect.
---
xen/arch/x86/include/asm/msr.h | 19 +++++++++++++++++++
xen/arch/x86/pv/emul-priv-op.c | 3 +--
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/include/asm/msr.h b/xen/arch/x86/include/asm/msr.h
index d2c86ddb09e9..6a97b41bae07 100644
--- a/xen/arch/x86/include/asm/msr.h
+++ b/xen/arch/x86/include/asm/msr.h
@@ -55,6 +55,24 @@ static inline void wrmsrns(uint32_t msr, uint64_t val)
/* rdmsr with exception handling */
static inline int rdmsr_safe(unsigned int msr, uint64_t *val)
{
+#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
+ uint64_t lo, hi;
+
+ asm_inline goto (
+ "1: rdmsr\n\t"
+ _ASM_EXTABLE(1b, %l[fault])
+ : "=a" (lo), "=d" (hi)
+ : "c" (msr)
+ :
+ : fault );
+
+ *val = lo | (hi << 32);
+
+ return 0;
+
+ fault:
+ return -EFAULT;
+#else
int rc;
uint64_t lo, hi;
@@ -73,6 +91,7 @@ static inline int rdmsr_safe(unsigned int msr, uint64_t *val)
*val = lo | (hi << 32);
return rc;
+#endif
}
/* wrmsr with exception handling */
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 4afbee59e53e..c3a484c50bf8 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -1027,6 +1027,7 @@ static int cf_check write_msr(
struct vcpu *curr = current;
const struct domain *currd = curr->domain;
const struct cpu_policy *cp = currd->arch.cpu_policy;
+ uint64_t temp = 0;
bool vpmu_msr = false;
int ret;
@@ -1040,8 +1041,6 @@ static int cf_check write_msr(
switch ( reg )
{
- uint64_t temp;
-
case MSR_FS_BASE:
case MSR_GS_BASE:
case MSR_SHADOW_GS_BASE:
--
2.39.5
On 15.08.2025 22:41, Andrew Cooper wrote:
> ... on capable toolchains.
>
> This avoids needing to hold rc in a register across the RDMSR, and in most
> cases removes direct testing and branching based on rc, as the fault label can
> be rearranged to directly land on the out-of-line block.
>
> There is a subtle difference in behaviour. The old behaviour would, on fault,
> still produce 0's and write to val.
>
> The new behaviour only writes val on success, and write_msr() is the only
> place where this matters. Move temp out of switch() scope and initialise it
> to 0.
But what's the motivation behind making this behavioral change? At least in
the cases where the return value isn't checked, it would feel safer if we
continued clearing the value. Even if in all cases where this could matter
(besides the one you cover here) one can prove correctness by looking at
surrounding code.
> --- a/xen/arch/x86/include/asm/msr.h
> +++ b/xen/arch/x86/include/asm/msr.h
> @@ -55,6 +55,24 @@ static inline void wrmsrns(uint32_t msr, uint64_t val)
> /* rdmsr with exception handling */
> static inline int rdmsr_safe(unsigned int msr, uint64_t *val)
> {
> +#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
> + uint64_t lo, hi;
Could at least this line move ahead of the #ifdef, to also cover ...
> + asm_inline goto (
> + "1: rdmsr\n\t"
> + _ASM_EXTABLE(1b, %l[fault])
> + : "=a" (lo), "=d" (hi)
> + : "c" (msr)
> + :
> + : fault );
> +
> + *val = lo | (hi << 32);
> +
> + return 0;
> +
> + fault:
> + return -EFAULT;
> +#else
> int rc;
> uint64_t lo, hi;
... the same being needed here?
Jan
On 18/08/2025 12:27 pm, Jan Beulich wrote:
> On 15.08.2025 22:41, Andrew Cooper wrote:
>> ... on capable toolchains.
>>
>> This avoids needing to hold rc in a register across the RDMSR, and in most
>> cases removes direct testing and branching based on rc, as the fault label can
>> be rearranged to directly land on the out-of-line block.
>>
>> There is a subtle difference in behaviour. The old behaviour would, on fault,
>> still produce 0's and write to val.
>>
>> The new behaviour only writes val on success, and write_msr() is the only
>> place where this matters. Move temp out of switch() scope and initialise it
>> to 0.
> But what's the motivation behind making this behavioral change? At least in
> the cases where the return value isn't checked, it would feel safer if we
> continued clearing the value. Even if in all cases where this could matter
> (besides the one you cover here) one can prove correctness by looking at
> surrounding code.
I didn't realise I'd made a change at first, but it's a consequence of
the compiler's ability to rearrange basic blocks.
It can be fixed with ...
>
>> --- a/xen/arch/x86/include/asm/msr.h
>> +++ b/xen/arch/x86/include/asm/msr.h
>> @@ -55,6 +55,24 @@ static inline void wrmsrns(uint32_t msr, uint64_t val)
>> /* rdmsr with exception handling */
>> static inline int rdmsr_safe(unsigned int msr, uint64_t *val)
>> {
>> +#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
>> + uint64_t lo, hi;
>> + asm_inline goto (
>> + "1: rdmsr\n\t"
>> + _ASM_EXTABLE(1b, %l[fault])
>> + : "=a" (lo), "=d" (hi)
>> + : "c" (msr)
>> + :
>> + : fault );
>> +
>> + *val = lo | (hi << 32);
>> +
>> + return 0;
>> +
>> + fault:
*val = 0;
here, but I don't want to do this. Because val is by pointer and
generally spilled to the stack, the compiler can't optimise away the store.
I'd far rather get a real compiler error, than to have logic relying on
the result of a faulting MSR read.
>> + return -EFAULT;
>> +#else
>> int rc;
>> uint64_t lo, hi;
> ... the same being needed here?
Fixed.
~Andrew
On 19.08.2025 15:52, Andrew Cooper wrote:
> On 18/08/2025 12:27 pm, Jan Beulich wrote:
>> On 15.08.2025 22:41, Andrew Cooper wrote:
>>> ... on capable toolchains.
>>>
>>> This avoids needing to hold rc in a register across the RDMSR, and in most
>>> cases removes direct testing and branching based on rc, as the fault label can
>>> be rearranged to directly land on the out-of-line block.
>>>
>>> There is a subtle difference in behaviour. The old behaviour would, on fault,
>>> still produce 0's and write to val.
>>>
>>> The new behaviour only writes val on success, and write_msr() is the only
>>> place where this matters. Move temp out of switch() scope and initialise it
>>> to 0.
>> But what's the motivation behind making this behavioral change? At least in
>> the cases where the return value isn't checked, it would feel safer if we
>> continued clearing the value. Even if in all cases where this could matter
>> (besides the one you cover here) one can prove correctness by looking at
>> surrounding code.
>
> I didn't realise I'd made a change at first, but it's a consequence of
> the compiler's ability to rearrange basic blocks.
>
> It can be fixed with ...
>
>>
>>> --- a/xen/arch/x86/include/asm/msr.h
>>> +++ b/xen/arch/x86/include/asm/msr.h
>>> @@ -55,6 +55,24 @@ static inline void wrmsrns(uint32_t msr, uint64_t val)
>>> /* rdmsr with exception handling */
>>> static inline int rdmsr_safe(unsigned int msr, uint64_t *val)
>>> {
>>> +#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
>>> + uint64_t lo, hi;
>>> + asm_inline goto (
>>> + "1: rdmsr\n\t"
>>> + _ASM_EXTABLE(1b, %l[fault])
>>> + : "=a" (lo), "=d" (hi)
>>> + : "c" (msr)
>>> + :
>>> + : fault );
>>> +
>>> + *val = lo | (hi << 32);
>>> +
>>> + return 0;
>>> +
>>> + fault:
>
> *val = 0;
>
> here, but I don't want to do this. Because val is by pointer and
> generally spilled to the stack, the compiler can't optimise away the store.
But the compiler is dealing with such indirection in inline functions just
fine. I don't expect it would typically spill val to the stack. Is there
anything specific here that you think would make this more likely?
> I'd far rather get a real compiler error, than to have logic relying on
> the result of a faulting MSR read.
A compiler error? (Hmm, perhaps you think of uninitialized variable
diagnostics. That may or may not trigger, depending on how else the
caller's variable is used.)
Jan
On 19/08/2025 5:23 pm, Jan Beulich wrote:
> On 19.08.2025 15:52, Andrew Cooper wrote:
>> On 18/08/2025 12:27 pm, Jan Beulich wrote:
>>> On 15.08.2025 22:41, Andrew Cooper wrote:
>>>> ... on capable toolchains.
>>>>
>>>> This avoids needing to hold rc in a register across the RDMSR, and in most
>>>> cases removes direct testing and branching based on rc, as the fault label can
>>>> be rearranged to directly land on the out-of-line block.
>>>>
>>>> There is a subtle difference in behaviour. The old behaviour would, on fault,
>>>> still produce 0's and write to val.
>>>>
>>>> The new behaviour only writes val on success, and write_msr() is the only
>>>> place where this matters. Move temp out of switch() scope and initialise it
>>>> to 0.
>>> But what's the motivation behind making this behavioral change? At least in
>>> the cases where the return value isn't checked, it would feel safer if we
>>> continued clearing the value. Even if in all cases where this could matter
>>> (besides the one you cover here) one can prove correctness by looking at
>>> surrounding code.
>> I didn't realise I'd made a change at first, but it's a consequence of
>> the compiler's ability to rearrange basic blocks.
>>
>> It can be fixed with ...
>>
>>>> --- a/xen/arch/x86/include/asm/msr.h
>>>> +++ b/xen/arch/x86/include/asm/msr.h
>>>> @@ -55,6 +55,24 @@ static inline void wrmsrns(uint32_t msr, uint64_t val)
>>>> /* rdmsr with exception handling */
>>>> static inline int rdmsr_safe(unsigned int msr, uint64_t *val)
>>>> {
>>>> +#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
>>>> + uint64_t lo, hi;
>>>> + asm_inline goto (
>>>> + "1: rdmsr\n\t"
>>>> + _ASM_EXTABLE(1b, %l[fault])
>>>> + : "=a" (lo), "=d" (hi)
>>>> + : "c" (msr)
>>>> + :
>>>> + : fault );
>>>> +
>>>> + *val = lo | (hi << 32);
>>>> +
>>>> + return 0;
>>>> +
>>>> + fault:
>> *val = 0;
>>
>> here, but I don't want to do this. Because val is by pointer and
>> generally spilled to the stack, the compiler can't optimise away the store.
> But the compiler is dealing with such indirection in inline functions just
> fine. I don't expect it would typically spill val to the stack. Is there
> anything specific here that you think would make this more likely?
Yes. The design of the functions they're used in. Adding this line
results in:
add/remove: 0/0 grow/shrink: 7/2 up/down: 109/-36 (73)
Function old new delta
read_msr 1243 1307 +64
resource_access 326 341 +15
hwp_init_msrs.cold 297 308 +11
probe_cpuid_faulting 168 175 +7
svm_msr_read_intercept 1034 1039 +5
hwp_write_request 113 117 +4
hwp_init_msrs 371 374 +3
amd_log_freq 844 828 -16
guest_rdmsr 2168 2148 -20
Taking read_msr() as a concrete example, this is because it's a store
into a parent functions variable, not into a local variable, and cannot
be elided.
>
>> I'd far rather get a real compiler error, than to have logic relying on
>> the result of a faulting MSR read.
> A compiler error? (Hmm, perhaps you think of uninitialized variable
> diagnostics. That may or may not trigger, depending on how else the
> caller's variable is used.)
Yes I was referring to the uninitialised variable diagnostic. *_safe()
are fairly rare, and we've got plenty of coverage in CI.
~Andrew
On 21.08.2025 18:20, Andrew Cooper wrote:
> On 19/08/2025 5:23 pm, Jan Beulich wrote:
>> On 19.08.2025 15:52, Andrew Cooper wrote:
>>> On 18/08/2025 12:27 pm, Jan Beulich wrote:
>>>> On 15.08.2025 22:41, Andrew Cooper wrote:
>>>>> ... on capable toolchains.
>>>>>
>>>>> This avoids needing to hold rc in a register across the RDMSR, and in most
>>>>> cases removes direct testing and branching based on rc, as the fault label can
>>>>> be rearranged to directly land on the out-of-line block.
>>>>>
>>>>> There is a subtle difference in behaviour. The old behaviour would, on fault,
>>>>> still produce 0's and write to val.
>>>>>
>>>>> The new behaviour only writes val on success, and write_msr() is the only
>>>>> place where this matters. Move temp out of switch() scope and initialise it
>>>>> to 0.
>>>> But what's the motivation behind making this behavioral change? At least in
>>>> the cases where the return value isn't checked, it would feel safer if we
>>>> continued clearing the value. Even if in all cases where this could matter
>>>> (besides the one you cover here) one can prove correctness by looking at
>>>> surrounding code.
>>> I didn't realise I'd made a change at first, but it's a consequence of
>>> the compiler's ability to rearrange basic blocks.
>>>
>>> It can be fixed with ...
>>>
>>>>> --- a/xen/arch/x86/include/asm/msr.h
>>>>> +++ b/xen/arch/x86/include/asm/msr.h
>>>>> @@ -55,6 +55,24 @@ static inline void wrmsrns(uint32_t msr, uint64_t val)
>>>>> /* rdmsr with exception handling */
>>>>> static inline int rdmsr_safe(unsigned int msr, uint64_t *val)
>>>>> {
>>>>> +#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
>>>>> + uint64_t lo, hi;
>>>>> + asm_inline goto (
>>>>> + "1: rdmsr\n\t"
>>>>> + _ASM_EXTABLE(1b, %l[fault])
>>>>> + : "=a" (lo), "=d" (hi)
>>>>> + : "c" (msr)
>>>>> + :
>>>>> + : fault );
>>>>> +
>>>>> + *val = lo | (hi << 32);
>>>>> +
>>>>> + return 0;
>>>>> +
>>>>> + fault:
>>> *val = 0;
>>>
>>> here, but I don't want to do this. Because val is by pointer and
>>> generally spilled to the stack, the compiler can't optimise away the store.
>> But the compiler is dealing with such indirection in inline functions just
>> fine. I don't expect it would typically spill val to the stack. Is there
>> anything specific here that you think would make this more likely?
>
> Yes. The design of the functions they're used in. Adding this line
> results in:
>
> add/remove: 0/0 grow/shrink: 7/2 up/down: 109/-36 (73)
> Function old new delta
> read_msr 1243 1307 +64
> resource_access 326 341 +15
> hwp_init_msrs.cold 297 308 +11
> probe_cpuid_faulting 168 175 +7
> svm_msr_read_intercept 1034 1039 +5
> hwp_write_request 113 117 +4
> hwp_init_msrs 371 374 +3
> amd_log_freq 844 828 -16
> guest_rdmsr 2168 2148 -20
>
> Taking read_msr() as a concrete example, this is because it's a store
> into a parent functions variable, not into a local variable, and cannot
> be elided.
>
>
>>
>>> I'd far rather get a real compiler error, than to have logic relying on
>>> the result of a faulting MSR read.
>> A compiler error? (Hmm, perhaps you think of uninitialized variable
>> diagnostics. That may or may not trigger, depending on how else the
>> caller's variable is used.)
>
> Yes I was referring to the uninitialised variable diagnostic. *_safe()
> are fairly rare, and we've got plenty of coverage in CI.
Well, okay, slightly hesitantly
Reviewed-by: Jan Beulich <jbeulich@suse.com>
preferably with the paragraph in the description that I commented on
slightly expanded to cover the "why" aspect.
Jan
© 2016 - 2025 Red Hat, Inc.