[PATCH] x86/cpuid: Clobber CPUID leaves 0x800000{1d..20}

Andrew Cooper posted 1 patch 2 years ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20220407010121.11301-1-andrew.cooper3@citrix.com
xen/arch/x86/cpuid.c | 7 +++++++
1 file changed, 7 insertions(+)
[PATCH] x86/cpuid: Clobber CPUID leaves 0x800000{1d..20}
Posted by Andrew Cooper 2 years ago
c/s 1a914256dca5 increased the AMD max leaf from 0x8000001c to 0x80000021, but
did not adjust anything in the calculate_*_policy() chain.  As a result, on
hardware supporting these leaves, we read the real hardware values into the
raw policy, then copy into host, and all the way into the PV/HVM default
policies.

All 4 of these leaves have enable bits (first two by TopoExt, next by SEV,
next by PQOS), so any software following the rules is fine and will leave them
alone.  However, leaf 0x8000001d takes a subleaf input and at least two
userspace utilities have been observed to loop indefinitely under Xen (clearly
waiting for eax to report "no more cache levels").

Such userspace is buggy, but Xen's behaviour isn't great either.

In the short term, clobber all information in these leaves.  This is a giant
bodge, but there are complexities with implementing all of these leaves
properly.

Fixes: 1a914256dca5 ("x86/cpuid: support LFENCE always serialising CPUID bit")
Link: https://github.com/QubesOS/qubes-issues/issues/7392
Reported-by: fosslinux <fosslinux@aussies.space>
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

I was hoping to do something better, but it turned into a rats nest, and this
fix wants backporting.

It turns out that Intel leaf 4 and AMD leaf 0x8000001d are *almost* identical.
They differ by the "complex" bit in edx, and the $X-per-cache fields in the
top of eax (Intel is threads-per-cache, AMD is cores-per-cache and lacks the
cores-per-package field).

As neither vendor implement each others version, I'm incredibly tempted to
reuse p->cache for both, rather than doubling the storage space.  Reading the
data out is easy to key on p->extd.topoext.  Writing the data can be done
without any further complexity if we simply trust the sending side to have its
indices the proper way around.  Particularly, this avoids needing to ensure
that p->extd.topoext is out of order and at the head of the stream.  Thoughts?
---
 xen/arch/x86/cpuid.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index bb554b06a73f..7e0b39569847 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -328,8 +328,15 @@ static void recalculate_misc(struct cpuid_policy *p)
 
         zero_leaves(p->extd.raw, 0xb, 0x18);
 
+        /* 0x19 - TLB details.  Pass through. */
+        /* 0x1a - Perf hints.   Pass through. */
+
         p->extd.raw[0x1b] = EMPTY_LEAF; /* IBS - not supported. */
         p->extd.raw[0x1c] = EMPTY_LEAF; /* LWP - not supported. */
+        p->extd.raw[0x1d] = EMPTY_LEAF; /* TopoExt Cache */
+        p->extd.raw[0x1e] = EMPTY_LEAF; /* TopoExt APIC ID/Core/Node */
+        p->extd.raw[0x1f] = EMPTY_LEAF; /* SEV */
+        p->extd.raw[0x20] = EMPTY_LEAF; /* Platform QoS */
         break;
     }
 }
-- 
2.11.0


Re: [PATCH] x86/cpuid: Clobber CPUID leaves 0x800000{1d..20}
Posted by Jan Beulich 2 years ago
On 07.04.2022 03:01, Andrew Cooper wrote:
> c/s 1a914256dca5 increased the AMD max leaf from 0x8000001c to 0x80000021, but
> did not adjust anything in the calculate_*_policy() chain.  As a result, on
> hardware supporting these leaves, we read the real hardware values into the
> raw policy, then copy into host, and all the way into the PV/HVM default
> policies.
> 
> All 4 of these leaves have enable bits (first two by TopoExt, next by SEV,
> next by PQOS), so any software following the rules is fine and will leave them
> alone.  However, leaf 0x8000001d takes a subleaf input and at least two
> userspace utilities have been observed to loop indefinitely under Xen (clearly
> waiting for eax to report "no more cache levels").
> 
> Such userspace is buggy, but Xen's behaviour isn't great either.

Just another remark, since I stumbled across this again while preparing
the backports: I'm not convinced such user space is to be called buggy.
Generic CPUID dumping tools won't normally look for particular features.
Their knowledge is usually limited to knowing where sub-leaves exist and
how to determine how many of them there are. Anything beyond that would
make a supposedly simple tool complicated.

Jan
Re: [PATCH] x86/cpuid: Clobber CPUID leaves 0x800000{1d..20}
Posted by Andrew Cooper 2 years ago
On 07/04/2022 15:27, Jan Beulich wrote:
> On 07.04.2022 03:01, Andrew Cooper wrote:
>> c/s 1a914256dca5 increased the AMD max leaf from 0x8000001c to 0x80000021, but
>> did not adjust anything in the calculate_*_policy() chain.  As a result, on
>> hardware supporting these leaves, we read the real hardware values into the
>> raw policy, then copy into host, and all the way into the PV/HVM default
>> policies.
>>
>> All 4 of these leaves have enable bits (first two by TopoExt, next by SEV,
>> next by PQOS), so any software following the rules is fine and will leave them
>> alone.  However, leaf 0x8000001d takes a subleaf input and at least two
>> userspace utilities have been observed to loop indefinitely under Xen (clearly
>> waiting for eax to report "no more cache levels").
>>
>> Such userspace is buggy, but Xen's behaviour isn't great either.
> Just another remark, since I stumbled across this again while preparing
> the backports: I'm not convinced such user space is to be called buggy.
> Generic CPUID dumping tools won't normally look for particular features.
> Their knowledge is usually limited to knowing where sub-leaves exist and
> how to determine how many of them there are. Anything beyond that would
> make a supposedly simple tool complicated.

It's basic input sanitisation.

If you have elected to ignore the rules AMD sets out to correctly
interpret the data, then you get to keep all the pieces when writing an
unbounded

do {
    x = read_untrusted_input();
} while ( x != 0 );

loop.  The only reason zeroing the data here unbreaks userspace is
because it aliases the loop exit condition.

~Andrew
Re: [PATCH] x86/cpuid: Clobber CPUID leaves 0x800000{1d..20}
Posted by Jan Beulich 2 years ago
On 07.04.2022 03:01, Andrew Cooper wrote:
> c/s 1a914256dca5 increased the AMD max leaf from 0x8000001c to 0x80000021, but
> did not adjust anything in the calculate_*_policy() chain.  As a result, on
> hardware supporting these leaves, we read the real hardware values into the
> raw policy, then copy into host, and all the way into the PV/HVM default
> policies.
> 
> All 4 of these leaves have enable bits (first two by TopoExt, next by SEV,
> next by PQOS), so any software following the rules is fine and will leave them
> alone.  However, leaf 0x8000001d takes a subleaf input and at least two
> userspace utilities have been observed to loop indefinitely under Xen (clearly
> waiting for eax to report "no more cache levels").
> 
> Such userspace is buggy, but Xen's behaviour isn't great either.
> 
> In the short term, clobber all information in these leaves.  This is a giant
> bodge, but there are complexities with implementing all of these leaves
> properly.
> 
> Fixes: 1a914256dca5 ("x86/cpuid: support LFENCE always serialising CPUID bit")
> Link: https://github.com/QubesOS/qubes-issues/issues/7392
> Reported-by: fosslinux <fosslinux@aussies.space>
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> It turns out that Intel leaf 4 and AMD leaf 0x8000001d are *almost* identical.
> They differ by the "complex" bit in edx, and the $X-per-cache fields in the
> top of eax (Intel is threads-per-cache, AMD is cores-per-cache and lacks the
> cores-per-package field).
> 
> As neither vendor implement each others version, I'm incredibly tempted to
> reuse p->cache for both, rather than doubling the storage space.  Reading the
> data out is easy to key on p->extd.topoext.  Writing the data can be done
> without any further complexity if we simply trust the sending side to have its
> indices the proper way around.  Particularly, this avoids needing to ensure
> that p->extd.topoext is out of order and at the head of the stream.  Thoughts?

Sounds quite reasonable to me. I guess the main risk is for new things
to appear on either vendor's side in a way breaking the overlaying
approach. But I guess that's not a significant risk.

As to ordering dependencies: Are there any in reality? Neither vendor
implements the other vendor's leaf, so there's only going to be one in
the stream anyway, and which one it is can be disambiguated by having
seen leaf 0 alone.

Jan
Re: [PATCH] x86/cpuid: Clobber CPUID leaves 0x800000{1d..20}
Posted by Andrew Cooper 2 years ago
On 07/04/2022 07:26, Jan Beulich wrote:
> On 07.04.2022 03:01, Andrew Cooper wrote:
>> c/s 1a914256dca5 increased the AMD max leaf from 0x8000001c to 0x80000021, but
>> did not adjust anything in the calculate_*_policy() chain.  As a result, on
>> hardware supporting these leaves, we read the real hardware values into the
>> raw policy, then copy into host, and all the way into the PV/HVM default
>> policies.
>>
>> All 4 of these leaves have enable bits (first two by TopoExt, next by SEV,
>> next by PQOS), so any software following the rules is fine and will leave them
>> alone.  However, leaf 0x8000001d takes a subleaf input and at least two
>> userspace utilities have been observed to loop indefinitely under Xen (clearly
>> waiting for eax to report "no more cache levels").
>>
>> Such userspace is buggy, but Xen's behaviour isn't great either.
>>
>> In the short term, clobber all information in these leaves.  This is a giant
>> bodge, but there are complexities with implementing all of these leaves
>> properly.
>>
>> Fixes: 1a914256dca5 ("x86/cpuid: support LFENCE always serialising CPUID bit")
>> Link: https://github.com/QubesOS/qubes-issues/issues/7392
>> Reported-by: fosslinux <fosslinux@aussies.space>
>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

>
>> It turns out that Intel leaf 4 and AMD leaf 0x8000001d are *almost* identical.
>> They differ by the "complex" bit in edx, and the $X-per-cache fields in the
>> top of eax (Intel is threads-per-cache, AMD is cores-per-cache and lacks the
>> cores-per-package field).
>>
>> As neither vendor implement each others version, I'm incredibly tempted to
>> reuse p->cache for both, rather than doubling the storage space.  Reading the
>> data out is easy to key on p->extd.topoext.  Writing the data can be done
>> without any further complexity if we simply trust the sending side to have its
>> indices the proper way around.  Particularly, this avoids needing to ensure
>> that p->extd.topoext is out of order and at the head of the stream.  Thoughts?
> Sounds quite reasonable to me. I guess the main risk is for new things
> to appear on either vendor's side in a way breaking the overlaying
> approach. But I guess that's not a significant risk.

Neither of the vendors are going to change it in incompatible ways to
how they currently expose it, and it's data that Xen doesn't
particularly care about it - we never interpret it on behalf of the guest.

When we fix the toolstack side of things to calculate topology properly,
the $foo-per-cache fields need adjusting, but that logic will be fine to
switch ( vendor ) on.  Since writing this, I found AMD's
cores-per-package and it's in the adjacent leaf with a wider field.

> As to ordering dependencies: Are there any in reality? Neither vendor
> implements the other vendor's leaf, so there's only going to be one in
> the stream anyway, and which one it is can be disambiguated by having
> seen leaf 0 alone.

The complexity is what (if anything) we do in
x86_cpuid_copy_from_buffer().  I've done some prototyping, and the
easiest option is to accept both 4 and e1Dd, in a latest-takes-precedent
manner precedent, and that we don't create interlinkage with the topoext
bit.

I've also got a pile of fixes to the unit tests so we hopefully can't
make mistakes like this again, although that will depend on getting
test-cpuid-policy running in OSSTest which is a todo list item which
really needs to get done.

~Andrew