When running pmu test on SPR, the following #GP fault is reported.
Unhandled exception 13 #GP at ip 000000000040771f
error_code=0000 rflags=00010046 cs=00000008
rax=00000000004031ad rcx=0000000000000186 rdx=0000000000000000 rbx=00000000005142f0
rbp=0000000000514260 rsi=0000000000000020 rdi=0000000000000340
r8=0000000000513a65 r9=00000000000003f8 r10=000000000000000d r11=00000000ffffffff
r12=000000000043003c r13=0000000000514450 r14=000000000000000b r15=0000000000000001
cr0=0000000080010011 cr2=0000000000000000 cr3=0000000001007000 cr4=0000000000000020
cr8=0000000000000000
STACK: @40771f 40040e 400976 400aef 40148d 401da9 4001ad
FAIL pmu
It looks EVENTSEL0 MSR (0x186) is written a invalid value (0x4031ad) and
cause a #GP.
Further investigation shows the #GP is caused by below code in
__start_event().
rmsr(MSR_GP_EVENT_SELECTx(event_to_global_idx(evt)),
evt->config | EVNTSEL_EN);
The evt->config is correctly initialized but seems corrupted before
writing to MSR.
The original pmu_counter_t layout looks as below.
typedef struct {
uint32_t ctr;
uint64_t config;
uint64_t count;
int idx;
} pmu_counter_t;
Obviously the config filed crosses two cache lines. When the two cache
lines are not updated simultaneously, the config value is corrupted.
Adjust pmu_counter_t fields order and ensure config field is cache-line
aligned.
Signeduoff-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
x86/pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/x86/pmu.c b/x86/pmu.c
index 60db8bdf..a0268db8 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -21,9 +21,9 @@
typedef struct {
uint32_t ctr;
+ uint32_t idx;
uint64_t config;
uint64_t count;
- int idx;
} pmu_counter_t;
struct pmu_event {
--
2.40.1