[PATCH] perf/x86/intel/ds: Fix loop ordering in release_ds_buffers()

Rik van Riel posted 1 patch 1 month ago
arch/x86/events/intel/ds.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
[PATCH] perf/x86/intel/ds: Fix loop ordering in release_ds_buffers()
Posted by Rik van Riel 1 month ago
release_ds_buffers() has three loops:
1. release_ds_buffer() - NULLs hwev->ds for each CPU
2. fini_debug_store_on_cpu() - clears MSR_IA32_DS_AREA
3. release_pebs_buffer()/release_bts_buffer() - unmaps CEA pages and
   frees backing pages

The problem is that fini_debug_store_on_cpu() checks if hwev->ds is
NULL and returns early if so. Since loop 1 already NULLed hwev->ds,
loop 2 never actually clears MSR_IA32_DS_AREA. Then loop 3 unmaps the
CEA pages, leaving the MSR pointing at now-unmapped memory. When a PEBS
overflow fires, the hardware writes to unmapped pages, causing page
faults in random victim code.

Fix by calling fini_debug_store_on_cpu() BEFORE release_ds_buffer(), so the
MSR is cleared while hwev->ds is still valid.

Observed crash signature:
  BUG: unable to handle kernel paging request in __lookup_object
  CR2: fffffe00004b7028 (CEA range)
  RIP: __lookup_object+0x39 (cmp %rdi,%rax -- register-only, can't fault)
  Secondary: TASK stack guard page hit (recursive page fault overflow)

Assisted-by: Claude:claude-opus-4.7 syzkaller
Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/events/intel/ds.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 2abfeb4e2908..85894673f03b 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -973,18 +973,25 @@ void release_ds_buffers(void)
 	if (!x86_pmu.bts && !x86_pmu.ds_pebs)
 		return;
 
-	for_each_possible_cpu(cpu)
-		release_ds_buffer(cpu);
-
 	for_each_possible_cpu(cpu) {
 		/*
-		 * Again, ignore errors from offline CPUs, they will no longer
-		 * observe cpu_hw_events.ds and not program the DS_AREA when
-		 * they come up.
+		 * Clear MSR_IA32_DS_AREA BEFORE NULLing hwev->ds.
+		 * fini_debug_store_on_cpu() checks hwev->ds and bails
+		 * if it's NULL, so calling release_ds_buffer() first
+		 * would prevent the MSR from being cleared. That leaves
+		 * the hardware writing into CEA pages that get unmapped
+		 * below, causing asynchronous page faults at random RIPs.
+		 *
+		 * Ignore errors from offline CPUs, they will no longer
+		 * observe cpu_hw_events.ds and not program the DS_AREA
+		 * when they come up.
 		 */
 		fini_debug_store_on_cpu(cpu);
 	}
 
+	for_each_possible_cpu(cpu)
+		release_ds_buffer(cpu);
+
 	for_each_possible_cpu(cpu) {
 		if (x86_pmu.ds_pebs)
 			release_pebs_buffer(cpu);
-- 
2.53.0-Meta