[PATCH] perf/x86/intel: Don't update PEBS_ENABLE MSR in NMI context

Jim Mattson posted 1 patch 2 weeks ago
arch/x86/events/intel/core.c | 28 ++++++++++++++++++++++------
arch/x86/events/intel/ds.c   |  4 +++-
arch/x86/events/perf_event.h |  1 +
3 files changed, 26 insertions(+), 7 deletions(-)
[PATCH] perf/x86/intel: Don't update PEBS_ENABLE MSR in NMI context
Posted by Jim Mattson 2 weeks ago
Writing MSR_IA32_PEBS_ENABLE in handle_pmi_common() when a PEBS event is
throttled is problematic for KVM, which may have an older value for the MSR
(previously obtained via perf_guest_get_msrs()) stored in the VM-exit
MSR-load list.

It isn't necessary to update the MSR in the PMI handler, because the
throttled counter's PerfEvtSel ENABLE bit has been cleared, so its
PEBS_ENABLE bit is irrelevant. However, the MSR must be updated before the
stale bit becomes relevant again (i.e. when the PMC starts counting a new
perf event).

When the PEBS_ENABLE MSR is rendered stale in the PMI handler, just record
that fact in a new cpu_hw_events boolean, pebs_stale.  Extend
intel_pmu_pebs_enable_all() to write MSR_IA32_PEBS_ENABLE when pebs_stale
is set and to clear pebs_stale.  This ensures stale bits are cleared
during the normal PMU enable path, before PERF_GLOBAL_CTRL is restored and
any new event can start counting.

Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/events/intel/core.c | 28 ++++++++++++++++++++++------
 arch/x86/events/intel/ds.c   |  4 +++-
 arch/x86/events/perf_event.h |  1 +
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index cf3a4fe06ff2..3b0173d25232 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3548,14 +3548,26 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 		static_call(x86_pmu_drain_pebs)(regs, &data);
 
 		/*
-		 * PMI throttle may be triggered, which stops the PEBS event.
-		 * Although cpuc->pebs_enabled is updated accordingly, the
-		 * MSR_IA32_PEBS_ENABLE is not updated. Because the
-		 * cpuc->enabled has been forced to 0 in PMI.
-		 * Update the MSR if pebs_enabled is changed.
+		 * PMI throttling may be triggered, which stops the PEBS
+		 * event.  Although cpuc->pebs_enabled was updated
+		 * accordingly, MSR_IA32_PEBS_ENABLE has not been updated.
+		 *
+		 * The MSR must not be updated in NMI context, because KVM
+		 * may be in a critical region on its way to VM-entry, with
+		 * the now-stale MSR value stored in the VM-exit MSR-load
+		 * list.  On VM-exit, the CPU will restore the stale value.
+		 *
+		 * Deferring the MSR update is harmless because the
+		 * throttled event's PerfEvtSel ENABLE bit has been
+		 * cleared, so the stale PEBS_ENABLE bit is irrelevant for
+		 * now.
+		 *
+		 * Track when MSR_IA32_PEBS_ENABLE is stale, so that
+		 * pebs_enable_all() will write cpuc->pebs_enabled to the
+		 * MSR, even when cpuc->pebs_enabled is 0.
 		 */
 		if (pebs_enabled != cpuc->pebs_enabled)
-			wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
+			cpuc->pebs_stale = true;
 
 		/*
 		 * Above PEBS handler (PEBS counters snapshotting) has updated fixed
@@ -4978,6 +4990,10 @@ static int intel_pmu_hw_config(struct perf_event *event)
  * These values have nothing to do with the emulated values the guest sees
  * when it uses {RD,WR}MSR, which should be handled by the KVM context,
  * specifically in the intel_pmu_{get,set}_msr().
+ *
+ * Note that MSRs returned from this function must not be modified in NMI
+ * context. Doing so could result in a stale host value being restored at
+ * the next VM-exit.
  */
 static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 {
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 5027afc97b65..852baf6a05e9 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1982,8 +1982,10 @@ void intel_pmu_pebs_enable_all(void)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
-	if (cpuc->pebs_enabled)
+	if (cpuc->pebs_enabled || cpuc->pebs_stale)
 		wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
+
+	cpuc->pebs_stale = false;
 }
 
 void intel_pmu_pebs_disable_all(void)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index fad87d3c8b2c..22102296e31b 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -298,6 +298,7 @@ struct cpu_hw_events {
 	/* DS based PEBS or arch-PEBS buffer address */
 	void			*pebs_vaddr;
 	u64			pebs_enabled;
+	bool			pebs_stale;
 	int			n_pebs;
 	int			n_large_pebs;
 	int			n_pebs_via_pt;
-- 
2.53.0.959.g497ff81fa9-goog
Re: [PATCH] perf/x86/intel: Don't update PEBS_ENABLE MSR in NMI context
Posted by Jim Mattson 2 weeks ago
On Fri, Mar 20, 2026 at 8:29 AM Jim Mattson <jmattson@google.com> wrote:
>
> Writing MSR_IA32_PEBS_ENABLE in handle_pmi_common() when a PEBS event is
> throttled is problematic for KVM, which may have an older value for the MSR
> (previously obtained via perf_guest_get_msrs()) stored in the VM-exit
> MSR-load list.
>
> It isn't necessary to update the MSR in the PMI handler, because the
> throttled counter's PerfEvtSel ENABLE bit has been cleared, so its
> PEBS_ENABLE bit is irrelevant. However, the MSR must be updated before the
> stale bit becomes relevant again (i.e. when the PMC starts counting a new
> perf event).
>
> When the PEBS_ENABLE MSR is rendered stale in the PMI handler, just record
> that fact in a new cpu_hw_events boolean, pebs_stale.  Extend
> intel_pmu_pebs_enable_all() to write MSR_IA32_PEBS_ENABLE when pebs_stale
> is set and to clear pebs_stale.  This ensures stale bits are cleared
> during the normal PMU enable path, before PERF_GLOBAL_CTRL is restored and
> any new event can start counting.
>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/events/intel/core.c | 28 ++++++++++++++++++++++------
>  arch/x86/events/intel/ds.c   |  4 +++-
>  arch/x86/events/perf_event.h |  1 +
>  3 files changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index cf3a4fe06ff2..3b0173d25232 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3548,14 +3548,26 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
>                 static_call(x86_pmu_drain_pebs)(regs, &data);
>
>                 /*
> -                * PMI throttle may be triggered, which stops the PEBS event.
> -                * Although cpuc->pebs_enabled is updated accordingly, the
> -                * MSR_IA32_PEBS_ENABLE is not updated. Because the
> -                * cpuc->enabled has been forced to 0 in PMI.
> -                * Update the MSR if pebs_enabled is changed.
> +                * PMI throttling may be triggered, which stops the PEBS
> +                * event.  Although cpuc->pebs_enabled was updated
> +                * accordingly, MSR_IA32_PEBS_ENABLE has not been updated.
> +                *
> +                * The MSR must not be updated in NMI context, because KVM
> +                * may be in a critical region on its way to VM-entry, with
> +                * the now-stale MSR value stored in the VM-exit MSR-load
> +                * list.  On VM-exit, the CPU will restore the stale value.
> +                *
> +                * Deferring the MSR update is harmless because the
> +                * throttled event's PerfEvtSel ENABLE bit has been
> +                * cleared, so the stale PEBS_ENABLE bit is irrelevant for
> +                * now.
> +                *
> +                * Track when MSR_IA32_PEBS_ENABLE is stale, so that
> +                * pebs_enable_all() will write cpuc->pebs_enabled to the
> +                * MSR, even when cpuc->pebs_enabled is 0.
>                  */
>                 if (pebs_enabled != cpuc->pebs_enabled)
> -                       wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
> +                       cpuc->pebs_stale = true;
>
>                 /*
>                  * Above PEBS handler (PEBS counters snapshotting) has updated fixed
> @@ -4978,6 +4990,10 @@ static int intel_pmu_hw_config(struct perf_event *event)
>   * These values have nothing to do with the emulated values the guest sees
>   * when it uses {RD,WR}MSR, which should be handled by the KVM context,
>   * specifically in the intel_pmu_{get,set}_msr().
> + *
> + * Note that MSRs returned from this function must not be modified in NMI
> + * context. Doing so could result in a stale host value being restored at
> + * the next VM-exit.
>   */
>  static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>  {
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 5027afc97b65..852baf6a05e9 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1982,8 +1982,10 @@ void intel_pmu_pebs_enable_all(void)
>  {
>         struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>
> -       if (cpuc->pebs_enabled)
> +       if (cpuc->pebs_enabled || cpuc->pebs_stale)
>                 wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
> +
> +       cpuc->pebs_stale = false;
>  }
>
>  void intel_pmu_pebs_disable_all(void)
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index fad87d3c8b2c..22102296e31b 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -298,6 +298,7 @@ struct cpu_hw_events {
>         /* DS based PEBS or arch-PEBS buffer address */
>         void                    *pebs_vaddr;
>         u64                     pebs_enabled;
> +       bool                    pebs_stale;
>         int                     n_pebs;
>         int                     n_large_pebs;
>         int                     n_pebs_via_pt;
> --
> 2.53.0.959.g497ff81fa9-goog
>

Apologies. Resend to fix Peter Zijlstra's address.