[PATCH v2 0/8] perf/x86/intel/uncore: PMU setup robustness fixes

Zide Chen posted 8 patches 6 days, 11 hours ago
arch/x86/events/intel/uncore.c           | 222 +++++++++++------------
arch/x86/events/intel/uncore.h           |  39 +++-
arch/x86/events/intel/uncore_discovery.c |  21 ++-
arch/x86/events/intel/uncore_discovery.h |   6 +-
arch/x86/events/intel/uncore_nhmex.c     |   3 +-
arch/x86/events/intel/uncore_snb.c       |  82 ++++++---
arch/x86/events/intel/uncore_snbep.c     |  77 +++++---
7 files changed, 254 insertions(+), 196 deletions(-)
[PATCH v2 0/8] perf/x86/intel/uncore: PMU setup robustness fixes
Posted by Zide Chen 6 days, 11 hours ago
This series fixes correctness issues in Intel uncore PMU setup:

- If all init_box() on a PMU fails, the PMU sysfs node may still exist,
  while perf events read zeros and silently report wrong data.
- If init_box() fails on only some dies, perf may return partial
  non-zero counts, which is harder to diagnose.
- CPU hotplug ref/unref ordering bugs can skip init_box() when the first
  CPU in a die comes online, and can call box_exit() prematurely when
  the second-to-last CPU goes offline.
- PCI PMU cleanup on setup failure has activeboxes leaks and potential
  NULL pointer dereference in error paths.

To address this, the series introduces a PMU broken state to track setup
failures and switches MSR/MMIO PMUs to lazy registration, matching
existing PCI behavior.

To avoid merge conflicts, this series should be applied after:
https://lore.kernel.org/lkml/20260527151154.130505-1-zide.chen@intel.com/
(textual conflict, no logical dependency)

V2 changes:
- Add new patch 1 to fix PCI PMU cleanup issues (Sashiko)
- Keep pmu->activeboxes naming and semantics to avoid potential refcnt
  leaks in the uncore_pci_remove() path. To accomplish this, make the
  PMU broken flag sticky and decrement pmu->activeboxes on active box
  only.
- Update commit messages and changelogs according.

V1: https://lore.kernel.org/lkml/20260512233048.9577-1-zide.chen@intel.com/
Sashiko's review: https://sashiko.dev/#/patchset/20260512233048.9577-1-zide.chen@intel.com

Zide Chen (8):
  perf/x86/intel/uncore: Fix PCI PMU cleanup on setup failure
  perf/x86/intel/uncore: Fix refcnt and other cleanups
  perf/x86/intel/uncore: Let init_box() callback report failures
  perf/x86/intel/uncore: Keep PCI PMUs working when MMIO/MSR setup fails
  perf/x86/intel/uncore: Factor out box setup code
  perf/x86/intel/uncore: Introduce PMU flags and broken state
  perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU
    hotplug
  perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU

 arch/x86/events/intel/uncore.c           | 222 +++++++++++------------
 arch/x86/events/intel/uncore.h           |  39 +++-
 arch/x86/events/intel/uncore_discovery.c |  21 ++-
 arch/x86/events/intel/uncore_discovery.h |   6 +-
 arch/x86/events/intel/uncore_nhmex.c     |   3 +-
 arch/x86/events/intel/uncore_snb.c       |  82 ++++++---
 arch/x86/events/intel/uncore_snbep.c     |  77 +++++---
 7 files changed, 254 insertions(+), 196 deletions(-)

-- 
2.54.0