[PATCH v8 0/8] AMD MCA interrupts rework

Yazen Ghannam posted 8 patches 1 month, 1 week ago
arch/x86/include/asm/mce.h          |  12 ++
arch/x86/kernel/acpi/apei.c         |   2 +
arch/x86/kernel/cpu/mce/amd.c       | 340 ++++++++++++++----------------------
arch/x86/kernel/cpu/mce/core.c      |  31 +++-
arch/x86/kernel/cpu/mce/internal.h  |   4 +
arch/x86/kernel/cpu/mce/threshold.c |  19 +-
6 files changed, 195 insertions(+), 213 deletions(-)
[PATCH v8 0/8] AMD MCA interrupts rework
Posted by Yazen Ghannam 1 month, 1 week ago
Hi all,

This set unifies the AMD MCA interrupt handlers with common MCA code.
The goal is to avoid duplicating functionality like reading and clearing
MCA banks.

Patches 1-2:
Unify AMD interrupt handlers with common MCE code.

Patches 3-4:
SMCA Corrected Error Interrupt support.

Patches 5-7:
Interrupt storm handling rebased on current set.

Patch 8:
Add support to get threshold limit from APEI HEST.

Thanks,
Yazen

---
Changes in v8:
- Apply "DFR unify" fixups. (Boris)
- Update "HEST threshold limit" string. (Boris)
- Link to v7: https://lore.kernel.org/r/20251016-wip-mca-updates-v7-0-5c139a4062cb@amd.com

Changes in v7:
- Rework DFR error handling to avoid reporting bogus errors.
- Don't modify polling banks for AMD-systems after an interrupt storm.
- Link to v6: https://lore.kernel.org/r/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com
- Link to "spurious errors" thread:
  https://lore.kernel.org/r/20250915010010.3547-1-spasswolf@web.de

Changes in v6:
- Rebase on tip/ras/core.
- Address comments from Boris for patches 1, 8, and 10.
- Link to v5: https://lore.kernel.org/r/20250825-wip-mca-updates-v5-0-865768a2eef8@amd.com

Changes in v5:
- Rebase on v6.17-rc1.
- Add tags and address comments from Nikolay.
- Added back patch that was dropped from v4.
- Link to v4: https://lore.kernel.org/r/20250624-wip-mca-updates-v4-0-236dd74f645f@amd.com

Changes in v4:
- Rebase on v6.16-rc3.
- Address comments from Boris about function names.
- Redo DFR handler integration.
- Drop AMD APIC LVT rework.
- Include more AMD thresholding reworks and fixes.
- Add support to get threshold limit from APEI HEST.
- Reorder patches so most fixes and reworks are at the beginning.
- Link to v3: https://lore.kernel.org/r/20250415-wip-mca-updates-v3-0-8ffd9eb4aa56@amd.com

Changes in v3:
- Rebased on tip/x86/merge rather than tip/master.
- Updated MSR access helpers (*msrl -> *msrq).
- Add patch to fix polling after a storm.
- Link to v2: https://lore.kernel.org/r/20250213-wip-mca-updates-v2-0-3636547fe05f@amd.com

Changes in v2:
- Add general cleanup pre-patches.
- Add changes for BSP-only init.
- Add interrupt storm handling for AMD.
- Link to v1: https://lore.kernel.org/r/20240523155641.2805411-1-yazen.ghannam@amd.com

---
Smita Koralahalli (1):
      x86/mce: Handle AMD threshold interrupt storms

Yazen Ghannam (7):
      x86/mce: Unify AMD THR handler with MCA Polling
      x86/mce: Unify AMD DFR handler with MCA Polling
      x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
      x86/mce/amd: Support SMCA Corrected Error Interrupt
      x86/mce/amd: Remove redundant reset_block()
      x86/mce/amd: Define threshold restart function for banks
      x86/mce: Save and use APEI corrected threshold limit

 arch/x86/include/asm/mce.h          |  12 ++
 arch/x86/kernel/acpi/apei.c         |   2 +
 arch/x86/kernel/cpu/mce/amd.c       | 340 ++++++++++++++----------------------
 arch/x86/kernel/cpu/mce/core.c      |  31 +++-
 arch/x86/kernel/cpu/mce/internal.h  |   4 +
 arch/x86/kernel/cpu/mce/threshold.c |  19 +-
 6 files changed, 195 insertions(+), 213 deletions(-)
---
base-commit: 5c6f123c419b6e20f84ac1683089a52f449273aa
change-id: 20250210-wip-mca-updates-bed2a67c9c57