[PATCH v6 00/15] AMD MCA interrupts rework

Yazen Ghannam posted 15 patches 1 day, 5 hours ago
arch/x86/include/asm/mce.h          |  14 ++
arch/x86/kernel/acpi/apei.c         |   2 +
arch/x86/kernel/cpu/common.c        |   1 +
arch/x86/kernel/cpu/mce/amd.c       | 367 +++++++++++++++---------------------
arch/x86/kernel/cpu/mce/core.c      | 280 +++++++++++++++------------
arch/x86/kernel/cpu/mce/intel.c     |  18 ++
arch/x86/kernel/cpu/mce/internal.h  |   9 +
arch/x86/kernel/cpu/mce/threshold.c |  16 ++
8 files changed, 374 insertions(+), 333 deletions(-)
[PATCH v6 00/15] AMD MCA interrupts rework
Posted by Yazen Ghannam 1 day, 5 hours ago
Hi all,

This set unifies the AMD MCA interrupt handlers with common MCA code.
The goal is to avoid duplicating functionality like reading and clearing
MCA banks.

Based on feedback, this revision also include changes to the MCA init
flow.

Patch 1:
Update MCA init ordering.

Patches 2-5:
Add BSP-only init flow and related changes.

Patches 6-9:
Unify AMD interrupt handlers with common MCE code.

Patches 10-11:
SMCA Corrected Error Interrupt support.

Patches 12-14:
Interrupt storm handling rebased on current set.

Patch 15:
Add support to get threshold limit from APEI HEST.

Thanks,
Yazen

---
Changes in v6:
- Rebase on tip/ras/core.
- Address comments from Boris for patches 1, 8, and 10.
- Link to v5: https://lore.kernel.org/r/20250825-wip-mca-updates-v5-0-865768a2eef8@amd.com

Changes in v5:
- Rebase on v6.17-rc1.
- Add tags and address comments from Nikolay.
- Added back patch that was dropped from v4.
- Link to v4: https://lore.kernel.org/r/20250624-wip-mca-updates-v4-0-236dd74f645f@amd.com

Changes in v4:
- Rebase on v6.16-rc3.
- Address comments from Boris about function names.
- Redo DFR handler integration.
- Drop AMD APIC LVT rework.
- Include more AMD thresholding reworks and fixes.
- Add support to get threshold limit from APEI HEST.
- Reorder patches so most fixes and reworks are at the beginning.
- Link to v3: https://lore.kernel.org/r/20250415-wip-mca-updates-v3-0-8ffd9eb4aa56@amd.com

Changes in v3:
- Rebased on tip/x86/merge rather than tip/master.
- Updated MSR access helpers (*msrl -> *msrq).
- Add patch to fix polling after a storm.
- Link to v2: https://lore.kernel.org/r/20250213-wip-mca-updates-v2-0-3636547fe05f@amd.com

Changes in v2:
- Add general cleanup pre-patches.
- Add changes for BSP-only init.
- Add interrupt storm handling for AMD.
- Link to v1: https://lore.kernel.org/r/20240523155641.2805411-1-yazen.ghannam@amd.com

---
Smita Koralahalli (1):
      x86/mce: Handle AMD threshold interrupt storms

Yazen Ghannam (14):
      x86/mce: Set CR4.MCE last during init
      x86/mce: Define BSP-only init
      x86/mce: Define BSP-only SMCA init
      x86/mce: Do 'UNKNOWN' vendor check early
      x86/mce: Separate global and per-CPU quirks
      x86/mce: Move machine_check_poll() status checks to helper functions
      x86/mce: Add clear_bank() helper
      x86/mce: Unify AMD THR handler with MCA Polling
      x86/mce: Unify AMD DFR handler with MCA Polling
      x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
      x86/mce/amd: Support SMCA Corrected Error Interrupt
      x86/mce/amd: Remove redundant reset_block()
      x86/mce/amd: Define threshold restart function for banks
      x86/mce: Save and use APEI corrected threshold limit

 arch/x86/include/asm/mce.h          |  14 ++
 arch/x86/kernel/acpi/apei.c         |   2 +
 arch/x86/kernel/cpu/common.c        |   1 +
 arch/x86/kernel/cpu/mce/amd.c       | 367 +++++++++++++++---------------------
 arch/x86/kernel/cpu/mce/core.c      | 280 +++++++++++++++------------
 arch/x86/kernel/cpu/mce/intel.c     |  18 ++
 arch/x86/kernel/cpu/mce/internal.h  |   9 +
 arch/x86/kernel/cpu/mce/threshold.c |  16 ++
 8 files changed, 374 insertions(+), 333 deletions(-)
---
base-commit: 9f34032ec0deef58bd0eb7475f1981adfa998648
change-id: 20250210-wip-mca-updates-bed2a67c9c57
RE: [PATCH v6 00/15] AMD MCA interrupts rework
Posted by Luck, Tony 1 day, 5 hours ago
> This set unifies the AMD MCA interrupt handlers with common MCA code.
> The goal is to avoid duplicating functionality like reading and clearing
> MCA banks.

Still works fine on Intel Icelake system. Tested poison recovery, and CMCI
storm handling.

-Tony
Re: [PATCH v6 00/15] AMD MCA interrupts rework
Posted by Yazen Ghannam 7 hours ago
On Mon, Sep 08, 2025 at 04:10:30PM +0000, Luck, Tony wrote:
> > This set unifies the AMD MCA interrupt handlers with common MCA code.
> > The goal is to avoid duplicating functionality like reading and clearing
> > MCA banks.
> 
> Still works fine on Intel Icelake system. Tested poison recovery, and CMCI
> storm handling.
> 

Thanks for testing!

-Yazen