Hi all,
This set unifies the AMD MCA interrupt handlers with common MCA code.
The goal is to avoid duplicating functionality like reading and clearing
MCA banks.
Patches 1-2:
Unify AMD interrupt handlers with common MCE code.
Patches 3-4:
SMCA Corrected Error Interrupt support.
Patches 5-7:
Interrupt storm handling rebased on current set.
Patch 8:
Add support to get threshold limit from APEI HEST.
Thanks,
Yazen
---
Changes in v8:
- Apply "DFR unify" fixups. (Boris)
- Update "HEST threshold limit" string. (Boris)
- Link to v7: https://lore.kernel.org/r/20251016-wip-mca-updates-v7-0-5c139a4062cb@amd.com
Changes in v7:
- Rework DFR error handling to avoid reporting bogus errors.
- Don't modify polling banks for AMD-systems after an interrupt storm.
- Link to v6: https://lore.kernel.org/r/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com
- Link to "spurious errors" thread:
https://lore.kernel.org/r/20250915010010.3547-1-spasswolf@web.de
Changes in v6:
- Rebase on tip/ras/core.
- Address comments from Boris for patches 1, 8, and 10.
- Link to v5: https://lore.kernel.org/r/20250825-wip-mca-updates-v5-0-865768a2eef8@amd.com
Changes in v5:
- Rebase on v6.17-rc1.
- Add tags and address comments from Nikolay.
- Added back patch that was dropped from v4.
- Link to v4: https://lore.kernel.org/r/20250624-wip-mca-updates-v4-0-236dd74f645f@amd.com
Changes in v4:
- Rebase on v6.16-rc3.
- Address comments from Boris about function names.
- Redo DFR handler integration.
- Drop AMD APIC LVT rework.
- Include more AMD thresholding reworks and fixes.
- Add support to get threshold limit from APEI HEST.
- Reorder patches so most fixes and reworks are at the beginning.
- Link to v3: https://lore.kernel.org/r/20250415-wip-mca-updates-v3-0-8ffd9eb4aa56@amd.com
Changes in v3:
- Rebased on tip/x86/merge rather than tip/master.
- Updated MSR access helpers (*msrl -> *msrq).
- Add patch to fix polling after a storm.
- Link to v2: https://lore.kernel.org/r/20250213-wip-mca-updates-v2-0-3636547fe05f@amd.com
Changes in v2:
- Add general cleanup pre-patches.
- Add changes for BSP-only init.
- Add interrupt storm handling for AMD.
- Link to v1: https://lore.kernel.org/r/20240523155641.2805411-1-yazen.ghannam@amd.com
---
Smita Koralahalli (1):
x86/mce: Handle AMD threshold interrupt storms
Yazen Ghannam (7):
x86/mce: Unify AMD THR handler with MCA Polling
x86/mce: Unify AMD DFR handler with MCA Polling
x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
x86/mce/amd: Support SMCA Corrected Error Interrupt
x86/mce/amd: Remove redundant reset_block()
x86/mce/amd: Define threshold restart function for banks
x86/mce: Save and use APEI corrected threshold limit
arch/x86/include/asm/mce.h | 12 ++
arch/x86/kernel/acpi/apei.c | 2 +
arch/x86/kernel/cpu/mce/amd.c | 340 ++++++++++++++----------------------
arch/x86/kernel/cpu/mce/core.c | 31 +++-
arch/x86/kernel/cpu/mce/internal.h | 4 +
arch/x86/kernel/cpu/mce/threshold.c | 19 +-
6 files changed, 195 insertions(+), 213 deletions(-)
---
base-commit: 5c6f123c419b6e20f84ac1683089a52f449273aa
change-id: 20250210-wip-mca-updates-bed2a67c9c57