MAINTAINERS | 7 + drivers/edac/Kconfig | 1 - drivers/edac/amd64_edac.c | 48 --- drivers/ras/Kconfig | 13 + drivers/ras/Makefile | 1 + drivers/ras/amd/atl/Kconfig | 1 + drivers/ras/amd/atl/umc.c | 51 +++ drivers/ras/amd/fmpm.c | 776 ++++++++++++++++++++++++++++++++++++ include/linux/ras.h | 2 + 9 files changed, 851 insertions(+), 49 deletions(-) create mode 100644 drivers/ras/amd/fmpm.c
Hi all, This set adds a new module to manage error records on persistent storage. Patch 1 moves a function from AMD64 EDAC to the AMD Address Translation Library. This is needed for patch 2. Patch 2 adds the new module. This is a near total rewrite based on patch 2 from the following set: https://lore.kernel.org/r/20231129075034.2159223-1-muralimk@amd.com I included questions in code comments where I think more attention is needed. I'd like to add Murali and Naveen as Co-developers, since this is based on their work. Also, I kept Naveen as a maintainer in case he's still interested. Regarding the old set: * Patch 1 exports a new function from the ERST driver. This is not necessary. * Patch 3 adds a new sysfs interface. This needs more work. * Patch 4 old set adds documentation. This needs updating. I did some basic testing on a 2P server system without ERST support. Mostly I tried to check out the memory layout of the structures. And I did some memory error injections to check out the record updating flow. I did some fixups after testing, so I apologize if I missed anything. Thanks, Yazen Yazen Ghannam (2): RAS/AMD/ATL, EDAC/amd64: Move MI300 Row Retirement to ATL RAS: Introduce the FRU Memory Poison Manager MAINTAINERS | 7 + drivers/edac/Kconfig | 1 - drivers/edac/amd64_edac.c | 48 --- drivers/ras/Kconfig | 13 + drivers/ras/Makefile | 1 + drivers/ras/amd/atl/Kconfig | 1 + drivers/ras/amd/atl/umc.c | 51 +++ drivers/ras/amd/fmpm.c | 776 ++++++++++++++++++++++++++++++++++++ include/linux/ras.h | 2 + 9 files changed, 851 insertions(+), 49 deletions(-) create mode 100644 drivers/ras/amd/fmpm.c base-commit: c2064388aa8765abd7c2c5785e7bfe266a2f6cd3 -- 2.34.1
On Tue, Feb 13, 2024 at 09:35:14PM -0600, Yazen Ghannam wrote:
> I included questions in code comments where I think more attention is
> needed.
Lemme look.
> Also, I kept Naveen as a maintainer in case he's still interested.
I don't mind that as long as he responds to bug reports from users and
addresses them in timely manner.
> I did some basic testing on a 2P server system without ERST support.
> Mostly I tried to check out the memory layout of the structures. And I
> did some memory error injections to check out the record updating flow.
> I did some fixups after testing, so I apologize if I missed anything.
Right, I'd like for Murali and/or Naveen to test the final version but
lemme go through them first.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Boris, On 2/14/2024 1:22 PM, Borislav Petkov wrote: > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. > > > On Tue, Feb 13, 2024 at 09:35:14PM -0600, Yazen Ghannam wrote: >> I included questions in code comments where I think more attention is >> needed. > > Lemme look. > >> Also, I kept Naveen as a maintainer in case he's still interested. > > I don't mind that as long as he responds to bug reports from users and > addresses them in timely manner. > >> I did some basic testing on a 2P server system without ERST support. >> Mostly I tried to check out the memory layout of the structures. And I >> did some memory error injections to check out the record updating flow. >> I did some fixups after testing, so I apologize if I missed anything. > > Right, I'd like for Murali and/or Naveen to test the final version but > lemme go through them first. > Please include, we have worked previous versions of this patch set. Co-developed-by: naveenkrishna.chatradhi@amd.com Signed-off-by: naveenkrishna.chatradhi@amd.com Co-developed-by: muralidhara.mk@amd.com Signed-off-by: muralidhara.mk@amd.com Co-developed-by: sathyapriya.k@amd.com Signed-off-by: sathyapriya.k@amd.com > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette >
On 2/20/2024 5:59 PM, M K, Muralidhara wrote: > Hi Boris, > > On 2/14/2024 1:22 PM, Borislav Petkov wrote: >> Caution: This message originated from an External Source. Use proper >> caution when opening attachments, clicking links, or responding. >> >> >> On Tue, Feb 13, 2024 at 09:35:14PM -0600, Yazen Ghannam wrote: >>> I included questions in code comments where I think more attention is >>> needed. >> >> Lemme look. >> >>> Also, I kept Naveen as a maintainer in case he's still interested. >> >> I don't mind that as long as he responds to bug reports from users and >> addresses them in timely manner. >> >>> I did some basic testing on a 2P server system without ERST support. >>> Mostly I tried to check out the memory layout of the structures. And I >>> did some memory error injections to check out the record updating flow. >>> I did some fixups after testing, so I apologize if I missed anything. >> >> Right, I'd like for Murali and/or Naveen to test the final version but >> lemme go through them first. >> > Please include, we have worked previous versions of this patch set. > Co-developed-by: naveenkrishna.chatradhi@amd.com > Signed-off-by: naveenkrishna.chatradhi@amd.com > Co-developed-by: muralidhara.mk@amd.com > Signed-off-by: muralidhara.mk@amd.com > Co-developed-by: sathyapriya.k@amd.com > Signed-off-by: sathyapriya.k@amd.com > > Sorry, Just re-arranging the tags. Please add the below tags Co-developed-by: naveenkrishna.chatradhi@amd.com Signed-off-by: naveenkrishna.chatradhi@amd.com Co-developed-by: muralidhara.mk@amd.com Signed-off-by: muralidhara.mk@amd.com Tested-by: sathyapriya.k@amd.com >> Thx. >> >> -- >> Regards/Gruss, >> Boris. >> >> https://people.kernel.org/tglx/notes-about-netiquette >>
© 2016 - 2025 Red Hat, Inc.