drivers/edac/amd64_edac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Starting Zen4, AMD SOCs have 12 Unified Memory Controllers (UMCs) per
socket.
When the amd64_edac module is being loaded, these UMCs are traversed to
determine if they have SdpInit (SdpCtrl[31]) and EccEnabled (UmcCapHi[30])
bits set and create masks in umc_en_mask and ecc_en_mask respectively.
However, the current data type of these variables is u8. As a result, if
only the last 4 UMCs (UMC8 - UMC11) of the system have been utilized,
umc_ecc_enabled() will return false. Consequently, the module may fail to
load on these systems.
Change the data type of these variables to u16.
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
drivers/edac/amd64_edac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index ddfbdb66b794..583685e8e60f 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3362,7 +3362,7 @@ static bool dct_ecc_enabled(struct amd64_pvt *pvt)
static bool umc_ecc_enabled(struct amd64_pvt *pvt)
{
- u8 umc_en_mask = 0, ecc_en_mask = 0;
+ u16 umc_en_mask = 0, ecc_en_mask = 0;
u16 nid = pvt->mc_node_id;
struct amd64_umc *umc;
u8 ecc_en = 0, i;
--
2.43.0
On Mon, Dec 09, 2024 at 09:55:10PM +0000, Avadhut Naik wrote: > Starting Zen4, AMD SOCs have 12 Unified Memory Controllers (UMCs) per > socket. > > When the amd64_edac module is being loaded, these UMCs are traversed to > determine if they have SdpInit (SdpCtrl[31]) and EccEnabled (UmcCapHi[30]) > bits set and create masks in umc_en_mask and ecc_en_mask respectively. > > However, the current data type of these variables is u8. As a result, if > only the last 4 UMCs (UMC8 - UMC11) of the system have been utilized, > umc_ecc_enabled() will return false. Consequently, the module may fail to > load on these systems. > > Change the data type of these variables to u16. No need to explain what the patch does. The "why" is enough. > Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> > --- > drivers/edac/amd64_edac.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) This looks like it needs a CC:stable and a Fixes: tag, right? While at it, you can simply make those vars int and be done with it. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
On 12/10/2024 03:55, Borislav Petkov wrote: > On Mon, Dec 09, 2024 at 09:55:10PM +0000, Avadhut Naik wrote: >> Starting Zen4, AMD SOCs have 12 Unified Memory Controllers (UMCs) per >> socket. >> >> When the amd64_edac module is being loaded, these UMCs are traversed to >> determine if they have SdpInit (SdpCtrl[31]) and EccEnabled (UmcCapHi[30]) >> bits set and create masks in umc_en_mask and ecc_en_mask respectively. >> >> However, the current data type of these variables is u8. As a result, if >> only the last 4 UMCs (UMC8 - UMC11) of the system have been utilized, >> umc_ecc_enabled() will return false. Consequently, the module may fail to >> load on these systems. >> >> Change the data type of these variables to u16. > > No need to explain what the patch does. The "why" is enough. > Will remove this line. >> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> >> --- >> drivers/edac/amd64_edac.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > This looks like it needs a CC:stable and a Fixes: tag, right? > It does need a Fixes: tag. Will add that. But not so sure about the CC:stable. Quoting from stable-kernel-rules about the patches that are accepted into stable. It must fix a real bug that bothers people (not a, “This could be a problem...” type thing). This has not been reported by anyone yet. IMO, it classifies as "This could be a problem" type issue. Would you agree? > While at it, you can simply make those vars int and be done with it. > Okay, will change the data type of these 2 variables to int. > Thx. > -- Thanks, Avadhut Naik
On Tue, Dec 10, 2024 at 11:32:06AM -0600, Naik, Avadhut wrote: > But not so sure about the CC:stable. > Quoting from stable-kernel-rules about the patches that are accepted into stable. > > It must fix a real bug that bothers people (not a, “This could be a problem...” type thing). > > This has not been reported by anyone yet. IMO, it classifies as > "This could be a problem" type issue. > Would you agree? So if someone populates the last 4 UMCs, he won't have EDAC load on long-term kernels. So are you sure about "this could be a problem"? Or is it more like a "this is real problem for some configurations" issue? IOW, at least "oh, that's not good" issue. Or? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
On 12/10/2024 12:04, Borislav Petkov wrote: > On Tue, Dec 10, 2024 at 11:32:06AM -0600, Naik, Avadhut wrote: >> But not so sure about the CC:stable. >> Quoting from stable-kernel-rules about the patches that are accepted into stable. >> >> It must fix a real bug that bothers people (not a, “This could be a problem...” type thing). >> >> This has not been reported by anyone yet. IMO, it classifies as >> "This could be a problem" type issue. >> Would you agree? > > So if someone populates the last 4 UMCs, he won't have EDAC load on long-term > kernels. So are you sure about "this could be a problem"? > > Or is it more like a "this is real problem for some configurations" issue? > > IOW, at least "oh, that's not good" issue. > > Or? Agreed that this is a real problem for some configurations, particularly since the module wont load, even on long term kernels. Will add stable to Cc and resend. > -- Thanks, Avadhut Naik
© 2016 - 2024 Red Hat, Inc.