lib/cpu_rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Wrap allocation size calculation in size_add() and size_mul() to avoid
any potential overflow.
Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@gmail.com>
---
lib/cpu_rmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/cpu_rmap.c b/lib/cpu_rmap.c
index f03d9be3f06b..18b2146a73d2 100644
--- a/lib/cpu_rmap.c
+++ b/lib/cpu_rmap.c
@@ -36,7 +36,7 @@ struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags)
obj_offset = ALIGN(offsetof(struct cpu_rmap, near[nr_cpu_ids]),
sizeof(void *));
- rmap = kzalloc(obj_offset + size * sizeof(rmap->obj[0]), flags);
+ rmap = kzalloc(size_add(obj_offset, size_mul(size, sizeof(rmap->obj[0]))), flags);
if (!rmap)
return NULL;
--
2.51.0
On 9/30/25 03:23, Mehdi Ben Hadj Khelifa wrote: > Wrap allocation size calculation in size_add() and size_mul() to avoid > any potential overflow. How did you find this problem and how did you test this change? > > Signed-off-by: Mehdi Ben Hadj Khelifa <mehdi.benhadjkhelifa@gmail.com> > --- > lib/cpu_rmap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/cpu_rmap.c b/lib/cpu_rmap.c > index f03d9be3f06b..18b2146a73d2 100644 > --- a/lib/cpu_rmap.c > +++ b/lib/cpu_rmap.c > @@ -36,7 +36,7 @@ struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags) > obj_offset = ALIGN(offsetof(struct cpu_rmap, near[nr_cpu_ids]), > sizeof(void *)); > > - rmap = kzalloc(obj_offset + size * sizeof(rmap->obj[0]), flags); > + rmap = kzalloc(size_add(obj_offset, size_mul(size, sizeof(rmap->obj[0]))), flags); > if (!rmap) > return NULL; > thanks, -- Shuah
On 10/7/25 11:23 PM, Shuah Khan wrote: > > How did you find this problem and how did you test this change? For the first part of your question,After simply referring to deprecated documentation[1] which states the following: 'For other calculations, please compose the use of the size_mul(), size_add(), and size_sub() helpers' Which is about dynamic calculations made inside of kzalloc() and kmalloc(). Specifically, the quoted part is talking about calculations which can't be simply divided into two parameters referring to the number of elements and size per element and in cases where we can't use struct_size() too.After that it was a matter of finding code where that could be the problem which is the case of the changed code. For the second part, As per any patch,I make a copy of all dmesg warnings errors critical messages,then I compile install and boot the new kernel then check if there is any change or regression in dmesg. For this particular change, since it doesn't have any selftests because it's in utility library which in my case cpu_rmap is used in the networking subsystem, I did some fault injection with a custom module to test if in case of overflow it fails safely reporting the issue in dmesg which is catched by the __alloc_frozen_pages_noprof() function in mm/page_alloc.c and also return a NULL for rmap instead of wrapping to a smaller size. If any further testing or work to be done or even suggestions on improvements to my testing methodology, I would gladly hear any advice. Thank you for you time. > > thanks, > -- Shuah Best Regards, Mehdi Ben Hadj Khelifa
On 10/9/25 09:16, Mehdi Ben Hadj Khelifa wrote: > On 10/7/25 11:23 PM, Shuah Khan wrote: > >> >> How did you find this problem and how did you test this change? Bummer - you trimmed the code entirely from the thread. Next time leave it in for context for the discussion. > For the first part of your question,After simply referring to deprecated documentation[1] which states the following: Looks you forgot to add link to the deprecated documentation[1]. It sounds like this is a potential problem without a reproducer. These types of problems made to a critical piece of code require substantial testing. > 'For other calculations, please compose the use of the size_mul(), size_add(), and size_sub() helpers' > Which is about dynamic calculations made inside of kzalloc() and kmalloc(). Specifically, the quoted part is talking about calculations which can't be simply divided into two parameters referring to the number of elements and size per element and in cases where we can't use struct_size() too.After that it was a matter of finding code where that could be the problem which is the case of the changed code. > > For the second part, As per any patch,I make a copy of all dmesg warnings errors critical messages,then I compile install and boot the new kernel then check if there is any change or regression in dmesg. This is a basic boot test which isn't sufficient in this case. > For this particular change, since it doesn't have any selftests because it's in utility library which in my case cpu_rmap is used in the networking subsystem, I did some fault injection with a custom module to test if in case of overflow it fails safely reporting the issue in dmesg which is catched by the __alloc_frozen_pages_noprof() function in mm/page_alloc.c and also return a NULL for rmap instead of wrapping to a smaller size. Custom module testing doesn't test this change in a wider scope which is necessary when you are making changes such as these without a reproducer and a way to reproduce. How do you know this change doesn't introduce regressions? thanks, -- Shuah
On 10/10/25 6:00 PM, Shuah Khan wrote: > On 10/9/25 09:16, Mehdi Ben Hadj Khelifa wrote: >> On 10/7/25 11:23 PM, Shuah Khan wrote: >> >>> >>> How did you find this problem and how did you test this change? > > Bummer - you trimmed the code entirely from the thread. Next time > leave it in for context for the discussion. > Ah, I saw in other LKMLs that some do delete the code so I thought it was okay. We'll do next time.>> For the first part of your question,After simply referring to >> deprecated documentation[1] which states the following: > > Looks you forgot to add link to the deprecated documentation[1]. > It sounds like this is a potential problem without a reproducer. > These types of problems made to a critical piece of code require > substantial testing. > Ack, This is the doc that I was referencing: https://docs.kernel.org/process/deprecated.html I'm not sure what is exactly demanded in substantial testing.My guess was to do normal testing as I mentionned and add some fault injection to test the change in case of failure and also compare dmesg outputs.I have run selftests for the net subsystem too since my last mail with no sign of regression.Any suggestions on what testing for this case should look like instead or on top of what I did?>> 'For other calculations, please compose the use of the size_mul(), >> size_add(), and size_sub() helpers' >> Which is about dynamic calculations made inside of kzalloc() and >> kmalloc(). Specifically, the quoted part is talking about calculations >> which can't be simply divided into two parameters referring to the >> number of elements and size per element and in cases where we can't >> use struct_size() too.After that it was a matter of finding code where >> that could be the problem which is the case of the changed code. >> >> For the second part, As per any patch,I make a copy of all dmesg >> warnings errors critical messages,then I compile install and boot the >> new kernel then check if there is any change or regression in dmesg. > > This is a basic boot test which isn't sufficient in this case. > >> For this particular change, since it doesn't have any selftests >> because it's in utility library which in my case cpu_rmap is used in >> the networking subsystem, I did some fault injection with a custom >> module to test if in case of overflow it fails safely reporting the >> issue in dmesg which is catched by the __alloc_frozen_pages_noprof() >> function in mm/page_alloc.c and also return a NULL for rmap instead of >> wrapping to a smaller size. > > Custom module testing doesn't test this change in a wider scope > which is necessary when you are making changes such as these > without a reproducer and a way to reproduce. How do you know > this change doesn't introduce regressions? > My custom module testing specifically tested the change in case of failure which is what the change is for in the first place.The change which deems to be simple in the documentation since we are just wrapping calculations instead of using operators,is just to safe guard calculations that are made inside of kzalloc() so that no unwanted behavior is produced i.e in case of overflow.As I mentionned above,I tested regressions by running selftests for net subsystem with it showing no regressions on top of fault injection mentionned. I would like to have more guidance as to what I could do to have more robust testing in this case.> thanks, > -- Shuah Regards, Mehdi Ben Hadj Khelifa
On 10/18/25 10:52, Mehdi Ben Hadj Khelifa wrote: > On 10/10/25 6:00 PM, Shuah Khan wrote: >> On 10/9/25 09:16, Mehdi Ben Hadj Khelifa wrote: >>> On 10/7/25 11:23 PM, Shuah Khan wrote: >>> >>>> >>>> How did you find this problem and how did you test this change? >> >> Bummer - you trimmed the code entirely from the thread. Next time >> leave it in for context for the discussion. >> > Ah, I saw in other LKMLs that some do delete the code so I thought it was okay. We'll do next time.>> For the first part of your question,After simply referring to >>> deprecated documentation[1] which states the following: >> >> Looks you forgot to add link to the deprecated documentation[1]. >> It sounds like this is a potential problem without a reproducer. >> These types of problems made to a critical piece of code require >> substantial testing. >> > > Ack, This is the doc that I was referencing: https://docs.kernel.org/process/deprecated.html > I'm not sure what is exactly demanded in substantial testing.My guess was to do normal testing as I mentionned and add some fault injection to test the change in case of failure and also compare dmesg outputs.I have run selftests for the net subsystem too since my last mail with no sign of regression.Any suggestions on what testing for this case should look like instead or on top of what I did?>> 'For other calculations, please compose the use of the size_mul(), >>> size_add(), and size_sub() helpers' >>> Which is about dynamic calculations made inside of kzalloc() and kmalloc(). Specifically, the quoted part is talking about calculations which can't be simply divided into two parameters referring to the number of elements and size per element and in cases where we can't use struct_size() too.After that it was a matter of finding code where that could be the problem which is the case of the changed code. >>> >>> For the second part, As per any patch,I make a copy of all dmesg warnings errors critical messages,then I compile install and boot the new kernel then check if there is any change or regression in dmesg. >> >> This is a basic boot test which isn't sufficient in this case. >> >>> For this particular change, since it doesn't have any selftests because it's in utility library which in my case cpu_rmap is used in the networking subsystem, I did some fault injection with a custom module to test if in case of overflow it fails safely reporting the issue in dmesg which is catched by the __alloc_frozen_pages_noprof() function in mm/page_alloc.c and also return a NULL for rmap instead of wrapping to a smaller size. Why not a write a test for this then? >> >> Custom module testing doesn't test this change in a wider scope >> which is necessary when you are making changes such as these >> without a reproducer and a way to reproduce. How do you know >> this change doesn't introduce regressions? >> > My custom module testing specifically tested the change in case of failure which is what the change is for in the first place.The change which deems to be simple in the documentation since we are just wrapping calculations instead of using operators,is just to safe guard calculations that are made inside of kzalloc() so that no unwanted behavior is produced i.e in case of overflow.As I mentionned above,I tested regressions by running selftests for net subsystem with it showing no regressions on top of fault injection mentionned. > I would like to have more guidance as to what I could do to have more robust testing in this case.> thanks, So as you say this is a potential overflow, can you explain what are the cases where you would run into this? thanks, -- Shuah
© 2016 - 2025 Red Hat, Inc.