kernel/sched/ext.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
can require large contiguous memory (up to order=9) depending on the
implementation. This change prevents allocation failures by allowing the
system to fall back to vmalloc when contiguous memory allocation fails.
Since this buffer is only used for debugging purposes, physical memory
contiguity is not required, making vmalloc a suitable alternative.
Cc: stable@vger.kernel.org
Fixes: 07814a9439a3b0 ("sched_ext: Print debug dump after an error exit")
Suggested-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Andrea Righi <arighi@nvidia.com>
---
Changes in v2:
- Use kvfree() on the free path as well.
- Link to v1: https://lore.kernel.org/r/20250407-scx-v1-1-774ba74a2c17@debian.org
---
kernel/sched/ext.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 66bcd40a28ca1..db9af6a3c04fd 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4623,7 +4623,7 @@ static void scx_ops_bypass(bool bypass)
static void free_exit_info(struct scx_exit_info *ei)
{
- kfree(ei->dump);
+ kvfree(ei->dump);
kfree(ei->msg);
kfree(ei->bt);
kfree(ei);
@@ -4639,7 +4639,7 @@ static struct scx_exit_info *alloc_exit_info(size_t exit_dump_len)
ei->bt = kcalloc(SCX_EXIT_BT_LEN, sizeof(ei->bt[0]), GFP_KERNEL);
ei->msg = kzalloc(SCX_EXIT_MSG_LEN, GFP_KERNEL);
- ei->dump = kzalloc(exit_dump_len, GFP_KERNEL);
+ ei->dump = kvzalloc(exit_dump_len, GFP_KERNEL);
if (!ei->bt || !ei->msg || !ei->dump) {
free_exit_info(ei);
---
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
change-id: 20250407-scx-11dbf94803c3
Best regards,
--
Breno Leitao <leitao@debian.org>
Hi Breno,
I already acked even the buggy version, so this one looks good. :)
On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
> Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
> can require large contiguous memory (up to order=9) depending on the
BTW, from where this order=9 is coming from? exit_dump_len is 32K by
default, but a BPF scheduler can arbitrarily set it to any value via
ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
Thanks,
-Andrea
> implementation. This change prevents allocation failures by allowing the
> system to fall back to vmalloc when contiguous memory allocation fails.
>
> Since this buffer is only used for debugging purposes, physical memory
> contiguity is not required, making vmalloc a suitable alternative.
>
> Cc: stable@vger.kernel.org
> Fixes: 07814a9439a3b0 ("sched_ext: Print debug dump after an error exit")
> Suggested-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Acked-by: Andrea Righi <arighi@nvidia.com>
> ---
> Changes in v2:
> - Use kvfree() on the free path as well.
> - Link to v1: https://lore.kernel.org/r/20250407-scx-v1-1-774ba74a2c17@debian.org
> ---
> kernel/sched/ext.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 66bcd40a28ca1..db9af6a3c04fd 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4623,7 +4623,7 @@ static void scx_ops_bypass(bool bypass)
>
> static void free_exit_info(struct scx_exit_info *ei)
> {
> - kfree(ei->dump);
> + kvfree(ei->dump);
> kfree(ei->msg);
> kfree(ei->bt);
> kfree(ei);
> @@ -4639,7 +4639,7 @@ static struct scx_exit_info *alloc_exit_info(size_t exit_dump_len)
>
> ei->bt = kcalloc(SCX_EXIT_BT_LEN, sizeof(ei->bt[0]), GFP_KERNEL);
> ei->msg = kzalloc(SCX_EXIT_MSG_LEN, GFP_KERNEL);
> - ei->dump = kzalloc(exit_dump_len, GFP_KERNEL);
> + ei->dump = kvzalloc(exit_dump_len, GFP_KERNEL);
>
> if (!ei->bt || !ei->msg || !ei->dump) {
> free_exit_info(ei);
>
> ---
> base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
> change-id: 20250407-scx-11dbf94803c3
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
Hello Andrea, On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote: > Hi Breno, > > I already acked even the buggy version, so this one looks good. :) > > On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote: > > Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which > > can require large contiguous memory (up to order=9) depending on the > > BTW, from where this order=9 is coming from? exit_dump_len is 32K by > default, but a BPF scheduler can arbitrarily set it to any value via > ops->exit_dump_len, so it could be even bigger than an order 9 allocation. You are absolutely correct, this allocation could be of any size. I've got this problem because I was monitoring the Meta fleet, and saw a bunch of allocation failures and decided to investigate. In this case specifically, the users were using order=9 (512 pages), but, again, this could be even bigger. Thanks for the review, --breno
On Tue, Apr 08, 2025 at 05:17:16AM -0700, Breno Leitao wrote: > Hello Andrea, > > On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote: > > Hi Breno, > > > > I already acked even the buggy version, so this one looks good. :) > > > > On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote: > > > Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which > > > can require large contiguous memory (up to order=9) depending on the > > > > BTW, from where this order=9 is coming from? exit_dump_len is 32K by > > default, but a BPF scheduler can arbitrarily set it to any value via > > ops->exit_dump_len, so it could be even bigger than an order 9 allocation. > > You are absolutely correct, this allocation could be of any size. > > I've got this problem because I was monitoring the Meta fleet, and saw > a bunch of allocation failures and decided to investigate. In this case > specifically, the users were using order=9 (512 pages), but, again, this > could be even bigger. I see, makes sense. Maybe we can rephrase this part to not mention the order=9 allocation and avoid potential confusion. Thanks, -Andrea
On Tue, Apr 08, 2025 at 03:12:43PM +0200, Andrea Righi wrote: > On Tue, Apr 08, 2025 at 05:17:16AM -0700, Breno Leitao wrote: > > Hello Andrea, > > > > On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote: > > > Hi Breno, > > > > > > I already acked even the buggy version, so this one looks good. :) > > > > > > On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote: > > > > Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which > > > > can require large contiguous memory (up to order=9) depending on the > > > > > > BTW, from where this order=9 is coming from? exit_dump_len is 32K by > > > default, but a BPF scheduler can arbitrarily set it to any value via > > > ops->exit_dump_len, so it could be even bigger than an order 9 allocation. > > > > You are absolutely correct, this allocation could be of any size. > > > > I've got this problem because I was monitoring the Meta fleet, and saw > > a bunch of allocation failures and decided to investigate. In this case > > specifically, the users were using order=9 (512 pages), but, again, this > > could be even bigger. > > I see, makes sense. Maybe we can rephrase this part to not mention the > order=9 allocation and avoid potential confusion. Sure! I will send a v3 later today, then.
© 2016 - 2026 Red Hat, Inc.