[PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation

Breno Leitao posted 1 patch 10 months ago
There is a newer version of this series
kernel/sched/ext.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation
Posted by Breno Leitao 10 months ago
Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
can require large contiguous memory (up to order=9) depending on the
implementation. This change prevents allocation failures by allowing the
system to fall back to vmalloc when contiguous memory allocation fails.

Since this buffer is only used for debugging purposes, physical memory
contiguity is not required, making vmalloc a suitable alternative.

Cc: stable@vger.kernel.org
Fixes: 07814a9439a3b0 ("sched_ext: Print debug dump after an error exit")
Suggested-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Andrea Righi <arighi@nvidia.com>
---
Changes in v2:
- Use kvfree() on the free path as well.
- Link to v1: https://lore.kernel.org/r/20250407-scx-v1-1-774ba74a2c17@debian.org
---
 kernel/sched/ext.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 66bcd40a28ca1..db9af6a3c04fd 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4623,7 +4623,7 @@ static void scx_ops_bypass(bool bypass)
 
 static void free_exit_info(struct scx_exit_info *ei)
 {
-	kfree(ei->dump);
+	kvfree(ei->dump);
 	kfree(ei->msg);
 	kfree(ei->bt);
 	kfree(ei);
@@ -4639,7 +4639,7 @@ static struct scx_exit_info *alloc_exit_info(size_t exit_dump_len)
 
 	ei->bt = kcalloc(SCX_EXIT_BT_LEN, sizeof(ei->bt[0]), GFP_KERNEL);
 	ei->msg = kzalloc(SCX_EXIT_MSG_LEN, GFP_KERNEL);
-	ei->dump = kzalloc(exit_dump_len, GFP_KERNEL);
+	ei->dump = kvzalloc(exit_dump_len, GFP_KERNEL);
 
 	if (!ei->bt || !ei->msg || !ei->dump) {
 		free_exit_info(ei);

---
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
change-id: 20250407-scx-11dbf94803c3

Best regards,
-- 
Breno Leitao <leitao@debian.org>
Re: [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation
Posted by Andrea Righi 10 months ago
Hi Breno,

I already acked even the buggy version, so this one looks good. :)

On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
> Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
> can require large contiguous memory (up to order=9) depending on the

BTW, from where this order=9 is coming from? exit_dump_len is 32K by
default, but a BPF scheduler can arbitrarily set it to any value via
ops->exit_dump_len, so it could be even bigger than an order 9 allocation.

Thanks,
-Andrea

> implementation. This change prevents allocation failures by allowing the
> system to fall back to vmalloc when contiguous memory allocation fails.
> 
> Since this buffer is only used for debugging purposes, physical memory
> contiguity is not required, making vmalloc a suitable alternative.
> 
> Cc: stable@vger.kernel.org
> Fixes: 07814a9439a3b0 ("sched_ext: Print debug dump after an error exit")
> Suggested-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Acked-by: Andrea Righi <arighi@nvidia.com>
> ---
> Changes in v2:
> - Use kvfree() on the free path as well.
> - Link to v1: https://lore.kernel.org/r/20250407-scx-v1-1-774ba74a2c17@debian.org
> ---
>  kernel/sched/ext.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 66bcd40a28ca1..db9af6a3c04fd 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4623,7 +4623,7 @@ static void scx_ops_bypass(bool bypass)
>  
>  static void free_exit_info(struct scx_exit_info *ei)
>  {
> -	kfree(ei->dump);
> +	kvfree(ei->dump);
>  	kfree(ei->msg);
>  	kfree(ei->bt);
>  	kfree(ei);
> @@ -4639,7 +4639,7 @@ static struct scx_exit_info *alloc_exit_info(size_t exit_dump_len)
>  
>  	ei->bt = kcalloc(SCX_EXIT_BT_LEN, sizeof(ei->bt[0]), GFP_KERNEL);
>  	ei->msg = kzalloc(SCX_EXIT_MSG_LEN, GFP_KERNEL);
> -	ei->dump = kzalloc(exit_dump_len, GFP_KERNEL);
> +	ei->dump = kvzalloc(exit_dump_len, GFP_KERNEL);
>  
>  	if (!ei->bt || !ei->msg || !ei->dump) {
>  		free_exit_info(ei);
> 
> ---
> base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
> change-id: 20250407-scx-11dbf94803c3
> 
> Best regards,
> -- 
> Breno Leitao <leitao@debian.org>
>
Re: [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation
Posted by Breno Leitao 10 months ago
Hello Andrea,

On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote:
> Hi Breno,
> 
> I already acked even the buggy version, so this one looks good. :)
> 
> On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
> > Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
> > can require large contiguous memory (up to order=9) depending on the
> 
> BTW, from where this order=9 is coming from? exit_dump_len is 32K by
> default, but a BPF scheduler can arbitrarily set it to any value via
> ops->exit_dump_len, so it could be even bigger than an order 9 allocation.

You are absolutely correct, this allocation could be of any size.

I've got this problem because I was monitoring the Meta fleet, and saw
a bunch of allocation failures and decided to investigate. In this case
specifically, the users were using order=9 (512 pages), but, again, this
could be even bigger.

Thanks for the review,
--breno
Re: [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation
Posted by Andrea Righi 10 months ago
On Tue, Apr 08, 2025 at 05:17:16AM -0700, Breno Leitao wrote:
> Hello Andrea,
> 
> On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote:
> > Hi Breno,
> > 
> > I already acked even the buggy version, so this one looks good. :)
> > 
> > On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
> > > Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
> > > can require large contiguous memory (up to order=9) depending on the
> > 
> > BTW, from where this order=9 is coming from? exit_dump_len is 32K by
> > default, but a BPF scheduler can arbitrarily set it to any value via
> > ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
> 
> You are absolutely correct, this allocation could be of any size.
> 
> I've got this problem because I was monitoring the Meta fleet, and saw
> a bunch of allocation failures and decided to investigate. In this case
> specifically, the users were using order=9 (512 pages), but, again, this
> could be even bigger.

I see, makes sense. Maybe we can rephrase this part to not mention the
order=9 allocation and avoid potential confusion.

Thanks,
-Andrea
Re: [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation
Posted by Breno Leitao 10 months ago
On Tue, Apr 08, 2025 at 03:12:43PM +0200, Andrea Righi wrote:
> On Tue, Apr 08, 2025 at 05:17:16AM -0700, Breno Leitao wrote:
> > Hello Andrea,
> > 
> > On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote:
> > > Hi Breno,
> > > 
> > > I already acked even the buggy version, so this one looks good. :)
> > > 
> > > On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
> > > > Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
> > > > can require large contiguous memory (up to order=9) depending on the
> > > 
> > > BTW, from where this order=9 is coming from? exit_dump_len is 32K by
> > > default, but a BPF scheduler can arbitrarily set it to any value via
> > > ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
> > 
> > You are absolutely correct, this allocation could be of any size.
> > 
> > I've got this problem because I was monitoring the Meta fleet, and saw
> > a bunch of allocation failures and decided to investigate. In this case
> > specifically, the users were using order=9 (512 pages), but, again, this
> > could be even bigger.
> 
> I see, makes sense. Maybe we can rephrase this part to not mention the
> order=9 allocation and avoid potential confusion.

Sure! I will send a v3 later today, then.