[PATCH v4 0/2] mm: slub: Enhanced debugging in slub error

Hyesoo Yu posted 2 patches 9 months, 3 weeks ago
mm/slab_common.c |  3 ---
mm/slub.c        | 63 +++++++++++++++++++++++++-----------------------
2 files changed, 33 insertions(+), 33 deletions(-)
[PATCH v4 0/2] mm: slub: Enhanced debugging in slub error
Posted by Hyesoo Yu 9 months, 3 weeks ago
Dear Maintainer,

The purpose is to improve the debugging capabilities of the slub allocator
when a error occurs. The following improvements have been made:

 - Added WARN() calls at specific locations (slab_err, object_err) to detect
errors effectively and to generate a crash dump if panic_on_warn is enabled.

 - Additionally, the error printing location in check_object has been adjusted to
display the broken data before the restoration process. This improvement
allows for a better understanding of how the data was corrupted.

This series combines two patches that were discussed seperately in the links below.
https://lore.kernel.org/linux-mm/20250120082908.4162780-1-hyesoo.yu@samsung.com/
https://lore.kernel.org/linux-mm/20250120083023.4162932-1-hyesoo.yu@samsung.com/

Thanks you.

version 2 changes
 - Replaced direct calling of BUG_ON with the use of WARN() to trigger a panic.
 - Modified the code to print the broken data only once before the restore.

version 3 changes
 - Moved WARN() from slab_fix to slab_err and object to call WARN on all error
 reporting paths.
 - Changed the parameter type of check_bytes_and_report.

version 4 changes
 - Modified the print format to include specific error names.
 - Removed the redundant warning by removing WARN() in kmem_cache_destroy

Hyesoo Yu (2):
  mm: slub: Print the broken data before restoring slub.
  mm: slub: call WARN() when the slab detect an error

 mm/slab_common.c |  3 ---
 mm/slub.c        | 63 +++++++++++++++++++++++++-----------------------
 2 files changed, 33 insertions(+), 33 deletions(-)

-- 
2.28.0
Re: [PATCH v4 0/2] mm: slub: Enhanced debugging in slub error
Posted by Vlastimil Babka 9 months, 3 weeks ago
On 2/26/25 09:11, Hyesoo Yu wrote:
> Dear Maintainer,
> 
> The purpose is to improve the debugging capabilities of the slub allocator
> when a error occurs. The following improvements have been made:
> 
>  - Added WARN() calls at specific locations (slab_err, object_err) to detect
> errors effectively and to generate a crash dump if panic_on_warn is enabled.
> 
>  - Additionally, the error printing location in check_object has been adjusted to
> display the broken data before the restoration process. This improvement
> allows for a better understanding of how the data was corrupted.
> 
> This series combines two patches that were discussed seperately in the links below.
> https://lore.kernel.org/linux-mm/20250120082908.4162780-1-hyesoo.yu@samsung.com/
> https://lore.kernel.org/linux-mm/20250120083023.4162932-1-hyesoo.yu@samsung.com/
> 
> Thanks you.

Thanks. On top of things already mentioned, I added some kunit suppressions
in patch 2. Please check the result:

https://web.git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-6.15/fixes-cleanups
Re: [PATCH v4 0/2] mm: slub: Enhanced debugging in slub error
Posted by Vlastimil Babka 9 months, 3 weeks ago
On 2/27/25 17:12, Vlastimil Babka wrote:
> On 2/26/25 09:11, Hyesoo Yu wrote:
>> Dear Maintainer,
>> 
>> The purpose is to improve the debugging capabilities of the slub allocator
>> when a error occurs. The following improvements have been made:
>> 
>>  - Added WARN() calls at specific locations (slab_err, object_err) to detect
>> errors effectively and to generate a crash dump if panic_on_warn is enabled.
>> 
>>  - Additionally, the error printing location in check_object has been adjusted to
>> display the broken data before the restoration process. This improvement
>> allows for a better understanding of how the data was corrupted.
>> 
>> This series combines two patches that were discussed seperately in the links below.
>> https://lore.kernel.org/linux-mm/20250120082908.4162780-1-hyesoo.yu@samsung.com/
>> https://lore.kernel.org/linux-mm/20250120083023.4162932-1-hyesoo.yu@samsung.com/
>> 
>> Thanks you.
> 
> Thanks. On top of things already mentioned, I added some kunit suppressions
> in patch 2. Please check the result:
> 
> https://web.git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-6.15/fixes-cleanups

What do you think about the following patch on top?

---8<---
From c38dadde6293cacdb91f95afc3615c22dec5830a Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 27 Feb 2025 16:05:46 +0100
Subject: [PATCH] mm, slab: cleanup slab_bug() parameters

slab_err() has variadic printf arguments but instead of passing them to
slab_bug() it does vsnprintf() to a buffer and passes %s, buf.

To allow passing them directly, turn slab_bug() to __slab_bug() with a
va_list parameter, and slab_bug() a wrapper with fmt, ... parameters.
Then slab_err() can call __slab_bug() without the intermediate buffer.

Also constify fmt everywhere, which also simplifies object_err()'s
call to slab_bug().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index a9a02b4ae4d6..d94af020b305 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1017,12 +1017,12 @@ void skip_orig_size_check(struct kmem_cache *s, const void *object)
 	set_orig_size(s, (void *)object, s->object_size);
 }
 
-static void slab_bug(struct kmem_cache *s, char *fmt, ...)
+static void __slab_bug(struct kmem_cache *s, const char *fmt, va_list argsp)
 {
 	struct va_format vaf;
 	va_list args;
 
-	va_start(args, fmt);
+	va_copy(args, argsp);
 	vaf.fmt = fmt;
 	vaf.va = &args;
 	pr_err("=============================================================================\n");
@@ -1031,8 +1031,17 @@ static void slab_bug(struct kmem_cache *s, char *fmt, ...)
 	va_end(args);
 }
 
+static void slab_bug(struct kmem_cache *s, const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	__slab_bug(s, fmt, args);
+	va_end(args);
+}
+
 __printf(2, 3)
-static void slab_fix(struct kmem_cache *s, char *fmt, ...)
+static void slab_fix(struct kmem_cache *s, const char *fmt, ...)
 {
 	struct va_format vaf;
 	va_list args;
@@ -1088,12 +1097,12 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
 }
 
 static void object_err(struct kmem_cache *s, struct slab *slab,
-			u8 *object, char *reason)
+			u8 *object, const char *reason)
 {
 	if (slab_add_kunit_errors())
 		return;
 
-	slab_bug(s, "%s", reason);
+	slab_bug(s, reason);
 	print_trailer(s, slab, object);
 	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
 
@@ -1129,15 +1138,14 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab,
 			const char *fmt, ...)
 {
 	va_list args;
-	char buf[100];
 
 	if (slab_add_kunit_errors())
 		return;
 
 	va_start(args, fmt);
-	vsnprintf(buf, sizeof(buf), fmt, args);
+	__slab_bug(s, fmt, args);
 	va_end(args);
-	slab_bug(s, "%s", buf);
+
 	__slab_err(slab);
 }
 
@@ -1175,7 +1183,7 @@ static void init_object(struct kmem_cache *s, void *object, u8 val)
 					  s->inuse - poison_size);
 }
 
-static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
+static void restore_bytes(struct kmem_cache *s, const char *message, u8 data,
 						void *from, void *to)
 {
 	slab_fix(s, "Restoring %s 0x%p-0x%p=0x%x", message, from, to - 1, data);
@@ -1190,7 +1198,7 @@ static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
 
 static pad_check_attributes int
 check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
-		       u8 *object, char *what, u8 *start, unsigned int value,
+		       u8 *object, const char *what, u8 *start, unsigned int value,
 		       unsigned int bytes, bool slab_obj_print)
 {
 	u8 *fault;
-- 
2.48.1
Re: [PATCH v4 0/2] mm: slub: Enhanced debugging in slub error
Posted by Harry Yoo 9 months, 3 weeks ago
On Thu, Feb 27, 2025 at 05:26:26PM +0100, Vlastimil Babka wrote:
> On 2/27/25 17:12, Vlastimil Babka wrote:
> > On 2/26/25 09:11, Hyesoo Yu wrote:
> >> Dear Maintainer,
> >> 
> >> The purpose is to improve the debugging capabilities of the slub allocator
> >> when a error occurs. The following improvements have been made:
> >> 
> >>  - Added WARN() calls at specific locations (slab_err, object_err) to detect
> >> errors effectively and to generate a crash dump if panic_on_warn is enabled.
> >> 
> >>  - Additionally, the error printing location in check_object has been adjusted to
> >> display the broken data before the restoration process. This improvement
> >> allows for a better understanding of how the data was corrupted.
> >> 
> >> This series combines two patches that were discussed seperately in the links below.
> >> https://lore.kernel.org/linux-mm/20250120082908.4162780-1-hyesoo.yu@samsung.com/
> >> https://lore.kernel.org/linux-mm/20250120083023.4162932-1-hyesoo.yu@samsung.com/
> >> 
> >> Thanks you.
> > 
> > Thanks. On top of things already mentioned, I added some kunit suppressions
> > in patch 2. Please check the result:
> > 
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-6.15/fixes-cleanups
> 
> What do you think about the following patch on top?
> 
> ---8<---
> From c38dadde6293cacdb91f95afc3615c22dec5830a Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Thu, 27 Feb 2025 16:05:46 +0100
> Subject: [PATCH] mm, slab: cleanup slab_bug() parameters
> 
> slab_err() has variadic printf arguments but instead of passing them to
> slab_bug() it does vsnprintf() to a buffer and passes %s, buf.
> 
> To allow passing them directly, turn slab_bug() to __slab_bug() with a
> va_list parameter, and slab_bug() a wrapper with fmt, ... parameters.
> Then slab_err() can call __slab_bug() without the intermediate buffer.
> 
> Also constify fmt everywhere, which also simplifies object_err()'s
> call to slab_bug().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---

Looks good to me.

FWIW,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry

>  mm/slub.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index a9a02b4ae4d6..d94af020b305 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1017,12 +1017,12 @@ void skip_orig_size_check(struct kmem_cache *s, const void *object)
>  	set_orig_size(s, (void *)object, s->object_size);
>  }
>  
> -static void slab_bug(struct kmem_cache *s, char *fmt, ...)
> +static void __slab_bug(struct kmem_cache *s, const char *fmt, va_list argsp)
>  {
>  	struct va_format vaf;
>  	va_list args;
>  
> -	va_start(args, fmt);
> +	va_copy(args, argsp);
>  	vaf.fmt = fmt;
>  	vaf.va = &args;
>  	pr_err("=============================================================================\n");
> @@ -1031,8 +1031,17 @@ static void slab_bug(struct kmem_cache *s, char *fmt, ...)
>  	va_end(args);
>  }
>  
> +static void slab_bug(struct kmem_cache *s, const char *fmt, ...)
> +{
> +	va_list args;
> +
> +	va_start(args, fmt);
> +	__slab_bug(s, fmt, args);
> +	va_end(args);
> +}
> +
>  __printf(2, 3)
> -static void slab_fix(struct kmem_cache *s, char *fmt, ...)
> +static void slab_fix(struct kmem_cache *s, const char *fmt, ...)
>  {
>  	struct va_format vaf;
>  	va_list args;
> @@ -1088,12 +1097,12 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
>  }
>  
>  static void object_err(struct kmem_cache *s, struct slab *slab,
> -			u8 *object, char *reason)
> +			u8 *object, const char *reason)
>  {
>  	if (slab_add_kunit_errors())
>  		return;
>  
> -	slab_bug(s, "%s", reason);
> +	slab_bug(s, reason);
>  	print_trailer(s, slab, object);
>  	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>  
> @@ -1129,15 +1138,14 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab,
>  			const char *fmt, ...)
>  {
>  	va_list args;
> -	char buf[100];
>  
>  	if (slab_add_kunit_errors())
>  		return;
>  
>  	va_start(args, fmt);
> -	vsnprintf(buf, sizeof(buf), fmt, args);
> +	__slab_bug(s, fmt, args);
>  	va_end(args);
> -	slab_bug(s, "%s", buf);
> +
>  	__slab_err(slab);
>  }
>  
> @@ -1175,7 +1183,7 @@ static void init_object(struct kmem_cache *s, void *object, u8 val)
>  					  s->inuse - poison_size);
>  }
>  
> -static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
> +static void restore_bytes(struct kmem_cache *s, const char *message, u8 data,
>  						void *from, void *to)
>  {
>  	slab_fix(s, "Restoring %s 0x%p-0x%p=0x%x", message, from, to - 1, data);
> @@ -1190,7 +1198,7 @@ static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
>  
>  static pad_check_attributes int
>  check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
> -		       u8 *object, char *what, u8 *start, unsigned int value,
> +		       u8 *object, const char *what, u8 *start, unsigned int value,
>  		       unsigned int bytes, bool slab_obj_print)
>  {
>  	u8 *fault;
> -- 
> 2.48.1
> 
>
Re: [PATCH v4 0/2] mm: slub: Enhanced debugging in slub error
Posted by Vlastimil Babka 9 months, 3 weeks ago
On 2/28/25 13:47, Harry Yoo wrote:
> On Thu, Feb 27, 2025 at 05:26:26PM +0100, Vlastimil Babka wrote:
>> ---8<---
>> From c38dadde6293cacdb91f95afc3615c22dec5830a Mon Sep 17 00:00:00 2001
>> From: Vlastimil Babka <vbabka@suse.cz>
>> Date: Thu, 27 Feb 2025 16:05:46 +0100
>> Subject: [PATCH] mm, slab: cleanup slab_bug() parameters
>> 
>> slab_err() has variadic printf arguments but instead of passing them to
>> slab_bug() it does vsnprintf() to a buffer and passes %s, buf.
>> 
>> To allow passing them directly, turn slab_bug() to __slab_bug() with a
>> va_list parameter, and slab_bug() a wrapper with fmt, ... parameters.
>> Then slab_err() can call __slab_bug() without the intermediate buffer.
>> 
>> Also constify fmt everywhere, which also simplifies object_err()'s
>> call to slab_bug().
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
> 
> Looks good to me.
> 
> FWIW,
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

Thanks, adding to slab/for-next
Re: [PATCH v4 0/2] mm: slub: Enhanced debugging in slub error
Posted by Hyesoo Yu 9 months, 2 weeks ago
On Fri, Feb 28, 2025 at 05:02:00PM +0100, Vlastimil Babka wrote:
> On 2/28/25 13:47, Harry Yoo wrote:
> > On Thu, Feb 27, 2025 at 05:26:26PM +0100, Vlastimil Babka wrote:
> >> ---8<---
> >> From c38dadde6293cacdb91f95afc3615c22dec5830a Mon Sep 17 00:00:00 2001
> >> From: Vlastimil Babka <vbabka@suse.cz>
> >> Date: Thu, 27 Feb 2025 16:05:46 +0100
> >> Subject: [PATCH] mm, slab: cleanup slab_bug() parameters
> >> 
> >> slab_err() has variadic printf arguments but instead of passing them to
> >> slab_bug() it does vsnprintf() to a buffer and passes %s, buf.
> >> 
> >> To allow passing them directly, turn slab_bug() to __slab_bug() with a
> >> va_list parameter, and slab_bug() a wrapper with fmt, ... parameters.
> >> Then slab_err() can call __slab_bug() without the intermediate buffer.
> >> 
> >> Also constify fmt everywhere, which also simplifies object_err()'s
> >> call to slab_bug().
> >> 
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> > 
> > Looks good to me.
> > 
> > FWIW,
> > Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> 
> Thanks, adding to slab/for-next
> 
> 

Looks good to me.
Thanks! 
Re: [PATCH v4 0/2] mm: slub: Enhanced debugging in slub error
Posted by Harry Yoo 9 months, 3 weeks ago
On Wed, Feb 26, 2025 at 05:11:59PM +0900, Hyesoo Yu wrote:
> Dear Maintainer,
> 
> The purpose is to improve the debugging capabilities of the slub allocator
> when a error occurs. The following improvements have been made:
> 
>  - Added WARN() calls at specific locations (slab_err, object_err) to detect
> errors effectively and to generate a crash dump if panic_on_warn is enabled.
> 
>  - Additionally, the error printing location in check_object has been adjusted to
> display the broken data before the restoration process. This improvement
> allows for a better understanding of how the data was corrupted.
> 
> This series combines two patches that were discussed seperately in the links below.
> https://urldefense.com/v3/__https://lore.kernel.org/linux-mm/20250120082908.4162780-1-hyesoo.yu@samsung.com/__;!!ACWV5N9M2RV99hQ!JpvsczvJJcu4xw6xseDcLQJyiNXgZmwubb5cXEfORBj3VslI2ZTgmipoW7pdQ6qTldrr0mnk2l99xw3nio0$ 
> https://urldefense.com/v3/__https://lore.kernel.org/linux-mm/20250120083023.4162932-1-hyesoo.yu@samsung.com/__;!!ACWV5N9M2RV99hQ!JpvsczvJJcu4xw6xseDcLQJyiNXgZmwubb5cXEfORBj3VslI2ZTgmipoW7pdQ6qTldrr0mnk2l99Cdp4khE$ 

IMHO it will be helpful if the cover letter includes error reporting output 
before and after this patch series.
-- 
Cheers,
Harry

> Thanks you.
> 
> version 2 changes
>  - Replaced direct calling of BUG_ON with the use of WARN() to trigger a panic.
>  - Modified the code to print the broken data only once before the restore.
> 
> version 3 changes
>  - Moved WARN() from slab_fix to slab_err and object to call WARN on all error
>  reporting paths.
>  - Changed the parameter type of check_bytes_and_report.
> 
> version 4 changes
>  - Modified the print format to include specific error names.
>  - Removed the redundant warning by removing WARN() in kmem_cache_destroy
> 
> Hyesoo Yu (2):
>   mm: slub: Print the broken data before restoring slub.
>   mm: slub: call WARN() when the slab detect an error
> 
>  mm/slab_common.c |  3 ---
>  mm/slub.c        | 63 +++++++++++++++++++++++++-----------------------
>  2 files changed, 33 insertions(+), 33 deletions(-)
> 
> -- 
> 2.28.0