[PATCH v3] mm/slub: Avoid list corruption when removing a slab from the full list

yuan.gao posted 1 patch 1 month, 2 weeks ago
There is a newer version of this series
mm/slab.h |  4 ++++
mm/slub.c | 11 ++++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)
[PATCH v3] mm/slub: Avoid list corruption when removing a slab from the full list
Posted by yuan.gao 1 month, 2 weeks ago
Boot with slub_debug=UFPZ.

If allocated object failed in alloc_consistency_checks, all objects of
the slab will be marked as used, and then the slab will be removed from
the partial list.

When an object belonging to the slab got freed later, the remove_full()
function is called. Because the slab is neither on the partial list nor
on the full list, it eventually lead to a list corruption.

So we need to mark and isolate the corrupted slab page, do not put it
back in circulation.

Because the debug caches avoid all the fastpaths, reusing the frozen bit
seems to be fine.

[ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100)
[ 4277.387023] ------------[ cut here ]------------
[ 4277.387880] kernel BUG at lib/list_debug.c:56!
[ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G           OE      6.6.1-1 #1
[ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs]
[ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91
[ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082
[ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000
[ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff
[ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0
[ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910
[ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0
[ 4277.404049] FS:  0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000
[ 4277.405357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0
[ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4277.410000] PKRU: 55555554
[ 4277.410645] Call Trace:
[ 4277.411234]  <TASK>
[ 4277.411777]  ? die+0x32/0x80
[ 4277.412439]  ? do_trap+0xd6/0x100
[ 4277.413150]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.414158]  ? do_error_trap+0x6a/0x90
[ 4277.414948]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.415915]  ? exc_invalid_op+0x4c/0x60
[ 4277.416710]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.417675]  ? asm_exc_invalid_op+0x16/0x20
[ 4277.418482]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.419466]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.420410]  free_to_partial_list+0x515/0x5e0
[ 4277.421242]  ? xfs_iext_remove+0x41a/0xa10 [xfs]
[ 4277.422298]  xfs_iext_remove+0x41a/0xa10 [xfs]
[ 4277.423316]  ? xfs_inodegc_worker+0xb4/0x1a0 [xfs]
[ 4277.424383]  xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs]
[ 4277.425490]  __xfs_bunmapi+0x50d/0x840 [xfs]
[ 4277.426445]  xfs_itruncate_extents_flags+0x13a/0x490 [xfs]
[ 4277.427553]  xfs_inactive_truncate+0xa3/0x120 [xfs]
[ 4277.428567]  xfs_inactive+0x22d/0x290 [xfs]
[ 4277.429500]  xfs_inodegc_worker+0xb4/0x1a0 [xfs]
[ 4277.430479]  process_one_work+0x171/0x340
[ 4277.431227]  worker_thread+0x277/0x390
[ 4277.431962]  ? __pfx_worker_thread+0x10/0x10
[ 4277.432752]  kthread+0xf0/0x120
[ 4277.433382]  ? __pfx_kthread+0x10/0x10
[ 4277.434134]  ret_from_fork+0x2d/0x50
[ 4277.434837]  ? __pfx_kthread+0x10/0x10
[ 4277.435566]  ret_from_fork_asm+0x1b/0x30
[ 4277.436280]  </TASK>

v3:
 - Reuse slab->fronzen bit as a corrupted marker.

v2:
 - Call remove_partial() and add_full() only for slab folios.
 - https://lore.kernel.org/linux-mm/20241007091850.16959-1-yuan.gao@ucloud.cn/

v1:
 - https://lore.kernel.org/linux-mm/20241006044755.79634-1-yuan.gao@ucloud.cn/

Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
Fixes: 643b113849d8 ("slub: enable tracking of full slabs")
Acked-by: Christoph Lameter <cl@linux.com>
Suggested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.h |  4 ++++
 mm/slub.c | 11 ++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/mm/slab.h b/mm/slab.h
index 6c6fe6d630ce..7681e71d9a13 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -73,6 +73,10 @@ struct slab {
 						struct {
 							unsigned inuse:16;
 							unsigned objects:15;
+							/*
+							 * Reuse frozen bit for slab with debug enabled:
+							 *	frozen == 1 means it is corrupted
+							 */
 							unsigned frozen:1;
 						};
 					};
diff --git a/mm/slub.c b/mm/slub.c
index 5b832512044e..b9265e9f11aa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1423,6 +1423,11 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
 			slab->inuse, slab->objects);
 		return 0;
 	}
+	if (slab->frozen) {
+		slab_err(s, slab, "Corrupted slab");
+		return 0;
+	}
+
 	/* Slab_pad_check fixes things up after itself */
 	slab_pad_check(s, slab);
 	return 1;
@@ -1603,6 +1608,7 @@ static noinline bool alloc_debug_processing(struct kmem_cache *s,
 		slab_fix(s, "Marking all objects used");
 		slab->inuse = slab->objects;
 		slab->freelist = NULL;
+		slab->frozen = 1; /* mark consistency-failed slab as frozen */
 	}
 	return false;
 }
@@ -2744,7 +2750,10 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
 	slab->inuse++;
 
 	if (!alloc_debug_processing(s, slab, object, orig_size)) {
-		remove_partial(n, slab);
+		if (folio_test_slab(slab_folio(slab))) {
+			remove_partial(n, slab);
+			add_full(s, n, slab);
+		}
 		return NULL;
 	}
 
-- 
2.32.0
Re: [PATCH v3] mm/slub: Avoid list corruption when removing a slab from the full list
Posted by Christoph Lameter (Ampere) 1 month, 2 weeks ago
On Fri, 11 Oct 2024, yuan.gao wrote:

> When an object belonging to the slab got freed later, the remove_full()
> function is called. Because the slab is neither on the partial list nor
> on the full list, it eventually lead to a list corruption.

We detect list poison....

> diff --git a/mm/slab.h b/mm/slab.h
> index 6c6fe6d630ce..7681e71d9a13 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -73,6 +73,10 @@ struct slab {
>  						struct {
>  							unsigned inuse:16;
>  							unsigned objects:15;
> +							/*
> +							 * Reuse frozen bit for slab with debug enabled:

"If slab debugging is enabled then the frozen bit can bereused to
 indicate that the slab was corrupted"

> index 5b832512044e..b9265e9f11aa 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1423,6 +1423,11 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
>  			slab->inuse, slab->objects);
>  		return 0;
>  	}
> +	if (slab->frozen) {
> +		slab_err(s, slab, "Corrupted slab");


"Slab folio disabled due to metadata corruption" ?


> @@ -2744,7 +2750,10 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
>  	slab->inuse++;
>
>  	if (!alloc_debug_processing(s, slab, object, orig_size)) {
> -		remove_partial(n, slab);
> +		if (folio_test_slab(slab_folio(slab))) {


Does folio_test_slab test for the frozen bit??
Re: [PATCH v3] mm/slub: Avoid list corruption when removing a slab from the full list
Posted by yuan.gao 1 month, 2 weeks ago
On 24/10/11 11:07AM, Christoph Lameter (Ampere) wrote:
> On Fri, 11 Oct 2024, yuan.gao wrote:
> 
> > When an object belonging to the slab got freed later, the remove_full()
> > function is called. Because the slab is neither on the partial list nor
> > on the full list, it eventually lead to a list corruption.
> 
> We detect list poison....
> 
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 6c6fe6d630ce..7681e71d9a13 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -73,6 +73,10 @@ struct slab {
> >  						struct {
> >  							unsigned inuse:16;
> >  							unsigned objects:15;
> > +							/*
> > +							 * Reuse frozen bit for slab with debug enabled:
> 
> "If slab debugging is enabled then the frozen bit can bereused to
>  indicate that the slab was corrupted"
> 
> > index 5b832512044e..b9265e9f11aa 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1423,6 +1423,11 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
> >  			slab->inuse, slab->objects);
> >  		return 0;
> >  	}
> > +	if (slab->frozen) {
> > +		slab_err(s, slab, "Corrupted slab");
> 
> 
> "Slab folio disabled due to metadata corruption" ?
> 
> 

Yes, that's what I meant. 
Perhaps I should change the description from "Corrupted slab" to
"Metadata corrupt"?

> > @@ -2744,7 +2750,10 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
> >  	slab->inuse++;
> >
> >  	if (!alloc_debug_processing(s, slab, object, orig_size)) {
> > -		remove_partial(n, slab);
> > +		if (folio_test_slab(slab_folio(slab))) {
> 
> 
> Does folio_test_slab test for the frozen bit??
> 

For slab folios, slab->fronzen has been set to 1.
For non-slab folios, we should not call remove_partial().
I'm not sure if I understand this correctly.

Thanks
Re: [PATCH v3] mm/slub: Avoid list corruption when removing a slab from the full list
Posted by David Rientjes 1 month, 2 weeks ago
On Sat, 12 Oct 2024, yuan.gao wrote:

> On 24/10/11 11:07AM, Christoph Lameter (Ampere) wrote:
> > On Fri, 11 Oct 2024, yuan.gao wrote:
> > 
> > > When an object belonging to the slab got freed later, the remove_full()
> > > function is called. Because the slab is neither on the partial list nor
> > > on the full list, it eventually lead to a list corruption.
> > 
> > We detect list poison....
> > 
> > > diff --git a/mm/slab.h b/mm/slab.h
> > > index 6c6fe6d630ce..7681e71d9a13 100644
> > > --- a/mm/slab.h
> > > +++ b/mm/slab.h
> > > @@ -73,6 +73,10 @@ struct slab {
> > >  						struct {
> > >  							unsigned inuse:16;
> > >  							unsigned objects:15;
> > > +							/*
> > > +							 * Reuse frozen bit for slab with debug enabled:
> > 
> > "If slab debugging is enabled then the frozen bit can bereused to
> >  indicate that the slab was corrupted"
> > 
> > > index 5b832512044e..b9265e9f11aa 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -1423,6 +1423,11 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
> > >  			slab->inuse, slab->objects);
> > >  		return 0;
> > >  	}
> > > +	if (slab->frozen) {
> > > +		slab_err(s, slab, "Corrupted slab");
> > 
> > 
> > "Slab folio disabled due to metadata corruption" ?
> > 
> > 
> 
> Yes, that's what I meant. 
> Perhaps I should change the description from "Corrupted slab" to
> "Metadata corrupt"?
> 

I think the point here is that slab page corruption is different from slab 
metadata corruption :)

The suggested phrasing, "Slab folio disabled due to metadata corruption", 
sounds good to me.

> > > @@ -2744,7 +2750,10 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
> > >  	slab->inuse++;
> > >
> > >  	if (!alloc_debug_processing(s, slab, object, orig_size)) {
> > > -		remove_partial(n, slab);
> > > +		if (folio_test_slab(slab_folio(slab))) {
> > 
> > 
> > Does folio_test_slab test for the frozen bit??
> > 
> 
> For slab folios, slab->fronzen has been set to 1.
> For non-slab folios, we should not call remove_partial().
> I'm not sure if I understand this correctly.
> 
> Thanks
>
Re: [PATCH v3] mm/slub: Avoid list corruption when removing a slab from the full list
Posted by Vlastimil Babka 1 month, 1 week ago
On 10/13/24 22:46, David Rientjes wrote:
> On Sat, 12 Oct 2024, yuan.gao wrote:
> 
>> On 24/10/11 11:07AM, Christoph Lameter (Ampere) wrote:
>> > On Fri, 11 Oct 2024, yuan.gao wrote:
>> > 
>> > > When an object belonging to the slab got freed later, the remove_full()
>> > > function is called. Because the slab is neither on the partial list nor
>> > > on the full list, it eventually lead to a list corruption.
>> > 
>> > We detect list poison....
>> > 
>> > > diff --git a/mm/slab.h b/mm/slab.h
>> > > index 6c6fe6d630ce..7681e71d9a13 100644
>> > > --- a/mm/slab.h
>> > > +++ b/mm/slab.h
>> > > @@ -73,6 +73,10 @@ struct slab {
>> > >  						struct {
>> > >  							unsigned inuse:16;
>> > >  							unsigned objects:15;
>> > > +							/*
>> > > +							 * Reuse frozen bit for slab with debug enabled:
>> > 
>> > "If slab debugging is enabled then the frozen bit can bereused to
>> >  indicate that the slab was corrupted"
>> > 
>> > > index 5b832512044e..b9265e9f11aa 100644
>> > > --- a/mm/slub.c
>> > > +++ b/mm/slub.c
>> > > @@ -1423,6 +1423,11 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
>> > >  			slab->inuse, slab->objects);
>> > >  		return 0;
>> > >  	}
>> > > +	if (slab->frozen) {
>> > > +		slab_err(s, slab, "Corrupted slab");
>> > 
>> > 
>> > "Slab folio disabled due to metadata corruption" ?
>> > 
>> > 
>> 
>> Yes, that's what I meant. 
>> Perhaps I should change the description from "Corrupted slab" to
>> "Metadata corrupt"?
>> 
> 
> I think the point here is that slab page corruption is different from slab 
> metadata corruption :)
> 
> The suggested phrasing, "Slab folio disabled due to metadata corruption", 
> sounds good to me.

What about:

"Slab disabled due to previous consistency check failure" ?

> 
>> > > @@ -2744,7 +2750,10 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
>> > >  	slab->inuse++;
>> > >
>> > >  	if (!alloc_debug_processing(s, slab, object, orig_size)) {
>> > > -		remove_partial(n, slab);
>> > > +		if (folio_test_slab(slab_folio(slab))) {

Your patch adds add_full() here as in the previous versions. I wouldn't do
it anymore. Thanks to the frozen bit check in check_slab(), no further list
manipulation should happen that would trigger the list poison being detected.

Adding to full list would rather mean each sysfs-triggered validation will
reach the slab and output the new slab_err() message, which is not useful.
We want the corrupted slab to stay away from everything else, and only be
informed of further object freeing attempts.

>> > 
>> > 
>> > Does folio_test_slab test for the frozen bit??
>> > 
>> 
>> For slab folios, slab->fronzen has been set to 1.
>> For non-slab folios, we should not call remove_partial().
>> I'm not sure if I understand this correctly.
>> 
>> Thanks
>>
Re: [PATCH v3] mm/slub: Avoid list corruption when removing a slab from the full list
Posted by Christoph Lameter (Ampere) 1 month, 1 week ago
On Mon, 14 Oct 2024, Vlastimil Babka wrote:

> What about:
>
> "Slab disabled due to previous consistency check failure" ?

I think this implies more than we can actually check. We can only check
the metadata generated by SLUB. The consistency check for the object
itself does not exist and would have to be done by the subsystem.
Re: [PATCH v3] mm/slub: Avoid list corruption when removing a slab from the full list
Posted by Vlastimil Babka 1 month, 1 week ago
On 10/14/24 18:47, Christoph Lameter (Ampere) wrote:
> On Mon, 14 Oct 2024, Vlastimil Babka wrote:
> 
>> What about:
>>
>> "Slab disabled due to previous consistency check failure" ?
> 
> I think this implies more than we can actually check. We can only check
> the metadata generated by SLUB. The consistency check for the object
> itself does not exist and would have to be done by the subsystem.

"Slab disabled since SLUB metadata consistency check failure" ?
Re: [PATCH v3] mm/slub: Avoid list corruption when removing a slab from the full list
Posted by Christoph Lameter (Ampere) 1 month, 1 week ago
On Tue, 15 Oct 2024, Vlastimil Babka wrote:

> On 10/14/24 18:47, Christoph Lameter (Ampere) wrote:
> > On Mon, 14 Oct 2024, Vlastimil Babka wrote:
> >
> >> What about:
> >>
> >> "Slab disabled due to previous consistency check failure" ?
> >
> > I think this implies more than we can actually check. We can only check
> > the metadata generated by SLUB. The consistency check for the object
> > itself does not exist and would have to be done by the subsystem.
>
> "Slab disabled since SLUB metadata consistency check failure" ?

   "failed"

Sounds good.