[v2] mm/slub: Avoid list corruption when removing a slab from the full list

[PATCH v2] mm/slub: Avoid list corruption when removing a slab from the full list

Posted by yuan.gao 1 year, 4 months ago

boot with slub_debug=UFPZ.

If allocated object failed in alloc_consistency_checks, all objects of
the slab will be marked as used, and then the slab will be removed from
the partial list.

When an object belonging to the slab got freed later, the remove_full()
function is called. Because the slab is neither on the partial list nor
on the full list, it eventually lead to a list corruption.

So we need to add the slab to full list in this case.

[ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100)
[ 4277.387023] ------------[ cut here ]------------
[ 4277.387880] kernel BUG at lib/list_debug.c:56!
[ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G           OE      6.6.1-1 #1
[ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs]
[ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91
[ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082
[ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000
[ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff
[ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0
[ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910
[ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0
[ 4277.404049] FS:  0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000
[ 4277.405357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0
[ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4277.410000] PKRU: 55555554
[ 4277.410645] Call Trace:
[ 4277.411234]  <TASK>
[ 4277.411777]  ? die+0x32/0x80
[ 4277.412439]  ? do_trap+0xd6/0x100
[ 4277.413150]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.414158]  ? do_error_trap+0x6a/0x90
[ 4277.414948]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.415915]  ? exc_invalid_op+0x4c/0x60
[ 4277.416710]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.417675]  ? asm_exc_invalid_op+0x16/0x20
[ 4277.418482]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.419466]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.420410]  free_to_partial_list+0x515/0x5e0
[ 4277.421242]  ? xfs_iext_remove+0x41a/0xa10 [xfs]
[ 4277.422298]  xfs_iext_remove+0x41a/0xa10 [xfs]
[ 4277.423316]  ? xfs_inodegc_worker+0xb4/0x1a0 [xfs]
[ 4277.424383]  xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs]
[ 4277.425490]  __xfs_bunmapi+0x50d/0x840 [xfs]
[ 4277.426445]  xfs_itruncate_extents_flags+0x13a/0x490 [xfs]
[ 4277.427553]  xfs_inactive_truncate+0xa3/0x120 [xfs]
[ 4277.428567]  xfs_inactive+0x22d/0x290 [xfs]
[ 4277.429500]  xfs_inodegc_worker+0xb4/0x1a0 [xfs]
[ 4277.430479]  process_one_work+0x171/0x340
[ 4277.431227]  worker_thread+0x277/0x390
[ 4277.431962]  ? __pfx_worker_thread+0x10/0x10
[ 4277.432752]  kthread+0xf0/0x120
[ 4277.433382]  ? __pfx_kthread+0x10/0x10
[ 4277.434134]  ret_from_fork+0x2d/0x50
[ 4277.434837]  ? __pfx_kthread+0x10/0x10
[ 4277.435566]  ret_from_fork_asm+0x1b/0x30
[ 4277.436280]  </TASK>

v2:
* Call remove_partial() and add_full() only for slab folios.

v1:
https://lore.kernel.org/linux-mm/20241006044755.79634-1-yuan.gao@ucloud.cn/

Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
---
 mm/slub.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 21f71cb6cc06..2992388c00f4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2745,7 +2745,10 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
 	slab->inuse++;
 
 	if (!alloc_debug_processing(s, slab, object, orig_size)) {
-		remove_partial(n, slab);
+		if (folio_test_slab(slab_folio(slab))) {
+			remove_partial(n, slab);
+			add_full(s, n, slab);
+		}
 		return NULL;
 	}
 
-- 
2.32.0

Re: [PATCH v2] mm/slub: Avoid list corruption when removing a slab from the full list

Posted by Vlastimil Babka 1 year, 4 months ago

On 10/7/24 11:18 AM, yuan.gao wrote:
> boot with slub_debug=UFPZ.
> 
> If allocated object failed in alloc_consistency_checks, all objects of
> the slab will be marked as used, and then the slab will be removed from
> the partial list.
> 
> When an object belonging to the slab got freed later, the remove_full()
> function is called. Because the slab is neither on the partial list nor
> on the full list, it eventually lead to a list corruption.
> 
> So we need to add the slab to full list in this case.
> 
> [ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100)
> [ 4277.387023] ------------[ cut here ]------------
> [ 4277.387880] kernel BUG at lib/list_debug.c:56!
> [ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G           OE      6.6.1-1 #1
> [ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs]
> [ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0
> [ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91
> [ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082
> [ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000
> [ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff
> [ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0
> [ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910
> [ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0
> [ 4277.404049] FS:  0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000
> [ 4277.405357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0
> [ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 4277.410000] PKRU: 55555554
> [ 4277.410645] Call Trace:
> [ 4277.411234]  <TASK>
> [ 4277.411777]  ? die+0x32/0x80
> [ 4277.412439]  ? do_trap+0xd6/0x100
> [ 4277.413150]  ? __list_del_entry_valid_or_report+0x7b/0xc0
> [ 4277.414158]  ? do_error_trap+0x6a/0x90
> [ 4277.414948]  ? __list_del_entry_valid_or_report+0x7b/0xc0
> [ 4277.415915]  ? exc_invalid_op+0x4c/0x60
> [ 4277.416710]  ? __list_del_entry_valid_or_report+0x7b/0xc0
> [ 4277.417675]  ? asm_exc_invalid_op+0x16/0x20
> [ 4277.418482]  ? __list_del_entry_valid_or_report+0x7b/0xc0
> [ 4277.419466]  ? __list_del_entry_valid_or_report+0x7b/0xc0
> [ 4277.420410]  free_to_partial_list+0x515/0x5e0
> [ 4277.421242]  ? xfs_iext_remove+0x41a/0xa10 [xfs]
> [ 4277.422298]  xfs_iext_remove+0x41a/0xa10 [xfs]
> [ 4277.423316]  ? xfs_inodegc_worker+0xb4/0x1a0 [xfs]
> [ 4277.424383]  xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs]
> [ 4277.425490]  __xfs_bunmapi+0x50d/0x840 [xfs]
> [ 4277.426445]  xfs_itruncate_extents_flags+0x13a/0x490 [xfs]
> [ 4277.427553]  xfs_inactive_truncate+0xa3/0x120 [xfs]
> [ 4277.428567]  xfs_inactive+0x22d/0x290 [xfs]
> [ 4277.429500]  xfs_inodegc_worker+0xb4/0x1a0 [xfs]
> [ 4277.430479]  process_one_work+0x171/0x340
> [ 4277.431227]  worker_thread+0x277/0x390
> [ 4277.431962]  ? __pfx_worker_thread+0x10/0x10
> [ 4277.432752]  kthread+0xf0/0x120
> [ 4277.433382]  ? __pfx_kthread+0x10/0x10
> [ 4277.434134]  ret_from_fork+0x2d/0x50
> [ 4277.434837]  ? __pfx_kthread+0x10/0x10
> [ 4277.435566]  ret_from_fork_asm+0x1b/0x30
> [ 4277.436280]  </TASK>
> 
> v2:
> * Call remove_partial() and add_full() only for slab folios.
> 
> v1:
> https://lore.kernel.org/linux-mm/20241006044755.79634-1-yuan.gao@ucloud.cn/
> 
> Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>

Is it possible to determine which commit introduced this issue, for a
stable cc?

Also in addition to what Hyeonggon proposed, we should perhaps mark
these consistency-failed slabs in a way that further freeing of objects
in them will also don't actually free anything, and thus not move the
slab back from full to partial list for further reuse. Right now the
(understandable) attempt to not use the corrupted slab further is only
partially successful.

> ---
>  mm/slub.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 21f71cb6cc06..2992388c00f4 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2745,7 +2745,10 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
>  	slab->inuse++;
>  
>  	if (!alloc_debug_processing(s, slab, object, orig_size)) {
> -		remove_partial(n, slab);
> +		if (folio_test_slab(slab_folio(slab))) {
> +			remove_partial(n, slab);
> +			add_full(s, n, slab);
> +		}
>  		return NULL;
>  	}
>

Re: [PATCH v2] mm/slub: Avoid list corruption when removing a slab from the full list

Posted by Hyeonggon Yoo 1 year, 4 months ago

On Mon, Oct 7, 2024 at 11:29 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 10/7/24 11:18 AM, yuan.gao wrote:
> > boot with slub_debug=UFPZ.
> >
> > If allocated object failed in alloc_consistency_checks, all objects of
> > the slab will be marked as used, and then the slab will be removed from
> > the partial list.
> >
> > When an object belonging to the slab got freed later, the remove_full()
> > function is called. Because the slab is neither on the partial list nor
> > on the full list, it eventually lead to a list corruption.
> >
> > So we need to add the slab to full list in this case.
> >
> > [ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100)
> > [ 4277.387023] ------------[ cut here ]------------
> > [ 4277.387880] kernel BUG at lib/list_debug.c:56!
> > [ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> > [ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G           OE      6.6.1-1 #1
> > [ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs]
> > [ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0
> > [ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91
> > [ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082
> > [ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000
> > [ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff
> > [ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0
> > [ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910
> > [ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0
> > [ 4277.404049] FS:  0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000
> > [ 4277.405357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0
> > [ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 4277.410000] PKRU: 55555554
> > [ 4277.410645] Call Trace:
> > [ 4277.411234]  <TASK>
> > [ 4277.411777]  ? die+0x32/0x80
> > [ 4277.412439]  ? do_trap+0xd6/0x100
> > [ 4277.413150]  ? __list_del_entry_valid_or_report+0x7b/0xc0
> > [ 4277.414158]  ? do_error_trap+0x6a/0x90
> > [ 4277.414948]  ? __list_del_entry_valid_or_report+0x7b/0xc0
> > [ 4277.415915]  ? exc_invalid_op+0x4c/0x60
> > [ 4277.416710]  ? __list_del_entry_valid_or_report+0x7b/0xc0
> > [ 4277.417675]  ? asm_exc_invalid_op+0x16/0x20
> > [ 4277.418482]  ? __list_del_entry_valid_or_report+0x7b/0xc0
> > [ 4277.419466]  ? __list_del_entry_valid_or_report+0x7b/0xc0
> > [ 4277.420410]  free_to_partial_list+0x515/0x5e0
> > [ 4277.421242]  ? xfs_iext_remove+0x41a/0xa10 [xfs]
> > [ 4277.422298]  xfs_iext_remove+0x41a/0xa10 [xfs]
> > [ 4277.423316]  ? xfs_inodegc_worker+0xb4/0x1a0 [xfs]
> > [ 4277.424383]  xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs]
> > [ 4277.425490]  __xfs_bunmapi+0x50d/0x840 [xfs]
> > [ 4277.426445]  xfs_itruncate_extents_flags+0x13a/0x490 [xfs]
> > [ 4277.427553]  xfs_inactive_truncate+0xa3/0x120 [xfs]
> > [ 4277.428567]  xfs_inactive+0x22d/0x290 [xfs]
> > [ 4277.429500]  xfs_inodegc_worker+0xb4/0x1a0 [xfs]
> > [ 4277.430479]  process_one_work+0x171/0x340
> > [ 4277.431227]  worker_thread+0x277/0x390
> > [ 4277.431962]  ? __pfx_worker_thread+0x10/0x10
> > [ 4277.432752]  kthread+0xf0/0x120
> > [ 4277.433382]  ? __pfx_kthread+0x10/0x10
> > [ 4277.434134]  ret_from_fork+0x2d/0x50
> > [ 4277.434837]  ? __pfx_kthread+0x10/0x10
> > [ 4277.435566]  ret_from_fork_asm+0x1b/0x30
> > [ 4277.436280]  </TASK>
> >
> > v2:
> > * Call remove_partial() and add_full() only for slab folios.
> >
> > v1:
> > https://lore.kernel.org/linux-mm/20241006044755.79634-1-yuan.gao@ucloud.cn/
> >
> > Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
>
> Is it possible to determine which commit introduced this issue, for a
> stable cc?

By code inspection I suspect it's around when SLUB's first introduced in 2007,
more specifically commit 643b113849d8 ("slub: enable tracking of full slabs").
Even v2.6 kernels do not seem to handle this case correctly.

> Also in addition to what Hyeonggon proposed, we should perhaps mark
> these consistency-failed slabs in a way that further freeing of objects
> in them will also don't actually free anything, and thus not move the
> slab back from full to partial list for further reuse.

Yeah I was thinking of that too.

If that is feasible Yuan you may use one bit from (in mm/slab.h)
struct slab's 'inuse' field
and change it to 15 bits to mark consistency-failed slabs.

IIUC 'inuse' doesn't have to be 16 bits and 'objects' is already 15
bits, so I think it should be fine.

> Right now the
> (understandable) attempt to not use the corrupted slab further is only
> partially successful.

Best,
Hyeonggon

> > ---
> >  mm/slub.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 21f71cb6cc06..2992388c00f4 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2745,7 +2745,10 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
> >       slab->inuse++;
> >
> >       if (!alloc_debug_processing(s, slab, object, orig_size)) {
> > -             remove_partial(n, slab);
> > +             if (folio_test_slab(slab_folio(slab))) {
> > +                     remove_partial(n, slab);
> > +                     add_full(s, n, slab);
> > +             }
> >               return NULL;
> >       }
> >

Re: [PATCH v2] mm/slub: Avoid list corruption when removing a slab from the full list

Posted by Vlastimil Babka 1 year, 4 months ago

On 10/7/24 17:03, Hyeonggon Yoo wrote:
> On Mon, Oct 7, 2024 at 11:29 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 10/7/24 11:18 AM, yuan.gao wrote:
>> > boot with slub_debug=UFPZ.
>> >
>> > If allocated object failed in alloc_consistency_checks, all objects of
>> > the slab will be marked as used, and then the slab will be removed from
>> > the partial list.
>> >
>> > When an object belonging to the slab got freed later, the remove_full()
>> > function is called. Because the slab is neither on the partial list nor
>> > on the full list, it eventually lead to a list corruption.
>> >
>> > So we need to add the slab to full list in this case.
>> >
>> > [ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100)
>> > [ 4277.387023] ------------[ cut here ]------------
>> > [ 4277.387880] kernel BUG at lib/list_debug.c:56!
>> > [ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>> > [ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G           OE      6.6.1-1 #1
>> > [ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs]
>> > [ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91
>> > [ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082
>> > [ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000
>> > [ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff
>> > [ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0
>> > [ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910
>> > [ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0
>> > [ 4277.404049] FS:  0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000
>> > [ 4277.405357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0
>> > [ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > [ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > [ 4277.410000] PKRU: 55555554
>> > [ 4277.410645] Call Trace:
>> > [ 4277.411234]  <TASK>
>> > [ 4277.411777]  ? die+0x32/0x80
>> > [ 4277.412439]  ? do_trap+0xd6/0x100
>> > [ 4277.413150]  ? __list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.414158]  ? do_error_trap+0x6a/0x90
>> > [ 4277.414948]  ? __list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.415915]  ? exc_invalid_op+0x4c/0x60
>> > [ 4277.416710]  ? __list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.417675]  ? asm_exc_invalid_op+0x16/0x20
>> > [ 4277.418482]  ? __list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.419466]  ? __list_del_entry_valid_or_report+0x7b/0xc0
>> > [ 4277.420410]  free_to_partial_list+0x515/0x5e0
>> > [ 4277.421242]  ? xfs_iext_remove+0x41a/0xa10 [xfs]
>> > [ 4277.422298]  xfs_iext_remove+0x41a/0xa10 [xfs]
>> > [ 4277.423316]  ? xfs_inodegc_worker+0xb4/0x1a0 [xfs]
>> > [ 4277.424383]  xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs]
>> > [ 4277.425490]  __xfs_bunmapi+0x50d/0x840 [xfs]
>> > [ 4277.426445]  xfs_itruncate_extents_flags+0x13a/0x490 [xfs]
>> > [ 4277.427553]  xfs_inactive_truncate+0xa3/0x120 [xfs]
>> > [ 4277.428567]  xfs_inactive+0x22d/0x290 [xfs]
>> > [ 4277.429500]  xfs_inodegc_worker+0xb4/0x1a0 [xfs]
>> > [ 4277.430479]  process_one_work+0x171/0x340
>> > [ 4277.431227]  worker_thread+0x277/0x390
>> > [ 4277.431962]  ? __pfx_worker_thread+0x10/0x10
>> > [ 4277.432752]  kthread+0xf0/0x120
>> > [ 4277.433382]  ? __pfx_kthread+0x10/0x10
>> > [ 4277.434134]  ret_from_fork+0x2d/0x50
>> > [ 4277.434837]  ? __pfx_kthread+0x10/0x10
>> > [ 4277.435566]  ret_from_fork_asm+0x1b/0x30
>> > [ 4277.436280]  </TASK>
>> >
>> > v2:
>> > * Call remove_partial() and add_full() only for slab folios.
>> >
>> > v1:
>> > https://lore.kernel.org/linux-mm/20241006044755.79634-1-yuan.gao@ucloud.cn/
>> >
>> > Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
>>
>> Is it possible to determine which commit introduced this issue, for a
>> stable cc?
> 
> By code inspection I suspect it's around when SLUB's first introduced in 2007,
> more specifically commit 643b113849d8 ("slub: enable tracking of full slabs").
> Even v2.6 kernels do not seem to handle this case correctly.
> 
>> Also in addition to what Hyeonggon proposed, we should perhaps mark
>> these consistency-failed slabs in a way that further freeing of objects
>> in them will also don't actually free anything, and thus not move the
>> slab back from full to partial list for further reuse.
> 
> Yeah I was thinking of that too.
> 
> If that is feasible Yuan you may use one bit from (in mm/slab.h)
> struct slab's 'inuse' field
> and change it to 15 bits to mark consistency-failed slabs.
> 
> IIUC 'inuse' doesn't have to be 16 bits and 'objects' is already 15
> bits, so I think it should be fine.

AFAICS we are in a rather comfortable position for the debug caches which
avoid all the fastpaths, and we could simply reuse the frozen bit or invent
a special value for slab->freelist?

Either way it should boil down to free_debug_processing() detecting such a
slab (or perhaps in check_slab()) and returning false/0 immediately which
would avoid further actions on the slab.

If it was done in check_slab() then free_debug_processing() would at least
report which of the objects in a broken slab were attempted to be freed
after it was marked broken, which could perhaps be useful:

   slab_fix(s, "Object at 0x%p not freed", object);

Vlastimil

>> Right now the
>> (understandable) attempt to not use the corrupted slab further is only
>> partially successful.
> 
> Best,
> Hyeonggon
> 
>> > ---
>> >  mm/slub.c | 5 ++++-
>> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/mm/slub.c b/mm/slub.c
>> > index 21f71cb6cc06..2992388c00f4 100644
>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -2745,7 +2745,10 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
>> >       slab->inuse++;
>> >
>> >       if (!alloc_debug_processing(s, slab, object, orig_size)) {
>> > -             remove_partial(n, slab);
>> > +             if (folio_test_slab(slab_folio(slab))) {
>> > +                     remove_partial(n, slab);
>> > +                     add_full(s, n, slab);
>> > +             }
>> >               return NULL;
>> >       }
>> >

Re: [PATCH v2] mm/slub: Avoid list corruption when removing a slab from the full list

Posted by Christoph Lameter (Ampere) 1 year, 4 months ago

On Tue, 8 Oct 2024, Hyeonggon Yoo wrote:

> > Is it possible to determine which commit introduced this issue, for a
> > stable cc?
>
> By code inspection I suspect it's around when SLUB's first introduced in 2007,
> more specifically commit 643b113849d8 ("slub: enable tracking of full slabs").
> Even v2.6 kernels do not seem to handle this case correctly.

Yes this is an error that was there in the beginning. Its a rare
condition that only occurs when debugging is enabled so its difficult to
trigger IRL.

> > Also in addition to what Hyeonggon proposed, we should perhaps mark
> > these consistency-failed slabs in a way that further freeing of objects
> > in them will also don't actually free anything, and thus not move the
> > slab back from full to partial list for further reuse.
>
> Yeah I was thinking of that too.
>

Right. Stop any processing on the slab page with metadata corruption.

Re: [PATCH v2] mm/slub: Avoid list corruption when removing a slab from the full list

Posted by Christoph Lameter (Ampere) 1 year, 4 months ago

list_del() in remove_partial() sets poison values for next/prev so there
should no list
corruption but a failure showing the poison values.

static inline void list_del(struct list_head *entry)
{
        __list_del_entry(entry);
        entry->next = LIST_POISON1;
        entry->prev = LIST_POISON2;
}

Re: [PATCH v2] mm/slub: Avoid list corruption when removing a slab from the full list

Posted by Vlastimil Babka 1 year, 4 months ago

On 10/7/24 18:40, Christoph Lameter (Ampere) wrote:
> 
> list_del() in remove_partial() sets poison values for next/prev so there
> should no list
> corruption but a failure showing the poison values.

Yeah that's what is reported, but there's still a mention of list corruption:

[ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100)
 
> static inline void list_del(struct list_head *entry)
> {
>         __list_del_entry(entry);
>         entry->next = LIST_POISON1;
>         entry->prev = LIST_POISON2;
> }
>

Re: [PATCH v2] mm/slub: Avoid list corruption when removing a slab from the full list

Posted by yuan.gao 1 year, 4 months ago

On 24/10/08 11:17AM, Vlastimil Babka wrote:
> On 10/7/24 18:40, Christoph Lameter (Ampere) wrote:
> > 
> > list_del() in remove_partial() sets poison values for next/prev so there
> > should no list
> > corruption but a failure showing the poison values.
> 
> Yeah that's what is reported, but there's still a mention of list corruption:
> 
> [ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100)
>  
> > static inline void list_del(struct list_head *entry)
> > {
> >         __list_del_entry(entry);
> >         entry->next = LIST_POISON1;
> >         entry->prev = LIST_POISON2;
> > }
> > 
> 

Actually, if panic_on_oops=0, kernel will hang on my x86 system. The
task(kworker here) that triggers the bug dies in the #ud trap handler.

Thanks