fs/ubifs/tnc_commit.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
Running
rm -f /etc/test-file.bin
dd if=/dev/urandom of=/etc/test-file.bin bs=1M count=60 conv=fsync
in a loop, with `CONFIG_UBIFS_FS_AUTHENTICATION`, KASAN reports:
BUG: KASAN: use-after-free in ubifs_tnc_end_commit+0xa5c/0x1950
Write of size 32 at addr ffffff800a3af86c by task ubifs_bgt0_20/153
Call trace:
dump_backtrace+0x0/0x340
show_stack+0x18/0x24
dump_stack_lvl+0x9c/0xbc
print_address_description.constprop.0+0x74/0x2b0
kasan_report+0x1d8/0x1f0
kasan_check_range+0xf8/0x1a0
memcpy+0x84/0xf4
ubifs_tnc_end_commit+0xa5c/0x1950
do_commit+0x4e0/0x1340
ubifs_bg_thread+0x234/0x2e0
kthread+0x36c/0x410
ret_from_fork+0x10/0x20
Allocated by task 401:
kasan_save_stack+0x38/0x70
__kasan_kmalloc+0x8c/0xd0
__kmalloc+0x34c/0x5bc
tnc_insert+0x140/0x16a4
ubifs_tnc_add+0x370/0x52c
ubifs_jnl_write_data+0x5d8/0x870
do_writepage+0x36c/0x510
ubifs_writepage+0x190/0x4dc
__writepage+0x58/0x154
write_cache_pages+0x394/0x830
do_writepages+0x1f0/0x5b0
filemap_fdatawrite_wbc+0x170/0x25c
file_write_and_wait_range+0x140/0x190
ubifs_fsync+0xe8/0x290
vfs_fsync_range+0xc0/0x1e4
do_fsync+0x40/0x90
__arm64_sys_fsync+0x34/0x50
invoke_syscall.constprop.0+0xa8/0x260
do_el0_svc+0xc8/0x1f0
el0_svc+0x34/0x70
el0t_64_sync_handler+0x108/0x114
el0t_64_sync+0x1a4/0x1a8
Freed by task 403:
kasan_save_stack+0x38/0x70
kasan_set_track+0x28/0x40
kasan_set_free_info+0x28/0x4c
__kasan_slab_free+0xd4/0x13c
kfree+0xc4/0x3a0
tnc_delete+0x3f4/0xe40
ubifs_tnc_remove_range+0x368/0x73c
ubifs_tnc_remove_ino+0x29c/0x2e0
ubifs_jnl_delete_inode+0x150/0x260
ubifs_evict_inode+0x1d4/0x2e4
evict+0x1c8/0x450
iput+0x2a0/0x3c4
do_unlinkat+0x2cc/0x490
__arm64_sys_unlinkat+0x90/0x100
invoke_syscall.constprop.0+0xa8/0x260
do_el0_svc+0xc8/0x1f0
el0_svc+0x34/0x70
el0t_64_sync_handler+0x108/0x114
el0t_64_sync+0x1a4/0x1a8
The offending `memcpy` is in `ubifs_copy_hash()`. Fix this by checking
if the `znode` is obsolete before accessing the hash (just like we do
for `znode->parent`).
Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes")
Signed-off-by: Waqar Hameed <waqar.hameed@axis.com>
---
I'm not entirely sure if this is the _correct_ way to fix this. However,
testing shows that the problem indeed disappears.
My understanding is that the `znode` could concurrently be deleted (with
a reference in an unprotected code section without `tnc_mutex`). The
assumption is that in this case it would be sufficient to check
`ubifs_zn_obsolete(znode)`, like as in the if-statement for
`znode->parent` just below.
I'll be happy to get any helpful feedback!
fs/ubifs/tnc_commit.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c
index a55e04822d16..0b358254272b 100644
--- a/fs/ubifs/tnc_commit.c
+++ b/fs/ubifs/tnc_commit.c
@@ -891,8 +891,10 @@ static int write_index(struct ubifs_info *c)
mutex_lock(&c->tnc_mutex);
if (znode->cparent)
- ubifs_copy_hash(c, hash,
- znode->cparent->zbranch[znode->ciip].hash);
+ if (!ubifs_zn_obsolete(znode))
+ ubifs_copy_hash(c, hash,
+ znode->cparent->zbranch[znode->ciip]
+ .hash);
if (znode->parent) {
if (!ubifs_zn_obsolete(znode))
--
2.39.2
在 2024/10/9 22:46, Waqar Hameed 写道: > Running > > rm -f /etc/test-file.bin > dd if=/dev/urandom of=/etc/test-file.bin bs=1M count=60 conv=fsync > > in a loop, with `CONFIG_UBIFS_FS_AUTHENTICATION`, KASAN reports: > > BUG: KASAN: use-after-free in ubifs_tnc_end_commit+0xa5c/0x1950 > Write of size 32 at addr ffffff800a3af86c by task ubifs_bgt0_20/153 > > Call trace: > dump_backtrace+0x0/0x340 > show_stack+0x18/0x24 > dump_stack_lvl+0x9c/0xbc > print_address_description.constprop.0+0x74/0x2b0 > kasan_report+0x1d8/0x1f0 > kasan_check_range+0xf8/0x1a0 > memcpy+0x84/0xf4 > ubifs_tnc_end_commit+0xa5c/0x1950 > do_commit+0x4e0/0x1340 > ubifs_bg_thread+0x234/0x2e0 > kthread+0x36c/0x410 > ret_from_fork+0x10/0x20 > > Allocated by task 401: > kasan_save_stack+0x38/0x70 > __kasan_kmalloc+0x8c/0xd0 > __kmalloc+0x34c/0x5bc > tnc_insert+0x140/0x16a4 > ubifs_tnc_add+0x370/0x52c > ubifs_jnl_write_data+0x5d8/0x870 > do_writepage+0x36c/0x510 > ubifs_writepage+0x190/0x4dc > __writepage+0x58/0x154 > write_cache_pages+0x394/0x830 > do_writepages+0x1f0/0x5b0 > filemap_fdatawrite_wbc+0x170/0x25c > file_write_and_wait_range+0x140/0x190 > ubifs_fsync+0xe8/0x290 > vfs_fsync_range+0xc0/0x1e4 > do_fsync+0x40/0x90 > __arm64_sys_fsync+0x34/0x50 > invoke_syscall.constprop.0+0xa8/0x260 > do_el0_svc+0xc8/0x1f0 > el0_svc+0x34/0x70 > el0t_64_sync_handler+0x108/0x114 > el0t_64_sync+0x1a4/0x1a8 > > Freed by task 403: > kasan_save_stack+0x38/0x70 > kasan_set_track+0x28/0x40 > kasan_set_free_info+0x28/0x4c > __kasan_slab_free+0xd4/0x13c Hi Waqar, is that line 2639 ? 2540 sstatic int tnc_delete() 2541 { 2608 if (!znode->parent) { 2609 while (znode->child_cnt == 1 && znode->level != 0) { 2634 if (zp->cnext) { ... 2638 } else 2639 kfree(zp); 2644 } > kfree+0xc4/0x3a0 > tnc_delete+0x3f4/0xe40 > ubifs_tnc_remove_range+0x368/0x73c > ubifs_tnc_remove_ino+0x29c/0x2e0 > ubifs_jnl_delete_inode+0x150/0x260 > ubifs_evict_inode+0x1d4/0x2e4 > evict+0x1c8/0x450 > iput+0x2a0/0x3c4 > do_unlinkat+0x2cc/0x490 > __arm64_sys_unlinkat+0x90/0x100 > invoke_syscall.constprop.0+0xa8/0x260 > do_el0_svc+0xc8/0x1f0 > el0_svc+0x34/0x70 > el0t_64_sync_handler+0x108/0x114 > el0t_64_sync+0x1a4/0x1a8 > Looks like there is one possibility: 1. tnc_insert() triggers a TNC tree split: zroot / z_p1 / zn z_p1 is full, after inserting zn_new(key order is smaller that zn) under z_p1, zn->parent is switched to z_p2, but zn->cparent is still z_p1: zroot / \ z_p1 z_p2 / \ zn_new zn 2. tnc_delete() removes all znodes except the 'zn': zroot \ z_p2 \ zn TNC tree is collapsed, zroot and z_p2 are freed: zroot'(zn) 3. get_znodes_to_commit() finds only one znode(zn, which is also zroot), zn->cparent is not updated and still points to z_p1(which was freed). 4. write_index() accesses the zn->cparent->zbranch, which triggers an UAF! Try following modification to verify whether the problem is fixed: diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c index a55e04822d16..7c43e0ccf6d4 100644 --- a/fs/ubifs/tnc_commit.c +++ b/fs/ubifs/tnc_commit.c @@ -657,6 +657,8 @@ static int get_znodes_to_commit(struct ubifs_info *c) znode->alt = 0; cnext = find_next_dirty(znode); if (!cnext) { + ubifs_assert(c, !znode->parent); + znode->cparent = NULL; znode->cnext = c->cnext; break; }
On Thu, Nov 07, 2024 at 16:39 +0800 Zhihao Cheng <chengzhihao1@huawei.com> wrote: > 在 2024/10/9 22:46, Waqar Hameed 写道: >> Running >> rm -f /etc/test-file.bin >> dd if=/dev/urandom of=/etc/test-file.bin bs=1M count=60 conv=fsync >> in a loop, with `CONFIG_UBIFS_FS_AUTHENTICATION`, KASAN reports: >> BUG: KASAN: use-after-free in ubifs_tnc_end_commit+0xa5c/0x1950 >> Write of size 32 at addr ffffff800a3af86c by task ubifs_bgt0_20/153 >> Call trace: >> dump_backtrace+0x0/0x340 >> show_stack+0x18/0x24 >> dump_stack_lvl+0x9c/0xbc >> print_address_description.constprop.0+0x74/0x2b0 >> kasan_report+0x1d8/0x1f0 >> kasan_check_range+0xf8/0x1a0 >> memcpy+0x84/0xf4 >> ubifs_tnc_end_commit+0xa5c/0x1950 >> do_commit+0x4e0/0x1340 >> ubifs_bg_thread+0x234/0x2e0 >> kthread+0x36c/0x410 >> ret_from_fork+0x10/0x20 >> Allocated by task 401: >> kasan_save_stack+0x38/0x70 >> __kasan_kmalloc+0x8c/0xd0 >> __kmalloc+0x34c/0x5bc >> tnc_insert+0x140/0x16a4 >> ubifs_tnc_add+0x370/0x52c >> ubifs_jnl_write_data+0x5d8/0x870 >> do_writepage+0x36c/0x510 >> ubifs_writepage+0x190/0x4dc >> __writepage+0x58/0x154 >> write_cache_pages+0x394/0x830 >> do_writepages+0x1f0/0x5b0 >> filemap_fdatawrite_wbc+0x170/0x25c >> file_write_and_wait_range+0x140/0x190 >> ubifs_fsync+0xe8/0x290 >> vfs_fsync_range+0xc0/0x1e4 >> do_fsync+0x40/0x90 >> __arm64_sys_fsync+0x34/0x50 >> invoke_syscall.constprop.0+0xa8/0x260 >> do_el0_svc+0xc8/0x1f0 >> el0_svc+0x34/0x70 >> el0t_64_sync_handler+0x108/0x114 >> el0t_64_sync+0x1a4/0x1a8 >> Freed by task 403: >> kasan_save_stack+0x38/0x70 >> kasan_set_track+0x28/0x40 >> kasan_set_free_info+0x28/0x4c >> __kasan_slab_free+0xd4/0x13c > > Hi Waqar, is that line 2639 ? > 2540 sstatic int tnc_delete() > 2541 { > 2608 if (!znode->parent) { > 2609 while (znode->child_cnt == 1 && znode->level != 0) { > 2634 if (zp->cnext) { > ... > 2638 } else > 2639 kfree(zp); > 2644 } > `faddr2line` doesn't work for that func+offset. It just complains with bad symbol size: sym_addr: 0xc0830f81 cur_sym_addr: 0xc0831464 The offset and size hints that it should be somewhere at the end of `tnc_delete()`. I tried with `addr2line` and the addresses around that offsets points to the `kfree()` on line 2639. So yes, it would say that's the offending one. >> kfree+0xc4/0x3a0 >> tnc_delete+0x3f4/0xe40 >> ubifs_tnc_remove_range+0x368/0x73c >> ubifs_tnc_remove_ino+0x29c/0x2e0 >> ubifs_jnl_delete_inode+0x150/0x260 >> ubifs_evict_inode+0x1d4/0x2e4 >> evict+0x1c8/0x450 >> iput+0x2a0/0x3c4 >> do_unlinkat+0x2cc/0x490 >> __arm64_sys_unlinkat+0x90/0x100 >> invoke_syscall.constprop.0+0xa8/0x260 >> do_el0_svc+0xc8/0x1f0 >> el0_svc+0x34/0x70 >> el0t_64_sync_handler+0x108/0x114 >> el0t_64_sync+0x1a4/0x1a8 >> > > Looks like there is one possibility: > 1. tnc_insert() triggers a TNC tree split: > zroot > / > z_p1 > / > zn > z_p1 is full, after inserting zn_new(key order is smaller that zn) under z_p1, > zn->parent is switched to z_p2, but zn->cparent is still z_p1: > zroot > / \ > z_p1 z_p2 > / \ > zn_new zn > 2. tnc_delete() removes all znodes except the 'zn': > zroot > \ > z_p2 > \ > zn > TNC tree is collapsed, zroot and z_p2 are freed: > zroot'(zn) > 3. get_znodes_to_commit() finds only one znode(zn, which is also zroot), > zn->cparent is not updated and still points to z_p1(which was freed). > 4. write_index() accesses the zn->cparent->zbranch, which triggers an UAF! > > Try following modification to verify whether the problem is fixed: > diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c > index a55e04822d16..7c43e0ccf6d4 100644 > --- a/fs/ubifs/tnc_commit.c > +++ b/fs/ubifs/tnc_commit.c > @@ -657,6 +657,8 @@ static int get_znodes_to_commit(struct ubifs_info *c) > znode->alt = 0; > cnext = find_next_dirty(znode); > if (!cnext) { > + ubifs_assert(c, !znode->parent); > + znode->cparent = NULL; > znode->cnext = c->cnext; > break; > } Yup, I think this is it! Nice work! I've started a test run with the following patch: diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c index a55e04822d16..4c8150d5ed65 100644 --- a/fs/ubifs/tnc_commit.c +++ b/fs/ubifs/tnc_commit.c @@ -657,6 +657,11 @@ static int get_znodes_to_commit(struct ubifs_info *c) znode->alt = 0; cnext = find_next_dirty(znode); if (!cnext) { + ubifs_assert(c, !znode->parent); + if (znode->cparent) { + printk("%s:%d\n", __func__, __LINE__); + } + znode->cparent = NULL; znode->cnext = c->cnext; break; } and can already see that the print hit a couple of times in a few hours (250 iterations). I'll let it spin for a little longer (recall that the other patch "masked" the problem for almost 800 iterations). This was quite intricate and I really enjoyed your little breakdown. Thank you very much for the discussions/collaboration! I'll let you now as soon as I have an updated test result (and thus send a new version). P.S I wonder how many systems that might have experienced this use-after-free and got random memory corruptions (or other security issues). This bug has been there since v4.20. D.S
On Thu, Nov 07, 2024 at 23:38 +0100 Waqar Hameed <waqar.hameed@axis.com> wrote: [...] > I'll let it spin for a little longer (recall that the other patch > "masked" the problem for almost 800 iterations). > > This was quite intricate and I really enjoyed your little breakdown. > Thank you very much for the discussions/collaboration! I'll let you now > as soon as I have an updated test result (and thus send a new version). It has been running for more than 2100 iterations now (almost 24 hours) without any issues. I just sent a new version and added you to the commit footers as well. Hope that's OK!
在 2024/11/8 6:38, Waqar Hameed 写道: > On Thu, Nov 07, 2024 at 16:39 +0800 Zhihao Cheng <chengzhihao1@huawei.com> wrote: > [...] > > P.S > I wonder how many systems that might have experienced this > use-after-free and got random memory corruptions (or other security > issues). This bug has been there since v4.20. > D.S > . > Maybe the authentication feature is not widely used.
在 2024/10/9 22:46, Waqar Hameed 写道: > Running > > rm -f /etc/test-file.bin > dd if=/dev/urandom of=/etc/test-file.bin bs=1M count=60 conv=fsync > > in a loop, with `CONFIG_UBIFS_FS_AUTHENTICATION`, KASAN reports: > > BUG: KASAN: use-after-free in ubifs_tnc_end_commit+0xa5c/0x1950 > Write of size 32 at addr ffffff800a3af86c by task ubifs_bgt0_20/153 > > Call trace: > dump_backtrace+0x0/0x340 > show_stack+0x18/0x24 > dump_stack_lvl+0x9c/0xbc > print_address_description.constprop.0+0x74/0x2b0 > kasan_report+0x1d8/0x1f0 > kasan_check_range+0xf8/0x1a0 > memcpy+0x84/0xf4 > ubifs_tnc_end_commit+0xa5c/0x1950 > do_commit+0x4e0/0x1340 > ubifs_bg_thread+0x234/0x2e0 > kthread+0x36c/0x410 > ret_from_fork+0x10/0x20 > > Allocated by task 401: > kasan_save_stack+0x38/0x70 > __kasan_kmalloc+0x8c/0xd0 > __kmalloc+0x34c/0x5bc > tnc_insert+0x140/0x16a4 > ubifs_tnc_add+0x370/0x52c > ubifs_jnl_write_data+0x5d8/0x870 > do_writepage+0x36c/0x510 > ubifs_writepage+0x190/0x4dc > __writepage+0x58/0x154 > write_cache_pages+0x394/0x830 > do_writepages+0x1f0/0x5b0 > filemap_fdatawrite_wbc+0x170/0x25c > file_write_and_wait_range+0x140/0x190 > ubifs_fsync+0xe8/0x290 > vfs_fsync_range+0xc0/0x1e4 > do_fsync+0x40/0x90 > __arm64_sys_fsync+0x34/0x50 > invoke_syscall.constprop.0+0xa8/0x260 > do_el0_svc+0xc8/0x1f0 > el0_svc+0x34/0x70 > el0t_64_sync_handler+0x108/0x114 > el0t_64_sync+0x1a4/0x1a8 > > Freed by task 403: > kasan_save_stack+0x38/0x70 > kasan_set_track+0x28/0x40 > kasan_set_free_info+0x28/0x4c > __kasan_slab_free+0xd4/0x13c > kfree+0xc4/0x3a0 > tnc_delete+0x3f4/0xe40 > ubifs_tnc_remove_range+0x368/0x73c > ubifs_tnc_remove_ino+0x29c/0x2e0 > ubifs_jnl_delete_inode+0x150/0x260 > ubifs_evict_inode+0x1d4/0x2e4 > evict+0x1c8/0x450 > iput+0x2a0/0x3c4 > do_unlinkat+0x2cc/0x490 > __arm64_sys_unlinkat+0x90/0x100 > invoke_syscall.constprop.0+0xa8/0x260 > do_el0_svc+0xc8/0x1f0 > el0_svc+0x34/0x70 > el0t_64_sync_handler+0x108/0x114 > el0t_64_sync+0x1a4/0x1a8 > > The offending `memcpy` is in `ubifs_copy_hash()`. Fix this by checking > if the `znode` is obsolete before accessing the hash (just like we do > for `znode->parent`). Do you mean that the UAF occurs in following path: do_commit -> ubifs_tnc_end_commit -> write_index: while (1) { ... znode = cnext; ... if (znode->cparent) ubifs_copy_hash(c, hash, znode->cparent->zbranch[znode->ciip].hash); // znode->cparent has been freed! } If so, according to the current implementation(lastest linux kernel is v6.12), I cannot understand that how the znode->cparent is freed before write_index() finished, it looks impossible. All dirty znodes are gathered by function get_znodes_to_commit() which is protected by c->tnc_mutex, and the 'cparent' member in all dirty znodes is assigned with non-NULL. Then I think the znode memory freeing path 'tnc_delete->kfree(znode)' cannot happen, because: 1) If a znode is dirtied, all its' ancestor znodes(a path from znode to root znode) must be dirtied, which is guaranteed by UBIFS. See dirty_cow_bottom_up/lookup_level0_dirty. 2) A dirty znode waiting for commit cannot be freed before write_index() finished, which is guaranteed by tnc_delete: if (znode->cnext) { __set_bit(OBSOLETE_ZNODE, &znode->flags); ... } else { kfree(znode); } > > Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes") > Signed-off-by: Waqar Hameed <waqar.hameed@axis.com> > --- > I'm not entirely sure if this is the _correct_ way to fix this. However, > testing shows that the problem indeed disappears. > > My understanding is that the `znode` could concurrently be deleted (with > a reference in an unprotected code section without `tnc_mutex`). The > assumption is that in this case it would be sufficient to check > `ubifs_zn_obsolete(znode)`, like as in the if-statement for > `znode->parent` just below. I'm analyzing tnc-related code these days, however I can't find places that may concurrently operate the same znode. And I cannot reproduce the problem with your reproducer: while true; do rm -f /UBIFS_MNT/test-file.bin dd if=/dev/urandom of=/UBIFS_MNT/test-file.bin bs=1M count=60 conv=fsync done Can you dig more deeper by adding more debug message, so that we can figure out what is really happening. > > I'll be happy to get any helpful feedback! > > fs/ubifs/tnc_commit.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c > index a55e04822d16..0b358254272b 100644 > --- a/fs/ubifs/tnc_commit.c > +++ b/fs/ubifs/tnc_commit.c > @@ -891,8 +891,10 @@ static int write_index(struct ubifs_info *c) > mutex_lock(&c->tnc_mutex); > > if (znode->cparent) > - ubifs_copy_hash(c, hash, > - znode->cparent->zbranch[znode->ciip].hash); > + if (!ubifs_zn_obsolete(znode)) > + ubifs_copy_hash(c, hash, > + znode->cparent->zbranch[znode->ciip] > + .hash); > > if (znode->parent) { > if (!ubifs_zn_obsolete(znode)) >
On Sat, Oct 12, 2024 at 20:30 +0800 Zhihao Cheng <chengzhihao1@huawei.com> wrote: > 在 2024/10/9 22:46, Waqar Hameed 写道: >> Running >> rm -f /etc/test-file.bin >> dd if=/dev/urandom of=/etc/test-file.bin bs=1M count=60 conv=fsync >> in a loop, with `CONFIG_UBIFS_FS_AUTHENTICATION`, KASAN reports: >> BUG: KASAN: use-after-free in ubifs_tnc_end_commit+0xa5c/0x1950 >> Write of size 32 at addr ffffff800a3af86c by task ubifs_bgt0_20/153 >> Call trace: >> dump_backtrace+0x0/0x340 >> show_stack+0x18/0x24 >> dump_stack_lvl+0x9c/0xbc >> print_address_description.constprop.0+0x74/0x2b0 >> kasan_report+0x1d8/0x1f0 >> kasan_check_range+0xf8/0x1a0 >> memcpy+0x84/0xf4 >> ubifs_tnc_end_commit+0xa5c/0x1950 >> do_commit+0x4e0/0x1340 >> ubifs_bg_thread+0x234/0x2e0 >> kthread+0x36c/0x410 >> ret_from_fork+0x10/0x20 >> Allocated by task 401: >> kasan_save_stack+0x38/0x70 >> __kasan_kmalloc+0x8c/0xd0 >> __kmalloc+0x34c/0x5bc >> tnc_insert+0x140/0x16a4 >> ubifs_tnc_add+0x370/0x52c >> ubifs_jnl_write_data+0x5d8/0x870 >> do_writepage+0x36c/0x510 >> ubifs_writepage+0x190/0x4dc >> __writepage+0x58/0x154 >> write_cache_pages+0x394/0x830 >> do_writepages+0x1f0/0x5b0 >> filemap_fdatawrite_wbc+0x170/0x25c >> file_write_and_wait_range+0x140/0x190 >> ubifs_fsync+0xe8/0x290 >> vfs_fsync_range+0xc0/0x1e4 >> do_fsync+0x40/0x90 >> __arm64_sys_fsync+0x34/0x50 >> invoke_syscall.constprop.0+0xa8/0x260 >> do_el0_svc+0xc8/0x1f0 >> el0_svc+0x34/0x70 >> el0t_64_sync_handler+0x108/0x114 >> el0t_64_sync+0x1a4/0x1a8 >> Freed by task 403: >> kasan_save_stack+0x38/0x70 >> kasan_set_track+0x28/0x40 >> kasan_set_free_info+0x28/0x4c >> __kasan_slab_free+0xd4/0x13c >> kfree+0xc4/0x3a0 >> tnc_delete+0x3f4/0xe40 >> ubifs_tnc_remove_range+0x368/0x73c >> ubifs_tnc_remove_ino+0x29c/0x2e0 >> ubifs_jnl_delete_inode+0x150/0x260 >> ubifs_evict_inode+0x1d4/0x2e4 >> evict+0x1c8/0x450 >> iput+0x2a0/0x3c4 >> do_unlinkat+0x2cc/0x490 >> __arm64_sys_unlinkat+0x90/0x100 >> invoke_syscall.constprop.0+0xa8/0x260 >> do_el0_svc+0xc8/0x1f0 >> el0_svc+0x34/0x70 >> el0t_64_sync_handler+0x108/0x114 >> el0t_64_sync+0x1a4/0x1a8 >> The offending `memcpy` is in `ubifs_copy_hash()`. Fix this by checking >> if the `znode` is obsolete before accessing the hash (just like we do >> for `znode->parent`). > > Do you mean that the UAF occurs in following path: > do_commit -> ubifs_tnc_end_commit -> write_index: > while (1) { > ... > znode = cnext; > ... > if (znode->cparent) > ubifs_copy_hash(c, hash, znode->cparent->zbranch[znode->ciip].hash); // > znode->cparent has been freed! > } Yes, that's what KASAN reports. It's the `memcpy()` in `ubifs_copy_hash()` that triggers the slab-use-after-free. > > If so, according to the current implementation(lastest linux kernel is v6.12), I > cannot understand that how the znode->cparent is freed before write_index() > finished, it looks impossible. > All dirty znodes are gathered by function get_znodes_to_commit() which is > protected by c->tnc_mutex, and the 'cparent' member in all dirty znodes is > assigned with non-NULL. Then I think the znode memory freeing path > 'tnc_delete->kfree(znode)' cannot happen, because: > 1) If a znode is dirtied, all its' ancestor znodes(a path from znode to root > znode) must be dirtied, which is guaranteed by UBIFS. See > dirty_cow_bottom_up/lookup_level0_dirty. > 2) A dirty znode waiting for commit cannot be freed before write_index() > finished, which is guaranteed by tnc_delete: > if (znode->cnext) { > __set_bit(OBSOLETE_ZNODE, &znode->flags); > ... > } else { > kfree(znode); > } I'm with you here. Initially I thought there was some lock missing (since it is showing signs of a race, e.g. not deterministic). But as you point out, it is protected with `tnc_mutex`, hence my "RFC" tag on this patch. >> Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes") >> Signed-off-by: Waqar Hameed <waqar.hameed@axis.com> >> --- >> I'm not entirely sure if this is the _correct_ way to fix this. However, >> testing shows that the problem indeed disappears. >> My understanding is that the `znode` could concurrently be deleted (with >> a reference in an unprotected code section without `tnc_mutex`). The >> assumption is that in this case it would be sufficient to check >> `ubifs_zn_obsolete(znode)`, like as in the if-statement for >> `znode->parent` just below. > > I'm analyzing tnc-related code these days, however I can't find places that may > concurrently operate the same znode. And I cannot reproduce the problem with > your reproducer: > while true; do > rm -f /UBIFS_MNT/test-file.bin > dd if=/dev/urandom of=/UBIFS_MNT/test-file.bin bs=1M count=60 conv=fsync > done For completeness, here are the _exact_ steps that I have used to reproduce this on my system with v6.12-rc2 (commit 75b607fab38d "Merge tag 'sched_ext-for-6.12-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext"): ``` ubiattach -m 2 keyctl add logon dummy_key: dummy_load @us ubimkvol /dev/ubi0 -s 80MiB -n 0 -N test-vol ubiupdatevol /dev/ubi0_0 -t mount -t ubifs /dev/ubi0_0 /mnt/flash -o auth_hash_name=sha256,auth_key=dummy_key: count=0 while true; do date count=$(($count + 1)) echo count=$count rm -f /mnt/flash/test-file.bin dd if=/dev/urandom of=/mnt/flash/test-file.bin bs=1M count=60 conv=fsync echo "" done ``` Note that you need to have `CONFIG_UBIFS_FS_AUTHENTICATION=y` (and `CONFIG_KASAN=y` obviously) in your `.config` in order to trigger the offending `memcpy()` in `ubifs_copy_hash()`. Also, it takes a while. For example, last time it took me 88 iterations of the above loop before it triggered. So you might need to let it spin for a while. > > Can you dig more deeper by adding more debug message, so that we can figure out > what is really happening. Certainly! I could try to enable the debug prints from UBIFS, however they are *a lot*. Moreover, printing that much changes the timing behavior and might make it harder to trigger the use-after-free. Do you have any tips on where we should try to focus the debug prints (a dynamic debug filter). [...]
在 2024/10/16 2:52, Waqar Hameed 写道: Hi Waqar, [...] > > For completeness, here are the _exact_ steps that I have used to > reproduce this on my system with v6.12-rc2 (commit 75b607fab38d "Merge > tag 'sched_ext-for-6.12-rc2-fixes' of > git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext"): > > ``` > ubiattach -m 2 > > keyctl add logon dummy_key: dummy_load @us > > ubimkvol /dev/ubi0 -s 80MiB -n 0 -N test-vol > ubiupdatevol /dev/ubi0_0 -t > > mount -t ubifs /dev/ubi0_0 /mnt/flash -o auth_hash_name=sha256,auth_key=dummy_key: > > count=0 > while true; do > date > count=$(($count + 1)) > echo count=$count > > rm -f /mnt/flash/test-file.bin > dd if=/dev/urandom of=/mnt/flash/test-file.bin bs=1M count=60 conv=fsync > > echo "" > done > ``` > Thanks for the complete program, I will try it again. BTW, what is the configuration of your flash?(eg. erase size, page size)? > Note that you need to have `CONFIG_UBIFS_FS_AUTHENTICATION=y` (and > `CONFIG_KASAN=y` obviously) in your `.config` in order to trigger the > offending `memcpy()` in `ubifs_copy_hash()`. Also, it takes a while. For > example, last time it took me 88 iterations of the above loop before it > triggered. So you might need to let it spin for a while. Yes, above two configs are enabled, and I have added printing messages to confirm the authentication path is active. > >> >> Can you dig more deeper by adding more debug message, so that we can figure out >> what is really happening. > > Certainly! I could try to enable the debug prints from UBIFS, however > they are *a lot*. Moreover, printing that much changes the timing > behavior and might make it harder to trigger the use-after-free. Do you > have any tips on where we should try to focus the debug prints (a > dynamic debug filter). > Well, let's do a preliminary analysis. The znode->cparent[znode->ciip] is a freed address in write_index(), which means: 1. 'znode->ciip' is valid, znode->cparent is freed by tnc_delete, however znode cannot be freed if znode->cnext is not NULL, which means: a) 'znode->cparent' is not dirty, we should add an assertion like ubifs_assert(c, ubifs_zn_dirty(znode->cparent)) in get_znodes_to_commit(). Note, please check that 'znode->cparent' is not NULL before the assertion. b) 'znode->cparent' is dirty, but it is not added into list 'c->cnext', we should traverse the entire TNC in get_znodes_to_commit() to make sure that all dirty znodes are collected into list 'c->cnext', so another assertion is needed. 2. 'znode->ciip' is invalid, and the value beyonds the memory area of znode->cparent. All znodes are allocated with size of 'c->max_znode_sz', which means that 'znode->ciip' exceeds the 'c->fantout', so we can add an assertion like ubifs_assert(c, znode->ciip < c->fantout) in get_znodes_to_commit(). That's what I can think of, are there any other possibilities?
On Wed, Oct 16, 2024 at 10:11 +0800 Zhihao Cheng <chengzhihao1@huawei.com> wrote: [...] > BTW, what is the configuration of your flash?(eg. erase size, page size)? $ mtdinfo /dev/mtd2 mtd2 Name: firmware Type: nand Eraseblock size: 131072 bytes, 128.0 KiB Amount of eraseblocks: 1832 (240123904 bytes, 229.0 MiB) Minimum input/output unit size: 2048 bytes Sub-page size: 2048 bytes OOB size: 64 bytes Character device major/minor: 90:4 Bad blocks are allowed: true Device is writable: true $ ubinfo /dev/ubi0_0 Volume ID: 0 (on ubi0) Type: dynamic Alignment: 1 Size: 661 LEBs (83931136 bytes, 80.0 MiB) State: OK Name: test-vol Character device major/minor: 244:1 [...] > Well, let's do a preliminary analysis. > The znode->cparent[znode->ciip] is a freed address in write_index(), which > means: > 1. 'znode->ciip' is valid, znode->cparent is freed by tnc_delete, however znode > cannot be freed if znode->cnext is not NULL, which means: > a) 'znode->cparent' is not dirty, we should add an assertion like > ubifs_assert(c, ubifs_zn_dirty(znode->cparent)) in get_znodes_to_commit(). > Note, please check that 'znode->cparent' is not NULL before the assertion. > b) 'znode->cparent' is dirty, but it is not added into list 'c->cnext', we > should traverse the entire TNC in get_znodes_to_commit() to make sure that all > dirty znodes are collected into list 'c->cnext', so another assertion is > needed. > 2. 'znode->ciip' is invalid, and the value beyonds the memory area of > znode->cparent. All znodes are allocated with size of 'c->max_znode_sz', which > means that 'znode->ciip' exceeds the 'c->fantout', so we can add an assertion > like ubifs_assert(c, znode->ciip < c->fantout) in get_znodes_to_commit(). > > That's what I can think of, are there any other possibilities? I looked a little more at `get_znodes_to_commit()` when adding the asserts you suggest, and I have a question: what happens when `find_next_dirty()` returns `NULL`? In that case ``` znode->cnext = c->cnext; ``` but `znode->cparent` and `znode->ciip` are not updated. Shouldn't they? By the way, I left a test running, and it actually triggered the same KASAN report after 800 iterations... So we now at least know that this patch doesn't indeed fix the problem. I also found another minor thing regarding the update of `cnt` in `get_znodes_to_commit`. I'll send a separate patch for that.
在 2024/10/18 2:36, Waqar Hameed 写道: > On Wed, Oct 16, 2024 at 10:11 +0800 Zhihao Cheng <chengzhihao1@huawei.com> wrote: > > [...] > >> BTW, what is the configuration of your flash?(eg. erase size, page size)? > > $ mtdinfo /dev/mtd2 > mtd2 > Name: firmware > Type: nand > Eraseblock size: 131072 bytes, 128.0 KiB > Amount of eraseblocks: 1832 (240123904 bytes, 229.0 MiB) > Minimum input/output unit size: 2048 bytes > Sub-page size: 2048 bytes > OOB size: 64 bytes > Character device major/minor: 90:4 > Bad blocks are allowed: true > Device is writable: true > > $ ubinfo /dev/ubi0_0 > Volume ID: 0 (on ubi0) > Type: dynamic > Alignment: 1 > Size: 661 LEBs (83931136 bytes, 80.0 MiB) > State: OK > Name: test-vol > Character device major/minor: 244:1 > > [...] Thanks, I will change my nandsim configurations to generate a mtd device the same model. > >> Well, let's do a preliminary analysis. >> The znode->cparent[znode->ciip] is a freed address in write_index(), which >> means: >> 1. 'znode->ciip' is valid, znode->cparent is freed by tnc_delete, however znode >> cannot be freed if znode->cnext is not NULL, which means: >> a) 'znode->cparent' is not dirty, we should add an assertion like >> ubifs_assert(c, ubifs_zn_dirty(znode->cparent)) in get_znodes_to_commit(). >> Note, please check that 'znode->cparent' is not NULL before the assertion. >> b) 'znode->cparent' is dirty, but it is not added into list 'c->cnext', we >> should traverse the entire TNC in get_znodes_to_commit() to make sure that all >> dirty znodes are collected into list 'c->cnext', so another assertion is >> needed. >> 2. 'znode->ciip' is invalid, and the value beyonds the memory area of >> znode->cparent. All znodes are allocated with size of 'c->max_znode_sz', which >> means that 'znode->ciip' exceeds the 'c->fantout', so we can add an assertion >> like ubifs_assert(c, znode->ciip < c->fantout) in get_znodes_to_commit(). >> >> That's what I can think of, are there any other possibilities? > > I looked a little more at `get_znodes_to_commit()` when adding the > asserts you suggest, and I have a question: what happens when > `find_next_dirty()` returns `NULL`? In that case > > ``` > znode->cnext = c->cnext; > ``` > > but `znode->cparent` and `znode->ciip` are not updated. Shouldn't they? Good thinking. According to the implementation of find_next_dirty(), the order of dirty znodes collection is bottom-up, which means that the last dirty znode is the root znode, so it doesn't have a parent. You can verify that by adding assertion to check whether the last dirty znode is the root. > > By the way, I left a test running, and it actually triggered the same > KASAN report after 800 iterations... So we now at least know that this > patch doesn't indeed fix the problem. > > I also found another minor thing regarding the update of `cnt` in > `get_znodes_to_commit`. I'll send a separate patch for that. > . >
Sorry for the late response Zhihao! I've been quite busy these days... On Fri, Oct 18, 2024 at 09:40 +0800 Zhihao Cheng <chengzhihao1@huawei.com> wrote: > 在 2024/10/18 2:36, Waqar Hameed 写道: >> On Wed, Oct 16, 2024 at 10:11 +0800 Zhihao Cheng <chengzhihao1@huawei.com> wrote: >> [...] >> >>> BTW, what is the configuration of your flash?(eg. erase size, page size)? >> $ mtdinfo /dev/mtd2 >> mtd2 >> Name: firmware >> Type: nand >> Eraseblock size: 131072 bytes, 128.0 KiB >> Amount of eraseblocks: 1832 (240123904 bytes, 229.0 MiB) >> Minimum input/output unit size: 2048 bytes >> Sub-page size: 2048 bytes >> OOB size: 64 bytes >> Character device major/minor: 90:4 >> Bad blocks are allowed: true >> Device is writable: true >> $ ubinfo /dev/ubi0_0 >> Volume ID: 0 (on ubi0) >> Type: dynamic >> Alignment: 1 >> Size: 661 LEBs (83931136 bytes, 80.0 MiB) >> State: OK >> Name: test-vol >> Character device major/minor: 244:1 >> [...] > > Thanks, I will change my nandsim configurations to generate a mtd device the > same model. Did you manage to reproduce the issue with this? >> >>> Well, let's do a preliminary analysis. >>> The znode->cparent[znode->ciip] is a freed address in write_index(), which >>> means: >>> 1. 'znode->ciip' is valid, znode->cparent is freed by tnc_delete, however znode >>> cannot be freed if znode->cnext is not NULL, which means: >>> a) 'znode->cparent' is not dirty, we should add an assertion like >>> ubifs_assert(c, ubifs_zn_dirty(znode->cparent)) in get_znodes_to_commit(). >>> Note, please check that 'znode->cparent' is not NULL before the assertion. >>> b) 'znode->cparent' is dirty, but it is not added into list 'c->cnext', we >>> should traverse the entire TNC in get_znodes_to_commit() to make sure that all >>> dirty znodes are collected into list 'c->cnext', so another assertion is >>> needed. I'm a little worried that traversing the whole TNC could change the timing behavior, and thus might not trigger the race. Let's do that in steps? Start with the other asserts (see diff below) and later just do this assert. Does that sound reasonable? I could modify `dbg_check_tnc()` so that it also checks that each dirty `znode` is present in `c->cnext` list. We then call this at the end of `get_znodes_to_commit()`. >>> 2. 'znode->ciip' is invalid, and the value beyonds the memory area of >>> znode->cparent. All znodes are allocated with size of 'c->max_znode_sz', which >>> means that 'znode->ciip' exceeds the 'c->fantout', so we can add an assertion >>> like ubifs_assert(c, znode->ciip < c->fantout) in get_znodes_to_commit(). >>> >>> That's what I can think of, are there any other possibilities? >> I looked a little more at `get_znodes_to_commit()` when adding the >> asserts you suggest, and I have a question: what happens when >> `find_next_dirty()` returns `NULL`? In that case >> ``` >> znode->cnext = c->cnext; >> ``` >> but `znode->cparent` and `znode->ciip` are not updated. Shouldn't they? > > Good thinking. > According to the implementation of find_next_dirty(), the order of dirty znodes > collection is bottom-up, which means that the last dirty znode is the root > znode, so it doesn't have a parent. You can verify that by adding assertion to > check whether the last dirty znode is the root. [...] To summarize, I'll start a run with the following asserts: diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c index a55e04822d16..4eef82e02afe 100644 --- a/fs/ubifs/tnc_commit.c +++ b/fs/ubifs/tnc_commit.c @@ -652,11 +652,17 @@ static int get_znodes_to_commit(struct ubifs_info *c) } cnt += 1; while (1) { + ubifs_assert(c, znode->ciip < c->fantout); + if (znode->cparent) { + ubifs_assert(c, ubifs_zn_dirty(znode->cparent)); + } + ubifs_assert(c, !ubifs_zn_cow(znode)); __set_bit(COW_ZNODE, &znode->flags); znode->alt = 0; cnext = find_next_dirty(znode); if (!cnext) { + ubifs_assert(c, znode == c->zroot.znode); znode->cnext = c->cnext; break; } Then later, another run with a modified `dbg_check_tnc()` to check that all dirty `znode`s are indeed present in the list `c->cnext`.
在 2024/11/7 0:36, Waqar Hameed 写道: > Sorry for the late response Zhihao! I've been quite busy these days... > > On Fri, Oct 18, 2024 at 09:40 +0800 Zhihao Cheng <chengzhihao1@huawei.com> wrote: > >> 在 2024/10/18 2:36, Waqar Hameed 写道: >>> On Wed, Oct 16, 2024 at 10:11 +0800 Zhihao Cheng <chengzhihao1@huawei.com> wrote: >>> [...] >>> >>>> BTW, what is the configuration of your flash?(eg. erase size, page size)? >>> $ mtdinfo /dev/mtd2 >>> mtd2 >>> Name: firmware >>> Type: nand >>> Eraseblock size: 131072 bytes, 128.0 KiB >>> Amount of eraseblocks: 1832 (240123904 bytes, 229.0 MiB) >>> Minimum input/output unit size: 2048 bytes >>> Sub-page size: 2048 bytes >>> OOB size: 64 bytes >>> Character device major/minor: 90:4 >>> Bad blocks are allowed: true >>> Device is writable: true >>> $ ubinfo /dev/ubi0_0 >>> Volume ID: 0 (on ubi0) >>> Type: dynamic >>> Alignment: 1 >>> Size: 661 LEBs (83931136 bytes, 80.0 MiB) >>> State: OK >>> Name: test-vol >>> Character device major/minor: 244:1 >>> [...] >> >> Thanks, I will change my nandsim configurations to generate a mtd device the >> same model. > > Did you manage to reproduce the issue with this? I tried, but I still cannot reproduce it on my local machine. > >>> >>>> Well, let's do a preliminary analysis. >>>> The znode->cparent[znode->ciip] is a freed address in write_index(), which >>>> means: >>>> 1. 'znode->ciip' is valid, znode->cparent is freed by tnc_delete, however znode >>>> cannot be freed if znode->cnext is not NULL, which means: >>>> a) 'znode->cparent' is not dirty, we should add an assertion like >>>> ubifs_assert(c, ubifs_zn_dirty(znode->cparent)) in get_znodes_to_commit(). >>>> Note, please check that 'znode->cparent' is not NULL before the assertion. >>>> b) 'znode->cparent' is dirty, but it is not added into list 'c->cnext', we >>>> should traverse the entire TNC in get_znodes_to_commit() to make sure that all >>>> dirty znodes are collected into list 'c->cnext', so another assertion is >>>> needed. > > I'm a little worried that traversing the whole TNC could change the > timing behavior, and thus might not trigger the race. Let's do that in > steps? Start with the other asserts (see diff below) and later just do > this assert. Does that sound reasonable? Fine. I add one comment below. > > I could modify `dbg_check_tnc()` so that it also checks that each dirty > `znode` is present in `c->cnext` list. We then call this at the end of > `get_znodes_to_commit()`. > Sounds good to me, please remove other non-related checks in dbg_check_tnc(). >>>> 2. 'znode->ciip' is invalid, and the value beyonds the memory area of >>>> znode->cparent. All znodes are allocated with size of 'c->max_znode_sz', which >>>> means that 'znode->ciip' exceeds the 'c->fantout', so we can add an assertion >>>> like ubifs_assert(c, znode->ciip < c->fantout) in get_znodes_to_commit(). >>>> >>>> That's what I can think of, are there any other possibilities? >>> I looked a little more at `get_znodes_to_commit()` when adding the >>> asserts you suggest, and I have a question: what happens when >>> `find_next_dirty()` returns `NULL`? In that case >>> ``` >>> znode->cnext = c->cnext; >>> ``` >>> but `znode->cparent` and `znode->ciip` are not updated. Shouldn't they? >> >> Good thinking. >> According to the implementation of find_next_dirty(), the order of dirty znodes >> collection is bottom-up, which means that the last dirty znode is the root >> znode, so it doesn't have a parent. You can verify that by adding assertion to >> check whether the last dirty znode is the root. > > [...] > > To summarize, I'll start a run with the following asserts: > > diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c > index a55e04822d16..4eef82e02afe 100644 > --- a/fs/ubifs/tnc_commit.c > +++ b/fs/ubifs/tnc_commit.c > @@ -652,11 +652,17 @@ static int get_znodes_to_commit(struct ubifs_info *c) > } > cnt += 1; > while (1) { Please move the check after the assignment of 'znode->cparent', because 'znode->parent' could be switched by tnc_insert(). > + ubifs_assert(c, znode->ciip < c->fantout); > + if (znode->cparent) { > + ubifs_assert(c, ubifs_zn_dirty(znode->cparent)); > + } > + > ubifs_assert(c, !ubifs_zn_cow(znode)); > __set_bit(COW_ZNODE, &znode->flags); > znode->alt = 0; > cnext = find_next_dirty(znode); > if (!cnext) { > + ubifs_assert(c, znode == c->zroot.znode); > znode->cnext = c->cnext; > break; > } > @@ -662,6 +662,10 @@ static int get_znodes_to_commit(struct ubifs_info *c) } znode->cparent = znode->parent; znode->ciip = znode->iip; + if (znode->cparent) { + ubifs_assert(c, ubifs_zn_dirty(znode->cparent)); + } + ubifs_assert(c, znode->ciip < c->fantout); znode->cnext = cnext; znode = cnext; cnt += 1; > Then later, another run with a modified `dbg_check_tnc()` to check that > all dirty `znode`s are indeed present in the list `c->cnext`. > . >
On Thu, Nov 07, 2024 at 15:14 +0800 Zhihao Cheng <chengzhihao1@huawei.com> wrote: > 在 2024/11/7 0:36, Waqar Hameed 写道: [...] >> Did you manage to reproduce the issue with this? > > I tried, but I still cannot reproduce it on my local machine. That's a bummer! Sometimes it really could take a while. For example, my last attempt needed 248 iterations (almost 4 hours)... [...] > @@ -662,6 +662,10 @@ static int get_znodes_to_commit(struct ubifs_info *c) > } > znode->cparent = znode->parent; > znode->ciip = znode->iip; > + if (znode->cparent) { > + ubifs_assert(c, ubifs_zn_dirty(znode->cparent)); > + } > + ubifs_assert(c, znode->ciip < c->fantout); > znode->cnext = cnext; > znode = cnext; > cnt += 1; None of the asserts got hit during my last run, but KASAN still complained.
© 2016 - 2024 Red Hat, Inc.