[PATCH] gfs2: Fix memory leak in gfs2_trans_begin

Edward Adam Davis posted 1 patch 1 month, 1 week ago
fs/gfs2/log.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
[PATCH] gfs2: Fix memory leak in gfs2_trans_begin
Posted by Edward Adam Davis 1 month, 1 week ago
According to log [1], a "bad magic number" was found when checking the
metatype, which caused gfs2 withdraw.

The root cause of the problem is: log flush treats non-delayed withdraw
as withdraw, resulting in no one reclaiming the memory of transaction.
See the call stack below for details.

	CPU1					CPU2
	====					====
gfs2_meta_buffer()
gfs2_metatype_check()
gfs2_metatype_check_i()
gfs2_metatype_check_ii()		gfs2_log_flush()
gfs2_withdraw()				tr = sdp->sd_log_tr
signal_our_withdraw()			sdp->sd_log_tr = NULL
gfs2_ail_drain()			goto out_withdraw
spin_unlock(&sdp->sd_ail_lock)    	trans_drain()
					spin_lock(&sdp->sd_ail_lock)
					list_add(&tr->tr_list, &sdp->sd_ail1_list)
					tr = NULL
					goto out_end

The original text suggests adding a delayed withdraw check to handle
transaction cases to avoid similar memory leaks.

syzbot reported:
[1]
gfs2: fsid=syz:syz.0: fatal: invalid metadata block - bh = 9381 (bad magic number), function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 499

[2]
BUG: memory leak
unreferenced object 0xffff888126cf1000 (size 144):
  backtrace (crc f56b339f):
    gfs2_trans_begin+0x29/0xa0 fs/gfs2/trans.c:115
    alloc_dinode fs/gfs2/inode.c:418 [inline]
    gfs2_create_inode+0xca0/0x1890 fs/gfs2/inode.c:807


Fixes: f5456b5d67cf ("gfs2: Clean up revokes on normal withdraws")
Reported-by: syzbot+63ba84f14f62e61a5fd0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=63ba84f14f62e61a5fd0
Tested-by: syzbot+63ba84f14f62e61a5fd0@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
 fs/gfs2/log.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 115c4ac457e9..7bba7951dbdb 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -1169,11 +1169,13 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
 	 * never queued onto any of the ail lists. Here we add it to
 	 * ail1 just so that ail_drain() will find and free it.
 	 */
-	spin_lock(&sdp->sd_ail_lock);
-	if (tr && list_empty(&tr->tr_list))
-		list_add(&tr->tr_list, &sdp->sd_ail1_list);
-	spin_unlock(&sdp->sd_ail_lock);
-	tr = NULL;
+	if (gfs2_withdrawing(sdp)) {
+		spin_lock(&sdp->sd_ail_lock);
+		if (tr && list_empty(&tr->tr_list))
+			list_add(&tr->tr_list, &sdp->sd_ail1_list);
+		spin_unlock(&sdp->sd_ail_lock);
+		tr = NULL;
+	}
 	goto out_end;
 }
 
-- 
2.43.0
Re: [PATCH] gfs2: Fix memory leak in gfs2_trans_begin
Posted by Andreas Gruenbacher 1 month, 1 week ago
Hello,

On Sat, Nov 8, 2025 at 10:13 AM Edward Adam Davis <eadavis@qq.com> wrote:
> According to log [1], a "bad magic number" was found when checking the
> metatype, which caused gfs2 withdraw.
>
> The root cause of the problem is: log flush treats non-delayed withdraw
> as withdraw, resulting in no one reclaiming the memory of transaction.
> See the call stack below for details.
>
>         CPU1                                    CPU2
>         ====                                    ====
> gfs2_meta_buffer()
> gfs2_metatype_check()
> gfs2_metatype_check_i()
> gfs2_metatype_check_ii()                gfs2_log_flush()
> gfs2_withdraw()                         tr = sdp->sd_log_tr
> signal_our_withdraw()                   sdp->sd_log_tr = NULL
> gfs2_ail_drain()                        goto out_withdraw
> spin_unlock(&sdp->sd_ail_lock)          trans_drain()
>                                         spin_lock(&sdp->sd_ail_lock)
>                                         list_add(&tr->tr_list, &sdp->sd_ail1_list)
>                                         tr = NULL
>                                         goto out_end
>

this bug report is against upstream commit c2c2ccfd4ba7, which
precedes the withdraw rework on gfs2's for-next branch. With those
patches, the race you are describing is no longer possible because
do_withdraw() now uses sdp->sd_log_flush_lock and the SDF_JOURNAL_LIVE
flag to synchronize with gfs2_log_flush().

I don't know why Bob chose to push the transaction onto the ail1 list
instead of freeing it in gfs2_log_flush(); that's something to clean
up. I've pushed an untested patch doing that to for-later.

Related commits:
58e08e8d83ab ("gfs2: fix trans slab error when withdraw occurs inside
log_flush")
f5456b5d67cf ("gfs2: Clean up revokes on normal withdraws")

Thanks,
Andreas

> The original text suggests adding a delayed withdraw check to handle
> transaction cases to avoid similar memory leaks.
>
> syzbot reported:
> [1]
> gfs2: fsid=syz:syz.0: fatal: invalid metadata block - bh = 9381 (bad magic number), function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 499
>
> [2]
> BUG: memory leak
> unreferenced object 0xffff888126cf1000 (size 144):
>   backtrace (crc f56b339f):
>     gfs2_trans_begin+0x29/0xa0 fs/gfs2/trans.c:115
>     alloc_dinode fs/gfs2/inode.c:418 [inline]
>     gfs2_create_inode+0xca0/0x1890 fs/gfs2/inode.c:807
>
>
> Fixes: f5456b5d67cf ("gfs2: Clean up revokes on normal withdraws")
> Reported-by: syzbot+63ba84f14f62e61a5fd0@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=63ba84f14f62e61a5fd0
> Tested-by: syzbot+63ba84f14f62e61a5fd0@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> ---
>  fs/gfs2/log.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index 115c4ac457e9..7bba7951dbdb 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -1169,11 +1169,13 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
>          * never queued onto any of the ail lists. Here we add it to
>          * ail1 just so that ail_drain() will find and free it.
>          */
> -       spin_lock(&sdp->sd_ail_lock);
> -       if (tr && list_empty(&tr->tr_list))
> -               list_add(&tr->tr_list, &sdp->sd_ail1_list);
> -       spin_unlock(&sdp->sd_ail_lock);
> -       tr = NULL;
> +       if (gfs2_withdrawing(sdp)) {
> +               spin_lock(&sdp->sd_ail_lock);
> +               if (tr && list_empty(&tr->tr_list))
> +                       list_add(&tr->tr_list, &sdp->sd_ail1_list);
> +               spin_unlock(&sdp->sd_ail_lock);
> +               tr = NULL;
> +       }
>         goto out_end;
>  }
>
> --
> 2.43.0
>