From nobody Thu Jun 18 21:27:35 2026 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08E7A26738C for ; Tue, 16 Jun 2026 06:49:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.195 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781592552; cv=none; b=Usctjb4xn4iiyll2tnfYmzloMmiT1XZ9u1Coir17ghFWXYnAA6FKsVGR1W+QdT0FeTXeLUNS+yggtCalfk57vD/XHnatHfOooipPiaUzj8TA5rKATDHt7DzcdRDibdy/ELlU82FdKb5ukYh7jCGfh8RnsQok5okynNlAW7hj1sM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781592552; c=relaxed/simple; bh=tVKGb4jbNFwZ3i4+wv0Slwocci9w0OW4zPpTvoYEToc=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=Diu9kMw4/qIu1A0Z1Ep7psfI2ooMfU0rkk8QqKYT3coHtvnrWBNT5n3pXjo9UmCMIxEXJfjb4gwnygiDv4Oq8cJVkY2GFHmA7BpCcSFnyzjYHSSsLDPZiELdgr3tkWPvsBhqhO9bDCsOZh3CknjHODSLnOsLR7pxKRS43LRvjJI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NXMRRbO7; arc=none smtp.client-ip=209.85.210.195 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NXMRRbO7" Received: by mail-pf1-f195.google.com with SMTP id d2e1a72fcca58-8434840cea8so2514659b3a.3 for ; Mon, 15 Jun 2026 23:49:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781592550; x=1782197350; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=GCVxv7Lr69nULKT5yqjAycdaU8fM+vHXY3/h+9NHSqs=; b=NXMRRbO7iuajHScjV3BRA+2Obg7Bo846bq0yPMdIJHYJqPZhlBIyI1a2YHbW9HcvQF s6C9LgoUQupF2i5l3UduA/FxuEMg0SnLVYkmbIzzEPRQ8zowbIXVxQ7L4H9zorJRWMvd kE2X2wwJDhgHJfOKyAbXwEGAAnhx6xBj4/ugcez8J4H4/wAq88mXux2ScPQ2XikYYl9Z q69vODU+U4Rcqquy9bKz1NANCjHFcKH9mYDz0VwgJO0rWPp8HdlNHkBIS2Xu1V6A/wPS ZXQZWZH1Tl56aZYCKoKjBqmR04d1XKHCsSbET62+Hlmz1QJ7YgFa9l1sToSdHv4xeTnc Xsxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781592550; x=1782197350; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=GCVxv7Lr69nULKT5yqjAycdaU8fM+vHXY3/h+9NHSqs=; b=Ap2Fb0b7npSu0QRNqkgqKaA7H8hDwqT6GjCvPODPybGCpREiLrAtoA6z7XQ8UR8zYY D7g9aKrZRaf9MiEosz7HWgZ+z8dFEsbD5qjpzS4W93HHeGkqYP/11ha27W9ihYlW9DpV 2Lx2+E6UTcwTaSS+XkQJa62avAJGMuvqnFflmWkN9EN0cLwynLzOVCki+atxqlbG+jcT lN2jJ1wVHPMdygTD2/yvbQfSqToYQfSklR+ySqey5f7wt1sDA/6PxbrNC6O+9dzBLqga RmGb14Pu3Hjim5qHM7JFqV50Xep6Yd5sf0v92/VGxqRb2vudvwOeM2RJlIKi1pRe8pqY UhqQ== X-Forwarded-Encrypted: i=1; AFNElJ/hjfDoR84F9VARwhdrK4VDOz6osEVy03QSofVYPJ5ZlmJS62UMXiiPJZ468iwEqv1J7BDtbptQJKZyqVc=@vger.kernel.org X-Gm-Message-State: AOJu0YzzsZV9YyivdE7idym/tqtlpBbPMessnTUD1KyAETLUr1bMI9aB 3qjlkUzzhUjOMluuRJnyhPi9qLy7fqf3qtPf7PYPmuhxNbo1IVtIU79y X-Gm-Gg: Acq92OGxX6x3+QBAjBcA6Rdoxz4se/KZvfgrluNc4vDLL49piwPk5zTrMUPtcbjONrg xisnJj+SF7gg0JpUaWoBcXyq0OM9XXo6PL8y9sD9uN5Zq8h/xq9lJ9nqvaIeLBrkJu58ce7AN8y fKht4jOX/Vq/Yww4Uq1YqA4Q4SgAjPbeLaYHP6advFxiHyw9Tx6gEzavb1SATUauJlinzu718AD JsJvyo3tfph+CIGx/TzgrmL+310IPf7JMg7yQenJ7sENxMo9lRo5wNleEJa2xvIiqIJG9GOZkO+ KPe09YFsUTg/3EUw4hf769IkqHcOs+2UgT8wQLYc652EEEoo9IJK+/K69qs6UbNQr1qgTj7+zys OqYGdu8WeH6VW9xOwLHjpQhriMxzEq48s9n9zyraAFmwXsodv6hcgn6Q3vkYjbQ4uqJOW6GKJW0 cFdGt5rWZePiOpZ1Ej91rEbCbNr6S4IM0nOgkLCP4jw/OJVt6sN7SEYoy5fesoYY5NpTs14Rq3P 5WuxQ== X-Received: by 2002:a05:6a00:2e10:b0:842:2cda:7aa3 with SMTP id d2e1a72fcca58-845154f4f82mr2432040b3a.28.1781592550190; Mon, 15 Jun 2026 23:49:10 -0700 (PDT) Received: from localhost ([111.228.63.84]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434ac9cfebsm11017535b3a.9.2026.06.15.23.49.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jun 2026 23:49:09 -0700 (PDT) From: Zhang Cen To: Jaegeuk Kim , Chao Yu Cc: linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org, zerocling0077@gmail.com, 2045gemini@gmail.com Subject: [PATCH v5] f2fs: protect published gc_thread during teardown Date: Tue, 16 Jun 2026 14:49:04 +0800 Message-Id: <20260616064904.3616123-1-rollkingzzc@gmail.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" f2fs_stop_gc_thread() stops the background GC task, wakes GC_MERGE foreground waiters, frees sbi->gc_thread, and then clears the published pointer. A foreground f2fs_balance_fs() caller can already have copied that pointer and queued itself on gc_th->fggc_wq, so freeing gc_th at stop time can leave finish_wait() operating on a freed waitqueue. The buggy scenario involves two paths, with each column showing the order within that path: foreground f2fs_balance_fs() caller: shutdown path: 1. observes no checkpoint error 1. sets CP_ERROR_FLAG 2. snapshots sbi->gc_thread 2. enters f2fs_stop_gc_thread() 3. queues on gc_th->fggc_wq 3. stops gc_th->f2fs_gc_task 4. wakes gc_wait_queue_head 4. wakes gc_th->fggc_wq 5. sleeps for foreground GC 5. frees gc_th 6. finish_wait() touches fggc_wq GC_MERGE does not keep independent work_struct items that shutdown can cancel. Its pending foreground GC requests are waitqueue waiters. Drain them by withdrawing the GC task pointer, stopping the task, waking gc_th->fggc_wq, and leaving each waiter to remove its own wait entry with finish_wait(). Keep the allocated GC-thread state until the superblock is destroyed and use gc_th->f2fs_gc_task as the running-state marker. The stop path now withdraws the task pointer under gc_task_lock, stops the task, and wakes foreground waiters, but leaves the waitqueue storage valid. The start path reuses a stopped gc_thread object instead of reinitializing its waitqueues, and remount restart decisions check the task pointer rather than only the object pointer. f2fs_balance_fs() also snapshots sbi->gc_thread once and rechecks both f2fs_cp_error() and f2fs_gc_task after prepare_to_wait(). If shutdown is visible or the GC task has already been withdrawn, the caller removes its wait entry without waking the GC thread or sleeping for new foreground GC work. Thus shutdown drains already queued waiters and stops accepting new foreground GC work once shutdown is visible to the caller. Task pointer users are protected separately from the retained container lifetime. A sysfs critical_task_priority store now snapshots the GC task under gc_task_lock and holds a task_struct reference while calling set_user_nice(). Boolean running-state checks that do not dereference the task_struct continue to use READ_ONCE(). One observed report was: BUG: KASAN: slab-use-after-free in finish_wait+0x276/0x290 Write of size 8 at addr ffff8881150819b8 by task dd/802 The buggy address belongs to the object at ffff888115081900 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 184 bytes inside of freed 256-byte region Call trace: finish_wait() f2fs_balance_fs() f2fs_write_single_data_page() f2fs_write_cache_pages() __f2fs_write_data_pages() do_writepages() filemap_fdatawrite_wbc() __filemap_fdatawrite_range() file_write_and_wait_range() f2fs_do_sync_file() f2fs_sync_file() do_fsync() Freed by task stack: kfree() f2fs_stop_gc_thread() f2fs_do_shutdown() f2fs_shutdown() fs_bdev_mark_dead() Fixes: 5911d2d1d1a3 ("f2fs: introduce gc_merge mount option") Assisted-by: Codex:gpt-5.5 Signed-off-by: Zhang Cen --- v5: Explain that GC_MERGE foreground requests are waitqueue waiters, not independent work items; shutdown drains them by withdrawing the GC task, waking fggc_wq, and letting waiters dequeue themselves. Recheck checkpoint error and the GC task pointer after prepare_to_wait() so shutdown-visible f2fs_balance_fs() callers do not wake or sleep for new foreground GC work. v4: Replace the v3 SRCU/refcounted lifetime model with a smaller fix that keeps the existing heap-allocated gc_thread object alive until superblock teardown. Use f2fs_gc_task as the running-state marker and withdraw it with xchg() before waking GC_MERGE waiters. Reuse a stopped gc_thread object across remount restarts so the waitqueues are not reinitialized while old waiters can still finish. Recheck f2fs_gc_task after prepare_to_wait() so a waiter that races with teardown does not sleep after the worker has been withdrawn. v3: Fix checkpatch style issues in the broader lifetime variant. v2: Sashiko.dev pointed out that GC_MERGE foreground waiters and GC-thread users needed lifetime-safe access after teardown. fs/f2fs/gc.c | 52 ++++++++++++++++++++++++++++++----------------- fs/f2fs/gc.h | 38 ++++++++++++++++++++++++++++++++++ fs/f2fs/segment.c | 19 ++++++++++------- fs/f2fs/super.c | 7 +++++-- fs/f2fs/sysfs.c | 18 +++++++++++----- 5 files changed, 101 insertions(+), 33 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index ba93010924c0..190d60b3cfc5 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -193,12 +193,24 @@ static int gc_thread_func(void *data) =20 int f2fs_start_gc_thread(struct f2fs_sb_info *sbi) { - struct f2fs_gc_kthread *gc_th; + struct f2fs_gc_kthread *gc_th =3D READ_ONCE(sbi->gc_thread); + struct task_struct *task; + bool allocated =3D false; dev_t dev =3D sbi->sb->s_bdev->bd_dev; =20 - gc_th =3D f2fs_kmalloc(sbi, sizeof(struct f2fs_gc_kthread), GFP_KERNEL); - if (!gc_th) - return -ENOMEM; + if (gc_th && READ_ONCE(gc_th->f2fs_gc_task)) + return 0; + + if (!gc_th) { + gc_th =3D f2fs_kmalloc(sbi, sizeof(*gc_th), GFP_KERNEL); + if (!gc_th) + return -ENOMEM; + gc_th->f2fs_gc_task =3D NULL; + spin_lock_init(&gc_th->gc_task_lock); + init_waitqueue_head(&gc_th->gc_wait_queue_head); + init_waitqueue_head(&gc_th->fggc_wq); + allocated =3D true; + } =20 gc_th->urgent_sleep_time =3D DEF_GC_THREAD_URGENT_SLEEP_TIME; gc_th->valid_thresh_ratio =3D DEF_GC_THREAD_VALID_THRESH_RATIO; @@ -221,34 +233,36 @@ int f2fs_start_gc_thread(struct f2fs_sb_info *sbi) =20 gc_th->gc_wake =3D false; =20 - sbi->gc_thread =3D gc_th; - init_waitqueue_head(&sbi->gc_thread->gc_wait_queue_head); - init_waitqueue_head(&sbi->gc_thread->fggc_wq); - sbi->gc_thread->f2fs_gc_task =3D kthread_run(gc_thread_func, sbi, - "f2fs_gc-%u:%u", MAJOR(dev), MINOR(dev)); - if (IS_ERR(gc_th->f2fs_gc_task)) { - int err =3D PTR_ERR(gc_th->f2fs_gc_task); + task =3D kthread_create(gc_thread_func, sbi, "f2fs_gc-%u:%u", + MAJOR(dev), MINOR(dev)); + if (IS_ERR(task)) { + int err =3D PTR_ERR(task); =20 - kfree(gc_th); - sbi->gc_thread =3D NULL; + if (allocated) + kfree(gc_th); return err; } =20 - set_user_nice(gc_th->f2fs_gc_task, - PRIO_TO_NICE(sbi->critical_task_priority)); + set_user_nice(task, PRIO_TO_NICE(sbi->critical_task_priority)); + if (allocated) + WRITE_ONCE(sbi->gc_thread, gc_th); + f2fs_set_gc_task(gc_th, task); + wake_up_process(task); return 0; } =20 void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi) { - struct f2fs_gc_kthread *gc_th =3D sbi->gc_thread; + struct f2fs_gc_kthread *gc_th =3D READ_ONCE(sbi->gc_thread); + struct task_struct *task; =20 if (!gc_th) return; - kthread_stop(gc_th->f2fs_gc_task); + task =3D f2fs_detach_gc_task(gc_th); + if (!task) + return; + kthread_stop(task); wake_up_all(&gc_th->fggc_wq); - kfree(gc_th); - sbi->gc_thread =3D NULL; } =20 static int select_gc_type(struct f2fs_sb_info *sbi, int gc_type) diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h index 6c4d4567571e..6df406e98939 100644 --- a/fs/f2fs/gc.h +++ b/fs/f2fs/gc.h @@ -5,6 +5,9 @@ * Copyright (c) 2012 Samsung Electronics Co., Ltd. * http://www.samsung.com/ */ +#include +#include + #define GC_THREAD_MIN_WB_PAGES 1 /* * a threshold to determine * whether IO subsystem is idle @@ -47,6 +50,7 @@ =20 struct f2fs_gc_kthread { struct task_struct *f2fs_gc_task; + spinlock_t gc_task_lock; /* protects f2fs_gc_task */ wait_queue_head_t gc_wait_queue_head; =20 /* for gc sleep time */ @@ -72,6 +76,40 @@ struct f2fs_gc_kthread { unsigned int boost_gc_greedy; }; =20 +static inline struct task_struct *f2fs_get_gc_task(struct f2fs_gc_kthread = *gc_th) +{ + struct task_struct *task; + + spin_lock(&gc_th->gc_task_lock); + task =3D READ_ONCE(gc_th->f2fs_gc_task); + if (task) + get_task_struct(task); + spin_unlock(&gc_th->gc_task_lock); + + return task; +} + +static inline void f2fs_set_gc_task(struct f2fs_gc_kthread *gc_th, + struct task_struct *task) +{ + spin_lock(&gc_th->gc_task_lock); + WRITE_ONCE(gc_th->f2fs_gc_task, task); + spin_unlock(&gc_th->gc_task_lock); +} + +static inline struct task_struct * +f2fs_detach_gc_task(struct f2fs_gc_kthread *gc_th) +{ + struct task_struct *task; + + spin_lock(&gc_th->gc_task_lock); + task =3D READ_ONCE(gc_th->f2fs_gc_task); + WRITE_ONCE(gc_th->f2fs_gc_task, NULL); + spin_unlock(&gc_th->gc_task_lock); + + return task; +} + struct gc_inode_list { struct list_head ilist; struct radix_tree_root iroot; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 788f8b050249..5f61627009a5 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -424,6 +424,8 @@ int f2fs_commit_atomic_write(struct inode *inode) */ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need) { + struct f2fs_gc_kthread *gc_th; + if (f2fs_cp_error(sbi)) return; =20 @@ -444,15 +446,18 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool n= eed) if (has_enough_free_secs(sbi, 0, 0)) return; =20 - if (test_opt(sbi, GC_MERGE) && sbi->gc_thread && - sbi->gc_thread->f2fs_gc_task) { + gc_th =3D READ_ONCE(sbi->gc_thread); + if (test_opt(sbi, GC_MERGE) && gc_th && + READ_ONCE(gc_th->f2fs_gc_task)) { DEFINE_WAIT(wait); =20 - prepare_to_wait(&sbi->gc_thread->fggc_wq, &wait, - TASK_UNINTERRUPTIBLE); - wake_up(&sbi->gc_thread->gc_wait_queue_head); - io_schedule(); - finish_wait(&sbi->gc_thread->fggc_wq, &wait); + prepare_to_wait(&gc_th->fggc_wq, &wait, + TASK_UNINTERRUPTIBLE); + if (!f2fs_cp_error(sbi) && READ_ONCE(gc_th->f2fs_gc_task)) { + wake_up(&gc_th->gc_wait_queue_head); + io_schedule(); + } + finish_wait(&gc_th->fggc_wq, &wait); } else { struct f2fs_gc_control gc_control =3D { .victim_segno =3D NULL_SEGNO, diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index ccf806b676f5..d6863da05a7c 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2925,11 +2925,12 @@ static int __f2fs_remount(struct fs_context *fc, st= ruct super_block *sb) if ((flags & SB_RDONLY) || (F2FS_OPTION(sbi).bggc_mode =3D=3D BGGC_MODE_OFF && !test_opt(sbi, GC_MERGE))) { - if (sbi->gc_thread) { + if (sbi->gc_thread && READ_ONCE(sbi->gc_thread->f2fs_gc_task)) { f2fs_stop_gc_thread(sbi); need_restart_gc =3D true; } - } else if (!sbi->gc_thread) { + } else if (!sbi->gc_thread || + !READ_ONCE(sbi->gc_thread->f2fs_gc_task)) { err =3D f2fs_start_gc_thread(sbi); if (err) goto restore_opts; @@ -5451,6 +5452,7 @@ static int f2fs_fill_super(struct super_block *sb, st= ruct fs_context *fc) free_sb_buf: kfree(raw_super); free_sbi: + kfree(sbi->gc_thread); #ifdef CONFIG_DEBUG_LOCK_ALLOC lockdep_unregister_key(&sbi->cp_global_sem_key); #endif @@ -5535,6 +5537,7 @@ static void kill_f2fs_super(struct super_block *sb) /* Release block devices last, after fscrypt_destroy_keyring(). */ if (sbi) { destroy_device_list(sbi); + kfree(sbi->gc_thread); #ifdef CONFIG_DEBUG_LOCK_ALLOC lockdep_unregister_key(&sbi->cp_global_sem_key); #endif diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index 352e96ad5c3a..07543df2b5b1 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -12,6 +12,7 @@ #include #include #include +#include #include =20 #include "f2fs.h" @@ -981,17 +982,24 @@ static ssize_t __sbi_store(struct f2fs_attr *a, } =20 if (!strcmp(a->attr.name, "critical_task_priority")) { + struct f2fs_gc_kthread *gc_th; + struct task_struct *gc_task; + int nice; + if (t < NICE_TO_PRIO(MIN_NICE) || t > NICE_TO_PRIO(MAX_NICE)) return -EINVAL; if (!capable(CAP_SYS_NICE)) return -EPERM; sbi->critical_task_priority =3D t; + nice =3D PRIO_TO_NICE(sbi->critical_task_priority); if (sbi->cprc_info.f2fs_issue_ckpt) - set_user_nice(sbi->cprc_info.f2fs_issue_ckpt, - PRIO_TO_NICE(sbi->critical_task_priority)); - if (sbi->gc_thread && sbi->gc_thread->f2fs_gc_task) - set_user_nice(sbi->gc_thread->f2fs_gc_task, - PRIO_TO_NICE(sbi->critical_task_priority)); + set_user_nice(sbi->cprc_info.f2fs_issue_ckpt, nice); + gc_th =3D READ_ONCE(sbi->gc_thread); + gc_task =3D gc_th ? f2fs_get_gc_task(gc_th) : NULL; + if (gc_task) { + set_user_nice(gc_task, nice); + put_task_struct(gc_task); + } return count; } =20 --=20 2.43.0