From nobody Wed Jun 17 02:57:21 2026 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB35243DA5B for ; Tue, 28 Apr 2026 14:09:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777385347; cv=none; b=IthKJsUOEmG8KbEsgTszCLwa3rmVbSl9JosXDQqOM/emzEj/mp97Gjgz9Q2or6SZfK2XgkygU3ZIUWqA3MacV8KNoaUMhANzJ+vWrrjlwSvdJ1mcrk41/Ztz7k0P2svmdQARhH2e9IceIZM1dzxOvbTgtyQ1HGoa3GmtSBbMuqM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777385347; c=relaxed/simple; bh=KGBe/gwXXssI0g1Bu4ClISnGFi2PC+zHggOckt79b48=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bV3fWAQ+bXVF901J1OC81IG97TgoJCKYB6zJrq18XKoNRKe6sroutsKmOaLGJn+4F3ZCybMA5SJPYdcH0Q+hqXyA0NOUMEm2ZOrmUFFgM7ywtrJo6e6yGfvPmPovmtUgWdoznbiWTTgdZzScM6D9BsJEzOkC4LeukWEWUBq08mA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Nz0374Sj; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Nz0374Sj" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2b240d753ceso22171285ad.3 for ; Tue, 28 Apr 2026 07:09:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777385344; x=1777990144; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uEgePd5kGimwA7FpQCQXmU75w1iF7H1t3sq7wQi6GBA=; b=Nz0374SjEAiL7moB3moFwnI6h8XRFjSgEzQE3MdlB33qVntujG+Z3Rkxr5jN3io4wh qK39zCSBRcuHD9jxmZ4Pu/7NL+uf5r+F2hKxu/fRkJMciJCh/XbMSu+WjRwNHBaVX1tO ouzuhIcv4mqftuf5MqACYSda3WJ5EsfH9J+jK8nn5YeGgtqsK7/966j/Y/PcMyNuri7Q ivTP8egx6APQk2k2uwjaO5cDaCU/7p5OMYpbftCQMmiOxrqilIwRQ2lilYNZbljBSoLl KKohNRtKc/2OWejfcTpwqtR3ebVRzRiD0AgCfNE5mIUxvtJyIDjuuKVKC9rwW5CWR/+y 5KGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777385344; x=1777990144; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=uEgePd5kGimwA7FpQCQXmU75w1iF7H1t3sq7wQi6GBA=; b=fcex+7qLoToLYgyRNu70iPBEvQOe7JmD9a3Gn6WUaIfWISwN7jWOBENIqszHaameZ5 zeuHCfsL/LlqyJL1otNj9hKXlvRfu7YxFUW5bW7wAk0WXIbkc+q9Iav5uPEex3xcqSeM GV6/U+k2vJTrF5/DB/CE5Ab+VJwVsq0qpOLSC/oVhcZUtMNWDyVI3oZgLc3Ajm2wBBPL eiwln6GIKMUJY60P8E8w9o7VqJw3tKZZ5ROqk+WUceWbmx1dd0lhU3kUgZRyz7kIkEcm Xv+g+URRjgj4O54iVJ6SvfqEDUaATaXyMGyWOajZxHedtn7F97UqN89QEOQZivk/M1tM EP0A== X-Forwarded-Encrypted: i=1; AFNElJ8nlkx5hNkwGV2OEVtwsE3n1UQMNEzUCmVzlTQtf0w8u3IcTkL++HxZBALF6X/3F3N9z/Q9VKeptZfFKJg=@vger.kernel.org X-Gm-Message-State: AOJu0YywYuhU43BHVMuFkMNeqyRk2pJo/rEH7js/IbKeJvYI7WclZxX8 cWDhBDH1+fSOUatk+Jsm9nl5Aq0WAHCbOXNQ17k/orKtsYOTWaJcJHZv X-Gm-Gg: AeBDievAHnYvQB7ak4SaT9ca4eFHT1g7lhv6r2XPV7TxQ4GRM15+YC2YyQk1KpIiSZl L/Md72BZz5+iA9UbEkdtGGpyaHybRmuGYpULTO2qOT4XLP3va/ykSXy/crTnp9USpjd0xq/er4C kAcG2b3YM4GI6LXGamNhxy7PA9s5ohLjRr4xmfywR7F3CQMHbWSOhPjOTMI8s4CDV6cuhyZA8lZ gnGokPnqdpT0CObXGwDRZyZafIzzze46w2MICe2iRo5nx9Q01t7MiZDdT9MP3zqOuwaXGR0rlPS t+d9q0HNbitrCKMOFNrm0JkZFZSdapiNGpkI70SadD89NVNNettWF9FMVrG/W5aBakLwliVKMpo qUOnLZ2oCqw79zhllJGL6ddaFvmnfOnDRQX+QRWkea6lOghCZYY1HlEzCPFP33xdJ3uD2OcmIw9 ahqxvkYrFbkIXhzCjm8XJqs1oSSEo= X-Received: by 2002:a05:6300:4047:b0:3a1:6a7c:dba5 with SMTP id adf61e73a8af0-3a398e6360emr2347405637.6.1777385343908; Tue, 28 Apr 2026 07:09:03 -0700 (PDT) Received: from ser8.. ([221.156.231.192]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c7fcade0f56sm2190526a12.21.2026.04.28.07.09.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Apr 2026 07:09:03 -0700 (PDT) From: DaeMyung Kang To: Namjae Jeon , Steve French Cc: Sergey Senozhatsky , Tom Talpey , Hyunchul Lee , Ronnie Sahlberg , linux-cifs@vger.kernel.org, linux-kernel@vger.kernel.org, DaeMyung Kang Subject: [PATCH v2 1/3] ksmbd: centralize ksmbd_conn final release to plug transport leak Date: Tue, 28 Apr 2026 23:08:54 +0900 Message-ID: <20260428140856.941847-2-charsyam@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260428140856.941847-1-charsyam@gmail.com> References: <20260428140856.941847-1-charsyam@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" ksmbd_conn_free() is one of four sites that can observe the last refcount drop of a struct ksmbd_conn. The other three fs/smb/server/connection.c ksmbd_conn_r_count_dec() fs/smb/server/oplock.c __free_opinfo() fs/smb/server/vfs_cache.c session_fd_check() end the conn with a bare kfree(), skipping ida_destroy(&conn->async_ida) and conn->transport->ops->free_transport(conn->transport). Whenever one of them is the last putter, the embedded async_ida and the entire transport struct leak -- for TCP, that is also the struct socket and the kvec iov. __free_opinfo() being a final putter is not theoretical. opinfo_put() queues the callback via call_rcu(&opinfo->rcu, free_opinfo_rcu), so ksmbd_server_terminate_conn() can deposit N opinfo releases in RCU and have ksmbd_conn_free() run in the handler thread before any of them fire. ksmbd_conn_free() then observes refcnt > 0 and short-circuits; the last RCU-delivered __free_opinfo() falls onto its bare kfree(conn) branch and the transport is lost. A/B validation in a QEMU/virtme guest, mounting //127.0.0.1/testshare: each iteration holds 8 files open via sleep processes, force-closes TCP with "ss -K sport =3D :445", kills the holders, lazy-umounts; repeated 10 times, then ksmbd shutdown and kmemleak scan. state conn_alloc conn_free tcp_free opi_rcu kmemleak ---------- ---------- --------- -------- ------- -------- pre-patch 20 20 10 160 7 with patch 20 20 20 160 0 Pre-patch conn_free=3D20 with tcp_free=3D10 directly demonstrates the bare-kfree paths skipping transport cleanup; kmemleak backtraces point into struct tcp_transport / iov. With this patch tcp_free matches conn_free at 20/20 and kmemleak is clean. Move the per-struct final release into __ksmbd_conn_release_work() and route the three bare-kfree final-put sites through a new ksmbd_conn_put(). Those sites now pair ida_destroy() and free_transport() with kfree(conn) regardless of which holder happens to release the last reference. stop_sessions() only triggers the transport shutdown and does not itself drop the last conn reference, so it is unaffected. The centralized release reaches sock_release() -> tcp_close() -> lock_sock_nested() (might_sleep) from every final putter, including __free_opinfo() invoked from an RCU softirq callback, which trips CONFIG_DEBUG_ATOMIC_SLEEP. Defer the release to a dedicated ksmbd_conn_wq workqueue so ksmbd_conn_put() is safe from any non-sleeping context. Make ksmbd_file own a strong connection reference while fp->conn is non-NULL so durable-preserve and final-close paths cannot dereference a stale connection. ksmbd_open_fd() and ksmbd_reopen_durable_fd() take the reference via ksmbd_conn_get() (the latter also reorders the fp->conn / fp->tcon assignments before __open_id() so the published fp is never observed with fp->conn =3D=3D NULL); session_fd_check() and __ksmbd_close_fd() drop it via ksmbd_conn_put(). With that invariant, session_fd_check() can take a local conn pointer once and use it across the m_op_list and lock_list iterations even though op->conn puts may otherwise drop the last reference. At module exit the workqueue is flushed and destroyed after rcu_barrier(), so any release queued by a trailing RCU callback is drained before the inode hash and module text go away. Fixes: ee426bfb9d09 ("ksmbd: add refcnt to ksmbd_conn struct") Signed-off-by: DaeMyung Kang --- fs/smb/server/connection.c | 101 +++++++++++++++++++++++++++++++------ fs/smb/server/connection.h | 6 +++ fs/smb/server/oplock.c | 7 +-- fs/smb/server/server.c | 12 +++++ fs/smb/server/vfs_cache.c | 60 ++++++++++++++++++---- 5 files changed, 156 insertions(+), 30 deletions(-) diff --git a/fs/smb/server/connection.c b/fs/smb/server/connection.c index fbbc0529743f..fc9b35d41185 100644 --- a/fs/smb/server/connection.c +++ b/fs/smb/server/connection.c @@ -79,6 +79,81 @@ static int create_proc_clients(void) { return 0; } static void delete_proc_clients(void) {} #endif =20 +static struct workqueue_struct *ksmbd_conn_wq; + +int ksmbd_conn_wq_init(void) +{ + ksmbd_conn_wq =3D alloc_workqueue("ksmbd-conn-release", + WQ_UNBOUND | WQ_MEM_RECLAIM, 0); + if (!ksmbd_conn_wq) + return -ENOMEM; + return 0; +} + +void ksmbd_conn_wq_destroy(void) +{ + if (ksmbd_conn_wq) { + destroy_workqueue(ksmbd_conn_wq); + ksmbd_conn_wq =3D NULL; + } +} + +/* + * __ksmbd_conn_release_work() - perform the final, once-per-struct cleanup + * of a ksmbd_conn whose refcount has just dropped to zero. + * + * This is the common release path used by ksmbd_conn_put() for the embedd= ed + * state that outlives the connection thread: async_ida and the attached + * transport (which owns the socket and iov for TCP). Called from a workq= ueue + * so that sleep-allowed teardown (sock_release -> tcp_close -> + * lock_sock_nested) never runs from an RCU softirq callback (free_opinfo_= rcu) + * or any other non-sleeping putter context. + */ +static void __ksmbd_conn_release_work(struct work_struct *work) +{ + struct ksmbd_conn *conn =3D + container_of(work, struct ksmbd_conn, release_work); + + ida_destroy(&conn->async_ida); + conn->transport->ops->free_transport(conn->transport); + kfree(conn); +} + +/** + * ksmbd_conn_get() - take a reference on @conn and return it. + * + * Returns @conn unchanged so callers can write + * "fp->conn =3D ksmbd_conn_get(work->conn);" in one expression. Returns = NULL + * if @conn is NULL. + */ +struct ksmbd_conn *ksmbd_conn_get(struct ksmbd_conn *conn) +{ + if (!conn) + return NULL; + + atomic_inc(&conn->refcnt); + return conn; +} + +/** + * ksmbd_conn_put() - drop a reference and, if it was the last, queue the + * release onto ksmbd_conn_wq so it runs from process context. + * + * Callable from any context including RCU softirq callbacks and non-sleep= ing + * locks; the actual release is deferred to the workqueue. ksmbd_conn_wq = is + * created in ksmbd_server_init() before any conn can be allocated and is + * destroyed in ksmbd_server_exit() after rcu_barrier(), so it is always + * non-NULL while a conn reference is held. + */ +void ksmbd_conn_put(struct ksmbd_conn *conn) +{ + if (!conn) + return; + + if (atomic_dec_and_test(&conn->refcnt)) + queue_work(ksmbd_conn_wq, &conn->release_work); +} + /** * ksmbd_conn_free() - free resources of the connection instance * @@ -93,23 +168,19 @@ void ksmbd_conn_free(struct ksmbd_conn *conn) hash_del(&conn->hlist); up_write(&conn_list_lock); =20 + /* + * request_buf / preauth_info / mechToken are only ever accessed by the + * connection handler thread that owns @conn. ksmbd_conn_free() is + * called from the transport free_transport() path when that thread is + * exiting, so it is safe to release them unconditionally even when + * ksmbd_conn_put() below is not the final putter (oplock / ksmbd_file + * holders only retain the conn pointer, not these per-thread buffers). + */ xa_destroy(&conn->sessions); kvfree(conn->request_buf); kfree(conn->preauth_info); kfree(conn->mechToken); - if (atomic_dec_and_test(&conn->refcnt)) { - /* - * async_ida is embedded in struct ksmbd_conn, so pair - * ida_destroy() with the final kfree() rather than with - * the unconditional field teardown above. This keeps - * the IDA valid for the entire lifetime of the struct, - * even while other refcount holders (oplock / vfs - * durable handles) still reference the connection. - */ - ida_destroy(&conn->async_ida); - conn->transport->ops->free_transport(conn->transport); - kfree(conn); - } + ksmbd_conn_put(conn); } =20 /** @@ -136,6 +207,7 @@ struct ksmbd_conn *ksmbd_conn_alloc(void) conn->um =3D ERR_PTR(-EOPNOTSUPP); if (IS_ERR(conn->um)) conn->um =3D NULL; + INIT_WORK(&conn->release_work, __ksmbd_conn_release_work); atomic_set(&conn->req_running, 0); atomic_set(&conn->r_count, 0); atomic_set(&conn->refcnt, 1); @@ -512,8 +584,7 @@ void ksmbd_conn_r_count_dec(struct ksmbd_conn *conn) if (!atomic_dec_return(&conn->r_count) && waitqueue_active(&conn->r_count= _q)) wake_up(&conn->r_count_q); =20 - if (atomic_dec_and_test(&conn->refcnt)) - kfree(conn); + ksmbd_conn_put(conn); } =20 int ksmbd_conn_transport_init(void) diff --git a/fs/smb/server/connection.h b/fs/smb/server/connection.h index ae21a1bd4c70..e95cb4ed0b3f 100644 --- a/fs/smb/server/connection.h +++ b/fs/smb/server/connection.h @@ -16,6 +16,7 @@ #include #include #include +#include =20 #include "smb_common.h" #include "ksmbd_work.h" @@ -119,6 +120,7 @@ struct ksmbd_conn { bool binding; atomic_t refcnt; bool is_aapl; + struct work_struct release_work; }; =20 struct ksmbd_conn_ops { @@ -163,6 +165,10 @@ void ksmbd_conn_wait_idle(struct ksmbd_conn *conn); int ksmbd_conn_wait_idle_sess_id(struct ksmbd_conn *curr_conn, u64 sess_id= ); struct ksmbd_conn *ksmbd_conn_alloc(void); void ksmbd_conn_free(struct ksmbd_conn *conn); +struct ksmbd_conn *ksmbd_conn_get(struct ksmbd_conn *conn); +void ksmbd_conn_put(struct ksmbd_conn *conn); +int ksmbd_conn_wq_init(void); +void ksmbd_conn_wq_destroy(void); bool ksmbd_conn_lookup_dialect(struct ksmbd_conn *c); int ksmbd_conn_write(struct ksmbd_work *work); int ksmbd_conn_rdma_read(struct ksmbd_conn *conn, diff --git a/fs/smb/server/oplock.c b/fs/smb/server/oplock.c index cd3f28b0e7cb..8feca02ddbf2 100644 --- a/fs/smb/server/oplock.c +++ b/fs/smb/server/oplock.c @@ -30,7 +30,6 @@ static DEFINE_RWLOCK(lease_list_lock); static struct oplock_info *alloc_opinfo(struct ksmbd_work *work, u64 id, __u16 Tid) { - struct ksmbd_conn *conn =3D work->conn; struct ksmbd_session *sess =3D work->sess; struct oplock_info *opinfo; =20 @@ -39,7 +38,7 @@ static struct oplock_info *alloc_opinfo(struct ksmbd_work= *work, return NULL; =20 opinfo->sess =3D sess; - opinfo->conn =3D conn; + opinfo->conn =3D ksmbd_conn_get(work->conn); opinfo->level =3D SMB2_OPLOCK_LEVEL_NONE; opinfo->op_state =3D OPLOCK_STATE_NONE; opinfo->pending_break =3D 0; @@ -50,7 +49,6 @@ static struct oplock_info *alloc_opinfo(struct ksmbd_work= *work, init_waitqueue_head(&opinfo->oplock_brk); atomic_set(&opinfo->refcount, 1); atomic_set(&opinfo->breaking_cnt, 0); - atomic_inc(&opinfo->conn->refcnt); =20 return opinfo; } @@ -132,8 +130,7 @@ static void __free_opinfo(struct oplock_info *opinfo) { if (opinfo->is_lease) free_lease(opinfo); - if (opinfo->conn && atomic_dec_and_test(&opinfo->conn->refcnt)) - kfree(opinfo->conn); + ksmbd_conn_put(opinfo->conn); kfree(opinfo); } =20 diff --git a/fs/smb/server/server.c b/fs/smb/server/server.c index 58ef02c423fc..5d799b2d4c62 100644 --- a/fs/smb/server/server.c +++ b/fs/smb/server/server.c @@ -596,8 +596,14 @@ static int __init ksmbd_server_init(void) if (ret) goto err_crypto_destroy; =20 + ret =3D ksmbd_conn_wq_init(); + if (ret) + goto err_workqueue_destroy; + return 0; =20 +err_workqueue_destroy: + ksmbd_workqueue_destroy(); err_crypto_destroy: ksmbd_crypto_destroy(); err_release_inode_hash: @@ -623,6 +629,12 @@ static void __exit ksmbd_server_exit(void) { ksmbd_server_shutdown(); rcu_barrier(); + /* + * ksmbd_conn_put() defers the final release onto ksmbd_conn_wq, + * so drain it after rcu_barrier() has fired any pending RCU + * callbacks that may have queued a release. + */ + ksmbd_conn_wq_destroy(); ksmbd_release_inode_hash(); } =20 diff --git a/fs/smb/server/vfs_cache.c b/fs/smb/server/vfs_cache.c index 3551f01a3fa0..4a18107937cc 100644 --- a/fs/smb/server/vfs_cache.c +++ b/fs/smb/server/vfs_cache.c @@ -475,6 +475,17 @@ static void __ksmbd_close_fd(struct ksmbd_file_table *= ft, struct ksmbd_file *fp) kfree(smb_lock); } =20 + /* + * Drop fp's strong reference on conn (taken in ksmbd_open_fd() / + * ksmbd_reopen_durable_fd()). Durable fps that reached the + * scavenger have already had fp->conn cleared by session_fd_check(), + * in which case there is nothing to drop here. + */ + if (fp->conn) { + ksmbd_conn_put(fp->conn); + fp->conn =3D NULL; + } + if (ksmbd_stream_fd(fp)) kfree(fp->stream.name); kfree(fp->owner.name); @@ -752,7 +763,14 @@ struct ksmbd_file *ksmbd_open_fd(struct ksmbd_work *wo= rk, struct file *filp) atomic_set(&fp->refcount, 1); =20 fp->filp =3D filp; - fp->conn =3D work->conn; + /* + * fp owns a strong reference on fp->conn for as long as fp->conn is + * non-NULL, so session_fd_check() and __ksmbd_close_fd() never + * dereference a dangling pointer. Paired with ksmbd_conn_put() in + * session_fd_check() (durable preserve), in __ksmbd_close_fd() + * (final close), and on the error paths below. + */ + fp->conn =3D ksmbd_conn_get(work->conn); fp->tcon =3D work->tcon; fp->volatile_id =3D KSMBD_NO_FID; fp->persistent_id =3D KSMBD_NO_FID; @@ -774,6 +792,8 @@ struct ksmbd_file *ksmbd_open_fd(struct ksmbd_work *wor= k, struct file *filp) return fp; =20 err_out: + /* fp->conn was set and refcounted before every branch here. */ + ksmbd_conn_put(fp->conn); kmem_cache_free(filp_cache, fp); return ERR_PTR(ret); } @@ -1062,25 +1082,32 @@ static bool session_fd_check(struct ksmbd_tree_conn= ect *tcon, if (!is_reconnectable(fp)) return false; =20 + if (WARN_ON_ONCE(!fp->conn)) + return false; + if (ksmbd_vfs_copy_durable_owner(fp, user)) return false; =20 + /* + * fp owns a strong reference on fp->conn (taken in ksmbd_open_fd() + * / ksmbd_reopen_durable_fd()), so conn stays valid for the whole + * body of this function regardless of any op->conn puts below. + */ conn =3D fp->conn; ci =3D fp->f_ci; down_write(&ci->m_lock); list_for_each_entry_rcu(op, &ci->m_op_list, op_entry) { if (op->conn !=3D conn) continue; - if (op->conn && atomic_dec_and_test(&op->conn->refcnt)) - kfree(op->conn); + ksmbd_conn_put(op->conn); op->conn =3D NULL; } up_write(&ci->m_lock); =20 list_for_each_entry_safe(smb_lock, tmp_lock, &fp->lock_list, flist) { - spin_lock(&fp->conn->llist_lock); + spin_lock(&conn->llist_lock); list_del_init(&smb_lock->clist); - spin_unlock(&fp->conn->llist_lock); + spin_unlock(&conn->llist_lock); } =20 fp->conn =3D NULL; @@ -1091,6 +1118,8 @@ static bool session_fd_check(struct ksmbd_tree_connec= t *tcon, fp->durable_scavenger_timeout =3D jiffies_to_msecs(jiffies) + fp->durable_timeout; =20 + /* Drop fp's own reference on conn. */ + ksmbd_conn_put(conn); return true; } =20 @@ -1178,15 +1207,27 @@ int ksmbd_reopen_durable_fd(struct ksmbd_work *work= , struct ksmbd_file *fp) =20 old_f_state =3D fp->f_state; fp->f_state =3D FP_NEW; + + /* + * Initialize fp's connection binding before publishing fp into the + * session's file table. If __open_id() is ordered first, a + * concurrent teardown that iterates the table can observe a valid + * volatile_id with fp->conn =3D=3D NULL and preserve a + * partially-initialized fp. fp owns a strong reference on the new + * conn (see ksmbd_open_fd()); undo it on __open_id() failure. + */ + fp->conn =3D ksmbd_conn_get(conn); + fp->tcon =3D work->tcon; + __open_id(&work->sess->file_table, fp, OPEN_ID_TYPE_VOLATILE_ID); if (!has_file_id(fp->volatile_id)) { + fp->conn =3D NULL; + fp->tcon =3D NULL; + ksmbd_conn_put(conn); fp->f_state =3D old_f_state; return -EBADF; } =20 - fp->conn =3D conn; - fp->tcon =3D work->tcon; - list_for_each_entry(smb_lock, &fp->lock_list, flist) { spin_lock(&conn->llist_lock); list_add_tail(&smb_lock->clist, &conn->lock_list); @@ -1198,8 +1239,7 @@ int ksmbd_reopen_durable_fd(struct ksmbd_work *work, = struct ksmbd_file *fp) list_for_each_entry_rcu(op, &ci->m_op_list, op_entry) { if (op->conn) continue; - op->conn =3D fp->conn; - atomic_inc(&op->conn->refcnt); + op->conn =3D ksmbd_conn_get(fp->conn); } up_write(&ci->m_lock); =20 --=20 2.43.0 From nobody Wed Jun 17 02:57:21 2026 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5F6A42EEDF for ; Tue, 28 Apr 2026 14:09:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777385350; cv=none; b=dgHKeCdohA8VBH6HrhDLpaMh3KmGXT79eOQPYlcUnaAQjKUumz++iQk0Ex2Ul3E8PZaVo2XGwYJvg9rrdSt9p7+ba6UPDK0f16p9Cq3eDri7sFxvnQThI+2qt8zktcQrjUUzfaP6m+cqBEqP8qeSsvSyy7kLIWOu51vsXguuJdI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777385350; c=relaxed/simple; bh=7Y75CJc2/d6aHB2cQnB6UNXLLb91D24toWSFpWWyhQc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WjRwJ3APqLWoF4ymaVGJ1Cm0cPVWH7o7xQ5dPFE04ZLWVO+cRxN0BSFZRoPy6VYnt3EqHJWUHyuua6FPB928RBHbLn8nSNdm52JxsZuyXzn4DVZbi7rFM7j2TrWttGzZXpeFSfqV4U+wr5K5nv/T1327v6+sk0QLsDyC09csII0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jFUrSbDg; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jFUrSbDg" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-826c4c6d95dso672933b3a.0 for ; Tue, 28 Apr 2026 07:09:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777385347; x=1777990147; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xlWHHqWJasISzgKRTEELiwHwHgmZ46NRjOQ3vo3IPy4=; b=jFUrSbDg/4oC2PkH4sl2GhXb7QY8/XCc0gAnSxpwC1ofvASx9wclP7c9dbcJ1w1goR zle6Uj5MGCAdyJZgOp+RUjObCE0EcJ6VHzHNhUbjw3nVnsA1UVVhiYlmVdhveF2G5bon cVVj8he/FVRD58nl+xOaGKJg/niRl7gsNpxaxH+g1nY8uxcqJ12l+118jIc/qZQC0sTG 3KBXWZOEl1HAj7akx3uXT8+lxUrfOXP9Uq7KfQ7s9Ou6z1g0n+uGjoi3FynXCTiL1+oF gw8SYTxM5a4BwCtgRPg8FPjiqPQHGYQng79zzJlWyWC+cQ3OmCOpEp7yiTi/wOkrPZEI +iHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777385347; x=1777990147; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=xlWHHqWJasISzgKRTEELiwHwHgmZ46NRjOQ3vo3IPy4=; b=lBAerj7ClNtFYs66xKzdUBG3CY7sT8PyZURiZxwtnDi7qVdjTLO37hGILafMDb8yPv +h8IToXkUydh/tgULxC4qP6ol9upVoN1cFIE5dy9802kH70dJ6fkuar/SuVe1B63bI5Q aBeFzS7y5j6aM07sHQzNSwcjBMHY9xn8I7cgeqcrp5tOodjSV07P/W+0dJp2EtOIuczm x4Mpm1kaOLZjMHdnY0GRjnpE71PNZpYYYipx/C2/1vYHhPIirynkZYwjVR6b5HqubGZR urpKEKH9UkbIRgx6T0ejuWJJHlfRFO9FvGOlmkOihTWhKDqKbL5Rc7W0PrzdKcGDFtRI 3RLw== X-Forwarded-Encrypted: i=1; AFNElJ+w87JHFeWai4815UFDDI/YJW1Zy8nR4X+75VElxf6FKFNUaFpJnu8FMckodtocvDKyWqRV9M6WBxsAbLY=@vger.kernel.org X-Gm-Message-State: AOJu0YyeWLyRFuQ8b8vF6hFrSoUNUzaqv6HPo5epfC3kijnHWVW/+LdB +BGPl4vbgE5Fv6sZ9SVnVvFXy2HOmmBCEwfx+kYqk5xROz7/UnC00yj6 X-Gm-Gg: AeBDieuuU9ZBWD5/GRZFd9hWh5s64E22wZOiP9p6ikttUW9L6Gvw9E1A4+4UMuxrP1R ifT3qEj1c07PkN1HCme+fyKpW/edEZvLb0/3u//yPbXNtICFJcC8XsmFQ1ozhksp9xIRvT/RifM 6kDOvkn06sdjdMDtb5z353QOfBhrbK8WUqvJCbKnL+K9RPNo9zVFqeoa8WWoglUeG3dLStpvR4d 9YlcVidfCDft191xRehppH9mGjPS2VK8Z5K6VAKyia2UnrDbpb5sDcg78QcGHROvxjYgI9EuUXa swqhWPuRPZCzqZJfxiVcsrrrEJHT20vcDkq3ClIUgQwYv60NaEc+W0V5rZ2WysS3DNeKOm1T4Ja WokTzAoBF94omGB5V10E4TjzqwYpRXpItZ9e7ZbTk48f75O/d+N6KjQ+7YLA7yeHVefogoQpYec ngGAEFwEZb1Cdag3xBq8KnKtolBW0= X-Received: by 2002:a05:6a21:10e:b0:3a2:c72c:745d with SMTP id adf61e73a8af0-3a398db06e1mr2710162637.4.1777385346798; Tue, 28 Apr 2026 07:09:06 -0700 (PDT) Received: from ser8.. ([221.156.231.192]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c7fcade0f56sm2190526a12.21.2026.04.28.07.09.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Apr 2026 07:09:06 -0700 (PDT) From: DaeMyung Kang To: Namjae Jeon , Steve French Cc: Sergey Senozhatsky , Tom Talpey , Hyunchul Lee , Ronnie Sahlberg , linux-cifs@vger.kernel.org, linux-kernel@vger.kernel.org, DaeMyung Kang Subject: [PATCH v2 2/3] ksmbd: harden file lifetime during session teardown Date: Tue, 28 Apr 2026 23:08:55 +0900 Message-ID: <20260428140856.941847-3-charsyam@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260428140856.941847-1-charsyam@gmail.com> References: <20260428140856.941847-1-charsyam@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" __close_file_table_ids() is the per-session teardown that closes every fp belonging to a session (or to one tree connect on that session) by walking the session's volatile-id idr. The current loop has three related problems on busy or racing workloads: * Sleeping under ft->lock. The session-teardown skip callback, session_fd_check(), already sleeps in ksmbd_vfs_copy_durable_owner() -> kstrdup(GFP_KERNEL) and down_write(&fp->f_ci->m_lock) (a rw_semaphore). Running the callback inside write_lock(&ft->lock) trips CONFIG_DEBUG_ATOMIC_SLEEP / CONFIG_PROVE_LOCKING on a durable-fd workload. * Refcount accounting blind to f_state. The unconditional atomic_dec_and_test(&fp->refcount) does not distinguish FP_INITED (idr-owned reference still intact) from FP_CLOSED (an earlier ksmbd_close_fd() already consumed the idr-owned reference while leaving fp in the idr because a holder kept refcount non-zero). When the latter races with teardown the same path over-decrements into a holder reference and ksmbd_fd_put() later UAFs that holder. * FP_NEW window. Between __open_id() publishing fp into the session idr and ksmbd_update_fstate(..., FP_INITED) committing the transition at the end of smb2_open(), an fp is in FP_NEW and an intervening teardown that takes a transient reference and unpublishes the volatile id leaves the original idr-owned reference orphaned -- the opener is unaware that fp has been unpublished, returns success to the client, and the fp leaks at refcount =3D 1. Refactor __close_file_table_ids() to take a transient reference on fp and unpublish fp from the session idr *under ft->lock* before calling skip() outside the lock. A transient ref protects lifetime but not concurrent field mutation, so the idr_remove() is what keeps __ksmbd_lookup_fd() through this session's idr from granting a new ksmbd_fp_get() reference to an fp whose fp->conn / fp->tcon / fp->volatile_id / op->conn / lock_list links are about to be rewritten by session_fd_check(). Durable reconnect is unaffected because it reaches fp through the global durable table (ksmbd_lookup_durable_fd -> global_ft). Decide n_to_drop together with any FP_INITED -> FP_CLOSED transition under ft->lock so teardown and ksmbd_close_fd() never both consume the idr-owned reference. See ksmbd_mark_fp_closed() for the per-state accounting. For the FP_NEW path to be safe, the opener has to learn that fp was unpublished: ksmbd_update_fstate() now returns -ENOENT when an FP_NEW -> FP_INITED transition finds f_state already advanced or the volatile id cleared (both committed by teardown under ft->lock); smb2_open() propagates that as STATUS_OBJECT_NAME_INVALID and drops the original reference via ksmbd_fd_put(). The list removal cannot be left for a deferred final putter because fp->volatile_id has already been cleared and __ksmbd_remove_fd() will intentionally skip both idr_remove() and list_del_init(). Move the m_fp_list unlink in __ksmbd_remove_fd() above the volatile-id check so that an FP_NEW fp that happened to be added to m_fp_list (smb2_open() adds fp->node before ksmbd_update_fstate() runs) is still cleaned up on the deferred putter path; list_del_init() on an empty node is a no-op and remains safe for fps that were never added. Add a defensive guard in session_fd_check() that refuses non-FP_INITED fps so that even if a teardown reaches an FP_NEW fp it falls into the close branch (where the n_to_drop =3D 1 accounting keeps the opener's reference alive) instead of the durable-preserve branch (which mutates fp->conn / fp->tcon). Validation on a debug kernel additionally built with CONFIG_DEBUG_LIST and CONFIG_DEBUG_OBJECTS_WORK used a same-session two-tcon workload (open/write storm on one tcon, 50 tree disconnects on the other) and reported no list-corruption, work_struct ODEBUG, sleep-in-atomic, lockdep or kmemleak reports. Reverting only the __close_file_table_ids() hunk while keeping a forced-is_reconnectable() harness produced the expected sleep-in-atomic at vfs_cache.c:1095, confirming the ft->lock-out-of-sleepable-skip discipline. KASAN-enabled direct SMB2 coverage with durable handles enabled exercised ksmbd_close_tree_conn_fds(), ksmbd_close_session_fds(), the FP_NEW failure path, tree_conn_fd_check(), and a non-zero session_fd_check() durable-preserve return. This produced no KASAN, DEBUG_LIST, ODEBUG, or WARNING reports. Fixes: f44158485826 ("cifsd: add file operations") Signed-off-by: DaeMyung Kang --- fs/smb/server/smb2pdu.c | 6 +- fs/smb/server/vfs_cache.c | 179 +++++++++++++++++++++++++++++++++----- fs/smb/server/vfs_cache.h | 4 +- 3 files changed, 164 insertions(+), 25 deletions(-) diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c index 21825a69c29a..c1240ca14253 100644 --- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -3767,8 +3767,10 @@ int smb2_open(struct ksmbd_work *work) =20 err_out2: if (!rc) { - ksmbd_update_fstate(&work->sess->file_table, fp, FP_INITED); - rc =3D ksmbd_iov_pin_rsp(work, (void *)rsp, iov_len); + rc =3D ksmbd_update_fstate(&work->sess->file_table, fp, + FP_INITED); + if (!rc) + rc =3D ksmbd_iov_pin_rsp(work, (void *)rsp, iov_len); } if (rc) { if (rc =3D=3D -EINVAL) diff --git a/fs/smb/server/vfs_cache.c b/fs/smb/server/vfs_cache.c index 4a18107937cc..dc4037ef1834 100644 --- a/fs/smb/server/vfs_cache.c +++ b/fs/smb/server/vfs_cache.c @@ -431,13 +431,13 @@ static void ksmbd_remove_durable_fd(struct ksmbd_file= *fp) =20 static void __ksmbd_remove_fd(struct ksmbd_file_table *ft, struct ksmbd_fi= le *fp) { - if (!has_file_id(fp->volatile_id)) - return; - down_write(&fp->f_ci->m_lock); list_del_init(&fp->node); up_write(&fp->f_ci->m_lock); =20 + if (!has_file_id(fp->volatile_id)) + return; + write_lock(&ft->lock); idr_remove(ft->idr, fp->volatile_id); write_unlock(&ft->lock); @@ -798,15 +798,58 @@ struct ksmbd_file *ksmbd_open_fd(struct ksmbd_work *w= ork, struct file *filp) return ERR_PTR(ret); } =20 -void ksmbd_update_fstate(struct ksmbd_file_table *ft, struct ksmbd_file *f= p, - unsigned int state) +/** + * ksmbd_update_fstate() - update an fp state under the file-table lock + * @ft: file table that publishes @fp's volatile id + * @fp: file pointer to update + * @state: new state + * + * Return: 0 on success. The FP_NEW -> FP_INITED transition is special: + * -ENOENT if teardown already unpublished @fp by advancing the state or + * clearing the volatile id. Other state updates preserve the historical + * fire-and-forget behavior. + */ +int ksmbd_update_fstate(struct ksmbd_file_table *ft, struct ksmbd_file *fp, + unsigned int state) { + int ret; + if (!fp) - return; + return -ENOENT; =20 write_lock(&ft->lock); - fp->f_state =3D state; + if (state =3D=3D FP_INITED && + (fp->f_state !=3D FP_NEW || !has_file_id(fp->volatile_id))) { + ret =3D -ENOENT; + } else { + fp->f_state =3D state; + ret =3D 0; + } write_unlock(&ft->lock); + + return ret; +} + +/* + * ksmbd_mark_fp_closed() - mark fp closed under ft->lock and return how m= any + * refs the teardown path owns. + * + * FP_INITED has a normal idr-owned reference, so teardown owns both that + * reference and the transient lookup reference. FP_NEW is still owned by= the + * in-flight opener/reopener, which will drop the original reference after + * ksmbd_update_fstate(..., FP_INITED) observes the cleared volatile id. + * FP_CLOSED on entry means an earlier ksmbd_close_fd() already consumed t= he + * idr-owned ref. + */ +static int ksmbd_mark_fp_closed(struct ksmbd_file *fp) +{ + if (fp->f_state =3D=3D FP_INITED) { + set_close_state_blocked_works(fp); + fp->f_state =3D FP_CLOSED; + return 2; + } + + return 1; } =20 static int @@ -814,7 +857,8 @@ __close_file_table_ids(struct ksmbd_session *sess, struct ksmbd_tree_connect *tcon, bool (*skip)(struct ksmbd_tree_connect *tcon, struct ksmbd_file *fp, - struct ksmbd_user *user)) + struct ksmbd_user *user), + bool skip_preserves_fp) { struct ksmbd_file_table *ft =3D &sess->file_table; struct ksmbd_file *fp; @@ -822,32 +866,120 @@ __close_file_table_ids(struct ksmbd_session *sess, int num =3D 0; =20 while (1) { + int n_to_drop; + write_lock(&ft->lock); fp =3D idr_get_next(ft->idr, &id); if (!fp) { write_unlock(&ft->lock); break; } - - if (skip(tcon, fp, sess->user) || - !atomic_dec_and_test(&fp->refcount)) { + if (!atomic_inc_not_zero(&fp->refcount)) { id++; write_unlock(&ft->lock); continue; } =20 - set_close_state_blocked_works(fp); - idr_remove(ft->idr, fp->volatile_id); - fp->volatile_id =3D KSMBD_NO_FID; - write_unlock(&ft->lock); + if (skip_preserves_fp) { + /* + * Session teardown: skip() is session_fd_check(), + * which may sleep and mutates fp->conn / fp->tcon / + * fp->volatile_id when it chooses to preserve fp + * for durable reconnect. Unpublish fp from the + * session idr here, under ft->lock, so that + * __ksmbd_lookup_fd() through this session cannot + * grant a new ksmbd_fp_get() reference to an fp + * whose fields are about to be rewritten outside + * the lock. Durable reconnect still reaches fp via + * global_ft. + */ + idr_remove(ft->idr, id); + fp->volatile_id =3D KSMBD_NO_FID; + write_unlock(&ft->lock); + + if (skip(tcon, fp, sess->user)) { + /* + * session_fd_check() has converted fp to + * durable-preserve state and cleared its + * per-conn fields. fp is already unpublished + * above; the original idr-owned ref keeps it + * alive for the durable scavenger. Drop only + * the transient ref. atomic_dec() is safe -- + * atomic_inc_not_zero() succeeded on a + * positive value and we added one more, so + * refcount cannot be zero here. + */ + atomic_dec(&fp->refcount); + id++; + continue; + } + + /* + * Keep the close-state decision under the same lock + * observed by ksmbd_update_fstate(), which is how an + * in-flight FP_NEW opener learns that teardown has + * cleared its volatile id. + */ + write_lock(&ft->lock); + n_to_drop =3D ksmbd_mark_fp_closed(fp); + write_unlock(&ft->lock); + } else { + /* + * Tree teardown: skip() is tree_conn_fd_check(), a + * cheap pointer compare that doesn't sleep and has + * no side effects, so keep the skip decision plus + * the unpublish-and-mark-closed sequence atomic + * under ft->lock. fps belonging to other tree + * connects (skip() =3D=3D true) stay fully published in + * the session idr with no lock window. + */ + if (skip(tcon, fp, sess->user)) { + atomic_dec(&fp->refcount); + write_unlock(&ft->lock); + id++; + continue; + } + idr_remove(ft->idr, id); + fp->volatile_id =3D KSMBD_NO_FID; + n_to_drop =3D ksmbd_mark_fp_closed(fp); + write_unlock(&ft->lock); + } =20 + /* + * fp->volatile_id is already cleared to prevent stale idr + * removal from a deferred final close. Remove fp from + * m_fp_list here because __ksmbd_remove_fd() will skip the + * list unlink when volatile_id is KSMBD_NO_FID. + */ down_write(&fp->f_ci->m_lock); list_del_init(&fp->node); up_write(&fp->f_ci->m_lock); =20 - __ksmbd_close_fd(ft, fp); - - num++; + /* + * Drop the references this iteration owns: + * + * n_to_drop =3D=3D 2: we observed FP_INITED and committed + * the FP_CLOSED transition ourselves, so we own the + * transient (+1) and the still-intact idr-owned ref. + * + * n_to_drop =3D=3D 1: either a prior ksmbd_close_fd() + * already consumed the idr-owned ref, or fp was still + * FP_NEW and the in-flight opener/reopener must keep + * the original reference until ksmbd_update_fstate() + * observes the cleared volatile id. + * + * If we end up as the final putter, finalize fp and + * account the open_files_count decrement via the caller's + * atomic_sub(num, ...). Otherwise the remaining user's + * ksmbd_fd_put() reaches __put_fd_final(), which does its + * own atomic_dec(&open_files_count), so we must not count + * this fp here -- doing so would double-decrement the + * connection-wide counter. + */ + if (atomic_sub_and_test(n_to_drop, &fp->refcount)) { + __ksmbd_close_fd(NULL, fp); + num++; + } id++; } =20 @@ -1082,6 +1214,9 @@ static bool session_fd_check(struct ksmbd_tree_connec= t *tcon, if (!is_reconnectable(fp)) return false; =20 + if (fp->f_state !=3D FP_INITED) + return false; + if (WARN_ON_ONCE(!fp->conn)) return false; =20 @@ -1127,7 +1262,8 @@ void ksmbd_close_tree_conn_fds(struct ksmbd_work *wor= k) { int num =3D __close_file_table_ids(work->sess, work->tcon, - tree_conn_fd_check); + tree_conn_fd_check, + false); =20 atomic_sub(num, &work->conn->stats.open_files_count); } @@ -1136,7 +1272,8 @@ void ksmbd_close_session_fds(struct ksmbd_work *work) { int num =3D __close_file_table_ids(work->sess, work->tcon, - session_fd_check); + session_fd_check, + true); =20 atomic_sub(num, &work->conn->stats.open_files_count); } @@ -1268,7 +1405,7 @@ void ksmbd_destroy_file_table(struct ksmbd_session *s= ess) if (!ft->idr) return; =20 - __close_file_table_ids(sess, NULL, session_fd_check); + __close_file_table_ids(sess, NULL, session_fd_check, true); idr_destroy(ft->idr); kfree(ft->idr); ft->idr =3D NULL; diff --git a/fs/smb/server/vfs_cache.h b/fs/smb/server/vfs_cache.h index 866f32c10d4d..e6871266a94b 100644 --- a/fs/smb/server/vfs_cache.h +++ b/fs/smb/server/vfs_cache.h @@ -172,8 +172,8 @@ int ksmbd_close_inode_fds(struct ksmbd_work *work, stru= ct inode *inode); int ksmbd_init_global_file_table(void); void ksmbd_free_global_file_table(void); void ksmbd_set_fd_limit(unsigned long limit); -void ksmbd_update_fstate(struct ksmbd_file_table *ft, struct ksmbd_file *f= p, - unsigned int state); +int ksmbd_update_fstate(struct ksmbd_file_table *ft, struct ksmbd_file *fp, + unsigned int state); bool ksmbd_vfs_compare_durable_owner(struct ksmbd_file *fp, struct ksmbd_user *user); =20 --=20 2.43.0 From nobody Wed Jun 17 02:57:21 2026 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A6EF43E9C4 for ; Tue, 28 Apr 2026 14:09:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777385354; cv=none; b=QRqYHqQo3ayCUYftnLAO+PMICf7Z71UJ149uS36stMbA6qL+mVOcUicn6u5sny//lCpv+YovfrdDQRKgohg3po81wpPvfosxqF/gDK6GM+/oAm1XWDGmYX4kXFHh2ZWxl0Sy+S3e/Nd9bErP1PK5UT/FUUaEnUGEySMM6HvtDeQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777385354; c=relaxed/simple; bh=4wj41cHrtYbOpRGT0t7PtOWHzkpMgKUVap56UoFLJIQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rjhNpbsYoIi7kC3N5j5rJSnaDtmHTrAm5KV1g0iN+WwqcrTGJRz2E+Y5arD9yBimdGh0Pq/cgxvUh5pYY4gzpEmQ79mySiypiJh8JJ5dLDNgR3jlxlXimCQMLNUcYg7uWLs6HCXejgVzoDGpJIbWVpY1AQYO5ojz25hAsfrMe4A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=A+mQ76Ep; arc=none smtp.client-ip=209.85.210.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="A+mQ76Ep" Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-8296e257735so443371b3a.1 for ; Tue, 28 Apr 2026 07:09:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777385352; x=1777990152; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tkJ08U6yZCRDKix+9y0PgVL7KcnbudlGjXHmuDGf9BQ=; b=A+mQ76EpoWEDPhZKKZtmBBKkp7SYakv6ALWIRj7PjT47LMCg8HAUXlsI0GS4lMswP1 GwthRNhxoxBJY2qo1dBdKh+QjHBp9gZ4Idp2G7lPur2S+XiUWENRogy/AiIOhI2JBcIi ERC08CX9acZ0XJ2n8yMJPhtOSI2/coKfsyJfTWJjPz84Xro5HxcIkljdaftpOni3UdcG 4omAK4542NKx/uYnj/p9Kkoll7OsRk0xEJzsLi/A/kPSATu1/kx2nu2WZ7dGhdt6CBhs 3uuwvwyCIr1lfPo217VOmi1E3NB6/kf9tMEnRcziVqPkVxH8FvRIZClhH6GhblYgjWnY oJeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777385352; x=1777990152; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=tkJ08U6yZCRDKix+9y0PgVL7KcnbudlGjXHmuDGf9BQ=; b=mIufs6qOR9k9+9wgHAa+5/PwHOnhvg0UXn2IiBG6bst1FE0EFDXJ0YZ1eRwgJqeZAo fekHknEQSzYe+89FQw/N8SWbrC6/sisb8c3ZSthR4N0sMxMKPfLQEYlB0mfUqWoPasKJ Bvpqx7Ys5leOGiBpKS4/WpL9DW9QlE8mhCg1UGLWn8I0JSuZXCM5985Dzmzz0hW3/VKY zMtkkpscbOrK53E6GK2FlBFAyfLE7QuPzbtQkmeNuzO5XCYY8tmUslHAudpx3GeJenDP hIudPj3um/586J/8+Ux2reDOwSSGDUmucqchuF2tKhA/hqGLkJ4DftRjUPPgTeOGle8p J5GQ== X-Forwarded-Encrypted: i=1; AFNElJ/OdGJi0B6JfOSigYZLDQGe/13pbO/8+quhScyEUa2Ez7e0XY+KVel+/IV1bTI5szo8hUM1Sq306oW1GY4=@vger.kernel.org X-Gm-Message-State: AOJu0YxO898Z3V8OIoyCUCn12Y9BK3ApZ1VNsCwByFkgTm54yEtnGBNt F9mtuRE0Mvsex41aOR5BmZ817FwQOsQlIqU+KNPsjiLQa7Y+lkjoHSE7 X-Gm-Gg: AeBDietlwd9LkmS3yvtvhMOwixagIWWgnd17RFSzlCzIn8HdXls/WqN48qKgfeapo25 FWfyTqbnnoU84fFAFgK4RCscA9pyjkqn6dSq5A0AG/shuxQrJxrXRb/mfiHdVvHRt5UBnm7UU1q Sx2gl41taIcxEglIGRxISfc5g4YAUwnZNMdIcfZeEetXCBdl/uJOXDXOYOmIMVI9ePer2WxulM8 C0eem1rDCe9MUMLRS+1doWaTVul+miLaAAl02iI/S0dW4Czj7xeEm5f9eJHcm3J6UdIGHPBzV8V bofsbj3fq1ueo6WutmM19sEQy1sM68azaxZpJffzaGk+xgwL3jULtZ1rNlBoTEULJmiSUYp5Fgu lkIhvFfe42B8am50kSmXUsvAAWJa0AgtcD6LaDx9Di2uTnvzq9raKOuFO7OcEYWuqFbplEry3Fm FP7xEwp8c4uQo6oAccvU5c4P8htaQ= X-Received: by 2002:a05:6a21:104:b0:3a2:d3c3:39a9 with SMTP id adf61e73a8af0-3a39889bc54mr2419585637.0.1777385351601; Tue, 28 Apr 2026 07:09:11 -0700 (PDT) Received: from ser8.. ([221.156.231.192]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c7fcade0f56sm2190526a12.21.2026.04.28.07.09.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Apr 2026 07:09:11 -0700 (PDT) From: DaeMyung Kang To: Namjae Jeon , Steve French Cc: Sergey Senozhatsky , Tom Talpey , Hyunchul Lee , Ronnie Sahlberg , linux-cifs@vger.kernel.org, linux-kernel@vger.kernel.org, DaeMyung Kang Subject: [PATCH v2 3/3] ksmbd: close durable scavenger races against m_fp_list lookups Date: Tue, 28 Apr 2026 23:08:56 +0900 Message-ID: <20260428140856.941847-4-charsyam@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260428140856.941847-1-charsyam@gmail.com> References: <20260428140856.941847-1-charsyam@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" ksmbd_durable_scavenger() has two related races against any walker that iterates f_ci->m_fp_list, including ksmbd_lookup_fd_inode() (used by ksmbd_vfs_rename) and the share-mode checks in fs/smb/server/smb_common.c. (1) fp->node list-head reuse. Durable-preserved handles can remain linked on f_ci->m_fp_list after session teardown so share-mode checks still see them while the handle is reconnectable. The scavenger collected expired handles by adding fp->node to a local scavenger_list after removing them from the global durable idr. Because fp->node is the same list_head used by m_fp_list, list_add(&fp->node, &scavenger_list) overwrites the m_fp_list links and corrupts both lists. CONFIG_DEBUG_LIST can report this on the share-mode walk path. (2) Refcount race against m_fp_list walkers. The scavenger qualifies an expired durable handle with atomic_read(&fp->refcount) > 1 and fp->conn under global_ft.lock, removes fp from global_ft, then drops global_ft.lock before unlinking fp from m_fp_list and freeing it. During that gap fp is still linked on m_fp_list with f_state =3D=3D FP_INITED. ksmbd_lookup_fd_inode() under m_lock read calls ksmbd_fp_get() (atomic_inc_not_zero on refcount that is still 1) and takes a live reference; the scavenger then unlinks and frees fp while the holder owns a reference, leading to UAF on the holder's subsequent ksmbd_fd_put() and on any field reads performed by a concurrent share-mode walker that iterates m_fp_list without taking ksmbd_fp_get() (smb_check_perm_dleases-like paths). Fix both: * Stop reusing fp->node as a scavenger-private list node. Remove one expired handle from global_ft under global_ft.lock, take an explicit transient reference, drop the lock, unlink fp->node from m_fp_list under f_ci->m_lock, then drop both the durable lifetime and transient references with atomic_sub_and_test(2, &fp->refcount). If the scavenger is the last putter the close runs there; otherwise an in-flight holder that already raced through the m_fp_list lookup owns the final close via its ksmbd_fd_put() path. The one-at-a-time disposal can rescan the durable idr when multiple handles expire in the same pass, but durable scavenging is a background expiration path and the final full scan recomputes min_timeout before the next wait. * Clear fp->persistent_id inside __ksmbd_remove_durable_fd() right after idr_remove(), so a delayed final close from a holder that snatched fp does not re-issue idr_remove() on a persistent id that idr_alloc_cyclic() in ksmbd_open_durable_fd() may have already handed out to a brand-new durable handle. * Bypass the per-conn open_files_count decrement in __put_fd_final() when fp is detached from any session table (fp->conn cleared by session_fd_check() at durable preserve -- paired with the volatile_id clear at unpublish, so checking fp->conn alone is sufficient). The walker that owns the final close runs from an unrelated work->conn whose stats.open_files_count never tracked this durable fp; without this guard the holder would underflow that unrelated counter. The two races are folded into one patch because patch (1) alone cleans up the corrupted list but leaves a deterministic UAF window for m_fp_list walkers that the transient-reference and persistent_id discipline in (2) close; bisecting onto an intermediate state would land on a UAF that pre-patch chaos merely made less reproducible. Validation: * CONFIG_DEBUG_LIST coverage for the list_head reuse path. * KASAN-enabled direct SMB2 durable-handle coverage that exercised ksmbd_durable_scavenger() and non-NULL ksmbd_lookup_fd_inode() returns while durable handles expired under concurrent rename lookups, with no KASAN, UAF, list-corruption, ODEBUG, or WARNING reports. * checkpatch --strict * make -j$(nproc) M=3Dfs/smb/server Fixes: d484d621d40f ("ksmbd: add durable scavenger timer") Signed-off-by: DaeMyung Kang --- fs/smb/server/vfs_cache.c | 102 ++++++++++++++++++++++++++++---------- 1 file changed, 76 insertions(+), 26 deletions(-) diff --git a/fs/smb/server/vfs_cache.c b/fs/smb/server/vfs_cache.c index dc4037ef1834..354c4d8a1cfb 100644 --- a/fs/smb/server/vfs_cache.c +++ b/fs/smb/server/vfs_cache.c @@ -418,6 +418,14 @@ static void __ksmbd_remove_durable_fd(struct ksmbd_fil= e *fp) return; =20 idr_remove(global_ft.idr, fp->persistent_id); + /* + * Clear persistent_id so a later __ksmbd_close_fd() that runs from a + * delayed putter (e.g. when a concurrent ksmbd_lookup_fd_inode() + * walker held the final reference) does not re-issue idr_remove() on + * an id that idr_alloc_cyclic() may have already handed out to a new + * durable handle. + */ + fp->persistent_id =3D KSMBD_NO_FID; } =20 static void ksmbd_remove_durable_fd(struct ksmbd_file *fp) @@ -521,6 +529,20 @@ static struct ksmbd_file *__ksmbd_lookup_fd(struct ksm= bd_file_table *ft, =20 static void __put_fd_final(struct ksmbd_work *work, struct ksmbd_file *fp) { + /* + * Detached durable fp -- session_fd_check() cleared fp->conn at + * preserve, so this fp is no longer tracked by any conn's + * stats.open_files_count. This happens when + * ksmbd_scavenger_dispose_dh() hands the final close off to an + * m_fp_list walker (e.g. ksmbd_lookup_fd_inode()) whose work->conn + * is unrelated to the conn that originally opened the handle; close + * via the NULL-ft path so we do not underflow that unrelated + * counter. + */ + if (!fp->conn) { + __ksmbd_close_fd(NULL, fp); + return; + } __ksmbd_close_fd(&work->sess->file_table, fp); atomic_dec(&work->conn->stats.open_files_count); } @@ -1033,24 +1055,37 @@ static bool ksmbd_durable_scavenger_alive(void) return true; } =20 -static void ksmbd_scavenger_dispose_dh(struct list_head *head) +static void ksmbd_scavenger_dispose_dh(struct ksmbd_file *fp) { - while (!list_empty(head)) { - struct ksmbd_file *fp; + /* + * Durable-preserved fp can remain linked on f_ci->m_fp_list for + * share-mode checks. Unlink it before final close; fp->node is not + * available as a scavenger-private list node because re-adding it to + * another list corrupts m_fp_list. + */ + down_write(&fp->f_ci->m_lock); + list_del_init(&fp->node); + up_write(&fp->f_ci->m_lock); =20 - fp =3D list_first_entry(head, struct ksmbd_file, node); - list_del_init(&fp->node); + /* + * Drop both the durable lifetime reference and the transient reference + * taken by the scavenger under global_ft.lock. If a concurrent + * ksmbd_lookup_fd_inode() (or any other m_fp_list walker) snatched fp + * before the unlink above, that holder owns the final close via + * ksmbd_fd_put() -> __ksmbd_close_fd(). Otherwise the scavenger is + * the last putter and finalises fp here. + */ + if (atomic_sub_and_test(2, &fp->refcount)) __ksmbd_close_fd(NULL, fp); - } } =20 static int ksmbd_durable_scavenger(void *dummy) { struct ksmbd_file *fp =3D NULL; + struct ksmbd_file *expired_fp; unsigned int id; unsigned int min_timeout =3D 1; bool found_fp_timeout; - LIST_HEAD(scavenger_list); unsigned long remaining_jiffies; =20 __module_get(THIS_MODULE); @@ -1060,8 +1095,6 @@ static int ksmbd_durable_scavenger(void *dummy) if (try_to_freeze()) continue; =20 - found_fp_timeout =3D false; - remaining_jiffies =3D wait_event_timeout(dh_wq, ksmbd_durable_scavenger_alive() =3D=3D false, __msecs_to_jiffies(min_timeout)); @@ -1070,23 +1103,39 @@ static int ksmbd_durable_scavenger(void *dummy) else min_timeout =3D DURABLE_HANDLE_MAX_TIMEOUT; =20 - write_lock(&global_ft.lock); - idr_for_each_entry(global_ft.idr, fp, id) { - if (!fp->durable_timeout) - continue; + do { + expired_fp =3D NULL; + found_fp_timeout =3D false; =20 - if (atomic_read(&fp->refcount) > 1 || - fp->conn) - continue; - - found_fp_timeout =3D true; - if (fp->durable_scavenger_timeout <=3D - jiffies_to_msecs(jiffies)) { - __ksmbd_remove_durable_fd(fp); - list_add(&fp->node, &scavenger_list); - } else { + write_lock(&global_ft.lock); + idr_for_each_entry(global_ft.idr, fp, id) { unsigned long durable_timeout; =20 + if (!fp->durable_timeout) + continue; + + if (atomic_read(&fp->refcount) > 1 || + fp->conn) + continue; + + found_fp_timeout =3D true; + if (fp->durable_scavenger_timeout <=3D + jiffies_to_msecs(jiffies)) { + __ksmbd_remove_durable_fd(fp); + /* + * Take a transient reference so fp + * cannot be freed by an in-flight + * ksmbd_lookup_fd_inode() that found + * it through f_ci->m_fp_list while we + * drop global_ft.lock and reach the + * m_fp_list unlink in + * ksmbd_scavenger_dispose_dh(). + */ + atomic_inc(&fp->refcount); + expired_fp =3D fp; + break; + } + durable_timeout =3D fp->durable_scavenger_timeout - jiffies_to_msecs(jiffies); @@ -1094,10 +1143,11 @@ static int ksmbd_durable_scavenger(void *dummy) if (min_timeout > durable_timeout) min_timeout =3D durable_timeout; } - } - write_unlock(&global_ft.lock); + write_unlock(&global_ft.lock); =20 - ksmbd_scavenger_dispose_dh(&scavenger_list); + if (expired_fp) + ksmbd_scavenger_dispose_dh(expired_fp); + } while (expired_fp); =20 if (found_fp_timeout =3D=3D false) break; --=20 2.43.0