ksmbd: fix connection and durable handle teardown races

[PATCH v2 0/3] ksmbd: fix connection and durable handle teardown races

Posted by DaeMyung Kang 1 month, 2 weeks ago

This series fixes lifetime bugs around ksmbd connection shutdown,
session file-table teardown, and durable handle scavenging.

Patch 1 centralizes the final struct ksmbd_conn release so every last
putter runs ida_destroy() and transport cleanup.  The release is queued
to a dedicated workqueue because transport teardown can sleep, while
one known last-putter is an RCU callback.

Patch 2 hardens __close_file_table_ids() by taking a transient
ksmbd_file reference, unpublishing from the session idr under ft->lock,
and doing sleepable preserve/close work outside ft->lock.  It also
makes the FP_NEW window visible to the opener through
ksmbd_update_fstate().

Patch 2 is scoped to file-table teardown and the FP_NEW publication
window.  It does not try to fix durable reconnect rollback on later
smb2_open() error paths (and the related post-FP_INITED reference
window in fresh smb2_open).  Both already exist before this series
and need either an explicit unpublish-on-error step or an extra
session-owned reference.  That is left as follow-up work.  The
FP_NEW -> FP_INITED failure reuses smb2_open()'s existing -ENOENT to
STATUS_OBJECT_NAME_INVALID mapping to avoid changing wire behavior in
this lifetime fix.

Patch 3 closes two related races in the durable scavenger against any
walker that iterates f_ci->m_fp_list (ksmbd_lookup_fd_inode() and the
share-mode checks).  The scavenger no longer reuses fp->node as a
scavenger-private collect-list node, takes an explicit transient
reference under global_ft.lock, and drops both the durable lifetime
and transient refs with atomic_sub_and_test(2, ...) after the
m_fp_list unlink so an in-flight m_fp_list walker that snatched fp
owns the final close cleanly.  fp->persistent_id is cleared inside
__ksmbd_remove_durable_fd() so a delayed final close cannot re-issue
idr_remove() on a slot that idr_alloc_cyclic() may have already
re-handed to a new durable handle.  __put_fd_final() bypasses the
per-conn open_files_count decrement when fp is detached from any
session table (fp->conn cleared by session_fd_check() at durable
preserve, paired with the volatile_id clear at unpublish), since the
walker that owns the final close in that case runs from an unrelated
work->conn whose counter never tracked this durable fp.

The series is intentionally scoped to lifetime/race fixes and the
walker-final-putter regression that the durable scavenger handoff in
patch 3 newly exposes.  Pre-existing reconnect rollback and per-conn
open_files_count accounting gaps are left as follow-up work so this
series does not have to claim a full durable-reconnect or accounting
cleanup.

Validation:
  * abrupt-disconnect kmemleak/kprobe A/B for connection release
  * same-session two-tcon DEBUG_LIST/DEBUG_OBJECTS_WORK stress
  * forced durable-preserve sleep-path harness for session teardown
  * KASAN-enabled direct SMB2 coverage for session/tree teardown and
    durable-preserve paths
  * KASAN-enabled direct SMB2 coverage for durable scavenger expiry
    racing with m_fp_list lookups
  * checkpatch --strict for all patches
  * make -j$(nproc) M=fs/smb/server

v1 -> v2:
  * Split the original change into bisectable patches: connection final
    release, session file-table teardown, and durable scavenger races.
  * Keep sleepable session preserve/close work out of ft->lock and make
    the FP_NEW publication race visible through a cleared volatile id.
  * Document that durable reconnect rollback on later smb2_open()
    error paths is a pre-existing follow-up item, and keep the
    existing -ENOENT to STATUS_OBJECT_NAME_INVALID wire mapping for
    this lifetime fix.
  * Avoid reusing fp->node as a temporary durable scavenger list node
    and take a transient reference in the durable scavenger so
    concurrent ksmbd_lookup_fd_inode() walkers cannot UAF on freed fp.
    These two fixes are folded into a single patch because the
    list-head-reuse fix alone leaves a deterministic UAF window for
    m_fp_list walkers; bisecting onto an intermediate state would land
    on a use-after-free that pre-patch chaos merely made less
    reproducible.
  * Clear fp->persistent_id in __ksmbd_remove_durable_fd() so a holder
    that owns the final close after a scavenger removal does not
    re-issue idr_remove() on a persistent id that may have already been
    handed out to a new durable handle.
  * Bypass the per-conn open_files_count decrement in __put_fd_final()
    when fp is detached from any session table, so an m_fp_list walker
    that owns the final close of a scavenged durable fp does not
    underflow an unrelated conn's stats counter.
  * Document the ksmbd_conn_wq lifetime invariant in ksmbd_conn_put()
    instead of guarding with WARN_ON_ONCE, so a violation surfaces as
    a NULL deref rather than a silent leak of the final release.

DaeMyung Kang (3):
  ksmbd: centralize ksmbd_conn final release to plug transport leak
  ksmbd: harden file lifetime during session teardown
  ksmbd: close durable scavenger races against m_fp_list lookups

 fs/smb/server/connection.c | 101 +++++++++--
 fs/smb/server/connection.h |   6 +
 fs/smb/server/oplock.c     |   7 +-
 fs/smb/server/server.c     |  12 ++
 fs/smb/server/smb2pdu.c    |   6 +-
 fs/smb/server/vfs_cache.c  | 341 ++++++++++++++++++++++++++++++-------
 fs/smb/server/vfs_cache.h  |   4 +-
 7 files changed, 396 insertions(+), 81 deletions(-)

-- 
2.43.0

Re: [PATCH v2 0/3] ksmbd: fix connection and durable handle teardown races

Posted by Namjae Jeon 1 month, 2 weeks ago

On Tue, Apr 28, 2026 at 11:09 PM DaeMyung Kang <charsyam@gmail.com> wrote:
>
> This series fixes lifetime bugs around ksmbd connection shutdown,
> session file-table teardown, and durable handle scavenging.
>
> Patch 1 centralizes the final struct ksmbd_conn release so every last
> putter runs ida_destroy() and transport cleanup.  The release is queued
> to a dedicated workqueue because transport teardown can sleep, while
> one known last-putter is an RCU callback.
>
> Patch 2 hardens __close_file_table_ids() by taking a transient
> ksmbd_file reference, unpublishing from the session idr under ft->lock,
> and doing sleepable preserve/close work outside ft->lock.  It also
> makes the FP_NEW window visible to the opener through
> ksmbd_update_fstate().
>
> Patch 2 is scoped to file-table teardown and the FP_NEW publication
> window.  It does not try to fix durable reconnect rollback on later
> smb2_open() error paths (and the related post-FP_INITED reference
> window in fresh smb2_open).  Both already exist before this series
> and need either an explicit unpublish-on-error step or an extra
> session-owned reference.  That is left as follow-up work.  The
> FP_NEW -> FP_INITED failure reuses smb2_open()'s existing -ENOENT to
> STATUS_OBJECT_NAME_INVALID mapping to avoid changing wire behavior in
> this lifetime fix.
>
> Patch 3 closes two related races in the durable scavenger against any
> walker that iterates f_ci->m_fp_list (ksmbd_lookup_fd_inode() and the
> share-mode checks).  The scavenger no longer reuses fp->node as a
> scavenger-private collect-list node, takes an explicit transient
> reference under global_ft.lock, and drops both the durable lifetime
> and transient refs with atomic_sub_and_test(2, ...) after the
> m_fp_list unlink so an in-flight m_fp_list walker that snatched fp
> owns the final close cleanly.  fp->persistent_id is cleared inside
> __ksmbd_remove_durable_fd() so a delayed final close cannot re-issue
> idr_remove() on a slot that idr_alloc_cyclic() may have already
> re-handed to a new durable handle.  __put_fd_final() bypasses the
> per-conn open_files_count decrement when fp is detached from any
> session table (fp->conn cleared by session_fd_check() at durable
> preserve, paired with the volatile_id clear at unpublish), since the
> walker that owns the final close in that case runs from an unrelated
> work->conn whose counter never tracked this durable fp.
>
> The series is intentionally scoped to lifetime/race fixes and the
> walker-final-putter regression that the durable scavenger handoff in
> patch 3 newly exposes.  Pre-existing reconnect rollback and per-conn
> open_files_count accounting gaps are left as follow-up work so this
> series does not have to claim a full durable-reconnect or accounting
> cleanup.
>
> Validation:
>   * abrupt-disconnect kmemleak/kprobe A/B for connection release
>   * same-session two-tcon DEBUG_LIST/DEBUG_OBJECTS_WORK stress
>   * forced durable-preserve sleep-path harness for session teardown
>   * KASAN-enabled direct SMB2 coverage for session/tree teardown and
>     durable-preserve paths
>   * KASAN-enabled direct SMB2 coverage for durable scavenger expiry
>     racing with m_fp_list lookups
>   * checkpatch --strict for all patches
>   * make -j$(nproc) M=fs/smb/server
>
> v1 -> v2:
>   * Split the original change into bisectable patches: connection final
>     release, session file-table teardown, and durable scavenger races.
>   * Keep sleepable session preserve/close work out of ft->lock and make
>     the FP_NEW publication race visible through a cleared volatile id.
>   * Document that durable reconnect rollback on later smb2_open()
>     error paths is a pre-existing follow-up item, and keep the
>     existing -ENOENT to STATUS_OBJECT_NAME_INVALID wire mapping for
>     this lifetime fix.
>   * Avoid reusing fp->node as a temporary durable scavenger list node
>     and take a transient reference in the durable scavenger so
>     concurrent ksmbd_lookup_fd_inode() walkers cannot UAF on freed fp.
>     These two fixes are folded into a single patch because the
>     list-head-reuse fix alone leaves a deterministic UAF window for
>     m_fp_list walkers; bisecting onto an intermediate state would land
>     on a use-after-free that pre-patch chaos merely made less
>     reproducible.
>   * Clear fp->persistent_id in __ksmbd_remove_durable_fd() so a holder
>     that owns the final close after a scavenger removal does not
>     re-issue idr_remove() on a persistent id that may have already been
>     handed out to a new durable handle.
>   * Bypass the per-conn open_files_count decrement in __put_fd_final()
>     when fp is detached from any session table, so an m_fp_list walker
>     that owns the final close of a scavenged durable fp does not
>     underflow an unrelated conn's stats counter.
>   * Document the ksmbd_conn_wq lifetime invariant in ksmbd_conn_put()
>     instead of guarding with WARN_ON_ONCE, so a violation surfaces as
>     a NULL deref rather than a silent leak of the final release.
>
> DaeMyung Kang (3):
>   ksmbd: centralize ksmbd_conn final release to plug transport leak
>   ksmbd: harden file lifetime during session teardown
>   ksmbd: close durable scavenger races against m_fp_list lookups
Applied them to #ksmbd-for-next-next.
Thanks!