[PATCH 0/2] smb: client: fix unprivileged-local UAF in cifs_swn_notify

Michael Bommarito posted 2 patches 1 week ago
fs/smb/client/cifs_swn.c | 64 ++++++++++++++++++++++++++++++++--------
fs/smb/client/netlink.c  |  1 +
fs/smb/client/trace.h    |  2 ++
3 files changed, 54 insertions(+), 13 deletions(-)
[PATCH 0/2] smb: client: fix unprivileged-local UAF in cifs_swn_notify
Posted by Michael Bommarito 1 week ago
This series fixes an unprivileged-local use-after-free in the cifs
witness notify path (fs/smb/client/cifs_swn.c +
fs/smb/client/netlink.c).  On any kernel built with
CONFIG_CIFS_SWN_UPCALL=y that has an active witness mount, a local
process of any uid can race cifs_swn_notify() against the reconnect
or umount path and trigger a slab-use-after-free on struct
cifs_tcon.  Patch 1 is the lifetime fix; patch 2 separately closes
the unprivileged-reach surface, because CIFS_GENL_CMD_SWN_NOTIFY
currently admits any uid.  Applies to v7.1-rc2 mainline.

Impact
======

Patch 1 fixes a use-after-free.  cifs_swn_notify() does an
idr_find() under cifs_swnreg_idr_mutex, drops the mutex, and then
dereferences both the returned cifs_swn_reg and swnreg->tcon.
Neither object is pinned across the mutex drop.  A concurrent
cifs_put_tcon() reaching tc_count == 0 calls cifs_swn_unregister()
which frees the swnreg, and tconInfoFree() which frees the tcon
itself.  The race fires both on unmount and from the
smb2_reconnect_server worker on any reconnect cycle.

KASAN under a SMP=4 KVM build with CONFIG_KASAN=y and
kasan.fault=panic_on_write reports a slab-use-after-free in
cifs_swn_notify -> cifs_swn_resource_state_changed (or
cifs_swn_client_move -> cifs_swn_reconnect) on the freed cifs_tcon
under a 400-iteration mount-with-witness/umount loop plus a
4-thread NETLINK_GENERIC notify spammer.  First splat appears in
40-51 seconds on natural timing without any artificial widener.
The freed object is reported as struct cifs_tcon (kmalloc-4k slab),
freed via cifs_put_tcon() from the smb2_reconnect_server worker.

Patch 2 closes the unprivileged surface itself.  The
CIFS_GENL_CMD_SWN_NOTIFY operation in cifs_genl_ops[] currently has
no .flags set, so generic netlink admits it from any uid.  The
intended sender is the cifs.witness userspace helper, which runs
as root via its systemd unit, so requiring CAP_NET_ADMIN in the
initial user namespace does not break any in-tree consumer.

The two patches are orthogonal and both unconditionally beneficial.
Patch 1 fixes the lifetime defect even for an attacker with
CAP_NET_ADMIN; patch 2 prevents reaching the lifetime defect at
all from unprivileged userspace, even before patch 1 lands.

Reach surface
=============

Witness registrations are created when a CIFS client mounts an SMB3
share with the "witness" option against a server advertising
SMB2_SHARE_CAP_CLUSTER on tree-connect (clustered Samba+CTDB,
clustered ONTAP, Windows Scale-Out File Server, Azure SOFS, etc.).
Trigger on the bug is then: any local process able to send a
NETLINK_GENERIC message, raced against the reconnect or umount
path.  Kernels without an active witness mount are not affected.

Validation
==========

On stock the same workload reproduces the splat in 40-51 s of
natural timing from an unprivileged sender (uid 65534).  The
patched kernel runs that workload clean across multiple
multi-minute campaigns:

  - 4 unprivileged-sender instances (200/50/500/100 iter x
    50/500/10/100 ms hold), ~185 s aggregate KASAN-clean.  These
    show patch 2 rejects the send at the genl layer before
    cifs_swn_notify() runs.
  - 4 root-sender instances (1000 iter x 20 ms hold each, the
    densest config tested), ~470 s aggregate KASAN-clean.  These
    deliberately bypass patch 2 and drive cifs_swn_notify()
    directly to validate that patch 1 closes the underlying
    lifetime defect on its own, independent of the genl gate.

A behavioural trace of CIFS_SWN_NOTIFICATION_CLIENT_MOVE confirms
that the cifs.witness daemon still observes the documented wire
sequence after patch 1:

  cifs_swn_client_move -> cifs_swn_reconnect ->
    cifs_swn_unregister -> cifs_swn_send_unregister_message
                           (for old IP, +6.009813s) ->
    cifs_swn_register -> cifs_swn_send_register_message
                         (for new IP, +6.011462s)

Unregister precedes register, matching stock behaviour.  A
narrower fix that also pinned the swnreg across the handler
(rather than only the tcon) would have deferred
cifs_swn_reg_release() past the end of cifs_swn_notify(), and
the daemon would have observed the new-IP register before the
old-IP unregister.  Pinning only the tcon (which is all the
post-mutex code actually dereferences) avoids that ordering trap.

A separate single-process probe confirms patch 2 takes effect:
an unprivileged sendto(NETLINK_GENERIC) of CIFS_GENL_CMD_SWN_NOTIFY
receives -EPERM on the patched kernel and -EINVAL (the cifs handler
ran, no matching registration) on stock.

A reproducer harness (race driver + initramfs glue + QEMU runner)
is available on request.

Patch 1 has a Fixes: tag pointing at fed979a7e082 ("cifs: Set
witness notification handler for messages from userspace daemon");
patch 2 references the same commit since that is where the genl_op
entry was added.

Michael Bommarito (2):
  smb: client: pin tcon across cifs_swn_notify() mutex drop
  smb: client: require GENL_ADMIN_PERM on CIFS_GENL_CMD_SWN_NOTIFY

 fs/smb/client/cifs_swn.c | 64 ++++++++++++++++++++++++++++++++--------
 fs/smb/client/netlink.c  |  1 +
 fs/smb/client/trace.h    |  2 ++
 3 files changed, 54 insertions(+), 13 deletions(-)

--
2.53.0