From nobody Sun Jun 14 04:08:54 2026 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F62C223DEA for ; Sun, 3 May 2026 19:53:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777838032; cv=none; b=ExIXuyDxWoQFBTOuK+YmFud3nhwP+eqCqCUbCgeTUgup4dxJXLCnBrQ5WEHgLCj/+Bi3L/9xL53bcz4k9V5jko9GY00lbwyIOQuDF2XZWiGhr/VLyrVe/OxFb5idz+LEN70HVdw05isJjrcrZHnMjUWb7h3AxbKKl5CY3hwePVw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777838032; c=relaxed/simple; bh=JfMGdIbiT2KdVHz8g0viJ1GpkDYuCJYV2rWnOwAWMT0=; h=Message-ID:Date:MIME-Version:To:Cc:From:Subject:Content-Type; b=MHqcLkM3/+y4wEcEGR+MSgNrlaBnykbwtU4QLtLq4ls42q1l9Uy5zB91MSmk+m6pWHPYznMsnFtnYM+SlgDYs0KQbucPprMjnq4IGz0VSILCd/9xmNMWI1GBH/36mdBWW9I+6LwgKHgjgUpcW0uQzCpHrmpUHc+p4BuYCcwR058= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=vastdata.com; spf=pass smtp.mailfrom=vastdata.com; dkim=pass (2048-bit key) header.d=vastdata.com header.i=@vastdata.com header.b=gLt8R0w6; arc=none smtp.client-ip=209.85.221.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=vastdata.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=vastdata.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=vastdata.com header.i=@vastdata.com header.b="gLt8R0w6" Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-43eb05b1875so1731058f8f.3 for ; Sun, 03 May 2026 12:53:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vastdata.com; s=google; t=1777838028; x=1778442828; darn=vger.kernel.org; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=nc4ouqvmyuFYLLFWfP/eIRW8O+xQJ4l+PCrNjSXuyTM=; b=gLt8R0w6xEj5VGInipc+Tkd8NYqPhw6ejM4+1gbdBOPoEJ8qDamAhiaK1mlv/KWUX/ Xo4tbC3ntqpz0UGYzGtyEDoJWREYwrZE/uyRKGAnwADWl175PrkD0BvVWzkc12fTs51h eSEIInUkHbt4gNlL5rN4HY/oSdwi7gkIN3dKnk/3iCEU1N9805vbA9kbPSAE4/cT2mVP 2uzMTCRcSday7/nm55WBzDsfNZgt0UEkLpRLu8A1zMpVLaBKM1Bexv+mYgOnaYJgrhaS S/IZyuCjHZt1Gd0tx3y6hsMXX5m+rjCYBLSq4xhB+HYc9TQpFOK6z699q58dp76crfeu yDpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777838028; x=1778442828; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:date:message-id:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=nc4ouqvmyuFYLLFWfP/eIRW8O+xQJ4l+PCrNjSXuyTM=; b=g/7kORSEhbLl5B5D4fxXYJWV0jWKUvNpS44SgqC1Rm6l9ekZ2Yq45jcfetC/KIcyx8 Z4XJG+svTFr2XLpJKQc6esx8X7RpTRLgxX8q4s0oNMfIR58AjCOcU428RJzpoNgIXVM7 Y17EOYO2rUsGPrZbt7XcsgtAfzCDcHgeZiR2lFo2NcuM9+sbfGbKYTK9joATKZlWUZ49 JTDH4zHmu7ln3a1ZMup5FrKIJ5ZCGynLOXR1UJ9DgWVyErlcd1NjDYx5VujtZ4jWyYCp 5tFoYN/mXbbk7jNFgydP8ck3a1SLwiz16ssRkrt65uAWfj6vd7JNvAzBtOJk4fMNTP51 +klQ== X-Forwarded-Encrypted: i=1; AFNElJ/czNLdqgmzd3FmrYNbhvcNwDw2BXhE4jlVI0kZqAzsYx1DczkU3mDhG1LlwzITDLBxg5vzbsJMUbAelMk=@vger.kernel.org X-Gm-Message-State: AOJu0Yw5W4flx2ZGO/IU76rfVaVnB+cRfBN7RY2cE4eAwoBs8sn5ggls TTpAtfkKwuBS7ps4ytC7P03K2R2khX2C57O5s68oR9haqi/31iegWUlD6mz+VqzyJIg/uXSVgmF JRBGD2199 X-Gm-Gg: AeBDievNN/mEAR7gqgJ3x193VMHVohO6dpJr6sHTh3Ws3WGN3IrZZuLOaSkZff0fOyK TTYobEXtRmuqjWScPvzfngDyOenNrXPp+iyi8K0cRkPIGKR9brAWj/HbwDVg9Ys+XRPfYoXTgXt SBYEHMNMHpnyChXDrKLakJonOejO/vUY+Vb7Y8pfrzD28Nzxx6zpYuSv1ud4qqHYUbpya0fHnOO f24U1zomigafSewmOSP2yjr++yX57519CBsC/1vodWcFIUJAd6NoQ42ByPNVBftsMWfNVg5TUXo PIE+rSZ5pVmth3L1TqqaOE3zd4PFXTM2L7uBjXH/0hqwwCaS8BH+2pusQQESOxla9cpa8IhFv+X 7uFVYrOVy99xzEOGAMlPSlD0CYW8g/ToBMc3miVkXdilEIXDRK/2RI/OUdXx87oG8vrz0NXrf5D PgAWekKhoxr/iaIAJZx4kQQFru4bDKomxs+3H+vrGkG4WiobPwZSlAQWN39m4XRtJt+Y3RpV19u nXZcP5ZkvlhT+HRmrq9sCOql8iQxQ== X-Received: by 2002:a05:6000:4312:b0:43d:300b:2285 with SMTP id ffacd0b85a97d-44bb32fd70cmr11638437f8f.11.1777838028438; Sun, 03 May 2026 12:53:48 -0700 (PDT) Received: from [192.168.50.79] (46-116-175-134.bb.netvision.net.il. [46.116.175.134]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a986aaad6sm18033489f8f.28.2026.05.03.12.53.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 03 May 2026 12:53:47 -0700 (PDT) Message-ID: <40e3d522-dfcf-4fc1-9c55-b5e81f1536d5@vastdata.com> Date: Sun, 3 May 2026 22:53:45 +0300 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: linux-nfs@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Chuck Lever , Jeff Layton , NeilBrown , Olga Kornievskaia , Dai Ngo , Tom Talpey , Trond Myklebust , Anna Schumaker , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Dan Aloni , roi.azarzar@vastdata.com, sagi.grimberg@vastdata.com From: Michael Nemanov Subject: [RFC PATCH] Possible use-after-free bug in mTLS connect Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The NFS-over-TLS implementation seems to have a use-after-free bug where a raw unrefcounted pointer to an rpc_clnt is stored in xs_connect() and accessed by a delayed workqueue item after the client has been freed. The issue manifests when an NFS mount uses incorrect credentials (client cert is valid but does not match the server's) during TLS setup, leading to the client being freed while a delayed work item still holds a pointer to it. The patch contains several debug traces that hopefully illustrate the probl= em and a key msleep(100) that helps (though not guarantees) reproduction. I had to use pr_debug as the bug often involves a kernel crash and logs can only be collected from vmcore-dmesg. Used with kernel v7.0-rc5. Traces from vmcore-dmesg.txt showing the lifecycle of the RPC client from creation to use-after-free: [ 38.611952] New mount #5 [ 38.630156] @ rpc_new_client: New: 00000000f636d223, parent: 000000000= 0000000 << Client created [ 38.630179] @ xs_connect: Queue connect work in 0 for clnt 00000000f63= 6d223 [ 38.630209] @ xs_tcp_tls_setup_socket: Using clnt 00000000f636d223 [ 38.630258] @ rpc_new_client: New: 0000000004f8c0fe, parent: 000000000= 0000000 [ 38.630265] @ xs_connect: Queue connect work in 0 for clnt 0000000004f= 8c0fe [ 38.752404] @ rpc_shutdown_client: 0000000004f8c0fe [ 38.752474] @ rpc_free_client: put_cred 0000000004f8c0fe [ 38.752489] @ xs_tcp_tls_setup_socket: Done with clnt 00000000f636d223= . status=3D0 [ 38.752573] @ rpc_free_client_work: xprt_put done for 0000000004f8c0fe [ 38.752955] @ xs_sock_process_cmsg: TLS alert -13, level 2 [ 38.753069] @ xs_tcp_state_change: sk_state=3D8 [ 38.857474] @ xs_reset_transport: Null transport->sock for xprt 000000= 0066fa6fda [ 38.857558] @ xs_connect: Queue connect work in 0 for clnt 00000000f63= 6d223 << Client used [ 38.857574] @ xs_tcp_tls_setup_socket: Using clnt 00000000f636d223 [ 38.857677] @ rpc_new_client: New: 00000000d1455d7f, parent: 000000000= 0000000 [ 38.857684] @ xs_connect: Queue connect work in 0 for clnt 00000000d14= 55d7f [ 38.975657] @ rpc_shutdown_client: 00000000d1455d7f [ 38.975728] @ rpc_free_client: put_cred 00000000d1455d7f [ 38.975742] @ xs_tcp_tls_setup_socket: Done with clnt 00000000f636d223= . status=3D0 << Client used [ 38.975836] @ rpc_free_client_work: xprt_put done for 00000000d1455d7f [ 38.976220] @ xs_sock_process_cmsg: TLS alert -13, level 2 [ 38.976269] @ xs_tcp_state_change: sk_state=3D8 [ 38.976303] @ xs_connect: Queue connect work in 3000 for clnt 00000000= f636d223 << Client used [ 39.065470] @ rpc_shutdown_client: 00000000f636d223 [ 39.065580] @ rpc_free_client: put_cred 00000000f636d223 << Client = being destroyed [ 42.033267] @ xs_tcp_tls_setup_socket: Using clnt 00000000f636d223 <= < Client used [ 42.033481] @ rpc_new_client: New: 00000000762dc139, parent: 000000000= 0000000 [ 42.033505] @ xs_connect: Queue connect work in 0 for clnt 00000000762= dc139 [ 42.153240] @ rpc_shutdown_client: 00000000762dc139 [ 42.153274] @ rpc_free_client: put_cred 00000000762dc139 [ 42.153283] @ xs_tcp_tls_setup_socket: Done with clnt 00000000f636d223= . status=3D0 [ 42.153297] @ xs_reset_transport: Null transport->sock for xprt 000000= 0066fa6fda [ 42.153355] @ rpc_free_client_work: xprt_put done for 00000000f636d223 [ 42.153373] @ rpc_free_client_work: xprt_put done for 00000000762dc139 [ 42.153419] @ xs_reset_transport: Null transport->sock for xprt 000000= 0093fd6749 [ 42.164197] ------------[ cut here ]------------ [ 42.165779] refcount_t: underflow; use-after-free. [ 42.166596] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x5e/= 0x90, CPU#5: swapper/5/0 And for reference, a non-crashing attempt: [ 16.756822] New mount #1 [ 17.122657] NFS: Registering the id_resolver key type [ 17.123591] Key type id_resolver registered [ 17.124329] Key type id_legacy registered [ 17.131019] @ rpc_new_client: New: 0000000074283ee9, parent: 000000000= 0000000 [ 17.131035] @ xs_connect: Queue connect work in 0 for clnt 00000000742= 83ee9 [ 17.131042] @ xs_tcp_tls_setup_socket: Using clnt 0000000074283ee9 [ 17.131108] @ rpc_new_client: New: 00000000b25a12a5, parent: 000000000= 0000000 [ 17.131114] @ xs_connect: Queue connect work in 0 for clnt 00000000b25= a12a5 [ 17.254743] @ xs_``tcp_state_change: sk_state=3D8 [ 17.262363] @ rpc_shutdown_client: 00000000b25a12a5 [ 17.262398] @ rpc_free_client: put_cred 00000000b25a12a5 [ 17.262404] @ xs_tcp_tls_setup_socket: Done with clnt 0000000074283ee9= . status=3D0 [ 17.262448] @ rpc_free_client_work: xprt_put done for 00000000b25a12a5 [ 17.473543] @ xs_reset_transport: Null transport->sock for xprt 000000= 004c2083b6 [ 17.473597] @ xs_connect: Queue connect work in 0 for clnt 00000000742= 83ee9 [ 17.473606] @ xs_tcp_tls_setup_socket: Using clnt 0000000074283ee9 [ 17.473709] @ rpc_new_client: New: 00000000c549945c, parent: 000000000= 0000000 [ 17.473717] @ xs_connect: Queue connect work in 0 for clnt 00000000c54= 9945c [ 17.591937] @ rpc_shutdown_client: 00000000c549945c [ 17.591977] @ rpc_free_client: put_cred 00000000c549945c [ 17.591984] @ xs_tcp_tls_setup_socket: Done with clnt 0000000074283ee9= . status=3D0 [ 17.592030] @ rpc_free_client_work: xprt_put done for 00000000c549945c [ 17.592534] @ xs_sock_process_cmsg: TLS alert -13, level 2 [ 17.592564] @ xs_tcp_state_change: sk_state=3D8 [ 17.592569] @ rpc_shutdown_client: 0000000074283ee9 [ 17.592608] @ rpc_free_client: put_cred 0000000074283ee9 [ 17.593105] @ rpc_free_client_work: xprt_put done for 0000000074283ee9 [ 17.593148] @ xs_reset_transport: Null transport->sock for xprt 000000= 004c2083b6 Note that on some iterations, the queued work runs successfully despite using a freed client. Not sure if this is of interest. Reproduction script: ```bash sudo echo 'file net/sunrpc/clnt.c +p' > /sys/kernel/debug/dynamic_debug/con= trol sudo echo 'file net/sunrpc/xprtsock.c +p' >> /sys/kernel/debug/dynamic_debu= g/control sudo mkdir -p /mnt/export for i in {1..1000}; do echo "Iteration $i" echo "New mount #$i" | sudo tee /dev/kmsg sudo mount -o vers=3D4.2,proto=3Dtcp,xprtsec=3Dmtls remote_addr:/op= t/export /mnt/export sleep 5 done ``` What I understand: The connecting task terminates, presumably due to the fatal TLS error as expected. The upper client is being destroyed: `rpc_shutdown_client` down to `rpc_free_client`, which *puts the creds* that would be used by queued connect work. Where I need help: I don't understand if the use-after-free is the root cause and must be fixed or a symptom of a problem elsewhere. I am also not sure what causes the bug to manifest. It *might* be related to the timing of server's FIN (sk_state=3D8 in the log) and the flags it changes. Any guidance or pointers on where to look next would be much appreciated. Thank you for reading, Michael. Signed-off-by: Michael Nemanov --- net/sunrpc/clnt.c | 4 ++++ net/sunrpc/xprtsock.c | 8 ++++++++ 2 files changed, 12 insertions(+) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index bc8ca470718b..dbc7d51f073d 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -429,6 +429,7 @@ static struct rpc_clnt * rpc_new_client(const struct rp= c_create_args *args, refcount_inc(&parent->cl_count); =20 trace_rpc_clnt_new(clnt, xprt, args); + pr_debug("@ %s: New: %p, parent: %p\n", __func__, clnt, parent); return clnt; =20 out_no_path: @@ -949,6 +950,7 @@ void rpc_shutdown_client(struct rpc_clnt *clnt) might_sleep(); =20 trace_rpc_clnt_shutdown(clnt); + pr_debug("@ %s: %p\n", __func__, clnt); =20 clnt->cl_shutdown =3D 1; while (!list_empty(&clnt->cl_tasks)) { @@ -983,6 +985,7 @@ static void rpc_free_client_work(struct work_struct *wo= rk) rpc_free_clid(clnt); rpc_clnt_remove_pipedir(clnt); xprt_put(rcu_dereference_raw(clnt->cl_xprt)); + pr_debug("@ %s: xprt_put done for %p\n", __func__, clnt); =20 kfree(clnt); rpciod_down(); @@ -1000,6 +1003,7 @@ rpc_free_client(struct rpc_clnt *clnt) clnt->cl_metrics =3D NULL; xprt_iter_destroy(&clnt->cl_xpi); put_cred(clnt->cl_cred); + pr_debug("@ %s: put_cred %p\n", __func__, clnt); =20 INIT_WORK(&clnt->cl_work, rpc_free_client_work); schedule_work(&clnt->cl_work); diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 2e1fe6013361..cc4275e0b276 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -377,6 +377,7 @@ xs_sock_process_cmsg(struct socket *sock, struct msghdr= *msg, tls_alert_recv(sock->sk, msg, &level, &description); ret =3D (level =3D=3D TLS_ALERT_LEVEL_FATAL) ? -EACCES : -EAGAIN; + pr_debug("@ %s: TLS alert %d, level %d\n", __func__, ret, level); break; default: /* discard this record type */ @@ -1297,6 +1298,7 @@ static void xs_reset_transport(struct sock_xprt *tran= sport) transport->inet =3D NULL; transport->sock =3D NULL; transport->file =3D NULL; + pr_debug("@ %s: Null transport->sock for xprt %p\n", __func__, xprt); =20 sk->sk_user_data =3D NULL; sk->sk_sndtimeo =3D 0; @@ -1589,6 +1591,7 @@ static void xs_tcp_state_change(struct sock *sk) */ if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO) xprt->reestablish_timeout =3D XS_TCP_INIT_REEST_TO; + pr_debug("@ %s: sk_state=3D%d\n", __func__, sk->sk_state); break; case TCP_LAST_ACK: set_bit(XPRT_CLOSING, &xprt->state); @@ -2688,6 +2691,7 @@ static void xs_tcp_tls_setup_socket(struct work_struc= t *work) struct sock_xprt *upper_transport =3D container_of(work, struct sock_xprt, connect_worker.work); struct rpc_clnt *upper_clnt =3D upper_transport->clnt; + pr_debug("@ %s: Using clnt %p\n", __func__, upper_clnt); struct rpc_xprt *upper_xprt =3D &upper_transport->xprt; struct rpc_create_args args =3D { .net =3D upper_xprt->xprt_net, @@ -2759,6 +2763,7 @@ static void xs_tcp_tls_setup_socket(struct work_struc= t *work) current_restore_flags(pflags, PF_MEMALLOC); upper_transport->clnt =3D NULL; xprt_unlock_connect(upper_xprt, upper_transport); + pr_debug("@ %s: Done with clnt %p. status=3D%d\n", __func__, upper_clnt, = status); return; =20 out_close: @@ -2806,6 +2811,7 @@ static void xs_connect(struct rpc_xprt *xprt, struct = rpc_task *task) dprintk("RPC: xs_connect scheduled xprt %p\n", xprt); =20 transport->clnt =3D task->tk_client; + pr_debug("@ %s: Queue connect work in %lu for clnt %p\n", __func__, delay= , transport->clnt); queue_delayed_work(xprtiod_workqueue, &transport->connect_worker, delay); @@ -2847,6 +2853,8 @@ static void xs_error_handle(struct work_struct *work) struct sock_xprt *transport =3D container_of(work, struct sock_xprt, error_worker); =20 + msleep(100); // Improves reproducibility + xs_wake_disconnect(transport); xs_wake_write(transport); xs_wake_error(transport); base-commit: c369299895a591d96745d6492d4888259b004a9e --=20 2.43.7