From nobody Sat Oct 4 09:41:59 2025 Received: from lgeamrelo11.lge.com (lgeamrelo11.lge.com [156.147.23.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B21ED30E0D1 for ; Mon, 18 Aug 2025 09:31:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=156.147.23.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755509470; cv=none; b=eMvpnbKas8ZNZNn20jJa81k91zNcQhHSZ/OA2s5SDDXvEXukqtrUMBbCFeBJTY3zQGuyS6QtlLkK2kphlxPW/T/EJWRO/NhlCDgDiMtOIcyHDli49af8FbhoLgiNt/0PheDCZPid3xxOWggLT0BvtL4bIUJOcRQC3NqIA3ry138= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755509470; c=relaxed/simple; bh=G9LZ1tgerP7BCq/4zzhNbwknKEclbuQsJ9X23FInDmY=; h=From:To:Cc:Subject:Date:Message-Id; b=F4rD3t8XgB6iXdgW1tEI3VJOPyY1csppHrF0goAmdHwX2gVzlZrUS/95Q91vSCQEMzkTdpscr5nRH9sYGukCW/YODV5Gjpz+Fr4OSMQfEnNch6ERN0+tcT7mNSiFt4VrzpBVYLildfn2NExAedNbGXWGbx3pu7g/D4RtZlM/zbM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lge.com; spf=pass smtp.mailfrom=lge.com; arc=none smtp.client-ip=156.147.23.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lge.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lge.com Received: from unknown (HELO lgeamrelo04.lge.com) (156.147.1.127) by 156.147.23.51 with ESMTP; 18 Aug 2025 18:31:05 +0900 X-Original-SENDERIP: 156.147.1.127 X-Original-MAILFROM: chanho.min@lge.com Received: from unknown (HELO localhost.localdomain) (10.178.31.96) by 156.147.1.127 with ESMTP; 18 Aug 2025 18:31:05 +0900 X-Original-SENDERIP: 10.178.31.96 X-Original-MAILFROM: chanho.min@lge.com From: Chanho Min To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, gunho.lee@lge.com, Sasha Levin , Greg Kroah-Hartman , Josef Bacik , Chanho Min Subject: [PATCH] nfs: fix UAF in direct writes Date: Mon, 18 Aug 2025 18:31:03 +0900 Message-Id: <20250818093103.39233-1-chanho.min@lge.com> X-Mailer: git-send-email 2.17.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Josef Bacik [ Upstream commit 17f46b803d4f23c66cacce81db35fef3adb8f2af ] In production we have been hitting the following warning consistently ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+0= x9c/0xe0 Workqueue: nfsiod nfs_direct_write_schedule_work [nfs] RIP: 0010:refcount_warn_saturate+0x9c/0xe0 PKRU: 55555554 Call Trace: ? __warn+0x9f/0x130 ? refcount_warn_saturate+0x9c/0xe0 ? report_bug+0xcc/0x150 ? handle_bug+0x3d/0x70 ? exc_invalid_op+0x16/0x40 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0x9c/0xe0 nfs_direct_write_schedule_work+0x237/0x250 [nfs] process_one_work+0x12f/0x4a0 worker_thread+0x14e/0x3b0 ? ZSTD_getCParams_internal+0x220/0x220 kthread+0xdc/0x120 ? __btf_name_valid+0xa0/0xa0 ret_from_fork+0x1f/0x30 This is because we're completing the nfs_direct_request twice in a row. The source of this is when we have our commit requests to submit, we process them and send them off, and then in the completion path for the commit requests we have if (nfs_commit_end(cinfo.mds)) nfs_direct_write_complete(dreq); However since we're submitting asynchronous requests we sometimes have one that completes before we submit the next one, so we end up calling complete on the nfs_direct_request twice. The only other place we use nfs_generic_commit_list() is in __nfs_commit_inode, which wraps this call in a nfs_commit_begin(); nfs_commit_end(); Which is a common pattern for this style of completion handling, one that is also repeated in the direct code with get_dreq()/put_dreq() calls around where we process events as well as in the completion paths. Fix this by using the same pattern for the commit requests. Before with my 200 node rocksdb stress running this warning would pop every 10ish minutes. With my patch the stress test has been running for several hours without popping. Signed-off-by: Josef Bacik Cc: stable@vger.kernel.org # 5.4 Signed-off-by: Trond Myklebust [ chanho : Backports v5.4.y, commit 133a48abf6ec (NFS: Fix up commit deadlo= cks) is needed to use nfs_commit_end ] Signed-off-by: Chanho Min Signed-off-by: Sasha Levin --- fs/nfs/direct.c | 11 +++++++++-- fs/nfs/write.c | 2 +- include/linux/nfs_fs.h | 1 + 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 32dc176ea1aba..982f0eeac3dfa 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -727,10 +727,17 @@ static void nfs_direct_commit_schedule(struct nfs_dir= ect_req *dreq) LIST_HEAD(mds_list); =20 nfs_init_cinfo_from_dreq(&cinfo, dreq); + nfs_commit_begin(cinfo.mds); nfs_scan_commit(dreq->inode, &mds_list, &cinfo); res =3D nfs_generic_commit_list(dreq->inode, &mds_list, 0, &cinfo); - if (res < 0) /* res =3D=3D -ENOMEM */ - nfs_direct_write_reschedule(dreq); + if (res < 0) { /* res =3D=3D -ENOMEM */ + spin_lock(&dreq->lock); + if (dreq->flags =3D=3D 0) + dreq->flags =3D NFS_ODIRECT_RESCHED_WRITES; + spin_unlock(&dreq->lock); + } + if (nfs_commit_end(cinfo.mds)) + nfs_direct_write_complete(dreq); } =20 static void nfs_direct_write_schedule_work(struct work_struct *work) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index c9895316fc070..f3c672b11c4fc 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1653,7 +1653,7 @@ static int wait_on_commit(struct nfs_mds_commit_info = *cinfo) !atomic_read(&cinfo->rpcs_out)); } =20 -static void nfs_commit_begin(struct nfs_mds_commit_info *cinfo) +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo) { atomic_inc(&cinfo->rpcs_out); } diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index a96b116cc9224..b2b441f3572be 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -549,6 +549,7 @@ extern int nfs_wb_page_cancel(struct inode *inode, stru= ct page* page); extern int nfs_commit_inode(struct inode *, int); extern struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail); extern void nfs_commit_free(struct nfs_commit_data *data); +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo); bool nfs_commit_end(struct nfs_mds_commit_info *cinfo); =20 static inline int