From nobody Sun May 24 19:33:19 2026 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA096270EC1 for ; Sat, 23 May 2026 01:42:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779500531; cv=none; b=suhBaN3lsp2d4mejZavancKz9pdDjV9PIR3WDSIQy2J6S1bZTMuFACCExN3Sfnigcsz+8c2CNHInS4Vk59y38yo7fmCkeCNGl/2u1Gx4d8rmsfcndJFZPGmVDwYNvQ4mrcL/Cc32JrhFPHY+zjIHBThqqxCXw/E4rXueCNdvxjM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779500531; c=relaxed/simple; bh=o2q4kvQvXzLlPBHCT1CFmRRF08gGuBL3GfqAu8UepH8=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=cMNVpDmzFUsbVbph2/xiIaFC2ZUgJ0CpTGDF/WHhCOZdcEnudjSHzqEdG/xijKQb/2Du/qnnsXLkOZUnbUe3pQeNopSct5UY+7Ul9zTETHMyeJpQ9NR6KgmY7jcgY9Ghp5k9Gf58vLrS4Yrd6srmRbj7b/cUY0+8Yu0TToRrCjU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VYYwZH1A; arc=none smtp.client-ip=209.85.222.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VYYwZH1A" Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-9106ea78cd8so1249903885a.3 for ; Fri, 22 May 2026 18:42:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779500529; x=1780105329; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ojp1jufJnHXpy70948CVqK0ijPBwOg1yOqO14HsqH1o=; b=VYYwZH1AA8deo520tO6Y7c1OCyJIifRh3ZqoNDGJLRihyDTPmHNJ+9FtS+9H7cGyYo LjqcPrInEs/Rd84kZjJZGToyhocstZu3CtOoHeb0o7mA6SB2ORtBMSGaLGh669T0FA6M KrkdfuUVgElouCN6GNjQbWogYURzV+7WZCGKnQJR33EZ5JbzM0tjACyQmQahBBdbs6Vg Nin7VvBv5Pds1KCzGf5yTdahrJQz1ec5D0I6nxP4IvcBRtvJyrkPvobf/LbVZa6rTGEh CRsW5i0dmHvC9cCzfzS7K7DfWrGbnIQseimznOIRii8iS5MlwntCysvhWF61re2ZAD4c uuDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779500529; x=1780105329; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ojp1jufJnHXpy70948CVqK0ijPBwOg1yOqO14HsqH1o=; b=ddmHCPfhcjLsbDNMVeooiw03YY3jClzMObnwPgG685rHFaFmQ6Q6aiv8hma1rYHGlt iEntd6EtbRV7gxtKH2Q3fDw1UW41br7+vvTfGqae/z4uaDvLuyGkiHD/Ln4sqr2ezhEr fX2zrmVywuwUCHNLDAicM/eXjyBEfc3M57fhG5CVw9i2fi39FhNo+QdTDWU4OX68RJZe XDtiH5kBaWOLF736NMRLmK8kvaVu0jYE9HYyaRYcjP7nrKSiAnSKHAiYRCRPfSDdRALb AgJbKkKuFJ1lt0Gi5oUvPA9mT5B7Ih0qXuFuVsV74PMAIHMahvRiIg7bTqbnpNWYMkRe BCVg== X-Forwarded-Encrypted: i=1; AFNElJ+zect2fMgPYbhf2Rbxms6RpTKFyJuF13nIFAfoC32aw2hC7gshCtsk6x4hqVJymQ1h9999a4rs85f9nGA=@vger.kernel.org X-Gm-Message-State: AOJu0YxVgiouYx4ElOT5r20v5Ir7qwNAR0TbNIPOqoOBjwx0zUb1Ybqy 4DmzclnbIef337HRqtIDzEbs9amCcghVNmGSSB8w8yxEh+J0WLWAQsr5 X-Gm-Gg: Acq92OEWXJw6Yji7N7LVzHzyeh7pdj5wJ0r+DQ6VU7CPJo0+LIk/TIrLFKqcFUoOPWC HHHiJkDiWhFq36CXP8qxLsW4wO8EcSbQ9zWMsPuWmaQ+RVfxy7qt+xLvm79MDkawfWzbYW2XVsL rjTUbpR2q3197bssGZIBhpuB0VTwf2n1tzBmcESFTdNvYjGmTZhz6J8o7Y9YqHH7L1U6MbLUgfu OMX2jSvNYtax/nTUC7vOVToFvcuEeINjPzcW0jg+4g7YnvdJn9A+rBw4L0zz00I/eRUY505kEom owHKaZRaTRdmG/5Wie2bIh9yN3cLwH6j9TIg9wuv9sQsid9LERQTS+a4FS5ZkEvs/VzQUU4OdfD 5l4JUdEfrQh7/dD0Wt7KBu7TAImYfmuAqzWZz2poLAN1CA0jVefuLbu2GU789qF3uuj2bMkxKgf Ct7PFjHPpxv9COAFCKrhL5F3783WRv6rfXXeki+hmY+3iDt89kf6F6/aTRe2ZQzXstxZ2/8ip5M OnBj9HEnIcr1Aruger0 X-Received: by 2002:a05:620a:649b:b0:914:c53f:4d51 with SMTP id af79cd13be357-914c53f5006mr189756285a.53.1779500528669; Fri, 22 May 2026 18:42:08 -0700 (PDT) Received: from server0 (c-68-48-65-54.hsd1.mi.comcast.net. [68.48.65.54]) by smtp.gmail.com with ESMTPSA id af79cd13be357-914bb8cd286sm283616985a.3.2026.05.22.18.42.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 18:42:08 -0700 (PDT) From: Michael Bommarito To: Trond Myklebust , Anna Schumaker , Chuck Lever , Jeff Layton Cc: NeilBrown , Olga Kornievskaia , Dai Ngo , Tom Talpey , linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH] lockd: pin next file across nlm_inspect_file lock-drop Date: Fri, 22 May 2026 21:42:03 -0400 Message-ID: <20260523014203.2462827-1-michael.bommarito@gmail.com> X-Mailer: git-send-email 2.53.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" nlm_traverse_files() walks each nlm_files[] hash bucket with hlist_for_each_entry_safe(file, next, ...). For each matching file it bumps f_count, drops nlm_file_mutex to run nlm_inspect_file() (which may sleep walking blocks, shares, and the inode lock list), then reacquires the mutex and decrements f_count before continuing to the saved next. The f_count bump pins the current file across the lock-drop, but nothing pins next. Any nlmsvc thread that holds the last reference on the file at next will, during that window, call nlm_release_file() -> nlm_delete_file() under nlm_file_mutex, hlist_del() it from the bucket, and kfree() it. When nlm_traverse_files() reacquires the mutex and the macro reads the next entry's f_list.next on the following iteration, the read lands in the freed slab. A naive restart-on-action variant would deadlock-spin against an nlm_release_file holder: nlm_inspect_file() does not always drain the file (it can return 1 with an RPC still holding f_count above the cleanup threshold), and the outer predicate is_failover_file() matches static attributes of the file, so a restart can keep re-finding the same un-cleanable file until the external RPC ref drops. Pin the neighbour explicitly instead. Walk the bucket with two locally-pinned cursors at a time: file (current, pinned by the prior iteration's next bump) and next (one ahead). Drop file's pin at the end of each iteration, then advance to next, which is still alive because we hold its f_count above zero across the unlock. This bounds the walk at O(N) per bucket and never observes a freed neighbour. Factor the f_count/list/share/lock cleanup into a helper so the no-match path also drops a stale empty file rather than leaving it in the table. Cc: stable@vger.kernel.org Fixes: 01df9c5e918a ("LOCKD: Fix a deadlock in nlm_traverse_files()") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito --- fs/lockd/svcsubs.c | 61 +++++++++++++++++++++++++++++++--------------- 1 file changed, 42 insertions(+), 19 deletions(-) Reproduced under UML + KASAN with a loopback NFSv3 mount, 768 concurrent POSIX fcntl(F_SETLKW) holders, and parallel writes to /proc/fs/nfsd/unlock_filesystem forcing nlmsvc_unlock_all_by_sb() to walk the table while clients churn locks. Stock kernel: BUG: KASAN: slab-use-after-free in nlm_traverse_files+0x71d/0x9d0 Read of size 8 at addr 0000000070314800 by task nlm-init-... Allocated by: nlm_lookup_file via nlm4svc_proc_lock Freed by: another nlm_traverse_files instance freeing a file whose f_count dropped to zero during the nlm_inspect_file() unlock window Patched UML kernel ran the same harness silently. Pin-next was chosen over restart-on-action because the latter can livelock when nlm_inspect_file() returns 1 with an RPC reference still holding the file above the cleanup threshold and the outer is_failover_file() predicate matching static attributes. diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index dd0214dcb6950..2bfa32207f10c 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -295,36 +295,59 @@ static void nlm_close_files(struct nlm_file *file) /* * Loop over all files in the file table. */ +static void nlm_file_release(struct nlm_file *file) +{ + if (list_empty(&file->f_blocks) && !file->f_locks + && !file->f_shares && !file->f_count) { + hlist_del(&file->f_list); + nlm_close_files(file); + kfree(file); + } +} + static int nlm_traverse_files(void *data, nlm_host_match_fn_t match, int (*is_failover_file)(void *data, struct nlm_file *file)) { - struct hlist_node *next; - struct nlm_file *file; + struct nlm_file *file, *next; int i, ret =3D 0; =20 mutex_lock(&nlm_file_mutex); for (i =3D 0; i < FILE_NRHASH; i++) { - hlist_for_each_entry_safe(file, next, &nlm_files[i], f_list) { - if (is_failover_file && !is_failover_file(data, file)) - continue; + file =3D hlist_entry_safe(nlm_files[i].first, + struct nlm_file, f_list); + if (file) file->f_count++; - mutex_unlock(&nlm_file_mutex); - - /* Traverse locks, blocks and shares of this file - * and update file->f_locks count */ - if (nlm_inspect_file(data, file, match)) - ret =3D 1; + while (file) { + /* + * Pin the next neighbour before we drop the mutex + * for nlm_inspect_file(); a concurrent + * nlm_release_file() under the same mutex would + * otherwise be free to unlink and kfree it during + * the unlock window, leaving us to dereference a + * freed slab when we walked to next afterwards. + */ + next =3D hlist_entry_safe(file->f_list.next, + struct nlm_file, f_list); + if (next) + next->f_count++; + + if (!is_failover_file || is_failover_file(data, file)) { + mutex_unlock(&nlm_file_mutex); + + /* + * Traverse locks, blocks and shares of this + * file and update file->f_locks count. + */ + if (nlm_inspect_file(data, file, match)) + ret =3D 1; + + mutex_lock(&nlm_file_mutex); + } =20 - mutex_lock(&nlm_file_mutex); file->f_count--; - /* No more references to this file. Let go of it. */ - if (list_empty(&file->f_blocks) && !file->f_locks - && !file->f_shares && !file->f_count) { - hlist_del(&file->f_list); - nlm_close_files(file); - kfree(file); - } + nlm_file_release(file); + file =3D next; } } mutex_unlock(&nlm_file_mutex); --=20 2.53.0