From nobody Wed Apr 1 11:13:38 2026 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A60E421F1F for ; Tue, 31 Mar 2026 16:09:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774973352; cv=none; b=tUcr/FZ3WcmUFnkLMwgOOisT/mD3mnle9Lo1C9paFew1J8qfda0XE0sE92oouMq5zPRk61scNpwnQrahFUFGKK0ADeozT0IxPSm36vd6bNQ7BC6KO0CJG7XH5IbaIPl8/Ori5x3ob6vv7Ee1pQyncg6Qcu5lDljA/SuMHO1b8Rs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774973352; c=relaxed/simple; bh=xLe7Xr4y/3FYwrBKdU9rBcEBhztw5Png+eE/YUQOL1E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WDN2osQGm+N3NzvR/O5d2TeO9KDKsJA1hH3wY5KFaC3c7B1DS7blOZgjFkppv8/L5cXYcA+hjouQzYvxeH6ugeDNZzWdQUN038vZd9D+iQsaNswu50fVosHlHWE+cZZssqSkqCqyNfWYt8kcr8jlc3MxeoxURlHkGEOCIe9jVsg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=DOMpYfCE; arc=none smtp.client-ip=209.85.221.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="DOMpYfCE" Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-439d8df7620so4171581f8f.0 for ; Tue, 31 Mar 2026 09:09:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774973349; x=1775578149; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HSqZSq7PMnW1S3SVm/ldyegcUrreaAeM2ZHCXOJzrgI=; b=DOMpYfCEdP5j1HAU70ogTBiVPDb/mXo4GS157dHxTg1DjrQ5pHd45zJPp29U1rUbIt CiJznxTqGpsuraP6zBoYhvZqZSb1eeHiKgNLYC20Q3B1UPdrw0ITsgTiGxKzYsDDx9NA 4VbfaWDKId15DIRYRFcS+H/WcaQOvhJD1Nk7vkq/gef+23Wa2DwigK1VMlzvP5szhCAZ GI/QEkRYMPB8lxw+dzHZbobJ3RG66fRWRKtIA/N/FHF4gbQm/kRwvC52f8pw++yd3JOk S91Py92EiR22nHciMKDwzde3QKt8k+wDGPpKGurowFb/TkjIJEYGhfhJfEQsJ5UL49/H 1HRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774973349; x=1775578149; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=HSqZSq7PMnW1S3SVm/ldyegcUrreaAeM2ZHCXOJzrgI=; b=AVb100+aQVkS3vZck8/j4nlqSOgd5fzaSIgPS9BwyDOjsUOUhzsSB/xt0n8YTbttPw MGe+hwbQcrWOvUMuHwearprm6Z0ft8UquxFUNwpjWGFTPwD9PnlXlIKrcR/2tfzxzMJ0 jVkJ4/GRzmfxplhFATdhDylAhSiawn+HK2XCo5j9Th/CSqQKBicG1Ofz8YRdqwkOVZNH SY6mLgyavWN6ZPDmAZ4zzXUOgP/aT4o8rPYK4tsd3HTeS9+4nyLHXWvjO3zX1mUHpUzG YbTRZIf4gUV7P/m6SZNW0eh1RVdhOc0XaBQuy9bK+WFKcy7ZwSBb0LWbKjaD21ehJDjm gVfA== X-Forwarded-Encrypted: i=1; AJvYcCWhG6RKf1CqFyXMk/p/3FGXL/ftSC2aUdVGz8JjMdY14vFKHEd/xf/dZf/dHGviVCfP6tJv0Pn6ZUUDVUs=@vger.kernel.org X-Gm-Message-State: AOJu0YzIGFw3+WCG3tMnzx2FtLxYDzr7AsjRSsKv4MIarfpN4THaxffX NW6Ggk/Ar+Prrg8HSec+LLePkRGhDQvv0mdN2zlrA4GEYB5W4s/VKFfo8HsU7g== X-Gm-Gg: ATEYQzwheHuv4l3nkQcIlFmCv8tpLvk5AOoAmD3b2ybYKtv3wExAPrdbG2lqc3eSe1d S5CUEWRjByTLxJy3MkLPn4qyjF9NlmUbPjirdvKJ03ZQQvfqi6wQUf7jfP8JqwKpE7E2pMNwqLY i14E3JpRA5qt7W4yNwswWl95V/QFSmsedGRAYklb4AGADcdXx3/P2yrxFdzZ3YgFQ0tEVdKTEjZ JqJ9jhO0zmbZ91sV3mqEfTBYLbYCCkqn8DF/aopAKTbwxCYJYDkn7IOhAV62KAjPB3+Tb8fLLy5 9GA88bp7MyROiFDtbBuMQK3xaQT1ibWkq2nFTLq+AZkXEzFw4i+N6yxj9P7HIHQQiE6fzCIr6nH X31OM7xx0/dMtq2J4gr5d+L5z4LeiRpgXOloEciwQ0DhXuu8zFLoZY6pes08vRVxxyEnR4j3msk hagPMedLXikRlx1gFMEmQPc/cXG3K/eqF2FiHrYKCmdeKjrUp8/0r3ZGWsGgZYGjfVC/ku1RfBX Q== X-Received: by 2002:a05:6000:220f:b0:43c:f8b4:e58 with SMTP id ffacd0b85a97d-43d151107bdmr389813f8f.41.1774973348468; Tue, 31 Mar 2026 09:09:08 -0700 (PDT) Received: from f.. (cst-prg-89-171.cust.vodafone.cz. [46.135.89.171]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43cf21e3602sm28792632f8f.4.2026.03.31.09.09.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Mar 2026 09:09:07 -0700 (PDT) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Mateusz Guzik Subject: [PATCH v5 4/4] fs: allow lockless ->i_count bumps as long as it does not transition 0->1 Date: Tue, 31 Mar 2026 18:08:51 +0200 Message-ID: <20260331160851.3854954-5-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260331160851.3854954-1-mjguzik@gmail.com> References: <20260331160851.3854954-1-mjguzik@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With this change only 0->1 and 1->0 transitions need the lock. I verified all places which look at the refcount either only care about it staying 0 (and have the lock enforce it) or don't hold the inode lock to begin with (making the above change irrelevant to their correcness or lack thereof). I also confirmed nfs and btrfs like to call into these a lot and now avoid the lock in the common case, shaving off some atomics. Signed-off-by: Mateusz Guzik --- fs/dcache.c | 4 +++ fs/inode.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 4 +-- 3 files changed, 71 insertions(+), 2 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 9ceab142896f..b63450ebb85c 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2033,6 +2033,10 @@ void d_instantiate_new(struct dentry *entry, struct = inode *inode) __d_instantiate(entry, inode); spin_unlock(&entry->d_lock); WARN_ON(!(inode_state_read(inode) & I_NEW)); + /* + * Paired with igrab_try_lockless() + */ + smp_wmb(); inode_state_clear(inode, I_NEW | I_CREATING); inode_wake_up_bit(inode, __I_NEW); spin_unlock(&inode->i_lock); diff --git a/fs/inode.c b/fs/inode.c index 013470e6d144..03472be4e1a9 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1029,6 +1029,7 @@ long prune_icache_sb(struct super_block *sb, struct s= hrink_control *sc) } =20 static void __wait_on_freeing_inode(struct inode *inode, bool hash_locked,= bool rcu_locked); +static bool igrab_try_lockless(struct inode *inode); =20 /* * Called with the inode lock held. @@ -1053,6 +1054,11 @@ static struct inode *find_inode(struct super_block *= sb, continue; if (!test(inode, data)) continue; + if (igrab_try_lockless(inode)) { + rcu_read_unlock(); + *isnew =3D false; + return inode; + } spin_lock(&inode->i_lock); if (inode_state_read(inode) & (I_FREEING | I_WILL_FREE)) { __wait_on_freeing_inode(inode, hash_locked, true); @@ -1095,6 +1101,11 @@ static struct inode *find_inode_fast(struct super_bl= ock *sb, continue; if (inode->i_sb !=3D sb) continue; + if (igrab_try_lockless(inode)) { + rcu_read_unlock(); + *isnew =3D false; + return inode; + } spin_lock(&inode->i_lock); if (inode_state_read(inode) & (I_FREEING | I_WILL_FREE)) { __wait_on_freeing_inode(inode, hash_locked, true); @@ -1212,6 +1223,10 @@ void unlock_new_inode(struct inode *inode) lockdep_annotate_inode_mutex_key(inode); spin_lock(&inode->i_lock); WARN_ON(!(inode_state_read(inode) & I_NEW)); + /* + * Paired with igrab_try_lockless() + */ + smp_wmb(); inode_state_clear(inode, I_NEW | I_CREATING); inode_wake_up_bit(inode, __I_NEW); spin_unlock(&inode->i_lock); @@ -1223,6 +1238,10 @@ void discard_new_inode(struct inode *inode) lockdep_annotate_inode_mutex_key(inode); spin_lock(&inode->i_lock); WARN_ON(!(inode_state_read(inode) & I_NEW)); + /* + * Paired with igrab_try_lockless() + */ + smp_wmb(); inode_state_clear(inode, I_NEW); inode_wake_up_bit(inode, __I_NEW); spin_unlock(&inode->i_lock); @@ -1582,6 +1601,14 @@ EXPORT_SYMBOL(ihold); =20 struct inode *igrab(struct inode *inode) { + /* + * Read commentary above igrab_try_lockless() for an explanation why this= works. + */ + if (atomic_add_unless(&inode->i_count, 1, 0)) { + VFS_BUG_ON_INODE(inode_state_read_once(inode) & (I_FREEING | I_WILL_FREE= ), inode); + return inode; + } + spin_lock(&inode->i_lock); if (!(inode_state_read(inode) & (I_FREEING | I_WILL_FREE))) { __iget(inode); @@ -1599,6 +1626,44 @@ struct inode *igrab(struct inode *inode) } EXPORT_SYMBOL(igrab); =20 +/* + * igrab_try_lockless - special inode refcount acquire primitive for the i= node hash + * (don't use elsewhere!) + * + * It provides lockless refcount acquire in the common case of no problema= tic + * flags being set and the count being > 0. + * + * There are 4 state flags to worry about and the routine makes sure to no= t bump the + * ref if any of them is present. + * + * I_NEW and I_CREATING can only legally get set *before* the inode become= s visible + * during lookup. Thus if the flags are not spotted, they are guaranteed t= o not be + * a factor. However, we need an acquire fence before returning the inode = just + * in case we raced against clearing the state to make sure our consumer p= icks up + * any other changes made prior. atomic_add_unless provides a full fence, = which + * takes care of it. + * + * I_FREEING and I_WILL_FREE can only legally get set if ->i_count =3D=3D = 0 and it is + * illegal to bump the ref if either is present. Consequently if atomic_ad= d_unless + * managed to replace a non-0 value with a bigger one, we have a guarantee= neither + * of these flags is set. Note this means explicitly checking of these fla= gs below + * is not necessary, it is only done because it does not cost anything on = top of the + * load which already needs to be done to handle the other flags. + */ +static bool igrab_try_lockless(struct inode *inode) +{ + if (inode_state_read_once(inode) & (I_NEW | I_CREATING | I_FREEING | I_WI= LL_FREE)) + return false; + /* + * Paired with routines clearing I_NEW + */ + if (atomic_add_unless(&inode->i_count, 1, 0)) { + VFS_BUG_ON_INODE(inode_state_read_once(inode) & (I_FREEING | I_WILL_FREE= ), inode); + return true; + } + return false; +} + /** * ilookup5_nowait - search for an inode in the inode cache * @sb: super block of file system to search diff --git a/include/linux/fs.h b/include/linux/fs.h index 07363fce4406..119e0a3d2f42 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2234,8 +2234,8 @@ static inline int icount_read_once(const struct inode= *inode) } =20 /* - * returns the refcount on the inode. The lock guarantees no new references - * are added, but references can be dropped as long as the result is > 0. + * returns the refcount on the inode. The lock guarantees no 0->1 or 1->0 = transitions + * of the count are going to take place, otherwise it changes arbitrarily. */ static inline int icount_read(const struct inode *inode) { --=20 2.48.1