From nobody Fri Dec 19 12:13:10 2025 Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11CE01E231E for ; Fri, 10 Oct 2025 22:17:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760134665; cv=none; b=KyY28ata6+70NsynXGoWIsT5hxoVlJopbmG3H8LTfcuR0B6WXKxEXVFaJg93Kso9xZTxwA5pER3eeWiXpUjdrYF6aOE/KtabCYgqwqscoGz/Bm9i5vKUgKl11NOM/08x/n/V18XiN9zw0NvBf48E7HylBfQvB+xLYZ4PwjrTIWQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760134665; c=relaxed/simple; bh=w1T9GeSwriOHz1us7VegZsxeoRhTe4hxz+3gec3k1fg=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=EvLpbAEMhWitp/8yYg2oS1COdJbyjSw9z7bk2oBn46z7bPIJbkf48jyiXcC6MNMwRasYiNohPmiGhtBRaYHzR6EwK7l7Rddr/mnKJzHTha8APMMzd5tzl98YahjavnR79wAbxs2ET0A/shd7SUtSWP+L2uh5LNcVO8WENoLDa8I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IRI7Uvp0; arc=none smtp.client-ip=209.85.208.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IRI7Uvp0" Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-62fc89cd68bso702928a12.0 for ; Fri, 10 Oct 2025 15:17:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1760134661; x=1760739461; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=LKQtbwvgtZO6Ou2ow3imMF0f8RAbSKVvOayIwxBwtQw=; b=IRI7Uvp0W2q4VmWsiAtnxGupajmImF7D9vPVsJYBP034FtW4BoBy+jAFtItEhREMLJ 6ZEsY+ZOO1++OCN0S2rm6S2LML16cqdgqCe1h1lkumAIvaswm96E8sxOeV+LqgNUiuqY 5W3eqiGkDMIz1UlOaLWA8ldYwlYgrqpa23Wsj7sG7+gxYCS8vVoLyb7BqEds0wTLfFdx LmsOD/dS+V7w2UogUp1nQxRyckBsR+7kAAVu1wUV1BlQvB7yNPyl4jCy2FoHaiQA/Jlg Oe9EbPQdOMWAV4MI5qccef1kwKi6BNlgXq7HVDGyyDiu7OdFdusaTD5l+RFRZUHZGjmf 9vcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760134661; x=1760739461; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LKQtbwvgtZO6Ou2ow3imMF0f8RAbSKVvOayIwxBwtQw=; b=XUde+MgubLudq5mUwkY0fpsUzoqFWCzpd+MMPBSOjD07B3bvOpNtvA/Fp4SHN6wqXg VmukgG0hHuFkK1nwP7N0YBjE4axTbwUbk8aZErDI+ibxoO//fQ3iwilkd0tcJ+zs8YE9 ROW40EEm+THVL5s1pdUuCEJMwum/rx9/P7ycEP2tycVvaFr9nldg7X/JmVGon+zjbyXS s/do1/bTuVBHXFLvJxAN+YIrgzq8z4s9bLU+aq3zlKWlbjJtQ3yMkWlToWuwNe+lIlXp gQUrwZ6RfNHNpQMY90L2Czki79ZKZSYx3Zoo+d6T745VIoz8r5pZvUGmjQWWE5ekAz4V kmFw== X-Forwarded-Encrypted: i=1; AJvYcCU69mrC+8E+mVFsrAHfrXVJAkCnwVqq3EH1vDkkOTeXy98Y6aDRUyNEFcmYo3Ew10nnsxk9Zi065H4MpAo=@vger.kernel.org X-Gm-Message-State: AOJu0Yy5XENQee1ZPM7gfdg2i6o82kUWa5K5MMdte/PXWoPMMnEQKbyG 5mJ2ru7+pNQMu9myG9zbsuzaLA6lbOmKd6JCcMemQIDeDB5M8K/lHlWt9vUJNQ== X-Gm-Gg: ASbGncvqoUXReChHLmgfIfbxgRwt9hgaWVCgcGHhndkkTCesTuqmYZ32mJ6HMdFJhC6 i4oELtvjGExdPtada4zsK1srIHOX9UZctH70HE5yGJ9R1SQpTWmoJC5kg+HzMcPtQDMWJ3D4G+r JwO745iJhR6HCyjHIUIJVhooy4jl8Exl/AnbdkfP9JXKmLGw4jNRea2c2TD/RBtl+fnH007Idmh HKk68AqdEgZ0rloWBe+GIPSdKmEGrV3P8wqyXr5Hx/HlTWJQUkU7FqwszYTr41AK9Q5cDT/iK0v d9AhvpmqlIpBjg9M4VurSJRFrXGR6uGTvVcWBqXHyqY68Expf3h/HsN7bybZ7xl/HyeCo8a1cn5 XuNSsOy6zDRjxQKQ2T/+EaqjyAQNgrl7AcM3KBSKjyAki2obx8U9Rh7PRkmVLMJCWUoc3xqTB2E dGwhY+Z5zIWfANEhBcRC+cWP7cydA= X-Google-Smtp-Source: AGHT+IHIkPtl6T0qfdYDVwta+NY5U8nDtnRpyI9VpPx6t7WYBKcSYOsWUTGZI1qD5t5eYXJnGFmU2g== X-Received: by 2002:a17:906:c105:b0:b3e:907c:9e26 with SMTP id a640c23a62f3a-b50ac5cfaf7mr1384462966b.59.1760134661125; Fri, 10 Oct 2025 15:17:41 -0700 (PDT) Received: from f.. (cst-prg-66-155.cust.vodafone.cz. [46.135.66.155]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b55d67d2db4sm331467666b.36.2025.10.10.15.17.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Oct 2025 15:17:40 -0700 (PDT) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Mateusz Guzik Subject: [PATCH] fs: rework I_NEW handling to operate without fences Date: Sat, 11 Oct 2025 00:17:36 +0200 Message-ID: <20251010221737.1403539-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the inode hash code grab the state while ->i_lock is held. If found to be set, synchronize the sleep once more with the lock held. In the real world the flag is not set most of the time. Apart from being simpler to reason about, it comes with a minor speed up as now clearing the flag does not require the smp_mb() fence. While here rename wait_on_inode() to wait_on_new_inode() to line it up with __wait_on_freeing_inode(). Signed-off-by: Mateusz Guzik Reviewed-by: Jan Kara --- This temporarily duplicated sleep code from inode_wait_for_lru_isolating(). This is going to get dedupped later. There is high repetition of: if (unlikely(isnew)) { wait_on_new_inode(old); if (unlikely(inode_unhashed(old))) { iput(old); goto again; } I expect this is going to go away after I post a patch to sanitize the current APIs for the hash. fs/afs/dir.c | 4 +- fs/dcache.c | 10 ---- fs/gfs2/glock.c | 2 +- fs/inode.c | 146 +++++++++++++++++++++++++++------------------ include/linux/fs.h | 12 +--- 5 files changed, 93 insertions(+), 81 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 89d36e3e5c79..f4e9e12373ac 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -779,7 +779,7 @@ static struct inode *afs_do_lookup(struct inode *dir, s= truct dentry *dentry) struct afs_vnode *dvnode =3D AFS_FS_I(dir), *vnode; struct inode *inode =3D NULL, *ti; afs_dataversion_t data_version =3D READ_ONCE(dvnode->status.data_version); - bool supports_ibulk; + bool supports_ibulk, isnew; long ret; int i; =20 @@ -850,7 +850,7 @@ static struct inode *afs_do_lookup(struct inode *dir, s= truct dentry *dentry) * callback counters. */ ti =3D ilookup5_nowait(dir->i_sb, vp->fid.vnode, - afs_ilookup5_test_by_fid, &vp->fid); + afs_ilookup5_test_by_fid, &vp->fid, &isnew); if (!IS_ERR_OR_NULL(ti)) { vnode =3D AFS_FS_I(ti); vp->dv_before =3D vnode->status.data_version; diff --git a/fs/dcache.c b/fs/dcache.c index 78ffa7b7e824..25131f105a60 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1981,17 +1981,7 @@ void d_instantiate_new(struct dentry *entry, struct = inode *inode) spin_lock(&inode->i_lock); __d_instantiate(entry, inode); WARN_ON(!(inode_state_read(inode) & I_NEW)); - /* - * Pairs with smp_rmb in wait_on_inode(). - */ - smp_wmb(); inode_state_clear(inode, I_NEW | I_CREATING); - /* - * Pairs with the barrier in prepare_to_wait_event() to make sure - * ___wait_var_event() either sees the bit cleared or - * waitqueue_active() check in wake_up_var() sees the waiter. - */ - smp_mb(); inode_wake_up_bit(inode, __I_NEW); spin_unlock(&inode->i_lock); } diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index b677c0e6b9ab..c9712235e7a0 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -957,7 +957,7 @@ static struct gfs2_inode *gfs2_grab_existing_inode(stru= ct gfs2_glock *gl) ip =3D NULL; spin_unlock(&gl->gl_lockref.lock); if (ip) { - wait_on_inode(&ip->i_inode); + wait_on_new_inode(&ip->i_inode); if (is_bad_inode(&ip->i_inode)) { iput(&ip->i_inode); ip =3D NULL; diff --git a/fs/inode.c b/fs/inode.c index 3153d725859c..1396f79b2551 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -558,6 +558,32 @@ struct wait_queue_head *inode_bit_waitqueue(struct wai= t_bit_queue_entry *wqe, } EXPORT_SYMBOL(inode_bit_waitqueue); =20 +void wait_on_new_inode(struct inode *inode) +{ + struct wait_bit_queue_entry wqe; + struct wait_queue_head *wq_head; + + spin_lock(&inode->i_lock); + if (!(inode_state_read(inode) & I_NEW)) { + spin_unlock(&inode->i_lock); + return; + } + + wq_head =3D inode_bit_waitqueue(&wqe, inode, __I_NEW); + for (;;) { + prepare_to_wait_event(wq_head, &wqe.wq_entry, TASK_UNINTERRUPTIBLE); + if (!(inode_state_read(inode) & I_NEW)) + break; + spin_unlock(&inode->i_lock); + schedule(); + spin_lock(&inode->i_lock); + } + finish_wait(wq_head, &wqe.wq_entry); + WARN_ON(inode_state_read(inode) & I_NEW); + spin_unlock(&inode->i_lock); +} +EXPORT_SYMBOL(wait_on_new_inode); + /* * Add inode to LRU if needed (inode is unused and clean). * @@ -1008,7 +1034,8 @@ static void __wait_on_freeing_inode(struct inode *ino= de, bool is_inode_hash_lock static struct inode *find_inode(struct super_block *sb, struct hlist_head *head, int (*test)(struct inode *, void *), - void *data, bool is_inode_hash_locked) + void *data, bool is_inode_hash_locked, + bool *isnew) { struct inode *inode =3D NULL; =20 @@ -1035,6 +1062,7 @@ static struct inode *find_inode(struct super_block *s= b, return ERR_PTR(-ESTALE); } __iget(inode); + *isnew =3D !!(inode_state_read(inode) & I_NEW); spin_unlock(&inode->i_lock); rcu_read_unlock(); return inode; @@ -1049,7 +1077,7 @@ static struct inode *find_inode(struct super_block *s= b, */ static struct inode *find_inode_fast(struct super_block *sb, struct hlist_head *head, unsigned long ino, - bool is_inode_hash_locked) + bool is_inode_hash_locked, bool *isnew) { struct inode *inode =3D NULL; =20 @@ -1076,6 +1104,7 @@ static struct inode *find_inode_fast(struct super_blo= ck *sb, return ERR_PTR(-ESTALE); } __iget(inode); + *isnew =3D !!(inode_state_read(inode) & I_NEW); spin_unlock(&inode->i_lock); rcu_read_unlock(); return inode; @@ -1181,17 +1210,7 @@ void unlock_new_inode(struct inode *inode) lockdep_annotate_inode_mutex_key(inode); spin_lock(&inode->i_lock); WARN_ON(!(inode_state_read(inode) & I_NEW)); - /* - * Pairs with smp_rmb in wait_on_inode(). - */ - smp_wmb(); inode_state_clear(inode, I_NEW | I_CREATING); - /* - * Pairs with the barrier in prepare_to_wait_event() to make sure - * ___wait_var_event() either sees the bit cleared or - * waitqueue_active() check in wake_up_var() sees the waiter. - */ - smp_mb(); inode_wake_up_bit(inode, __I_NEW); spin_unlock(&inode->i_lock); } @@ -1202,17 +1221,7 @@ void discard_new_inode(struct inode *inode) lockdep_annotate_inode_mutex_key(inode); spin_lock(&inode->i_lock); WARN_ON(!(inode_state_read(inode) & I_NEW)); - /* - * Pairs with smp_rmb in wait_on_inode(). - */ - smp_wmb(); inode_state_clear(inode, I_NEW); - /* - * Pairs with the barrier in prepare_to_wait_event() to make sure - * ___wait_var_event() either sees the bit cleared or - * waitqueue_active() check in wake_up_var() sees the waiter. - */ - smp_mb(); inode_wake_up_bit(inode, __I_NEW); spin_unlock(&inode->i_lock); iput(inode); @@ -1286,12 +1295,13 @@ struct inode *inode_insert5(struct inode *inode, un= signed long hashval, { struct hlist_head *head =3D inode_hashtable + hash(inode->i_sb, hashval); struct inode *old; + bool isnew; =20 might_sleep(); =20 again: spin_lock(&inode_hash_lock); - old =3D find_inode(inode->i_sb, head, test, data, true); + old =3D find_inode(inode->i_sb, head, test, data, true, &isnew); if (unlikely(old)) { /* * Uhhuh, somebody else created the same inode under us. @@ -1300,10 +1310,12 @@ struct inode *inode_insert5(struct inode *inode, un= signed long hashval, spin_unlock(&inode_hash_lock); if (IS_ERR(old)) return NULL; - wait_on_inode(old); - if (unlikely(inode_unhashed(old))) { - iput(old); - goto again; + if (unlikely(isnew)) { + wait_on_new_inode(old); + if (unlikely(inode_unhashed(old))) { + iput(old); + goto again; + } } return old; } @@ -1391,18 +1403,21 @@ struct inode *iget5_locked_rcu(struct super_block *= sb, unsigned long hashval, { struct hlist_head *head =3D inode_hashtable + hash(sb, hashval); struct inode *inode, *new; + bool isnew; =20 might_sleep(); =20 again: - inode =3D find_inode(sb, head, test, data, false); + inode =3D find_inode(sb, head, test, data, false, &isnew); if (inode) { if (IS_ERR(inode)) return NULL; - wait_on_inode(inode); - if (unlikely(inode_unhashed(inode))) { - iput(inode); - goto again; + if (unlikely(isnew)) { + wait_on_new_inode(inode); + if (unlikely(inode_unhashed(inode))) { + iput(inode); + goto again; + } } return inode; } @@ -1434,18 +1449,21 @@ struct inode *iget_locked(struct super_block *sb, u= nsigned long ino) { struct hlist_head *head =3D inode_hashtable + hash(sb, ino); struct inode *inode; + bool isnew; =20 might_sleep(); =20 again: - inode =3D find_inode_fast(sb, head, ino, false); + inode =3D find_inode_fast(sb, head, ino, false, &isnew); if (inode) { if (IS_ERR(inode)) return NULL; - wait_on_inode(inode); - if (unlikely(inode_unhashed(inode))) { - iput(inode); - goto again; + if (unlikely(isnew)) { + wait_on_new_inode(inode); + if (unlikely(inode_unhashed(inode))) { + iput(inode); + goto again; + } } return inode; } @@ -1456,7 +1474,7 @@ struct inode *iget_locked(struct super_block *sb, uns= igned long ino) =20 spin_lock(&inode_hash_lock); /* We released the lock, so.. */ - old =3D find_inode_fast(sb, head, ino, true); + old =3D find_inode_fast(sb, head, ino, true, &isnew); if (!old) { inode->i_ino =3D ino; spin_lock(&inode->i_lock); @@ -1482,10 +1500,12 @@ struct inode *iget_locked(struct super_block *sb, u= nsigned long ino) if (IS_ERR(old)) return NULL; inode =3D old; - wait_on_inode(inode); - if (unlikely(inode_unhashed(inode))) { - iput(inode); - goto again; + if (unlikely(isnew)) { + wait_on_new_inode(inode); + if (unlikely(inode_unhashed(inode))) { + iput(inode); + goto again; + } } } return inode; @@ -1586,13 +1606,13 @@ EXPORT_SYMBOL(igrab); * Note2: @test is called with the inode_hash_lock held, so can't sleep. */ struct inode *ilookup5_nowait(struct super_block *sb, unsigned long hashva= l, - int (*test)(struct inode *, void *), void *data) + int (*test)(struct inode *, void *), void *data, bool *isnew) { struct hlist_head *head =3D inode_hashtable + hash(sb, hashval); struct inode *inode; =20 spin_lock(&inode_hash_lock); - inode =3D find_inode(sb, head, test, data, true); + inode =3D find_inode(sb, head, test, data, true, isnew); spin_unlock(&inode_hash_lock); =20 return IS_ERR(inode) ? NULL : inode; @@ -1620,16 +1640,19 @@ struct inode *ilookup5(struct super_block *sb, unsi= gned long hashval, int (*test)(struct inode *, void *), void *data) { struct inode *inode; + bool isnew; =20 might_sleep(); =20 again: - inode =3D ilookup5_nowait(sb, hashval, test, data); + inode =3D ilookup5_nowait(sb, hashval, test, data, &isnew); if (inode) { - wait_on_inode(inode); - if (unlikely(inode_unhashed(inode))) { - iput(inode); - goto again; + if (unlikely(isnew)) { + wait_on_new_inode(inode); + if (unlikely(inode_unhashed(inode))) { + iput(inode); + goto again; + } } } return inode; @@ -1648,19 +1671,22 @@ struct inode *ilookup(struct super_block *sb, unsig= ned long ino) { struct hlist_head *head =3D inode_hashtable + hash(sb, ino); struct inode *inode; + bool isnew; =20 might_sleep(); =20 again: - inode =3D find_inode_fast(sb, head, ino, false); + inode =3D find_inode_fast(sb, head, ino, false, &isnew); =20 if (inode) { if (IS_ERR(inode)) return NULL; - wait_on_inode(inode); - if (unlikely(inode_unhashed(inode))) { - iput(inode); - goto again; + if (unlikely(isnew)) { + wait_on_new_inode(inode); + if (unlikely(inode_unhashed(inode))) { + iput(inode); + goto again; + } } } return inode; @@ -1800,6 +1826,7 @@ int insert_inode_locked(struct inode *inode) struct super_block *sb =3D inode->i_sb; ino_t ino =3D inode->i_ino; struct hlist_head *head =3D inode_hashtable + hash(sb, ino); + bool isnew; =20 might_sleep(); =20 @@ -1832,12 +1859,15 @@ int insert_inode_locked(struct inode *inode) return -EBUSY; } __iget(old); + isnew =3D !!(inode_state_read(old) & I_NEW); spin_unlock(&old->i_lock); spin_unlock(&inode_hash_lock); - wait_on_inode(old); - if (unlikely(!inode_unhashed(old))) { - iput(old); - return -EBUSY; + if (isnew) { + wait_on_new_inode(old); + if (unlikely(!inode_unhashed(old))) { + iput(old); + return -EBUSY; + } } iput(old); } diff --git a/include/linux/fs.h b/include/linux/fs.h index 21c73df3ce75..a813abdcf218 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1030,15 +1030,7 @@ static inline void inode_fake_hash(struct inode *ino= de) hlist_add_fake(&inode->i_hash); } =20 -static inline void wait_on_inode(struct inode *inode) -{ - wait_var_event(inode_state_wait_address(inode, __I_NEW), - !(inode_state_read_once(inode) & I_NEW)); - /* - * Pairs with routines clearing I_NEW. - */ - smp_rmb(); -} +void wait_on_new_inode(struct inode *inode); =20 /* * inode->i_rwsem nesting subclasses for the lock validator: @@ -3417,7 +3409,7 @@ extern void d_mark_dontcache(struct inode *inode); =20 extern struct inode *ilookup5_nowait(struct super_block *sb, unsigned long hashval, int (*test)(struct inode *, void *), - void *data); + void *data, bool *isnew); extern struct inode *ilookup5(struct super_block *sb, unsigned long hashva= l, int (*test)(struct inode *, void *), void *data); extern struct inode *ilookup(struct super_block *sb, unsigned long ino); --=20 2.34.1