From nobody Mon Feb 9 15:26:07 2026 Received: from sender4-pp-f112.zoho.com (sender4-pp-f112.zoho.com [136.143.188.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CBD83101DE; Mon, 22 Dec 2025 15:19:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=pass smtp.client-ip=136.143.188.112 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766416775; cv=pass; b=K4uOH3uR4BcHsamRKn0bp5K2VsWusVbggZR898UdHZ6HNqO4VQLhffIhxsXIqaTS0pJyn4j+fLU2IwCBEnhfJiM4vnN4RoT/5HYueiZFi+pugAjYzWzVXGM7/en0iw7r0YYQsIocqGG0eAhqP01lenWYdcDm8MdIMeDSKti+zOY= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766416775; c=relaxed/simple; bh=etSCr9u+N1LUS3SnD/x758R31sHspWI9Ei21SgGE/vQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UibgZuab+XEnm1x0+PpP6xVGusqW6yGV3Vq5f+reJxenYNe612jx9hNKgs7fmWuwIHjmTvAfDkEBEQmsSy3mh82o29IWH00hGV0RWoWBjN16iB8fA4UUUBdLLSAMz/tsTXlt5TMrafHkoYdB0Yc08TeCiOxSyZO8y5mzE/dBQ3c= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=linux.beauty; spf=pass smtp.mailfrom=linux.beauty; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b=ddVgTUi9; arc=pass smtp.client-ip=136.143.188.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=linux.beauty Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.beauty Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty header.b="ddVgTUi9" ARC-Seal: i=1; a=rsa-sha256; t=1766416766; cv=none; d=zohomail.com; s=zohoarc; b=hY77aq+4WFftpjmrmRiWZfZFa9sjBMux9Xuje3ROOwioG6tpUX4frmqNfrSe+oqfD73hrPbQ5H2b9K9RTqdi8F8EarrGh84OSaFjoDvSc0Y0YEST9SXBNRGEjAw3CUasZQUalhtAhj0fFX1ZSCCQdA+qIGmAFb1qgKDvbTMI4xM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1766416766; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=z8WIyx4BhnOIHIvBxzDfG//TrLbXjEv6p0cAOqY3+2A=; b=D2Wx7ZvfMlUOjuOzLwigS6RVngKqgQpAubxui7E/kkDKRr+WDe7Res/yNw+TLnk3dHJUHca7CGX6HW8sc+AGG7BLKuJ3sOfxvDMK9VdiNvI5N3Wu4hahdMmwribBWkyaiaJbBTWTZLGa/jgAxx9WGNclZIXRVWT55RO/8j7xnVs= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=linux.beauty; spf=pass smtp.mailfrom=me@linux.beauty; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1766416766; s=zmail; d=linux.beauty; i=me@linux.beauty; h=From:From:To:To:Cc:Cc:Subject:Subject:Date:Date:Message-ID:In-Reply-To:References:MIME-Version:Content-Transfer-Encoding:Message-Id:Reply-To; bh=z8WIyx4BhnOIHIvBxzDfG//TrLbXjEv6p0cAOqY3+2A=; b=ddVgTUi98UsAAOYIzfvpj20VtWeMWG7upXIEWjLmFc1NhH9mxsJzBgMpWjFwrowh K2hO8j1Ibpespsc86dekoIk4kDJ3yX8ZTUwDPHjL35bURIJblGgEYMezhIgT1P7IKVn dgCDyaTaqNXDSRWk1FEUTH8ePh6K9LXomquSPTTw= Received: by mx.zohomail.com with SMTPS id 1766416765398216.03092298580214; Mon, 22 Dec 2025 07:19:25 -0800 (PST) From: Li Chen To: "Theodore Ts'o" , Andreas Dilger , linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Li Chen Subject: [RFC PATCH v2 2/2] ext4: fast commit: fix s_fc_lock vs i_data_sem inversion Date: Mon, 22 Dec 2025 23:19:06 +0800 Message-ID: <20251222151906.24607-3-me@linux.beauty> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251222151906.24607-1-me@linux.beauty> References: <20251222151906.24607-1-me@linux.beauty> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Content-Type: text/plain; charset="utf-8" lockdep reports a possible deadlock due to lock order inversion: CPU0 CPU1 ---- ---- lock(&sbi->s_fc_lock); lock(&ei->i_data_sem); lock(&sbi->s_fc_lock); rlock(&ei->i_data_sem); ext4_fc_perform_commit() held s_fc_lock while writing fast commit blocks. This can write the journal inode, whose mapping can call ext4_map_blocks() and take i_data_sem. At the same time, metadata update paths can hold i_data_sem and call ext4_fc_track_inode(), which takes s_fc_lock. Drop s_fc_lock before the log writing step. Keep inode and dentry state stable by using EXT4_STATE_FC_COMMITTING for synchronization: ext4_fc_del() waits for COMMITTING, and inodes referenced only from create dentry updates are also marked COMMITTING and woken up on cleanup. Signed-off-by: Li Chen --- fs/ext4/fast_commit.c | 79 ++++++++++++++++++++++++++++++++----------- 1 file changed, 60 insertions(+), 19 deletions(-) diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c index 3bcdd4619de1..722952bea515 100644 --- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -244,23 +244,26 @@ void ext4_fc_del(struct inode *inode) return; } =20 - /* - * Since ext4_fc_del is called from ext4_evict_inode while having a - * handle open, there is no need for us to wait here even if a fast - * commit is going on. That is because, if this inode is being - * committed, ext4_mark_inode_dirty would have waited for inode commit - * operation to finish before we come here. So, by the time we come - * here, inode's EXT4_STATE_FC_COMMITTING would have been cleared. So, - * we shouldn't see EXT4_STATE_FC_COMMITTING to be set on this inode - * here. - * - * We may come here without any handles open in the "no_delete" case of - * ext4_evict_inode as well. However, if that happens, we first mark the - * file system as fast commit ineligible anyway. So, even in that case, - * it is okay to remove the inode from the fc list. - */ - WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING) - && !ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE)); + /* Don't race with fast commit processing of this inode. */ + while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) { +#if (BITS_PER_LONG < 64) + DEFINE_WAIT_BIT(wait, &ei->i_state_flags, + EXT4_STATE_FC_COMMITTING); + wq =3D bit_waitqueue(&ei->i_state_flags, + EXT4_STATE_FC_COMMITTING); +#else + DEFINE_WAIT_BIT(wait, &ei->i_flags, + EXT4_STATE_FC_COMMITTING); + wq =3D bit_waitqueue(&ei->i_flags, EXT4_STATE_FC_COMMITTING); +#endif + prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE); + if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) { + mutex_unlock(&sbi->s_fc_lock); + schedule(); + mutex_lock(&sbi->s_fc_lock); + } + finish_wait(wq, &wait.wq_entry); + } while (ext4_test_inode_state(inode, EXT4_STATE_FC_FLUSHING_DATA)) { #if (BITS_PER_LONG < 64) DEFINE_WAIT_BIT(wait, &ei->i_state_flags, @@ -1107,6 +1110,27 @@ static int ext4_fc_perform_commit(journal_t *journal) ext4_set_inode_state(&iter->vfs_inode, EXT4_STATE_FC_COMMITTING); } + /* + * Also mark inodes referenced by create dentry updates. These inodes are + * tracked via i_fc_dilist and might not be on s_fc_q[MAIN]. + */ + { + struct ext4_fc_dentry_update *fc_dentry; + struct ext4_inode_info *ei; + + list_for_each_entry(fc_dentry, &sbi->s_fc_dentry_q[FC_Q_MAIN], + fcd_list) { + if (fc_dentry->fcd_op !=3D EXT4_FC_TAG_CREAT) + continue; + if (list_empty(&fc_dentry->fcd_dilist)) + continue; + ei =3D list_first_entry(&fc_dentry->fcd_dilist, + struct ext4_inode_info, + i_fc_dilist); + ext4_set_inode_state(&ei->vfs_inode, + EXT4_STATE_FC_COMMITTING); + } + } mutex_unlock(&sbi->s_fc_lock); jbd2_journal_unlock_updates(journal); =20 @@ -1135,7 +1159,6 @@ static int ext4_fc_perform_commit(journal_t *journal) } =20 /* Step 6.2: Now write all the dentry updates. */ - mutex_lock(&sbi->s_fc_lock); ret =3D ext4_fc_commit_dentry_updates(journal, &crc); if (ret) goto out; @@ -1157,7 +1180,6 @@ static int ext4_fc_perform_commit(journal_t *journal) ret =3D ext4_fc_write_tail(sb, crc); =20 out: - mutex_unlock(&sbi->s_fc_lock); blk_finish_plug(&plug); return ret; } @@ -1339,6 +1361,25 @@ static void ext4_fc_cleanup(journal_t *journal, int = full, tid_t tid) struct ext4_fc_dentry_update, fcd_list); list_del_init(&fc_dentry->fcd_list); + if (fc_dentry->fcd_op =3D=3D EXT4_FC_TAG_CREAT && + !list_empty(&fc_dentry->fcd_dilist)) { + ei =3D list_first_entry(&fc_dentry->fcd_dilist, + struct ext4_inode_info, + i_fc_dilist); + ext4_clear_inode_state(&ei->vfs_inode, + EXT4_STATE_FC_COMMITTING); + /* + * Make sure clearing of EXT4_STATE_FC_COMMITTING is + * visible before we send the wakeup. Pairs with implicit + * barrier in prepare_to_wait() in ext4_fc_track_inode(). + */ + smp_mb(); +#if (BITS_PER_LONG < 64) + wake_up_bit(&ei->i_state_flags, EXT4_STATE_FC_COMMITTING); +#else + wake_up_bit(&ei->i_flags, EXT4_STATE_FC_COMMITTING); +#endif + } list_del_init(&fc_dentry->fcd_dilist); =20 release_dentry_name_snapshot(&fc_dentry->fcd_name); --=20 2.51.0