From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6ACBA2FFFA4; Wed, 22 Apr 2026 02:16:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824220; cv=none; b=KfJw4GFr3GoDVWr5X7PeKSACa/GtGTaFrVu4TRa6/WzXuLub0p0s9BQAv+LFZVU4Vhm1gS2chf6NNSYb/sZ8+65oIB50ZnNXtP3ivRQpY3R/Re5KOQOR1Y06dcCQkM/RwSP4e2goB5cWNBhIrktiOKGfL3HQJS4KhPta0sY6A9k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824220; c=relaxed/simple; bh=2GLqOrJzpWznwh/KQhRAwJU+dvoOKIfi+7HjpC9I4vQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SVnBhKAxhuW+KELN/zeDecq4QwHVPIFLgBbdVy+bHfYkadFA1O6suiCFzDzCj4CkhegKR36qgDAdx8Yh3t3Z21hYclZuwicH71UvurtwfqE1LCoCnrAw/qYBTdXVNTXnWjM5Q5HW5ZXcD88O171tjCA+6lZQNCbMcKEoenb7FsY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWL3M0pzYQtpd; Wed, 22 Apr 2026 10:15:58 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id A6843405D6; Wed, 22 Apr 2026 10:16:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S5; Wed, 22 Apr 2026 10:16:55 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 01/22] ext4: simplify size updating in ext4_setattr() Date: Wed, 22 Apr 2026 10:10:21 +0800 Message-ID: <20260422021042.4157510-2-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S5 X-Coremail-Antispam: 1UD129KBjvJXoW7Aw4rtrW5Xr4DZr1kJr4ruFg_yoW8tryxpF yakw1vkw10gF1q9rs2gF1UZa40qa1093yUJFWjkw4IqF1DC3ZaqF17tay3uFW8trWDWw4Y qFWkKr4rJ34UGrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUm014x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0x vEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVj vjDU0xZFpf9x0JUfKs8UUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi The logic for updating the file size in ext4_setattr() is currently somewhat messy. By directly entering the error-handling path after failing to add an orphan inode, the unnecessary recovery process involving old_disksize and the file size can be avoided. Signed-off-by: Zhang Yi Reviewed-by: Jan Kara --- fs/ext4/inode.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c2c2d6ac7f3d..0751dc55e94f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5953,7 +5953,6 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dent= ry *dentry, if (attr->ia_valid & ATTR_SIZE) { handle_t *handle; loff_t oldsize =3D inode->i_size; - loff_t old_disksize; int shrink =3D (attr->ia_size < inode->i_size); =20 if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { @@ -6037,6 +6036,8 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dent= ry *dentry, if (ext4_handle_valid(handle) && shrink) { error =3D ext4_orphan_add(handle, inode); orphan =3D 1; + if (error) + goto out_handle; } =20 if (shrink) @@ -6052,23 +6053,18 @@ int ext4_setattr(struct mnt_idmap *idmap, struct de= ntry *dentry, (attr->ia_size > 0 ? attr->ia_size - 1 : 0) >> inode->i_sb->s_blocksize_bits); =20 - down_write(&EXT4_I(inode)->i_data_sem); - old_disksize =3D EXT4_I(inode)->i_disksize; - EXT4_I(inode)->i_disksize =3D attr->ia_size; - /* * We have to update i_size under i_data_sem together * with i_disksize to avoid races with writeback code - * running ext4_wb_update_i_disksize(). + * updating disksize in mpage_map_and_submit_extent(). */ - if (!error) - i_size_write(inode, attr->ia_size); - else - EXT4_I(inode)->i_disksize =3D old_disksize; + down_write(&EXT4_I(inode)->i_data_sem); + i_size_write(inode, attr->ia_size); + EXT4_I(inode)->i_disksize =3D attr->ia_size; up_write(&EXT4_I(inode)->i_data_sem); - rc =3D ext4_mark_inode_dirty(handle, inode); - if (!error) - error =3D rc; + + error =3D ext4_mark_inode_dirty(handle, inode); +out_handle: ext4_journal_stop(handle); if (error) goto out_mmap_sem; --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B772E30E84B; Wed, 22 Apr 2026 02:16:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824221; cv=none; b=lGWpJYUoHM0nKPJqq4R/TC/CSMnzJtaCuOqTWdL//gS81JEjS7ZsSaKxIBZ4Ad4pJjvhM4l9Wn8IhjqaTeqJe1ZWwwWToOR/ax9axa1t7uWcpQkkjFjcOYx5P67NY7J0HXhD3Vtt7z5WlXyDu2ynhqloZaXFkUi6QhFeZnIDibc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824221; c=relaxed/simple; bh=ljs0LduNiaHg7/zUlPLWLVki8+86xSUFofFyrf+sCWw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MWY+sdZ1sNBxNvkHni8fdJyUOO0w6zEDI/0fwMzjpEClH5SF+qNkIU1BiQBSM+p2xpyRypyUyPvb3WW/6KYtLPapu9uNQferXHLr96ou5OKvWcEHQZJOxJOZwItmq+1o+egOvX2w2supzhWMoiaeKqDZFIxwIBb4G8V3rM2A5Gk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWL49m0zYQtq0; Wed, 22 Apr 2026 10:15:58 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id C095140601; Wed, 22 Apr 2026 10:16:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S6; Wed, 22 Apr 2026 10:16:55 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 02/22] ext4: factor out ext4_truncate_[up|down]() Date: Wed, 22 Apr 2026 10:10:22 +0800 Message-ID: <20260422021042.4157510-3-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S6 X-Coremail-Antispam: 1UD129KBjvJXoW3WF4UuF1xCF4xCr1DKw4fGrg_yoWxWryxpF y2ka4rCw1ruFyqgr4Iqr4UZF43ta18K3yUWFy2krZ2va4qyw1ftF1xtayFgFWUtrWUWw4j qF4DJrn5Gw48A3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUm014x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0x vEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVj vjDU0xZFpf9x0JUADGOUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Refactor ext4_setattr() by introducing two helper functions, ext4_truncate_up() and ext4_truncate_down(), to handle size changes. The current ATTR_SIZE processing consolidates checks for both shrinking and non-shrinking cases, leading to cluttered code. Separating the truncation paths improves readability. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 17 ++++++ fs/ext4/inode.c | 157 ++++++++++++++++++++++++++---------------------- 2 files changed, 101 insertions(+), 73 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 94283a991e5c..9e4353432325 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3501,6 +3501,23 @@ static inline int ext4_update_inode_size(struct inod= e *inode, loff_t newsize) return changed; } =20 +/* + * Set i_size and i_disksize to 'newsize'. + * + * Both i_rwsem and i_data_sem are required here to avoid races between + * generic append writeback and concurrent truncate that also modify + * i_size and i_disksize. + */ +static inline void ext4_set_inode_size(struct inode *inode, loff_t newsize) +{ + WARN_ON_ONCE(S_ISREG(inode->i_mode) && !inode_is_locked(inode)); + + down_write(&EXT4_I(inode)->i_data_sem); + i_size_write(inode, newsize); + EXT4_I(inode)->i_disksize =3D newsize; + up_write(&EXT4_I(inode)->i_data_sem); +} + int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset, loff_t len); =20 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 0751dc55e94f..5e913aca6499 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5855,6 +5855,83 @@ static void ext4_wait_for_tail_page_commit(struct in= ode *inode) } } =20 +static int ext4_truncate_up(struct inode *inode, loff_t oldsize, loff_t ne= wsize) +{ + ext4_lblk_t old_lblk, new_lblk; + handle_t *handle; + int ret; + + if (!IS_ALIGNED(oldsize | newsize, i_blocksize(inode))) { + ret =3D ext4_inode_attach_jinode(inode); + if (ret) + return ret; + } + + inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); + if (oldsize & (i_blocksize(inode) - 1)) { + ret =3D ext4_block_zero_eof(inode, oldsize, LLONG_MAX); + if (ret) + return ret; + } + + handle =3D ext4_journal_start(inode, EXT4_HT_INODE, 3); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + old_lblk =3D oldsize > 0 ? (oldsize - 1) >> inode->i_blkbits : 0; + new_lblk =3D newsize > 0 ? (newsize - 1) >> inode->i_blkbits : 0; + ext4_fc_track_range(handle, inode, old_lblk, new_lblk); + + ext4_set_inode_size(inode, newsize); + + ret =3D ext4_mark_inode_dirty(handle, inode); + ext4_journal_stop(handle); + if (ret) + return ret; + /* + * isize extend must be called outside an active handle due to + * the lock ordering of transaction start and folio lock in the + * iomap buffered I/O path (folio lock -> transaction start). + */ + pagecache_isize_extended(inode, oldsize, newsize); + return 0; +} + +static int ext4_truncate_down(struct inode *inode, loff_t oldsize, + loff_t newsize, int *orphan) +{ + ext4_lblk_t start_lblk; + handle_t *handle; + int ret; + + handle =3D ext4_journal_start(inode, EXT4_HT_INODE, 3); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + if (ext4_handle_valid(handle)) { + ret =3D ext4_orphan_add(handle, inode); + *orphan =3D 1; + if (ret) { + ext4_journal_stop(handle); + return ret; + } + } + + start_lblk =3D newsize > 0 ? (newsize - 1) >> inode->i_blkbits : 0; + ext4_fc_track_range(handle, inode, start_lblk, EXT_MAX_BLOCKS - 1); + + ext4_set_inode_size(inode, newsize); + + ret =3D ext4_mark_inode_dirty(handle, inode); + ext4_journal_stop(handle); + if (ret) + return ret; + + if (ext4_should_journal_data(inode)) + ext4_wait_for_tail_page_commit(inode); + return 0; +} + /* * ext4_setattr() * @@ -5951,7 +6028,6 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dent= ry *dentry, } =20 if (attr->ia_valid & ATTR_SIZE) { - handle_t *handle; loff_t oldsize =3D inode->i_size; int shrink =3D (attr->ia_size < inode->i_size); =20 @@ -6003,78 +6079,13 @@ int ext4_setattr(struct mnt_idmap *idmap, struct de= ntry *dentry, goto err_out; } =20 - if (attr->ia_size !=3D inode->i_size) { - /* attach jbd2 jinode for EOF folio tail zeroing */ - if (attr->ia_size & (inode->i_sb->s_blocksize - 1) || - oldsize & (inode->i_sb->s_blocksize - 1)) { - error =3D ext4_inode_attach_jinode(inode); - if (error) - goto out_mmap_sem; - } - - /* - * Update c/mtime and tail zero the EOF folio on - * truncate up. ext4_truncate() handles the shrink case - * below. - */ - if (!shrink) { - inode_set_mtime_to_ts(inode, - inode_set_ctime_current(inode)); - if (oldsize & (inode->i_sb->s_blocksize - 1)) { - error =3D ext4_block_zero_eof(inode, - oldsize, LLONG_MAX); - if (error) - goto out_mmap_sem; - } - } - - handle =3D ext4_journal_start(inode, EXT4_HT_INODE, 3); - if (IS_ERR(handle)) { - error =3D PTR_ERR(handle); - goto out_mmap_sem; - } - if (ext4_handle_valid(handle) && shrink) { - error =3D ext4_orphan_add(handle, inode); - orphan =3D 1; - if (error) - goto out_handle; - } - - if (shrink) - ext4_fc_track_range(handle, inode, - (attr->ia_size > 0 ? attr->ia_size - 1 : 0) >> - inode->i_sb->s_blocksize_bits, - EXT_MAX_BLOCKS - 1); - else - ext4_fc_track_range( - handle, inode, - (oldsize > 0 ? oldsize - 1 : oldsize) >> - inode->i_sb->s_blocksize_bits, - (attr->ia_size > 0 ? attr->ia_size - 1 : 0) >> - inode->i_sb->s_blocksize_bits); - - /* - * We have to update i_size under i_data_sem together - * with i_disksize to avoid races with writeback code - * updating disksize in mpage_map_and_submit_extent(). - */ - down_write(&EXT4_I(inode)->i_data_sem); - i_size_write(inode, attr->ia_size); - EXT4_I(inode)->i_disksize =3D attr->ia_size; - up_write(&EXT4_I(inode)->i_data_sem); - - error =3D ext4_mark_inode_dirty(handle, inode); -out_handle: - ext4_journal_stop(handle); - if (error) - goto out_mmap_sem; - if (!shrink) { - pagecache_isize_extended(inode, oldsize, - inode->i_size); - } else if (ext4_should_journal_data(inode)) { - ext4_wait_for_tail_page_commit(inode); - } - } + if (attr->ia_size > oldsize) + error =3D ext4_truncate_up(inode, oldsize, attr->ia_size); + else if (shrink) + error =3D ext4_truncate_down(inode, oldsize, + attr->ia_size, &orphan); + if (error) + goto out_mmap_sem; =20 /* * Truncate pagecache after we've waited for commit --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A59320299B; Wed, 22 Apr 2026 02:16:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824220; cv=none; b=TreVKJCl9Ogge1iiiVHvHcFM1B3/e76acRFZFdWrcqCfrxv8UbM4DuQRgD83d43+4fd5/Wc4vmbaUiq4eCMZHxWfxBSuqxkVFADzgdBuX8XrgRTT/cmOpURzLz2tEtfRhA+D+xVMbXhLSAlBDnzhyaYwSvmYPHIONudVJl+wcCo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824220; c=relaxed/simple; bh=v0C9dOfxKQztbHv75QGxmgf12Sk8kWfhRM5Rh5+JIOQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kvWIkE9H+VtLTNvQLeMjCzh7mO4Dwrnu9miIfHKLlRODEWLxNnX0xi0mA+vGMojnotCVUtKwLs1bfnkuO9IaYrpYmfiysVEf4pLnQ6URKKTUsSaBmiOrzzhU7S8UmtcHbx0sC0xK5HwIRal/0GR8V6WSfVo7IXhFZlD9d/EAfGY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWL4cc0zYQtpy; Wed, 22 Apr 2026 10:15:58 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id D0B8F40604; Wed, 22 Apr 2026 10:16:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S7; Wed, 22 Apr 2026 10:16:55 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 03/22] ext4: simplify error handling in ext4_setattr() Date: Wed, 22 Apr 2026 10:10:23 +0800 Message-ID: <20260422021042.4157510-4-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S7 X-Coremail-Antispam: 1UD129KBjvJXoW7uw1xZr4UKF17Kw1xXryrZwb_yoW8Kry5pF yfG3Wqkr48Wr9rWr4rKFy7Z3WFq3WIq3yUAFy3K3Z2kFn3JwnxtFy2gayFgFW5GrWkWw1a qF4UKr9xCr15W3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42 IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIev Ja73UjIFyTuYvjfUF3kuDUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Remove the redundant rc variable and consolidate error handling. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 5e913aca6499..59405a95ecfc 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5960,7 +5960,7 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dent= ry *dentry, struct iattr *attr) { struct inode *inode =3D d_inode(dentry); - int error, rc =3D 0; + int error; int orphan =3D 0; const unsigned int ia_valid =3D attr->ia_valid; bool inc_ivers =3D true; @@ -6073,8 +6073,8 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dent= ry *dentry, =20 filemap_invalidate_lock(inode->i_mapping); =20 - rc =3D ext4_break_layouts(inode); - if (rc) { + error =3D ext4_break_layouts(inode); + if (error) { filemap_invalidate_unlock(inode->i_mapping); goto err_out; } @@ -6096,22 +6096,23 @@ int ext4_setattr(struct mnt_idmap *idmap, struct de= ntry *dentry, * Call ext4_truncate() even if i_size didn't change to * truncate possible preallocated blocks. */ - if (attr->ia_size <=3D oldsize) { - rc =3D ext4_truncate(inode); - if (rc) - error =3D rc; - } + if (attr->ia_size <=3D oldsize) + error =3D ext4_truncate(inode); out_mmap_sem: filemap_invalidate_unlock(inode->i_mapping); + if (error) + goto err_out; } =20 - if (!error) { - if (inc_ivers) - inode_inc_iversion(inode); - setattr_copy(idmap, inode, attr); - mark_inode_dirty(inode); - } + if (inc_ivers) + inode_inc_iversion(inode); + setattr_copy(idmap, inode, attr); + mark_inode_dirty(inode); =20 + if (ia_valid & ATTR_MODE) + error =3D posix_acl_chmod(idmap, dentry, inode->i_mode); + +err_out: /* * If the call to ext4_truncate failed to get a transaction handle at * all, we need to clean up the in-core orphan list manually. @@ -6119,14 +6120,8 @@ int ext4_setattr(struct mnt_idmap *idmap, struct den= try *dentry, if (orphan && inode->i_nlink) ext4_orphan_del(NULL, inode); =20 - if (!error && (ia_valid & ATTR_MODE)) - rc =3D posix_acl_chmod(idmap, dentry, inode->i_mode); - -err_out: - if (error) + if (error) ext4_std_error(inode->i_sb, error); - if (!error) - error =3D rc; return error; } =20 --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E491B2AF00; Wed, 22 Apr 2026 02:16:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824220; cv=none; b=Kl7fYPQlsoJWfeXni/nAXejogHg1AngBECLPaWaKS9j1wPxGm0WIbFZPIxbTdkRMQbfKbwK64XFsmcq/VCMuWzxOLpq/IqkTsjjQQliDbBauWBdfM5RZlCxVYkiqOtgMMha6HIE5KKVNHulvTpwxFxlLBaVumLGgOK6zHw4fMRI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824220; c=relaxed/simple; bh=93eZt22IW3rOXyfSLbbHOlMyViJusbp8Qbt+GwEybTY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pFR94sXWvdHQm2vJIf/qmr6JfbXszlWNBrr8P9g1yDcMtOOeq3AvFlDqxTaEIRrqT108XsuSjBY28n0Vl+kyUzAoRNqt6OFfZJsD84dJQ0PncAi0W02L5dN4mJQSCh8A5ANGTvAPX9ASwOBJEfnf6WXrTPSu5B3CDbcpMqhOCsY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWL5jyrzYQtqG; Wed, 22 Apr 2026 10:15:58 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id EF765405FA; Wed, 22 Apr 2026 10:16:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S8; Wed, 22 Apr 2026 10:16:55 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 04/22] ext4: add iomap address space operations for buffered I/O Date: Wed, 22 Apr 2026 10:10:24 +0800 Message-ID: <20260422021042.4157510-5-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S8 X-Coremail-Antispam: 1UD129KBjvJXoWxXF1DuryDWryrtF4rGryfZwb_yoW5Kw45pF 98Kas8Gr18XF9rua1Sqa9rZF4Yka4fJw4UKFW3W3Wa9Fy5G3y7KFW0k3WYkFy5K3y8Ar42 qF4j9rW7WF17CrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCw CI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnI WIevJa73UjIFyTuYvjfUriihUUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Introduce initial support for iomap in the buffered I/O path for regular files on ext4. - Add a new inode state flag EXT4_STATE_BUFFERED_IOMAP to indicate the inode uses iomap instead of buffer_head for buffered I/O - Add helper ext4_inode_buffered_iomap() to check the flag - Add new address space operations ext4_iomap_aops with callbacks that will use generic iomap implementations - Add ext4_iomap_aops to ext4_set_aops() when the flag is set The following callbacks(read_folio(), readahead(), writepages()) are provided as placeholders and will be implemented in later patches. Signed-off-by: Zhang Yi Reviewed-by: Jan Kara --- fs/ext4/ext4.h | 7 +++++++ fs/ext4/inode.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9e4353432325..fe3491ad2129 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1972,6 +1972,7 @@ enum { EXT4_STATE_FC_COMMITTING, /* Fast commit ongoing */ EXT4_STATE_FC_FLUSHING_DATA, /* Fast commit flushing data */ EXT4_STATE_ORPHAN_FILE, /* Inode orphaned in orphan file */ + EXT4_STATE_BUFFERED_IOMAP, /* Inode use iomap for buffered IO */ }; =20 #define EXT4_INODE_BIT_FNS(name, field, offset) \ @@ -2040,6 +2041,12 @@ static inline bool ext4_inode_orphan_tracked(struct = inode *inode) !list_empty(&EXT4_I(inode)->i_orphan); } =20 +/* Whether the inode pass through the iomap infrastructure for buffered I/= O */ +static inline bool ext4_inode_buffered_iomap(struct inode *inode) +{ + return ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP); +} + /* * Codes for operating systems */ diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 59405a95ecfc..9e9f421888ed 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3908,6 +3908,22 @@ const struct iomap_ops ext4_iomap_report_ops =3D { .iomap_begin =3D ext4_iomap_begin_report, }; =20 +static int ext4_iomap_read_folio(struct file *file, struct folio *folio) +{ + return 0; +} + +static void ext4_iomap_readahead(struct readahead_control *rac) +{ + +} + +static int ext4_iomap_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + return 0; +} + /* * For data=3Djournal mode, folio should be marked dirty only when it was * writeably mapped. When that happens, it was already attached to the @@ -3994,6 +4010,20 @@ static const struct address_space_operations ext4_da= _aops =3D { .swap_activate =3D ext4_iomap_swap_activate, }; =20 +static const struct address_space_operations ext4_iomap_aops =3D { + .read_folio =3D ext4_iomap_read_folio, + .readahead =3D ext4_iomap_readahead, + .writepages =3D ext4_iomap_writepages, + .dirty_folio =3D iomap_dirty_folio, + .bmap =3D ext4_bmap, + .invalidate_folio =3D iomap_invalidate_folio, + .release_folio =3D iomap_release_folio, + .migrate_folio =3D filemap_migrate_folio, + .is_partially_uptodate =3D iomap_is_partially_uptodate, + .error_remove_folio =3D generic_error_remove_folio, + .swap_activate =3D ext4_iomap_swap_activate, +}; + static const struct address_space_operations ext4_dax_aops =3D { .writepages =3D ext4_dax_writepages, .dirty_folio =3D noop_dirty_folio, @@ -4015,6 +4045,8 @@ void ext4_set_aops(struct inode *inode) } if (IS_DAX(inode)) inode->i_mapping->a_ops =3D &ext4_dax_aops; + else if (ext4_inode_buffered_iomap(inode)) + inode->i_mapping->a_ops =3D &ext4_iomap_aops; else if (test_opt(inode->i_sb, DELALLOC)) inode->i_mapping->a_ops =3D &ext4_da_aops; else --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3C3B3128B2; Wed, 22 Apr 2026 02:17:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824222; cv=none; b=kf7Q8Muv4nxlqMdiIlGZfBgBaA2Bju2TRLQUC8RRde/LBb5Dn5ukjJ3rv4Md813AW9wrhuE7waZIUHmrLB9bBZ0jf8s4ldGFnaVyQ6EAvuDzTgxVUS7lS8w/7+aKWGpVtSZhsGHqDFyaCWMq3S8ELjpv9IW+FnGL3n+xspldh3E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824222; c=relaxed/simple; bh=AxOhwkfgQS+PDWAJdieVpQIzyuhddjFIhmvSBwoE34k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Fxl5HUM890duuOJG+w7RM1KjkfzyBIRBk0Wa+kEASeVgUmfRGOYHlLzyAX1/wj1PxfTKGyek+v9efeFOSt+zghUsHNWMgKJpESnrL8vrhIrNJRrmnj/stHrN0ZzzLdQHMifcJuN5F0WsEVQD+HNl6+VutZBXSMkLFposcWRHLLU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4g0jX40CmhzKHMPM; Wed, 22 Apr 2026 10:16:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 144AA405D4; Wed, 22 Apr 2026 10:16:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S9; Wed, 22 Apr 2026 10:16:55 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 05/22] ext4: implement buffered read path using iomap Date: Wed, 22 Apr 2026 10:10:25 +0800 Message-ID: <20260422021042.4157510-6-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S9 X-Coremail-Antispam: 1UD129KBjvJXoWxGrW3tr4DXry8JFWktr1DJrb_yoW5Xr47pF 90kFy5Gr4UWrnF9F4SqFZrXr1Ykan7Ja1UWryfGwnxWF90krW2gayjgF1YvF15tw47Ar40 qF4jkry8Wr1UArDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUma14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCw CI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsG vfC2KfnxnUUI43ZEXa7VU1zpBDUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Implement the iomap read path for ext4 by introducing a new ext4_iomap_buffered_read_ops instance. This provides the read_folio() and readahead() callbacks for ext4_iomap_aops. The implementation introduces: - ext4_iomap_map_blocks(): Helper function to query extent mappings for a given read range using ext4_map_blocks() and convert the mapping information to iomap type - ext4_iomap_buffered_read_begin(): The iomap_begin callback that maps blocks, validates filesystem state, and populates the iomap. It returns -ERANGE for inline data which is not yet supported. Signed-off-by: Zhang Yi Reviewed-by: Jan Kara --- fs/ext4/inode.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9e9f421888ed..df21f6870ec4 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3908,14 +3908,57 @@ const struct iomap_ops ext4_iomap_report_ops =3D { .iomap_begin =3D ext4_iomap_begin_report, }; =20 +static int ext4_iomap_map_blocks(struct inode *inode, loff_t offset, + loff_t length, struct ext4_map_blocks *map) +{ + u8 blkbits =3D inode->i_blkbits; + + if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) + return -EINVAL; + + /* Calculate the first and last logical blocks respectively. */ + map->m_lblk =3D offset >> blkbits; + map->m_len =3D min_t(loff_t, (offset + length - 1) >> blkbits, + EXT4_MAX_LOGICAL_BLOCK) - map->m_lblk + 1; + + return ext4_map_blocks(NULL, inode, map, 0); +} + +static int ext4_iomap_buffered_read_begin(struct inode *inode, loff_t offs= et, + loff_t length, unsigned int flags, struct iomap *iomap, + struct iomap *srcmap) +{ + struct ext4_map_blocks map; + int ret; + + if (unlikely(ext4_forced_shutdown(inode->i_sb))) + return -EIO; + + /* Inline data support is not yet available. */ + if (WARN_ON_ONCE(ext4_has_inline_data(inode))) + return -ERANGE; + + ret =3D ext4_iomap_map_blocks(inode, offset, length, &map); + if (ret < 0) + return ret; + + ext4_set_iomap(inode, iomap, &map, offset, length, flags); + return 0; +} + +const struct iomap_ops ext4_iomap_buffered_read_ops =3D { + .iomap_begin =3D ext4_iomap_buffered_read_begin, +}; + static int ext4_iomap_read_folio(struct file *file, struct folio *folio) { + iomap_bio_read_folio(folio, &ext4_iomap_buffered_read_ops); return 0; } =20 static void ext4_iomap_readahead(struct readahead_control *rac) { - + iomap_bio_readahead(rac, &ext4_iomap_buffered_read_ops); } =20 static int ext4_iomap_writepages(struct address_space *mapping, --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD0A7345721; Wed, 22 Apr 2026 02:17:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824228; cv=none; b=WZmrp9T/41keWanURNjehijoMOVvYkswkyQkPjwYbZ6n1I1EtNXZ5gcBjyCbTAZ4R4roKHL7BnqVX4jDKMQxTo0T2uTNU8PJiI0DiyVGtbgjFeyNmQC88HeM1yXQH/w+Xl2LrQtzvWUNWMaY3UcU73QIeEuBUO/VaP2HvIhCBsQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824228; c=relaxed/simple; bh=LSueVWoJnsscuVIREEDByNIXpfqsPJMkD635NnFcnwE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WoadY5RmwKCPcStbAGK21nEac/brzNx4CYwD9J6aNEveFqrYFQ/sB3Z6exfR9M0/sV4N9x4IxzwyTxb6DAO0SGWofv+ShmehAiQvB9qQuIlgwD2zlAKFqx7yZPImSrLRMqaqYNUMxjR1jAznOtJoenMk9U/uEqayFfi9w1h3Jak= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4g0jX40lJ0zKHMQS; Wed, 22 Apr 2026 10:16:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 238D340601; Wed, 22 Apr 2026 10:16:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S10; Wed, 22 Apr 2026 10:16:55 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 06/22] ext4: pass out extent seq counter when mapping da blocks Date: Wed, 22 Apr 2026 10:10:26 +0800 Message-ID: <20260422021042.4157510-7-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S10 X-Coremail-Antispam: 1UD129KBjvJXoW7tF1ftFWDJr48AF4xKw1kuFg_yoW8Cr15p3 9YkF1rGw18Zw1q9ay8X3W7ZFyrKan8ArW7GrWfXw1j9a4DWFySqF4jkF17AFykKr4xXr1F vF48CryUC3ySkFDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi The iomap buffered write path does not hold any locks between querying inode extent mapping information and performing buffered writes. It relies on the sequence counter saved in the inode to detect stale mappings. Commit 07c440e8da8f ("ext4: pass out extent seq counter when mapping blocks") added the m_seq field to ext4_map_blocks to pass out extent sequence numbers, but it missed two callsites within ext4_da_map_blocks(). These callsites are on the delayed allocation path, which is also used in the iomap buffered write path. Pass out the sequence counter to ensure stale mappings can be detected. Signed-off-by: Zhang Yi Reviewed-by: Jan Kara --- fs/ext4/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index df21f6870ec4..5ffd6aeb3485 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1929,7 +1929,7 @@ static int ext4_da_map_blocks(struct inode *inode, st= ruct ext4_map_blocks *map) ext4_check_map_extents_env(inode); =20 /* Lookup extent status tree firstly */ - if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, NULL)) { + if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, &map->m_seq)) { map->m_len =3D min_t(unsigned int, map->m_len, es.es_len - (map->m_lblk - es.es_lblk)); =20 @@ -1982,7 +1982,7 @@ static int ext4_da_map_blocks(struct inode *inode, st= ruct ext4_map_blocks *map) * is held in write mode, before inserting a new da entry in * the extent status tree. */ - if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, NULL)) { + if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, &map->m_seq)) { map->m_len =3D min_t(unsigned int, map->m_len, es.es_len - (map->m_lblk - es.es_lblk)); =20 --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10EB530F938; Wed, 22 Apr 2026 02:16:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824221; cv=none; b=Wd/OVerq5FH2tsCuEUJNROXBQZMSf/mVhyGoyAXskG74vrLjGM+s2DaLoRZQ6ZOzqhr2Hc93umN1xKojHv3oT06TMqrOLcAnWBrrfpNo+x2E6RfRCXKDSCD6oBkFF2UEdNRYVETPv42/C+o+5MLbMiv4eJ6HbeOVibQT/o4CuAw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824221; c=relaxed/simple; bh=J6TPc/ZuvZyAI7NvDmtzSyRLjPtrL2PL7n1prHF0gKo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zp+4RO4f3dX6gN4Jj0Wzj/khdZGKLxtruDQzClT3CSs8tTfZ6S2GLVhojU+s6jKAQY17wrDzsgy4IeQV2mxdquSbCWKmwnzL2gmjYflaSPq4JceQzhELL79XRRYusdJJWmToAfI20OU17H4/iFmiUHiKkn0+dWcNrpZ+/7gyg1w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4g0jX41WdBzKHMS4; Wed, 22 Apr 2026 10:16:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 3D57640605; Wed, 22 Apr 2026 10:16:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S11; Wed, 22 Apr 2026 10:16:55 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 07/22] ext4: do not use data=ordered mode for inodes using buffered iomap path Date: Wed, 22 Apr 2026 10:10:27 +0800 Message-ID: <20260422021042.4157510-8-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S11 X-Coremail-Antispam: 1UD129KBjvJXoWxZr1fJFW8tw43CFWrury7GFg_yoW5CrWfpF W5K345JrWvva47Cr4xuF4Iqr1ay3yrJr47JFy7GFZrua98J3WIkFy8tF1Yya4UKrWfG3W2 qr4UGrWxWFs0yrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Do not use data=3Dordered mode for inodes using the buffered iomap path. There are two reasons: 1. The lock ordering of the folio lock and starting transactions conflicts with the data=3Dordered mode. In the writeback path of the iomap, it processes each folio one by one. It first holds the folio lock and then starts a transaction to create the block mapping. In the data=3Dordered mode, if we perform writeback through the journal commit process, it may try to acquire the folio lock of a folio already locked by iomap, and the iomap could start a new transaction under this folio lock, which may also wait for the current committing transaction to finish, finally triggering a deadlock. 2. The iomap writeback path doesn't support partial folio submission. In the data=3Dordered mode, when the journal process is waiting for a folio to be written back, and the folio may also contain unmapped blocks with a block size smaller than the folio size, if the regular writeback process has already started committing this folio (and set the writeback flag), then a deadlock may occur while mapping the remaining unmapped blocks. This is because the writeback flag is cleared only after the entire folio are processed and committed. To support the data=3Dordered mode, we need to modify the iomap infrastructure by grabbing the transaction handle before we lock any folio for writeback. In addition, we need to add support for submitting partial folios, which is complicated and tricky, and may also cause performance regressions. Therefore, we need to get rid of the data=3Dordered mode when doing the conversion. Currently, there are three scenarios where the data=3Dordered mode is used: - Append write - Post-EOF partial block truncate up and append write - Online defragmentation For append write, we can get rid of it by always allocating unwritten blocks, retains the behavior of the current extents-type inode. For post-E0F partial block truncate up and append write, we can get rid of it by postponing updating i_disksize after the zeroed partial block is written back. For the case of online defragmentation, it has not yet been supported, we can find other solutions later. Signed-off-by: Zhang Yi --- fs/ext4/ext4_jbd2.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 63d17c5201b5..26999f173870 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -383,7 +383,12 @@ static inline int ext4_should_journal_data(struct inod= e *inode) =20 static inline int ext4_should_order_data(struct inode *inode) { - return ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE; + /* + * inodes using the iomap buffered I/O path do not use the + * data=3Dordered mode. + */ + return !ext4_inode_buffered_iomap(inode) && + (ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE); } =20 static inline int ext4_should_writeback_data(struct inode *inode) --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80792310620; Wed, 22 Apr 2026 02:17:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824223; cv=none; b=lAC1zK0zRPASPDnnT+SOmIsv0h/TPSD82ReeYrtTHaKGubETvpPoHxt7OUMxILa7b3vN88EOMPHUMthLhwA/jqaCD25HWpk7K4y15hIzC37N2t33AuFI439XxRXSfmRdVSCkEasLEAUrknmUU3FJ5yh2g4D+n6OzlDSLlvAn3Cg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824223; c=relaxed/simple; bh=6HObb2LAWjwIuKdbRXu5Ab0/G5LsBqg7PpPg8bLypfg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uu/KeUSFUfg4O/RUZdGbTHqr/H7J6ZNm8xt9MVTzWPP/7RSmjcv1NBS2k9RUGVqWduCM/LjTtmG3rvkFP3W8DzrESpFsljdSu99E87A4LPtn9AdRBj7Xv+BWOWyEsj2+AbD2WwhkZeMDBCoy0MTRIzlNO0WR1ZNWHB4Zkq+k/b4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4g0jX421tVzKHMQS; Wed, 22 Apr 2026 10:16:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 5152240604; Wed, 22 Apr 2026 10:16:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S12; Wed, 22 Apr 2026 10:16:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 08/22] ext4: implement buffered write path using iomap Date: Wed, 22 Apr 2026 10:10:28 +0800 Message-ID: <20260422021042.4157510-9-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S12 X-Coremail-Antispam: 1UD129KBjvJXoW3Aw1kJFWrAw1fWFW7WF18Grg_yoWkuF4kpF 90kry5GFsrXr97uF4ftF47Zr1F93WxtrW7CrW3Wrn8XryqyrWIqF40gFyayF15trZ7Cr4j qF4Ykry8Wr4UCrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Introduce two new iomap_ops instances, ext4_iomap_buffered_write_ops and ext4_iomap_buffered_da_write_ops, to implement the iomap write paths for ext4. ext4_iomap_buffered_da_write_begin() invokes ext4_da_map_blocks() to map delayed allocation extents, and ext4_iomap_buffer_write_begin() invokes ext4_iomap_get_blocks() to directly allocate blocks in non-delayed allocation mode. Additionally, add ext4_iomap_valid() to check the validity of extents by the iomap infrastructure. Key changes: - Since we don't use data=3Dordered mode to prevent exposing stale data in the non-delayed allocation path, we always allocate unwritten extents for new blocks. - The iomap write path maps multiple blocks at a time in the iomap_begin() callbacks, so we must remove the stale delayed allocation range in case of short writes and write failures. Otherwise, this could result in a range of delayed extents being covered by a clean folio, which would lead to inaccurate space reservation. - The lock ordering of the folio lock and transaction start is the opposite of that in the buffer_head buffered write path. So we have to stop journal handle in the iomap_begin() callbacks. The lock ordering documentation in super.c has been updated accordingly. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 4 ++ fs/ext4/file.c | 20 +++++- fs/ext4/inode.c | 164 +++++++++++++++++++++++++++++++++++++++++++++++- fs/ext4/super.c | 10 ++- 4 files changed, 191 insertions(+), 7 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index fe3491ad2129..be92ff648362 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3057,6 +3057,7 @@ int ext4_walk_page_buffers(handle_t *handle, int do_journal_get_write_access(handle_t *handle, struct inode *inode, struct buffer_head *bh); void ext4_set_inode_mapping_order(struct inode *inode); +int ext4_nonda_switch(struct super_block *sb); #define FALL_BACK_TO_NONDELALLOC 1 #define CONVERT_INLINE_DATA 2 =20 @@ -3943,6 +3944,9 @@ static inline void ext4_clear_io_unwritten_flag(ext4_= io_end_t *io_end) =20 extern const struct iomap_ops ext4_iomap_ops; extern const struct iomap_ops ext4_iomap_report_ops; +extern const struct iomap_ops ext4_iomap_buffered_write_ops; +extern const struct iomap_ops ext4_iomap_buffered_da_write_ops; +extern const struct iomap_write_ops ext4_iomap_write_ops; =20 static inline int ext4_buffer_uptodate(struct buffer_head *bh) { diff --git a/fs/ext4/file.c b/fs/ext4/file.c index eb1a323962b1..7f9bfbbc4a4e 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -299,6 +299,21 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, s= truct iov_iter *from) return count; } =20 +static ssize_t ext4_iomap_buffered_write(struct kiocb *iocb, + struct iov_iter *from) +{ + struct inode *inode =3D file_inode(iocb->ki_filp); + const struct iomap_ops *iomap_ops; + + if (test_opt(inode->i_sb, DELALLOC) && !ext4_nonda_switch(inode->i_sb)) + iomap_ops =3D &ext4_iomap_buffered_da_write_ops; + else + iomap_ops =3D &ext4_iomap_buffered_write_ops; + + return iomap_file_buffered_write(iocb, from, iomap_ops, + &ext4_iomap_write_ops, NULL); +} + static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, struct iov_iter *from) { @@ -313,7 +328,10 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *= iocb, if (ret <=3D 0) goto out; =20 - ret =3D generic_perform_write(iocb, from); + if (ext4_inode_buffered_iomap(inode)) + ret =3D ext4_iomap_buffered_write(iocb, from); + else + ret =3D generic_perform_write(iocb, from); =20 out: inode_unlock(inode); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 5ffd6aeb3485..0ca303a90249 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3097,7 +3097,7 @@ static int ext4_dax_writepages(struct address_space *= mapping, return ret; } =20 -static int ext4_nonda_switch(struct super_block *sb) +int ext4_nonda_switch(struct super_block *sb) { s64 free_clusters, dirty_clusters; struct ext4_sb_info *sbi =3D EXT4_SB(sb); @@ -3467,6 +3467,15 @@ static bool ext4_inode_datasync_dirty(struct inode *= inode) return inode_state_read_once(inode) & I_DIRTY_DATASYNC; } =20 +static bool ext4_iomap_valid(struct inode *inode, const struct iomap *ioma= p) +{ + return iomap->validity_cookie =3D=3D READ_ONCE(EXT4_I(inode)->i_es_seq); +} + +const struct iomap_write_ops ext4_iomap_write_ops =3D { + .iomap_valid =3D ext4_iomap_valid, +}; + static void ext4_set_iomap(struct inode *inode, struct iomap *iomap, struct ext4_map_blocks *map, loff_t offset, loff_t length, unsigned int flags) @@ -3501,6 +3510,8 @@ static void ext4_set_iomap(struct inode *inode, struc= t iomap *iomap, !ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) iomap->flags |=3D IOMAP_F_MERGED; =20 + iomap->validity_cookie =3D map->m_seq; + /* * Flags passed to ext4_map_blocks() for direct I/O writes can result * in m_flags having both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN bits @@ -3908,8 +3919,12 @@ const struct iomap_ops ext4_iomap_report_ops =3D { .iomap_begin =3D ext4_iomap_begin_report, }; =20 +/* Map blocks */ +typedef int (ext4_get_blocks_t)(struct inode *, struct ext4_map_blocks *); + static int ext4_iomap_map_blocks(struct inode *inode, loff_t offset, - loff_t length, struct ext4_map_blocks *map) + loff_t length, ext4_get_blocks_t get_blocks, + struct ext4_map_blocks *map) { u8 blkbits =3D inode->i_blkbits; =20 @@ -3921,6 +3936,9 @@ static int ext4_iomap_map_blocks(struct inode *inode,= loff_t offset, map->m_len =3D min_t(loff_t, (offset + length - 1) >> blkbits, EXT4_MAX_LOGICAL_BLOCK) - map->m_lblk + 1; =20 + if (get_blocks) + return get_blocks(inode, map); + return ext4_map_blocks(NULL, inode, map, 0); } =20 @@ -3938,7 +3956,7 @@ static int ext4_iomap_buffered_read_begin(struct inod= e *inode, loff_t offset, if (WARN_ON_ONCE(ext4_has_inline_data(inode))) return -ERANGE; =20 - ret =3D ext4_iomap_map_blocks(inode, offset, length, &map); + ret =3D ext4_iomap_map_blocks(inode, offset, length, NULL, &map); if (ret < 0) return ret; =20 @@ -3946,6 +3964,146 @@ static int ext4_iomap_buffered_read_begin(struct in= ode *inode, loff_t offset, return 0; } =20 +static int ext4_iomap_get_blocks(struct inode *inode, + struct ext4_map_blocks *map) +{ + loff_t i_size =3D i_size_read(inode); + handle_t *handle; + int ret, needed_blocks; + + /* + * Check if the blocks have already been allocated, this could + * avoid initiating a new journal transaction and return the + * mapping information directly. + */ + if ((map->m_lblk + map->m_len) <=3D + round_up(i_size, i_blocksize(inode)) >> inode->i_blkbits) { + ret =3D ext4_map_blocks(NULL, inode, map, 0); + if (ret < 0) + return ret; + if (map->m_flags & (EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN | + EXT4_MAP_DELAYED)) + return 0; + } + + /* + * Reserve one block more for addition to orphan list in case + * we allocate blocks but write fails for some reason. + */ + needed_blocks =3D ext4_chunk_trans_blocks(inode, map->m_len) + 1; + handle =3D ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + ret =3D ext4_map_blocks(handle, inode, map, + EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); + /* + * Stop handle here following the lock ordering of the folio lock + * and the transaction start. + */ + ext4_journal_stop(handle); + + return ret; +} + +static int ext4_iomap_buffered_do_write_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap, bool delalloc) +{ + int ret, retries =3D 0; + struct ext4_map_blocks map; + ext4_get_blocks_t *get_blocks; + + ret =3D ext4_emergency_state(inode->i_sb); + if (unlikely(ret)) + return ret; + + /* Inline data support is not yet available. */ + if (WARN_ON_ONCE(ext4_has_inline_data(inode))) + return -ERANGE; + if (WARN_ON_ONCE(!(flags & IOMAP_WRITE))) + return -EINVAL; + + if (delalloc) + get_blocks =3D ext4_da_map_blocks; + else + get_blocks =3D ext4_iomap_get_blocks; +retry: + ret =3D ext4_iomap_map_blocks(inode, offset, length, get_blocks, &map); + if (ret =3D=3D -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) + goto retry; + if (ret < 0) + return ret; + + ext4_set_iomap(inode, iomap, &map, offset, length, flags); + return 0; +} + +static int ext4_iomap_buffered_write_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + return ext4_iomap_buffered_do_write_begin(inode, offset, length, flags, + iomap, srcmap, false); +} + +static int ext4_iomap_buffered_da_write_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + return ext4_iomap_buffered_do_write_begin(inode, offset, length, flags, + iomap, srcmap, true); +} + +/* + * Drop the staled delayed allocation range from the write failure, + * including both start and end blocks. If not, we could leave a range + * of delayed extents covered by a clean folio, it could lead to + * inaccurate space reservation. + */ +static void ext4_iomap_punch_delalloc(struct inode *inode, loff_t offset, + loff_t length, struct iomap *iomap) +{ + down_write(&EXT4_I(inode)->i_data_sem); + ext4_es_remove_extent(inode, offset >> inode->i_blkbits, + DIV_ROUND_UP_ULL(length, EXT4_BLOCK_SIZE(inode->i_sb))); + up_write(&EXT4_I(inode)->i_data_sem); +} + +static int ext4_iomap_buffered_da_write_end(struct inode *inode, loff_t of= fset, + loff_t length, ssize_t written, + unsigned int flags, + struct iomap *iomap) +{ + loff_t start_byte, end_byte; + + /* If we didn't reserve the blocks, we're not allowed to punch them. */ + if (iomap->type !=3D IOMAP_DELALLOC || !(iomap->flags & IOMAP_F_NEW)) + return 0; + + /* Nothing to do if we've written the entire delalloc extent */ + start_byte =3D iomap_last_written_block(inode, offset, written); + end_byte =3D round_up(offset + length, i_blocksize(inode)); + if (start_byte >=3D end_byte) + return 0; + + filemap_invalidate_lock(inode->i_mapping); + iomap_write_delalloc_release(inode, start_byte, end_byte, flags, + iomap, ext4_iomap_punch_delalloc); + filemap_invalidate_unlock(inode->i_mapping); + return 0; +} + + +const struct iomap_ops ext4_iomap_buffered_write_ops =3D { + .iomap_begin =3D ext4_iomap_buffered_write_begin, +}; + +const struct iomap_ops ext4_iomap_buffered_da_write_ops =3D { + .iomap_begin =3D ext4_iomap_buffered_da_write_begin, + .iomap_end =3D ext4_iomap_buffered_da_write_end, +}; + const struct iomap_ops ext4_iomap_buffered_read_ops =3D { .iomap_begin =3D ext4_iomap_buffered_read_begin, }; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 6a77db4d3124..9bc294b769db 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -104,9 +104,13 @@ static const struct fs_parameter_spec ext4_param_specs= []; * -> page lock -> i_data_sem (rw) * * buffered write path: - * sb_start_write -> i_mutex -> mmap_lock - * sb_start_write -> i_mutex -> transaction start -> page lock -> - * i_data_sem (rw) + * sb_start_write -> i_rwsem (w) -> mmap_lock + * - buffer_head path: + * sb_start_write -> i_rwsem (w) -> transaction start -> folio lock -> + * i_data_sem (rw) + * - iomap path: + * sb_start_write -> i_rwsem (w) -> transaction start -> i_data_sem (rw) + * sb_start_write -> i_rwsem (w) -> folio lock (not under an active hand= le) * * truncate: * sb_start_write -> i_mutex -> invalidate_lock (w) -> i_mmap_rwsem (w) -> --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD17B345CCE; Wed, 22 Apr 2026 02:17:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824226; cv=none; b=mku+I8Dm3IVdwa7EFgeE4ba7tMdePFBpc07FnmaJV/e2j+couafUoNmIb7ykpVk07hhlXqA+oRDfB1FdH35H9H0ATo6sQWYdFfpHESbZ0bxgtFYtTRnT8mkUPuoYEbutzrxb6etoIy4HNDP58MUsAxcEmY5l06lL3ITgUhciZ6o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824226; c=relaxed/simple; bh=4uuxHvJ0MnSuwgpqC0GALTHuP3B7gXA7d1fSBtbxNhM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n8q3Vo/qsZvMcTWceqS35o7LxRely2hQ5toBiZ51BGnNb01jy3OtDNh8i+tV9d6w/5Hswvcpv1cFF7QXtSM28fmFVhxsn9en21ANkKbHeLmi/fuQ7FeyKrWUN4+DSuvogyzWgcTqM4uc3iSgii1aCxnfCxhk3DVrzzHklwadZZw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWM1WtBzYQtqs; Wed, 22 Apr 2026 10:15:59 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 6818340601; Wed, 22 Apr 2026 10:16:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S13; Wed, 22 Apr 2026 10:16:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 09/22] ext4: implement writeback path using iomap Date: Wed, 22 Apr 2026 10:10:29 +0800 Message-ID: <20260422021042.4157510-10-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S13 X-Coremail-Antispam: 1UD129KBjvAXoW3ury7GFykuw18KFy7Gw4Uurg_yoW8Xw48Ao Waqa13Xr48Jry5t3yrCr1ftFyUuan7Gw4rJr45ursFvF9xJa4Yyw4xGw43W3W7Xw4FkFWf ZrWxJ3WrGr4xJF1rn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOV7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2 ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Y z7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zV AF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1l IxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIY CTnIWIevJa73UjIFyTuYvjfUriihUUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Implement the iomap writeback path for ext4. It implements ext4_iomap_writepages(), introduces a new iomap_writeback_ops instance, ext4_writeback_ops, and creates a new end I/O extent conversion worker to convert unwritten extents after the I/O is completed. In the ->writeback_range() callback, it first calls ext4_iomap_map_writeback_range() to query the longest range of existing mapped extents. For performance considerations, if the block range has not been allocated, it attempts to allocate a range of the longest blocks which is based on the writeback length and the delalloc extent length, rather than allocating for a single folio length at a time. Then, it adds the folio to the iomap_ioend instance. In the ->writeback_submit() callback, it registers a special end bio callback, ext4_iomap_end_bio(), which will start a worker if we need to convert unwritten extents or need to update i_disksize after the data has been written back, and if we need to abort the journal when the I/O failed to write back. Key changes: - Since we don't use data=3Dordered mode to prevent exposing stale data during append writebacks, we always allocate unwritten extents for new blocks and postpone updating the i_disksize until the I/O is done. In addition, the deadlock problem that was expected to be resolved through the reserve handle does not exist here. Therefore, we also do not need to use the reserve handle when converting the unwritten extent in the end I/O worker; we can start a normal journal handle instead. - Since ->writeback_range() is always executed under the folio lock, this means we need to start the handle under the folio lock as well. This is opposite to the order in the buffer_head writeback path. The lock ordering documentation in super.c has been updated accordingly. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 4 + fs/ext4/inode.c | 202 +++++++++++++++++++++++++++++++++++++++++++++- fs/ext4/page-io.c | 129 +++++++++++++++++++++++++++++ fs/ext4/super.c | 7 +- 4 files changed, 340 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index be92ff648362..0ffa81f86bc5 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1173,6 +1173,8 @@ struct ext4_inode_info { */ struct list_head i_rsv_conversion_list; struct work_struct i_rsv_conversion_work; + struct list_head i_iomap_ioend_list; + struct work_struct i_iomap_ioend_work; =20 /* * Transactions that contain inode's metadata needed to complete @@ -3887,6 +3889,8 @@ int ext4_bio_write_folio(struct ext4_io_submit *io, s= truct folio *page, size_t len); extern struct ext4_io_end_vec *ext4_alloc_io_end_vec(ext4_io_end_t *io_end= ); extern struct ext4_io_end_vec *ext4_last_io_end_vec(ext4_io_end_t *io_end); +extern void ext4_iomap_end_io(struct work_struct *work); +extern void ext4_iomap_end_bio(struct bio *bio); =20 /* mmp.c */ extern int ext4_multi_mount_protect(struct super_block *, ext4_fsblk_t); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 0ca303a90249..76ce43c64c30 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -44,6 +44,7 @@ #include =20 #include "ext4_jbd2.h" +#include "ext4_extents.h" #include "xattr.h" #include "acl.h" #include "truncate.h" @@ -4119,10 +4120,209 @@ static void ext4_iomap_readahead(struct readahead_= control *rac) iomap_bio_readahead(rac, &ext4_iomap_buffered_read_ops); } =20 +static int ext4_iomap_map_one_extent(struct inode *inode, + struct ext4_map_blocks *map) +{ + struct extent_status es; + handle_t *handle =3D NULL; + int credits, map_flags; + int retval; + + credits =3D ext4_chunk_trans_blocks(inode, map->m_len); + handle =3D ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, credits); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + map->m_flags =3D 0; + /* + * It is necessary to look up extent and map blocks under i_data_sem + * in write mode, otherwise, the delalloc extent may become stale + * during concurrent truncate operations. + */ + ext4_fc_track_inode(handle, inode); + down_write(&EXT4_I(inode)->i_data_sem); + if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, &map->m_seq)) { + retval =3D es.es_len - (map->m_lblk - es.es_lblk); + map->m_len =3D min_t(unsigned int, retval, map->m_len); + + if (ext4_es_is_delayed(&es)) { + map->m_flags |=3D EXT4_MAP_DELAYED; + trace_ext4_da_write_pages_extent(inode, map); + /* + * Call ext4_map_create_blocks() to allocate any + * delayed allocation blocks. It is possible that + * we're going to need more metadata blocks, however + * we must not fail because we're in writeback and + * there is nothing we can do so it might result in + * data loss. So use reserved blocks to allocate + * metadata if possible. + */ + map_flags =3D EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT | + EXT4_GET_BLOCKS_METADATA_NOFAIL | + EXT4_EX_NOCACHE; + + retval =3D ext4_map_create_blocks(handle, inode, map, + map_flags); + if (retval > 0) + ext4_fc_track_range(handle, inode, map->m_lblk, + map->m_lblk + map->m_len - 1); + goto out; + } else if (unlikely(ext4_es_is_hole(&es))) + goto out; + + /* Found written or unwritten extent. */ + map->m_pblk =3D ext4_es_pblock(&es) + map->m_lblk - es.es_lblk; + map->m_flags =3D ext4_es_is_written(&es) ? + EXT4_MAP_MAPPED : EXT4_MAP_UNWRITTEN; + goto out; + } + + retval =3D ext4_map_query_blocks(handle, inode, map, EXT4_EX_NOCACHE); +out: + up_write(&EXT4_I(inode)->i_data_sem); + ext4_journal_stop(handle); + return retval < 0 ? retval : 0; +} + +static int ext4_iomap_map_writeback_range(struct iomap_writepage_ctx *wpc, + loff_t offset, unsigned int dirty_len) +{ + struct inode *inode =3D wpc->inode; + struct super_block *sb =3D inode->i_sb; + struct journal_s *journal =3D EXT4_SB(sb)->s_journal; + struct ext4_map_blocks map; + unsigned int blkbits =3D inode->i_blkbits; + unsigned int index =3D offset >> blkbits; + unsigned int blk_end, blk_len; + int ret; + + ret =3D ext4_emergency_state(sb); + if (unlikely(ret)) + return ret; + + /* Check validity of the cached writeback mapping. */ + if (offset >=3D wpc->iomap.offset && + offset < wpc->iomap.offset + wpc->iomap.length && + ext4_iomap_valid(inode, &wpc->iomap)) + return 0; + + blk_len =3D dirty_len >> blkbits; + blk_end =3D min_t(unsigned int, (wpc->wbc->range_end >> blkbits), + (UINT_MAX - 1)); + if (blk_end > index + blk_len) + blk_len =3D blk_end - index + 1; + +retry: + map.m_lblk =3D index; + map.m_len =3D min_t(unsigned int, MAX_WRITEPAGES_EXTENT_LEN, blk_len); + ret =3D ext4_map_blocks(NULL, inode, &map, + EXT4_GET_BLOCKS_IO_SUBMIT | EXT4_EX_NOCACHE); + if (ret < 0) + return ret; + + /* + * The map is not a delalloc extent, it must either be a hole + * or an extent which have already been allocated. + */ + if (!(map.m_flags & EXT4_MAP_DELAYED)) + goto out; + + /* Map one delalloc extent. */ + ret =3D ext4_iomap_map_one_extent(inode, &map); + if (ret < 0) { + if (ext4_emergency_state(sb)) + return ret; + + /* + * Retry transient ENOSPC errors, if + * ext4_count_free_blocks() is non-zero, a commit + * should free up blocks. + */ + if (ret =3D=3D -ENOSPC && journal && ext4_count_free_clusters(sb)) { + jbd2_journal_force_commit_nested(journal); + goto retry; + } + + ext4_msg(sb, KERN_CRIT, + "Delayed block allocation failed for inode %llu at logical offset %llu= with max blocks %u with error %d", + inode->i_ino, (unsigned long long)map.m_lblk, + (unsigned int)map.m_len, -ret); + ext4_msg(sb, KERN_CRIT, + "This should not happen!! Data will be lost\n"); + if (ret =3D=3D -ENOSPC) + ext4_print_free_blocks(inode); + return ret; + } +out: + ext4_set_iomap(inode, &wpc->iomap, &map, offset, dirty_len, 0); + return 0; +} + +static void ext4_iomap_discard_folio(struct folio *folio, loff_t pos) +{ + struct inode *inode =3D folio->mapping->host; + loff_t length =3D folio_pos(folio) + folio_size(folio) - pos; + + ext4_iomap_punch_delalloc(inode, pos, length, NULL); +} + +static ssize_t ext4_iomap_writeback_range(struct iomap_writepage_ctx *wpc, + struct folio *folio, u64 offset, + unsigned int len, u64 end_pos) +{ + ssize_t ret; + + ret =3D ext4_iomap_map_writeback_range(wpc, offset, len); + if (!ret) + ret =3D iomap_add_to_ioend(wpc, folio, offset, end_pos, len); + if (ret < 0) + ext4_iomap_discard_folio(folio, offset); + return ret; +} + +static int ext4_iomap_writeback_submit(struct iomap_writepage_ctx *wpc, + int error) +{ + struct iomap_ioend *ioend =3D wpc->wb_ctx; + struct ext4_inode_info *ei =3D EXT4_I(ioend->io_inode); + + /* Need to convert unwritten extents when I/Os are completed. */ + if ((ioend->io_flags & IOMAP_IOEND_UNWRITTEN) || + ioend->io_offset + ioend->io_size > READ_ONCE(ei->i_disksize)) + ioend->io_bio.bi_end_io =3D ext4_iomap_end_bio; + + return iomap_ioend_writeback_submit(wpc, error); +} + +static const struct iomap_writeback_ops ext4_writeback_ops =3D { + .writeback_range =3D ext4_iomap_writeback_range, + .writeback_submit =3D ext4_iomap_writeback_submit, +}; + static int ext4_iomap_writepages(struct address_space *mapping, struct writeback_control *wbc) { - return 0; + struct inode *inode =3D mapping->host; + struct super_block *sb =3D inode->i_sb; + long nr =3D wbc->nr_to_write; + int alloc_ctx, ret; + struct iomap_writepage_ctx wpc =3D { + .inode =3D inode, + .wbc =3D wbc, + .ops =3D &ext4_writeback_ops, + }; + + ret =3D ext4_emergency_state(sb); + if (unlikely(ret)) + return ret; + + alloc_ctx =3D ext4_writepages_down_read(sb); + trace_ext4_writepages(inode, wbc); + ret =3D iomap_writepages(&wpc); + trace_ext4_writepages_result(inode, wbc, ret, nr - wbc->nr_to_write); + ext4_writepages_up_read(sb, alloc_ctx); + + return ret; } =20 /* diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index dc82e7b57e75..07978e2cd9c8 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -611,3 +612,131 @@ int ext4_bio_write_folio(struct ext4_io_submit *io, s= truct folio *folio, =20 return 0; } + +static int ext4_iomap_wb_update_disksize(handle_t *handle, struct inode *i= node, + loff_t end) +{ + loff_t new_disksize =3D end; + struct ext4_inode_info *ei =3D EXT4_I(inode); + int ret; + + if (new_disksize <=3D READ_ONCE(ei->i_disksize)) + return 0; + + /* + * Update on-disk size after IO is completed. Races with truncate + * are avoided by checking i_size under i_data_sem. + */ + down_write(&ei->i_data_sem); + new_disksize =3D min(new_disksize, i_size_read(inode)); + if (new_disksize > ei->i_disksize) + ei->i_disksize =3D new_disksize; + up_write(&ei->i_data_sem); + ret =3D ext4_mark_inode_dirty(handle, inode); + if (ret) + EXT4_ERROR_INODE_ERR(inode, -ret, "Failed to mark inode dirty"); + + return ret; +} + +static void ext4_iomap_finish_ioend(struct iomap_ioend *ioend) +{ + struct inode *inode =3D ioend->io_inode; + struct super_block *sb =3D inode->i_sb; + loff_t pos =3D ioend->io_offset; + size_t size =3D ioend->io_size; + handle_t *handle; + int credits; + int ret, err; + + ret =3D blk_status_to_errno(ioend->io_bio.bi_status); + if (unlikely(ret)) { + if (test_opt(sb, DATA_ERR_ABORT)) + jbd2_journal_abort(EXT4_SB(sb)->s_journal, ret); + goto out; + } + + /* We may need to convert one extent and dirty the inode. */ + credits =3D ext4_chunk_trans_blocks(inode, + EXT4_MAX_BLOCKS(size, pos, inode->i_blkbits)); + handle =3D ext4_journal_start(inode, EXT4_HT_EXT_CONVERT, credits); + if (IS_ERR(handle)) { + ret =3D PTR_ERR(handle); + goto out_err; + } + + if (ioend->io_flags & IOMAP_IOEND_UNWRITTEN) { + ret =3D ext4_convert_unwritten_extents(handle, inode, pos, size); + if (ret) + goto out_journal; + } + + ret =3D ext4_iomap_wb_update_disksize(handle, inode, pos + size); +out_journal: + err =3D ext4_journal_stop(handle); + if (!ret) + ret =3D err; +out_err: + if (ret < 0 && !ext4_emergency_state(sb)) { + ext4_msg(sb, KERN_EMERG, + "failed to convert unwritten extents to written extents or update inod= e size -- potential data loss! (inode %llu, error %d)", + inode->i_ino, ret); + } +out: + iomap_finish_ioends(ioend, ret); +} + +/* + * Work on buffered iomap completed IO, to convert unwritten extents to + * mapped extents + */ +void ext4_iomap_end_io(struct work_struct *work) +{ + struct ext4_inode_info *ei =3D container_of(work, struct ext4_inode_info, + i_iomap_ioend_work); + struct iomap_ioend *ioend; + struct list_head ioend_list; + unsigned long flags; + + spin_lock_irqsave(&ei->i_completed_io_lock, flags); + list_replace_init(&ei->i_iomap_ioend_list, &ioend_list); + spin_unlock_irqrestore(&ei->i_completed_io_lock, flags); + + iomap_sort_ioends(&ioend_list); + while (!list_empty(&ioend_list)) { + ioend =3D list_entry(ioend_list.next, struct iomap_ioend, io_list); + list_del_init(&ioend->io_list); + iomap_ioend_try_merge(ioend, &ioend_list); + ext4_iomap_finish_ioend(ioend); + } +} + +void ext4_iomap_end_bio(struct bio *bio) +{ + struct iomap_ioend *ioend =3D iomap_ioend_from_bio(bio); + struct inode *inode =3D ioend->io_inode; + struct ext4_inode_info *ei =3D EXT4_I(inode); + struct ext4_sb_info *sbi =3D EXT4_SB(inode->i_sb); + unsigned long flags; + int ret; + + /* Needs to convert unwritten extents or update the i_disksize. */ + if ((ioend->io_flags & IOMAP_IOEND_UNWRITTEN) || + ioend->io_offset + ioend->io_size > READ_ONCE(ei->i_disksize)) + goto defer; + + /* Needs to abort the journal on data_err=3Dabort. */ + ret =3D blk_status_to_errno(ioend->io_bio.bi_status); + if (unlikely(ret) && test_opt(inode->i_sb, DATA_ERR_ABORT) && + !ext4_emergency_state(inode->i_sb)) + goto defer; + + iomap_finish_ioends(ioend, ret); + return; +defer: + spin_lock_irqsave(&ei->i_completed_io_lock, flags); + if (list_empty(&ei->i_iomap_ioend_list)) + queue_work(sbi->rsv_conversion_wq, &ei->i_iomap_ioend_work); + list_add_tail(&ioend->io_list, &ei->i_iomap_ioend_list); + spin_unlock_irqrestore(&ei->i_completed_io_lock, flags); +} diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9bc294b769db..51d87db53543 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -123,7 +123,10 @@ static const struct fs_parameter_spec ext4_param_specs= []; * sb_start_write -> i_mutex -> transaction start -> i_data_sem (rw) * * writepages: - * transaction start -> page lock(s) -> i_data_sem (rw) + * - buffer_head path: + * transaction start -> folio lock(s) -> i_data_sem (rw) + * - iomap path: + * folio lock -> transaction start -> i_data_sem (rw) */ =20 static const struct fs_context_operations ext4_context_ops =3D { @@ -1428,10 +1431,12 @@ static struct inode *ext4_alloc_inode(struct super_= block *sb) #endif ei->jinode =3D NULL; INIT_LIST_HEAD(&ei->i_rsv_conversion_list); + INIT_LIST_HEAD(&ei->i_iomap_ioend_list); spin_lock_init(&ei->i_completed_io_lock); ei->i_sync_tid =3D 0; ei->i_datasync_tid =3D 0; INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work); + INIT_WORK(&ei->i_iomap_ioend_work, ext4_iomap_end_io); ext4_fc_init_inode(&ei->vfs_inode); spin_lock_init(&ei->i_fc_lock); mmb_init(&ei->i_metadata_bhs, &ei->vfs_inode.i_data); --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D5B635AC04; Wed, 22 Apr 2026 02:17:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824227; cv=none; b=i/gaWQ5IcOEzp0OiU6bWpFtSCY1Bablakb1oVRN4s12sCi+QHFnNEql3V5RGAk+vjYKmfX0b8cC/Yo2HZNMXt0tjn6uurycN11jKdrFHPpozDAyPGWZgL9O5+cXA6zZBkxzicSAtr3+MFQtyAu0Mjuw+NmSFcEkNRXASKCKWFAQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824227; c=relaxed/simple; bh=bCJkj+SJLswXyGtg3E1eLMLyPv981Hk0+bNTPn2K5S4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fGf2JIRYK5/fckHiwTgG+tr0nyCn2LrtVzoCCm8+kjHco3rVQ0wH+0E7iTMkikjkOmTq0KG8QyRel/rSeP23rF2+9YSIWazV5gZphHX+BWfHihh4gs/zLMb2HcfdGwWqNETBAXBJ+C67I4M/hPDnjgU+GqhQpMevNvmkerQYfs4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWM2N3zzYQtr3; Wed, 22 Apr 2026 10:15:59 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 82E8A405D6; Wed, 22 Apr 2026 10:16:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S14; Wed, 22 Apr 2026 10:16:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 10/22] ext4: implement mmap path using iomap Date: Wed, 22 Apr 2026 10:10:30 +0800 Message-ID: <20260422021042.4157510-11-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S14 X-Coremail-Antispam: 1UD129KBjvJXoWxWFW3KryUCFW7CFWkZF1UWrg_yoW5Kw17pF 95K3yrGrsxXwnF9rs7WF4DZr1rKayxtrW7WrW3Wry5ZFy2y340ga10gF1YvF45J3yxAr42 qF4jkr18Ww13A37anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Introduce ext4_iomap_page_mkwrite() to implement the mmap iomap path for ext4. Most of this work is delegated to iomap_page_mkwrite(), which only needs to be called with ext4_iomap_buffer_write_ops and ext4_iomap_buffer_da_write_ops as arguments to allocate and map the blocks. However, the lock ordering of the folio lock and transaction start is the opposite of that in the buffer_head buffered write path. The locking documentation in super.c has been updated accordingly. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 32 +++++++++++++++++++++++++++++++- fs/ext4/super.c | 8 ++++++-- 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 76ce43c64c30..26e1366b85fd 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4022,7 +4022,7 @@ static int ext4_iomap_buffered_do_write_begin(struct = inode *inode, /* Inline data support is not yet available. */ if (WARN_ON_ONCE(ext4_has_inline_data(inode))) return -ERANGE; - if (WARN_ON_ONCE(!(flags & IOMAP_WRITE))) + if (WARN_ON_ONCE(!(flags & (IOMAP_WRITE | IOMAP_FAULT)))) return -EINVAL; =20 if (delalloc) @@ -4082,6 +4082,14 @@ static int ext4_iomap_buffered_da_write_end(struct i= node *inode, loff_t offset, if (iomap->type !=3D IOMAP_DELALLOC || !(iomap->flags & IOMAP_F_NEW)) return 0; =20 + /* + * iomap_page_mkwrite() will never fail in a way that requires delalloc + * extents that it allocated to be revoked. Hence never try to release + * them here. + */ + if (flags & IOMAP_FAULT) + return 0; + /* Nothing to do if we've written the entire delalloc extent */ start_byte =3D iomap_last_written_block(inode, offset, written); end_byte =3D round_up(offset + length, i_blocksize(inode)); @@ -7167,6 +7175,23 @@ static int ext4_block_page_mkwrite(struct inode *ino= de, struct folio *folio, return ret; } =20 +static vm_fault_t ext4_iomap_page_mkwrite(struct vm_fault *vmf) +{ + struct inode *inode =3D file_inode(vmf->vma->vm_file); + const struct iomap_ops *iomap_ops; + + /* + * ext4_nonda_switch() could writeback this folio, so have to + * call it before lock folio. + */ + if (test_opt(inode->i_sb, DELALLOC) && !ext4_nonda_switch(inode->i_sb)) + iomap_ops =3D &ext4_iomap_buffered_da_write_ops; + else + iomap_ops =3D &ext4_iomap_buffered_write_ops; + + return iomap_page_mkwrite(vmf, iomap_ops, NULL); +} + vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; @@ -7189,6 +7214,11 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) =20 filemap_invalidate_lock_shared(mapping); =20 + if (ext4_inode_buffered_iomap(inode)) { + ret =3D ext4_iomap_page_mkwrite(vmf); + goto out; + } + err =3D ext4_convert_inline_data(inode); if (err) goto out_ret; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 51d87db53543..62bfe05a64bc 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -100,8 +100,12 @@ static const struct fs_parameter_spec ext4_param_specs= []; * Lock ordering * * page fault path: - * mmap_lock -> sb_start_pagefault -> invalidate_lock (r) -> transaction s= tart - * -> page lock -> i_data_sem (rw) + * - buffer_head path: + * mmap_lock -> sb_start_pagefault -> invalidate_lock (r) -> + * transaction start -> folio lock -> i_data_sem (rw) + * - iomap path: + * mmap_lock -> sb_start_pagefault -> invalidate_lock (r) -> + * folio lock -> transaction start -> i_data_sem (rw) * * buffered write path: * sb_start_write -> i_rwsem (w) -> mmap_lock --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CD49304BA3; Wed, 22 Apr 2026 02:16:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824220; cv=none; b=uW7d8ephP4IfrYxfyA6Ifg20jeEu9lS/4+2dDhaGLWUTLWdzU0KZir5+4BMLfe85SMbT2g7mP9vfwDuBE0rTjNqdG1x4PQOTX8a99V10P0X5ggLU66ViZWBcSs681I2+zN8op3npp6q/h0QRuaGUfEPFFCpEnYdVWaECQ3lwVoM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824220; c=relaxed/simple; bh=x1EARZ+iPskL6tZ0XIQegu+D7qkGMNZuUfvY3uTByYs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VlshnyqVOabJMOiOH5aI8baGHvmdDxC8BtkZWvqOb+3HZ8yS1aUA4lx3GuwGMvtbtcJC0ui5d9L9zY7pJvvTxBT5VDQDeSFYVb9XPO8isR3YFEeAxUw1OsPspma5goIX2kYWXrWrOgLvD9kHEorhC0uL/Yv6UrQl6RYaflJwU5I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4g0jX44F0fzKHMSj; Wed, 22 Apr 2026 10:16:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 9CDB240605; Wed, 22 Apr 2026 10:16:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S15; Wed, 22 Apr 2026 10:16:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 11/22] iomap: correct the range of a partial dirty clear Date: Wed, 22 Apr 2026 10:10:31 +0800 Message-ID: <20260422021042.4157510-12-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S15 X-Coremail-Antispam: 1UD129KBjvJXoW7ury5ArW5Wr4DKF1xuryDJrb_yoW8CFW8pF s3KFs8KrWDW34kuay8ZFWrXFnYka9rXF4xArW3W3s3Wa15AFyFgFn293y5uF92gr4xAF10 vF13KrWxCrWDAaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi The block range calculation in ifs_clear_range_dirty() is incorrect when partially clearing a range in a folio. We cannot clear the dirty bit of the first block or the last block if the start or end offset is not blocksize-aligned. This has not yet caused any issues since we always clear a whole folio in iomap_writeback_folio(). Fix this by rounding up the first block to blocksize alignment, and calculate the last block by rounding down (using truncation). Correct the nr_blks calculation accordingly. Signed-off-by: Zhang Yi --- This is modified from: https://lore.kernel.org/linux-fsdevel/20240812121159.3775074-2-yi.zhang@hu= aweicloud.com/ Changes: - Use round_up() instead of DIV_ROUND_UP() to prevent wasted integer division. fs/iomap/buffered-io.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index d7b648421a70..7e7d5b776d35 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -176,11 +176,15 @@ static void ifs_clear_range_dirty(struct folio *folio, { struct inode *inode =3D folio->mapping->host; unsigned int blks_per_folio =3D i_blocks_per_folio(inode, folio); - unsigned int first_blk =3D (off >> inode->i_blkbits); - unsigned int last_blk =3D (off + len - 1) >> inode->i_blkbits; - unsigned int nr_blks =3D last_blk - first_blk + 1; + unsigned int first_blk =3D round_up(off, i_blocksize(inode)) >> + inode->i_blkbits; + unsigned int last_blk =3D (off + len) >> inode->i_blkbits; + unsigned int nr_blks =3D last_blk - first_blk; unsigned long flags; =20 + if (!nr_blks) + return; + spin_lock_irqsave(&ifs->state_lock, flags); bitmap_clear(ifs->state, first_blk + blks_per_folio, nr_blks); spin_unlock_irqrestore(&ifs->state_lock, flags); --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A036B2FFFA4; Wed, 22 Apr 2026 02:17:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824226; cv=none; b=uyK5HxVnbIlOViqiA/fWt2WWBMs8n9//YSzG4IgfpiqLW3kAby5FQ8IkOeqyqiGK9/Ih4tIOXDWgSmb3YfUH0DdkBFxNCOJKQBxYcoJn7kz03BCXeeC8CuhTfuZHFvpYqZz5WnZbhoJEyB1BfuIT7b9rEZ7WHIVlDl9tcKqCrjM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824226; c=relaxed/simple; bh=SGgAVzQKlalx4575A8xE0nJS4/9k31qQPrfyU/eUHtI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aASOaVDemB56ndcIMtNG20u/LtrIQfPwcbqDVyFphpsRq/UHPnn6QLmGeJsmA0gxzFuMJuBkrIMkdLa0WiX54u47aYK8LXEdXsWebm2wze0t8aFYRJjZ33HBSoNlsbzK7PF7Ut+aRnF4EZ16oOBDD/30cbVMj5TiWlOxwoSozNk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWM44GWzYQtrC; Wed, 22 Apr 2026 10:15:59 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id BBBFF405F8; Wed, 22 Apr 2026 10:16:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S16; Wed, 22 Apr 2026 10:16:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 12/22] iomap: support invalidating partial folios Date: Wed, 22 Apr 2026 10:10:32 +0800 Message-ID: <20260422021042.4157510-13-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S16 X-Coremail-Antispam: 1UD129KBjvJXoW7uF1xtrW5Jw48tw18Kr4fAFb_yoW8CryrpF W3KrWDGryDGr17uw47Ca1fXF1j9a9xXFy7CFW3Gw1a9Fs8Jw1qgFy7Ka1YgayUJryxAF1S vrsFgFyvqF15A3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Current iomap_invalidate_folio() can only invalidate an entire folio. If we truncate a partial folio on a filesystem where the block size is smaller than the folio size, it will leave behind dirty bits for the truncated or punched blocks. During the write-back process, it will attempt to map the invalid hole range. Fortunately, this has not caused any real problems so far because the ->writeback_range() function corrects the length. However, the implementation of FALLOC_FL_ZERO_RANGE in ext4 depends on the support for invalidating partial folios. When ext4 partially zeroes out a dirty and unwritten folio, it does not perform a flush first like XFS. Therefore, if the dirty bits of the corresponding area cannot be cleared, the zeroed area after writeback remains in the written state rather than reverting to the unwritten state. Fix this by supporting invalidation of partial folios. Signed-off-by: Zhang Yi Reviewed-by: Darrick J. Wong --- This is cherry picked form: https://lore.kernel.org/linux-fsdevel/20240812121159.3775074-3-yi.zhang@hu= aweicloud.com/ No code changes, only update the commit message to explain why Ext4 needs this. fs/iomap/buffered-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 7e7d5b776d35..b17296b61a6e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -761,6 +761,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t= offset, size_t len) WARN_ON_ONCE(folio_test_writeback(folio)); folio_cancel_dirty(folio); ifs_free(folio); + } else { + iomap_clear_range_dirty(folio, offset, len); } } EXPORT_SYMBOL_GPL(iomap_invalidate_folio); --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C60793446B7; Wed, 22 Apr 2026 02:17:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824227; cv=none; b=Ff4cXRIQ7IkwaluHiI4SEa0ppTalEcw2oISNZ4zAgMRToCBln0Y5iiUmAkvKABr+ClsUG9oj/3iyJJ+caw0LBtb7lOeCP3ClyOMrYfMkeMtAO7QT7oLXS21euWF1q8I086xH3580iOxHHn/Uxqsq+2NLTpB2ItIRsezYRZWigM4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824227; c=relaxed/simple; bh=MgOzNx3i4g+Amum0Bg8TWTc4SNzjpmUBOxr9SspLvkM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JpA1F4KeXFTHMFSq078bZZSDIfa1AZOpro+f1MuyjECdZ+au+TBTuNpEvBYRnqxlPAWmiTi75VPf/EsozgnFQUThIUhhvgh71hxvyo0apLZoSqyCq7KWptfSLqFLMU1riFIs1Xltg00+KQzzkv0gfIZvi1uN4yBgjJbqb7zVRpc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWM4jN4zYQtrC; Wed, 22 Apr 2026 10:15:59 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id D3EB740604; Wed, 22 Apr 2026 10:16:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S17; Wed, 22 Apr 2026 10:16:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 13/22] iomap: fix incorrect did_zero setting in iomap_zero_iter() Date: Wed, 22 Apr 2026 10:10:33 +0800 Message-ID: <20260422021042.4157510-14-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S17 X-Coremail-Antispam: 1UD129KBjvJXoWxJr1UJr4fuw1kXF18uw4kZwb_yoW8tr48p3 9xKayDCFn2qrW7uFn5JF9Ivr1Yyws5JrW7Wr4UGwn8ZF4qvr4YkF1FgayYvF1xJ34fA3Wa yF4jyas2qF4UCrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi The did_zero output parameter was unconditionally set after the loop, which is incorrect. It should only be set when the zeroing operation actually completes, not when IOMAP_F_STALE is set or when IOMAP_F_FOLIO_BATCH is set but !folio causes the loop to break early, or when iomap_iter_advance() returns an error. This causes did_zero to be incorrectly set when zeroing a clean unwritten extent because the loop exits early without actually zeroing any data. Fix it by using a local variable to track whether any folio was actually zeroed, and only set did_zero after the loop if zeroing happened. Signed-off-by: Zhang Yi Reviewed-by: "Darrick J. Wong" --- This is cherry picked form: https://lore.kernel.org/linux-fsdevel/20260310082250.3535486-1-yi.zhang@hu= aweicloud.com/ No changes. fs/iomap/buffered-io.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index b17296b61a6e..0ffc2c3230af 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1542,6 +1542,7 @@ static int iomap_zero_iter(struct iomap_iter *iter, b= ool *did_zero, const struct iomap_write_ops *write_ops) { u64 bytes =3D iomap_length(iter); + bool zeroed =3D false; int status; =20 do { @@ -1560,6 +1561,8 @@ static int iomap_zero_iter(struct iomap_iter *iter, b= ool *did_zero, /* a NULL folio means we're done with a folio batch */ if (!folio) { status =3D iomap_iter_advance_full(iter); + if (status) + return status; break; } =20 @@ -1570,6 +1573,7 @@ static int iomap_zero_iter(struct iomap_iter *iter, b= ool *did_zero, bytes); =20 folio_zero_range(folio, offset, bytes); + zeroed =3D true; folio_mark_accessed(folio); =20 ret =3D iomap_write_end(iter, bytes, bytes, folio); @@ -1579,10 +1583,10 @@ static int iomap_zero_iter(struct iomap_iter *iter,= bool *did_zero, =20 status =3D iomap_iter_advance(iter, bytes); if (status) - break; + return status; } while ((bytes =3D iomap_length(iter)) > 0); =20 - if (did_zero) + if (did_zero && zeroed) *did_zero =3D true; return status; } --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D4D535A3B8; Wed, 22 Apr 2026 02:17:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824229; cv=none; b=JbeJ1FJfBMS3D7Jty1J9Mf5ANtOtamE39c5nqI3pwPw889K5JP51sLhaKubOINGP05YsAhGmjzj562XCbETfztZUG71qXSKJgzPzcR+2KzR6AMvnxcBDpXdsKb6BkpF51oeM6Yxatnuy4SxPtFt+H5UPdPPTrBFyY2AcUFGOW0A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824229; c=relaxed/simple; bh=xPnRnv2vLgfMgk2Egdli1mi1SjXfj3Qdi0qbTQSB6o0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ByxTK7/61RBHpja2hgTNU9ujgaUcTIRcKt/9UX/Nz7LrP4etbhgG1e3NtYMbFz1iComt33MSQwJQ5P2qGKiHfya+Lmar17LN6b2+1z9wXLLJ+XyZ68F5mi4EdhLCJMKWabqGX3jgBRDHkWrbjjG79GFLj+tRFLxHtgBduWE0vrM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWM5fs7zYQtrV; Wed, 22 Apr 2026 10:15:59 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id F22BB405D4; Wed, 22 Apr 2026 10:16:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S18; Wed, 22 Apr 2026 10:16:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 14/22] ext4: implement partial block zero range path using iomap Date: Wed, 22 Apr 2026 10:10:34 +0800 Message-ID: <20260422021042.4157510-15-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S18 X-Coremail-Antispam: 1UD129KBjvJXoW3WF13Kw1DKFWrXF48ZFyxKrg_yoW7WF1UpF WDK345Gr47Wry29w4ftFsrXr1Yk3WxtrW8Wry3Grn0v3s8XayxKF48GFyF93W5tw47Cw12 qF4UtryxGF1UAa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Introduce a new iomap_ops instance, ext4_iomap_zero_ops, along with ext4_iomap_block_zero_range() to implement the iomap block zeroing range for ext4. ext4_iomap_block_zero_range() invokes iomap_zero_range() and passes ext4_iomap_zero_begin() to locate and zero out a mapped partial block or a dirty, unwritten partial block. Note that zeroing out under an active handle can cause deadlock since the order of acquiring the folio lock and starting a handle is inconsistent with the iomap writeback procedure. Therefore, ext4_iomap_block_zero_range() cannot be called under an active handle, and we also cannot use data=3Dorder mode to ensure zeroed data to be unwritten back before updating i_disksize when performing post-EOF append write or performing truncate up as well. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 26e1366b85fd..701b912db6fb 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4103,6 +4103,50 @@ static int ext4_iomap_buffered_da_write_end(struct i= node *inode, loff_t offset, return 0; } =20 +static int ext4_iomap_zero_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + struct iomap_iter *iter =3D container_of(iomap, struct iomap_iter, iomap); + struct ext4_map_blocks map; + u8 blkbits =3D inode->i_blkbits; + unsigned int iomap_flags =3D 0; + int ret; + + ret =3D ext4_emergency_state(inode->i_sb); + if (unlikely(ret)) + return ret; + + if (WARN_ON_ONCE(!(flags & IOMAP_ZERO))) + return -EINVAL; + + ret =3D ext4_iomap_map_blocks(inode, offset, length, NULL, &map); + if (ret < 0) + return ret; + + /* + * Look up dirty folios for unwritten mappings within EOF. Providing + * this bypasses the flush iomap uses to trigger extent conversion + * when unwritten mappings have dirty pagecache in need of zeroing. + */ + if (map.m_flags & EXT4_MAP_UNWRITTEN) { + loff_t offset =3D ((loff_t)map.m_lblk) << blkbits; + loff_t end =3D ((loff_t)map.m_lblk + map.m_len) << blkbits; + + iomap_fill_dirty_folios(iter, &offset, end, &iomap_flags); + if ((offset >> blkbits) < map.m_lblk + map.m_len) + map.m_len =3D (offset >> blkbits) - map.m_lblk; + } + + ext4_set_iomap(inode, iomap, &map, offset, length, flags); + iomap->flags |=3D iomap_flags; + + return 0; +} + +static const struct iomap_ops ext4_iomap_zero_ops =3D { + .iomap_begin =3D ext4_iomap_zero_begin, +}; =20 const struct iomap_ops ext4_iomap_buffered_write_ops =3D { .iomap_begin =3D ext4_iomap_buffered_write_begin, @@ -4609,6 +4653,47 @@ static int ext4_block_journalled_zero_range(struct i= node *inode, loff_t from, return err; } =20 +static int ext4_block_iomap_zero_range(struct inode *inode, loff_t from, + loff_t length, bool *did_zero, + bool *zero_written) +{ + int ret; + + /* + * Zeroing out under an active handle can cause deadlock since + * the order of acquiring the folio lock and starting a handle is + * inconsistent with the iomap writeback procedure. + */ + if (WARN_ON_ONCE(ext4_handle_valid(journal_current_handle()))) + return -EINVAL; + + /* The zeroing scope should not extend across a block. */ + if (WARN_ON_ONCE((from >> inode->i_blkbits) !=3D + ((from + length - 1) >> inode->i_blkbits))) + return -EINVAL; + + if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ORPHAN_FS) && + !(inode_state_read_once(inode) & (I_NEW | I_FREEING))) + WARN_ON_ONCE(!inode_is_locked(inode) && + !rwsem_is_locked(&inode->i_mapping->invalidate_lock)); + + ret =3D iomap_zero_range(inode, from, length, did_zero, + &ext4_iomap_zero_ops, &ext4_iomap_write_ops, + NULL); + if (ret) + return ret; + + /* + * TODO: The iomap does not distinguish between different types of + * zeroing and always sets zero_written if a zeroing operation is + * performed, which may result in unnecessary order operations. + */ + if (did_zero && zero_written) + *zero_written =3D *did_zero; + + return 0; +} + /* * Zeros out a mapping of length 'length' starting from file offset * 'from'. The range to be zero'd must be contained with in one block. @@ -4635,6 +4720,9 @@ static int ext4_block_zero_range(struct inode *inode, } else if (ext4_should_journal_data(inode)) { return ext4_block_journalled_zero_range(inode, from, length, did_zero); + } else if (ext4_inode_buffered_iomap(inode)) { + return ext4_block_iomap_zero_range(inode, from, length, + did_zero, zero_written); } return ext4_block_do_zero_range(inode, from, length, did_zero, zero_written); @@ -4675,6 +4763,9 @@ int ext4_block_zero_eof(struct inode *inode, loff_t f= rom, loff_t end) * truncating up or performing an append write, because there might be * exposing stale on-disk data which may caused by concurrent post-EOF * mmap write during folio writeback. + * + * TODO: In the iomap path, handle this by updating i_disksize to + * i_size after the zeroed data has been written back. */ if (ext4_should_order_data(inode) && did_zero && zero_written && !IS_DAX(inode)) { --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59086352FA5; Wed, 22 Apr 2026 02:17:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824230; cv=none; b=d83TBbtS8lswYqO/cWCeYsIme/0/6O+NQ/Fyu4v7KCi8+RavSAslaaRxCSEsVJcT+PSLRHxJCc4jX97gLi7325PKLimUj4bbIfxZnxD49J8RT1BcETSXLIOkQiaCKwmNPGZJaLu+OUxrL+J4KDsN3hC+vl2rZsOD2SLHg5YF4jY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824230; c=relaxed/simple; bh=2KLnNij5KNNVM2Cx1sKcD2NX2y5ocyh3EgNgT7sfJ0M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CmNNhkGrXrzF68lggifiIbnpBzmhLPmv0SEITHb5ozAQE7yH/ix4ygDvYoyj/adpxWNsXqqiWP/RGIrWle/78AxxAuRe+YsItUDN4yTI350hdk12rnWlrZ7nqDLKu+VRKc5hcqE9HSxFZb+2hbIq5vczqQDsqrLSLo+G+CqXB9c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWM6KkPzYQtrf; Wed, 22 Apr 2026 10:15:59 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 1770C405D6; Wed, 22 Apr 2026 10:16:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S19; Wed, 22 Apr 2026 10:16:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 15/22] ext4: add block mapping tracepoints for iomap buffered I/O path Date: Wed, 22 Apr 2026 10:10:35 +0800 Message-ID: <20260422021042.4157510-16-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S19 X-Coremail-Antispam: 1UD129KBjvJXoWxurW7ZrWDAFWkXF17WF18uFg_yoWrGr4fpa 4vyFy5GF4fXrsF9w4fWrW3XF1Fva1xKr4UGry3Wry5AFWxtr42gF4UGFyjyFy5Jw4jkryf XF4Ykry8G3WUurDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Add tracepoints for iomap buffered read, write, partial block zeroing, and writeback operations to help debug the iomap buffered I/O path. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 6 +++++ include/trace/events/ext4.h | 45 +++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 701b912db6fb..53fdcb50f3dd 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3961,6 +3961,8 @@ static int ext4_iomap_buffered_read_begin(struct inod= e *inode, loff_t offset, if (ret < 0) return ret; =20 + trace_ext4_iomap_buffered_read_begin(inode, &map, offset, length, + flags); ext4_set_iomap(inode, iomap, &map, offset, length, flags); return 0; } @@ -4036,6 +4038,8 @@ static int ext4_iomap_buffered_do_write_begin(struct = inode *inode, if (ret < 0) return ret; =20 + trace_ext4_iomap_buffered_write_begin(inode, &map, offset, length, + flags); ext4_set_iomap(inode, iomap, &map, offset, length, flags); return 0; } @@ -4138,6 +4142,7 @@ static int ext4_iomap_zero_begin(struct inode *inode, map.m_len =3D (offset >> blkbits) - map.m_lblk; } =20 + trace_ext4_iomap_zero_begin(inode, &map, offset, length, flags); ext4_set_iomap(inode, iomap, &map, offset, length, flags); iomap->flags |=3D iomap_flags; =20 @@ -4306,6 +4311,7 @@ static int ext4_iomap_map_writeback_range(struct ioma= p_writepage_ctx *wpc, return ret; } out: + trace_ext4_iomap_map_writeback_range(inode, &map, offset, dirty_len, 0); ext4_set_iomap(inode, &wpc->iomap, &map, offset, dirty_len, 0); return 0; } diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index f493642cf121..ebafa06cd191 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -3096,6 +3096,51 @@ TRACE_EVENT(ext4_move_extent_exit, __entry->ret) ); =20 +DECLARE_EVENT_CLASS(ext4_set_iomap_class, + TP_PROTO(struct inode *inode, struct ext4_map_blocks *map, + loff_t offset, loff_t length, unsigned int flags), + TP_ARGS(inode, map, offset, length, flags), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(u64, ino) + __field(ext4_lblk_t, m_lblk) + __field(unsigned int, m_len) + __field(unsigned int, m_flags) + __field(u64, m_seq) + __field(loff_t, offset) + __field(loff_t, length) + __field(unsigned int, iomap_flags) + ), + TP_fast_assign( + __entry->dev =3D inode->i_sb->s_dev; + __entry->ino =3D inode->i_ino; + __entry->m_lblk =3D map->m_lblk; + __entry->m_len =3D map->m_len; + __entry->m_flags =3D map->m_flags; + __entry->m_seq =3D map->m_seq; + __entry->offset =3D offset; + __entry->length =3D length; + __entry->iomap_flags =3D flags; + + ), + TP_printk("dev %d:%d ino %llu m_lblk %u m_len %u m_flags %s m_seq %llu or= ig_off 0x%llx orig_len 0x%llx iomap_flags 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, __entry->m_lblk, __entry->m_len, + show_mflags(__entry->m_flags), __entry->m_seq, + __entry->offset, __entry->length, __entry->iomap_flags) +) + +#define DEFINE_SET_IOMAP_EVENT(name) \ +DEFINE_EVENT(ext4_set_iomap_class, name, \ + TP_PROTO(struct inode *inode, struct ext4_map_blocks *map, \ + loff_t offset, loff_t length, unsigned int flags), \ + TP_ARGS(inode, map, offset, length, flags)) + +DEFINE_SET_IOMAP_EVENT(ext4_iomap_buffered_read_begin); +DEFINE_SET_IOMAP_EVENT(ext4_iomap_buffered_write_begin); +DEFINE_SET_IOMAP_EVENT(ext4_iomap_map_writeback_range); +DEFINE_SET_IOMAP_EVENT(ext4_iomap_zero_begin); + #endif /* _TRACE_EXT4_H */ =20 /* This part must be outside protection */ --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFE9C34FF74; Wed, 22 Apr 2026 02:17:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824225; cv=none; b=YxqCN2cgFmXZ06RyWuvBEWqLFom2FiGACcSydY01Y2kKSfZoZ1kduveS5U1n1/O64PjN0xjtkoaiJuMkbUVEAUcygm9ho5PVe5s3VF2W+llkrwMKzrF9jWHSUhTf5gUFoF//ob5h5kMbKFNGvOq6KvNqbuLzRZieygT8LmOm4t8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824225; c=relaxed/simple; bh=TR3wXiUMaCqs+/Baqxnmc8l4eBzCYOhR1cr0EWSBFS0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DR+Peeq1IwG1YKc4rHxUztkSssUvrOxI/dvuYWEAN3S644iczifHRg80nihO0dzB1CbqhsoESNQCyir3GlqsSKX1JnNr1uKTQJ/B4XLliV8ZbD444k1mpZRSyceLowgHZELWpG4jyPKXc6maopSvKxDg5/FoWWUU1L9wRvXWIZI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4g0jX50jh6zKHMTL; Wed, 22 Apr 2026 10:16:37 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 25618405D5; Wed, 22 Apr 2026 10:16:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S20; Wed, 22 Apr 2026 10:16:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 16/22] ext4: disable online defrag when inode using iomap buffered I/O path Date: Wed, 22 Apr 2026 10:10:36 +0800 Message-ID: <20260422021042.4157510-17-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S20 X-Coremail-Antispam: 1UD129KBjvdXoW7Jw47Gw4Uur15Ary8Jr4xJFb_yoWktwc_ta 97Jry8Ww1YyFsa9398Jas8KrnYkF48GFn5WFZ5Gr18uw1UZ395Gr1vkry2vr98Wr1jqrZ8 CFn7Jr1rKry2gjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbvAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k26cxKx2IYs7xG 6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAVCq3wA2048vs2 IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28E F7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr 1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0D M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1l F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2 IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY 6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17 CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r4j6ryUMIIF 0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCw CI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsG vfC2KfnxnUUI43ZEXa7VU1zpBDUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Online defragmentation does not currently support inodes using the iomap buffered I/O path, as it still relies on buffer_head for the management of sub-folio blocks and on the data=3Dordered mode for data consistency. Signed-off-by: Zhang Yi --- fs/ext4/move_extent.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c index 3329b7ad5dbd..f707a1096544 100644 --- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -476,6 +476,17 @@ static int mext_check_validity(struct inode *orig_inod= e, return -EOPNOTSUPP; } =20 + /* + * TODO: support online defrag for inodes that using the buffered + * I/O iomap path. + */ + if (ext4_inode_buffered_iomap(orig_inode) || + ext4_inode_buffered_iomap(donor_inode)) { + ext4_msg(sb, KERN_ERR, + "Online defrag not supported for inode with iomap buffered IO path"); + return -EOPNOTSUPP; + } + if (donor_inode->i_mode & (S_ISUID|S_ISGID)) { ext4_debug("ext4 move extent: suid or sgid is set to donor file [ino:ori= g %llu, donor %llu]\n", orig_inode->i_ino, donor_inode->i_ino); --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64BEB3537F8; Wed, 22 Apr 2026 02:17:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824228; cv=none; b=JyGv0owC3Mx6Al77cKdTch6Iz3XSwC3qdt/fWubJfQxcygy+kSxu+3seeB4BzIZFvkpZv5kb7nWbwewhT5AG1TzmL1sGT0SI3rBa5A395eBSNk4GiIf5FbLKLuF+T0Oj6TdbcThERaRKfHw5Jj4pITveSyHzxlAPMrwD+VORkss= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824228; c=relaxed/simple; bh=EvEDJ4oSgF5i0Y+bxfo/CO9bC+0K95e4uWWm3+SznEo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VJt6A/MYX40Ezwp4kR7Bke7aPGh6JEyMaxRAWUUK2Z1drZ/A4MqUkkWfZ6LJUa3Sw8GG3IMUa9J5lm21ZYMtzqOJZCuiReMAO8EMi4nrseuY5mHzC/knD9IHMMIuORD7pFgVlXva8KQtw6E9zRX8FC1a57uaVrTU3oOUAhPqZ2s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4g0jX51kRXzKHMTT; Wed, 22 Apr 2026 10:16:37 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 457E0405F8; Wed, 22 Apr 2026 10:16:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S21; Wed, 22 Apr 2026 10:16:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 17/22] ext4: partially enable iomap for the buffered I/O path of regular files Date: Wed, 22 Apr 2026 10:10:37 +0800 Message-ID: <20260422021042.4157510-18-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S21 X-Coremail-Antispam: 1UD129KBjvJXoWxCrW8CFykuryrJr1kAryUtrb_yoWrurWkpr 9xKryrGr4DX3s29w4ftr4UZr1Yv3WxG3yUGrWfurs8ZrWDJw1IqFyUtF1YyF15JrWrWw4Y qF40kr1UursxCrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Partially enable iomap for the buffered I/O path of regular files. We now support default filesystem features, mount options, and the bigalloc feature. However, inline data, fsverity, fscrypt, online defragmentation, and data=3Djournal mode are not yet supported. Some of these features are expected to be gradually supported in the future. The filesystem will automatically fall back to the original buffer_head path if these mount options or features are enabled. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 1 + fs/ext4/ext4_jbd2.c | 1 + fs/ext4/ialloc.c | 1 + fs/ext4/inode.c | 36 ++++++++++++++++++++++++++++++++++++ 4 files changed, 39 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 0ffa81f86bc5..80d086d40990 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3059,6 +3059,7 @@ int ext4_walk_page_buffers(handle_t *handle, int do_journal_get_write_access(handle_t *handle, struct inode *inode, struct buffer_head *bh); void ext4_set_inode_mapping_order(struct inode *inode); +void ext4_enable_buffered_iomap(struct inode *inode); int ext4_nonda_switch(struct super_block *sb); #define FALL_BACK_TO_NONDELALLOC 1 #define CONVERT_INLINE_DATA 2 diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 9a8c225f2753..9b25a1c414b9 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -16,6 +16,7 @@ int ext4_inode_journal_mode(struct inode *inode) ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) || test_opt(inode->i_sb, DATA_FLAGS) =3D=3D EXT4_MOUNT_JOURNAL_DATA || (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) && + !ext4_inode_buffered_iomap(inode) && !test_opt(inode->i_sb, DELALLOC))) { /* We do not support data journalling for encrypted data */ if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode)) diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 3fd8f0099852..ea64b9e9e382 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1340,6 +1340,7 @@ struct inode *__ext4_new_inode(struct mnt_idmap *idma= p, } } =20 + ext4_enable_buffered_iomap(inode); ext4_set_inode_mapping_order(inode); =20 ext4_update_inode_fsync_trans(handle, inode, 1); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 53fdcb50f3dd..57b5708235cf 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -918,6 +918,9 @@ static int _ext4_get_block(struct inode *inode, sector_= t iblock, =20 if (ext4_has_inline_data(inode)) return -ERANGE; + /* inodes using the iomap buffered I/O path should not go here. */ + if (WARN_ON_ONCE(ext4_inode_buffered_iomap(inode))) + return -EINVAL; =20 map.m_lblk =3D iblock; map.m_len =3D bh->b_size >> inode->i_blkbits; @@ -2797,6 +2800,12 @@ static int ext4_do_writepages(struct mpage_da_data *= mpd) if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) goto out_writepages; =20 + /* inodes using the iomap buffered I/O path should not go here. */ + if (WARN_ON_ONCE(ext4_inode_buffered_iomap(inode))) { + ret =3D -EINVAL; + goto out_writepages; + } + /* * If the filesystem has aborted, it is read-only, so return * right away instead of dumping stack traces later on that @@ -5737,6 +5746,31 @@ static int check_igot_inode(struct inode *inode, ext= 4_iget_flags flags, return -EFSCORRUPTED; } =20 +void ext4_enable_buffered_iomap(struct inode *inode) +{ + struct super_block *sb =3D inode->i_sb; + + if (!S_ISREG(inode->i_mode)) + return; + if (ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE)) + return; + + /* Unsupported Features */ + if (ext4_has_feature_inline_data(sb)) + return; + if (ext4_has_feature_verity(sb)) + return; + if (ext4_has_feature_encrypt(sb)) + return; + if (test_opt(sb, DATA_FLAGS) =3D=3D EXT4_MOUNT_JOURNAL_DATA || + ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA)) + return; + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) + return; + + ext4_set_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP); +} + void ext4_set_inode_mapping_order(struct inode *inode) { struct super_block *sb =3D inode->i_sb; @@ -6022,6 +6056,8 @@ struct inode *__ext4_iget(struct super_block *sb, uns= igned long ino, if (ret) goto bad_inode; =20 + ext4_enable_buffered_iomap(inode); + if (S_ISREG(inode->i_mode)) { inode->i_op =3D &ext4_file_inode_operations; inode->i_fop =3D &ext4_file_operations; --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2177D351C1F; Wed, 22 Apr 2026 02:17:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824228; cv=none; b=utZumcCcxKN5OJfm5Gvm5jepUgSu+5w8vzb0n8WDazO9kgxqDDN+9Bw/l2KDg8z5JZhiSqy3ugR9mJ6n8Ae8LhrkW+7Vvz6aHEOs85+6t1Ru5Ev58Ye1ilDzyQOYa6nKtYZRLBbAL6+2SqQjf5i+9Mi9UvjYceg3jEprh6g4Ebg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824228; c=relaxed/simple; bh=8p0Z9TxyTBGaGe+jUygLLGUtsOnSGgzFOKdWwjlKTR0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BCQ0VibC+r3TqYoTBdzaWG1jD07kDUkXjvjN3ABsskzXrcMZc99/4VWkGpwgXK3gTPGOUbigE/O2EcQWDzbij9E6b6JgKrWNqLOBvEf/IQAhGUXzdLCMUeTclyHYBhiHO8GwjKj4imxw4QFFO3VMkeraIQcCbQvskLIFdX5NCks= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4g0jX52P44zKHMTs; Wed, 22 Apr 2026 10:16:37 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 589CD405F9; Wed, 22 Apr 2026 10:16:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S22; Wed, 22 Apr 2026 10:16:57 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 18/22] ext4: introduce a mount option for iomap buffered I/O path Date: Wed, 22 Apr 2026 10:10:38 +0800 Message-ID: <20260422021042.4157510-19-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S22 X-Coremail-Antispam: 1UD129KBjvJXoWxXr48Cw45uw1ftF4UKFWfXwb_yoW5Cr1xpr 90kFy8Gr1kXryF93yxuF48Gr1Fy3Z09a1UCrWFgrsrWFZrAryxXFyfKFn5CFWagrW8X34I qF18Ww17WF43CrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Since the iomap buffered I/O path does not yet support all existing features, it cannot be enabled by default. Introduce 'buffered_iomap' and 'nobuffered_iomap' mount options to enable and disable the iomap buffered I/O path for regular files. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 1 + fs/ext4/inode.c | 2 ++ fs/ext4/super.c | 7 +++++++ 3 files changed, 10 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 80d086d40990..60ba488b01c5 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1281,6 +1281,7 @@ struct ext4_inode_info { * scanning in mballoc */ #define EXT4_MOUNT2_ABORT 0x00000100 /* Abort filesystem */ +#define EXT4_MOUNT2_BUFFERED_IOMAP 0x00000200 /* Use iomap for buffered I/= O */ =20 #define clear_opt(sb, opt) EXT4_SB(sb)->s_mount_opt &=3D \ ~EXT4_MOUNT_##opt diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 57b5708235cf..d2f7af7922d7 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5750,6 +5750,8 @@ void ext4_enable_buffered_iomap(struct inode *inode) { struct super_block *sb =3D inode->i_sb; =20 + if (!test_opt2(sb, BUFFERED_IOMAP)) + return; if (!S_ISREG(inode->i_mode)) return; if (ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE)) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 62bfe05a64bc..b2da4834b6bb 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1722,6 +1722,7 @@ enum { Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable, Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache, Opt_no_prefetch_block_bitmaps, Opt_mb_optimize_scan, + Opt_buffered_iomap, Opt_nobuffered_iomap, Opt_errors, Opt_data, Opt_data_err, Opt_jqfmt, Opt_dax_type, #ifdef CONFIG_EXT4_DEBUG Opt_fc_debug_max_replay, Opt_fc_debug_force @@ -1860,6 +1861,8 @@ static const struct fs_parameter_spec ext4_param_spec= s[] =3D { fsparam_flag ("no_prefetch_block_bitmaps", Opt_no_prefetch_block_bitmaps), fsparam_s32 ("mb_optimize_scan", Opt_mb_optimize_scan), + fsparam_flag ("buffered_iomap", Opt_buffered_iomap), + fsparam_flag ("nobuffered_iomap", Opt_nobuffered_iomap), fsparam_string ("check", Opt_removed), /* mount option from ext2/3 */ fsparam_flag ("nocheck", Opt_removed), /* mount option from ext2/3 */ fsparam_flag ("reservation", Opt_removed), /* mount option from ext2/3 */ @@ -1953,6 +1956,10 @@ static const struct mount_opts { {Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET}, {Opt_no_prefetch_block_bitmaps, EXT4_MOUNT_NO_PREFETCH_BLOCK_BITMAPS, MOPT_SET}, + {Opt_buffered_iomap, EXT4_MOUNT2_BUFFERED_IOMAP, + MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY}, + {Opt_nobuffered_iomap, EXT4_MOUNT2_BUFFERED_IOMAP, + MOPT_CLEAR | MOPT_2 | MOPT_EXT4_ONLY}, #ifdef CONFIG_EXT4_DEBUG {Opt_fc_debug_force, EXT4_MOUNT2_JOURNAL_FAST_COMMIT, MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY}, --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 596B43537D2; Wed, 22 Apr 2026 02:17:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824227; cv=none; b=TWrdYo7IJE/MAsRUOJzNceIJz5qz5kMdPMiVLUoyfqd5JRbIyCoYi3jKjG1xEqSCQ49IpPwNEhbHZbMOYZ/0v/RrChJymDxwSaE2ahSY9vGNktujOy0pOtYA9SrFvgrBgpHYOWjo4XOHqpVv/Y99ZYJw6RliRUmR2BvdSpxLaKY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824227; c=relaxed/simple; bh=YXZHG7P9OY/dRX5x+8FscTuosfa5/dTxnRRQxgnAInY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=H85vH6KuXpCQzZN4o7I2zmIhl6QjlOPpe1Opg6Q7W1rcHNj5yzmsiTWUyB9vYGBbb2FFnAUiSdylLNFCC3WRRRF07z1CZVo8W4YEoR+KK5J6qo5OFCS/xt4RioLlhygHMbRULDTWOu5cGCL/iLzpTB73HAgluyLpj5V5bYisdQQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWN1YmtzYQts7; Wed, 22 Apr 2026 10:16:00 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 677A540609; Wed, 22 Apr 2026 10:16:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S23; Wed, 22 Apr 2026 10:16:57 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 19/22] ext4: submit zeroed post-EOF data immediately in the iomap buffered I/O path Date: Wed, 22 Apr 2026 10:10:39 +0800 Message-ID: <20260422021042.4157510-20-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S23 X-Coremail-Antispam: 1UD129KBjvJXoWxuF1UKr1rKr1fAry8uFyUJrb_yoW5ZrWxpr W3Kw1rAw4q9F9F9r4SqF17Xr1aka1rGw48GFWxWr40vay3X3WrKFy2k34rAFWUtr45Way2 qF45JFWDWF1UArJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi In the generic buffered_head I/O path, we rely on the data=3Dorder mode to ensure that the zeroed EOF block data is written before updating i_disksize, thus preventing stale data from being exposed. However, the iomap buffered I/O path cannot use this mechanism. Instead, we issue the I/O immediately after performing the zero operation (without synchronous waiting). This can reduce the risk of exposing stale data, but it does not guarantee that the zero data will be flushed to disk before the metadata of i_disksize is updated. The subsequent patches will wait for this I/O to complete before updating i_disksize. Suggested-by: Jan Kara Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 58 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 47 insertions(+), 11 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index d2f7af7922d7..d55899c1ef4c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4766,8 +4766,10 @@ int ext4_block_zero_eof(struct inode *inode, loff_t = from, loff_t end) if (IS_ENCRYPTED(inode) && !fscrypt_has_encryption_key(inode)) return 0; =20 - if (length > blocksize - offset) + if (length > blocksize - offset) { length =3D blocksize - offset; + end =3D from + length; + } =20 err =3D ext4_block_zero_range(inode, from, length, &did_zero, &zero_written); @@ -4782,18 +4784,52 @@ int ext4_block_zero_eof(struct inode *inode, loff_t= from, loff_t end) * TODO: In the iomap path, handle this by updating i_disksize to * i_size after the zeroed data has been written back. */ - if (ext4_should_order_data(inode) && - did_zero && zero_written && !IS_DAX(inode)) { - handle_t *handle; + if (did_zero && zero_written && !IS_DAX(inode)) { + if (ext4_should_order_data(inode)) { + handle_t *handle; =20 - handle =3D ext4_journal_start(inode, EXT4_HT_MISC, 1); - if (IS_ERR(handle)) - return PTR_ERR(handle); + handle =3D ext4_journal_start(inode, EXT4_HT_MISC, 1); + if (IS_ERR(handle)) + return PTR_ERR(handle); =20 - err =3D ext4_jbd2_inode_add_write(handle, inode, from, length); - ext4_journal_stop(handle); - if (err) - return err; + err =3D ext4_jbd2_inode_add_write(handle, inode, from, + length); + ext4_journal_stop(handle); + if (err) + return err; + /* + * inodes using the iomap buffered I/O path do not use the + * data=3Dordered mode. We submit zeroed range here. + * + * TODO: The end_io process needs to wait for I/O to completes + * before updating i_disksize. + */ + } else if (ext4_inode_buffered_iomap(inode)) { + struct folio *folio; + bool do_submit =3D false; + + folio =3D filemap_lock_folio(inode->i_mapping, + from >> PAGE_SHIFT); + if (IS_ERR(folio)) + /* Already writeback and clear? */ + return PTR_ERR(folio) =3D=3D -ENOENT ? 0 : + PTR_ERR(folio); + + folio_wait_writeback(folio); + WARN_ON_ONCE(folio_test_writeback(folio)); + + if (likely(folio_test_dirty(folio))) + do_submit =3D true; + folio_unlock(folio); + folio_put(folio); + + if (do_submit) { + err =3D filemap_fdatawrite_range(inode->i_mapping, + from, end - 1); + if (err) + return err; + } + } } =20 return 0; --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC47D3126BF; Wed, 22 Apr 2026 02:17:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824227; cv=none; b=ecDZi5DtIgD8grnjDRwQHiidDxr0kF7xJXMZIuNCwhQ6sMBpHM14UGmxdMq83KYAwQysTvxPQy7aXw3xDUDqC+ZIZ5/VzL1dKKZym6Fj4eCg86M9UTAXBNaTMs9RSHg2hXa2aizNec3JYKfs2VAZ55JfRrihtkLjoDtH/gL/+Zo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824227; c=relaxed/simple; bh=VuThvL7HFbKLHMEQCWYtRfLW/E7czcP9Vozi62Ou0Dg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=p1u73oLamVl4weVm2UDyM8NaXsT1rfs4ZDP8NfheSrrOmeRVSjy8h+gSF+xjgPBV27TaCyzBEoM+eb+lbfstvR4p3pwifNjd6fhI9bQ9rN4/o2bWhTizi8YrwaFA3mTnl4qyeem9a3fJWn3pEl+t0MUi4Fd/Zp3RG4tgwW/Gu10= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWN2FztzYQts7; Wed, 22 Apr 2026 10:16:00 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 7F521405D4; Wed, 22 Apr 2026 10:16:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S24; Wed, 22 Apr 2026 10:16:57 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 20/22] ext4: wait for ordered I/O in the iomap buffered I/O path Date: Wed, 22 Apr 2026 10:10:40 +0800 Message-ID: <20260422021042.4157510-21-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S24 X-Coremail-Antispam: 1UD129KBjvJXoW3Jr4ktrW3ZrWUWrWrCFWxZwb_yoWfur1DpF W3GryrGw48ZF929rs3Xw48Zr1Fq3WxKayrJFWfWanIvayUGryIkF1FyF15ZFyUKrZrJrWI qF48Jr47Wr1DJrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW5JVW7JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Wait for ordered I/O to complete before updating i_disksize. This ensures zeroed data is flushed to disk before the i_disksize metadata is updated, preventing stale data exposure during unaligned post-EOF append writes. Suggested-by: Jan Kara Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 11 +++++++++ fs/ext4/inode.c | 62 ++++++++++++++++++++++++++++++++++++++++++----- fs/ext4/page-io.c | 53 ++++++++++++++++++++++++++++++++++++++++ fs/ext4/super.c | 23 +++++++++++++----- 4 files changed, 137 insertions(+), 12 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 60ba488b01c5..760400395cb7 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1195,6 +1195,15 @@ struct ext4_inode_info { #ifdef CONFIG_FS_ENCRYPTION struct fscrypt_inode_info *i_crypt_info; #endif + + /* + * Track ordered zeroed data during post-EOF append writes, fallocate, + * and truncate-up operations. These parameters are used only in the + * iomap buffered I/O path. + */ + ext4_lblk_t i_ordered_lblk; + ext4_lblk_t i_ordered_len; + wait_queue_head_t i_ordered_wq; }; =20 /* @@ -3877,6 +3886,8 @@ extern int ext4_move_extents(struct file *o_filp, str= uct file *d_filp, __u64 len, __u64 *moved_len); =20 /* page-io.c */ +#define EXT4_IOMAP_IOEND_ORDER_IO 1UL /* This I/O is an ordered one */ + extern int __init ext4_init_pageio(void); extern void ext4_exit_pageio(void); extern ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index d55899c1ef4c..17bd4403c782 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4352,12 +4352,37 @@ static int ext4_iomap_writeback_submit(struct iomap= _writepage_ctx *wpc, { struct iomap_ioend *ioend =3D wpc->wb_ctx; struct ext4_inode_info *ei =3D EXT4_I(ioend->io_inode); + ext4_lblk_t start, end, order_lblk, order_len; =20 /* Need to convert unwritten extents when I/Os are completed. */ if ((ioend->io_flags & IOMAP_IOEND_UNWRITTEN) || ioend->io_offset + ioend->io_size > READ_ONCE(ei->i_disksize)) ioend->io_bio.bi_end_io =3D ext4_iomap_end_bio; =20 + /* + * Mark the I/O as ordered. Ordered I/O requires separate endio + * handling and must not be merged with regular I/O operations. + */ + order_len =3D READ_ONCE(ei->i_ordered_len); + if (order_len) { + /* + * Pair with smp_store_release() in ext4_block_zero_eof(). + * Ensure we see the updated i_ordered_lblk that was written + * before the release store to i_ordered_len. + */ + smp_rmb(); + order_lblk =3D READ_ONCE(ei->i_ordered_lblk); + start =3D ioend->io_offset >> ioend->io_inode->i_blkbits; + end =3D EXT4_B_TO_LBLK(ioend->io_inode, + ioend->io_offset + ioend->io_size); + + if (start <=3D order_lblk && end >=3D order_lblk + order_len) { + ioend->io_bio.bi_end_io =3D ext4_iomap_end_bio; + ioend->io_private =3D (void *)EXT4_IOMAP_IOEND_ORDER_IO; + ioend->io_flags |=3D IOMAP_IOEND_BOUNDARY; + } + } + return iomap_ioend_writeback_submit(wpc, error); } =20 @@ -4799,12 +4824,12 @@ int ext4_block_zero_eof(struct inode *inode, loff_t= from, loff_t end) return err; /* * inodes using the iomap buffered I/O path do not use the - * data=3Dordered mode. We submit zeroed range here. - * - * TODO: The end_io process needs to wait for I/O to completes - * before updating i_disksize. + * data=3Dordered mode. Submit zeroed range here. The end_io + * handler ext4_iomap_wb_ordered_wait() will wait for I/O + * completion before updating i_disksize. */ } else if (ext4_inode_buffered_iomap(inode)) { + struct ext4_inode_info *ei =3D EXT4_I(inode); struct folio *folio; bool do_submit =3D false; =20 @@ -4818,16 +4843,41 @@ int ext4_block_zero_eof(struct inode *inode, loff_t= from, loff_t end) folio_wait_writeback(folio); WARN_ON_ONCE(folio_test_writeback(folio)); =20 - if (likely(folio_test_dirty(folio))) + /* + * Mark the ordered range. It will be cleared upon + * I/O completion in ext4_iomap_end_bio(). + */ + if (likely(folio_test_dirty(folio)) && + READ_ONCE(ei->i_ordered_len) =3D=3D 0) { + WRITE_ONCE(ei->i_ordered_lblk, + from >> inode->i_blkbits); + /* + * Pairs with smp_rmb() in + * ext4_iomap_writeback_submit() and + * ext4_iomap_wb_ordered_wait(). Ensure the + * updated i_ordered_lblk is visible when + * i_ordered_len becomes non-zero. + */ + smp_store_release(&ei->i_ordered_len, 1); do_submit =3D true; + } folio_unlock(folio); folio_put(folio); =20 if (do_submit) { err =3D filemap_fdatawrite_range(inode->i_mapping, from, end - 1); - if (err) + if (err) { + /* + * Pairs with wait_event() in + * ext4_iomap_wb_ordered_wait(). Ensure + * i_ordered_len =3D 0 is visible before + * waking up waiters. + */ + smp_store_release(&ei->i_ordered_len, 0); + wake_up_all(&ei->i_ordered_wq); return err; + } } } } diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 07978e2cd9c8..9c88671836fe 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -613,6 +613,39 @@ int ext4_bio_write_folio(struct ext4_io_submit *io, st= ruct folio *folio, return 0; } =20 +/* + * If the old disk size is not block size aligned and the current + * writeback range is entirely beyond the old EOF block, we should + * wait for the zeroed data written in ext4_block_zero_eof() to be + * written out, otherwise, it may expose stale data in that block. + */ +static void ext4_iomap_wb_ordered_wait(struct inode *inode, + loff_t pos, loff_t end) +{ + struct ext4_inode_info *ei =3D EXT4_I(inode); + unsigned int blocksize =3D i_blocksize(inode); + loff_t disksize =3D READ_ONCE(ei->i_disksize); + ext4_lblk_t order_lblk, order_len; + + if (!(disksize & (blocksize - 1)) || + pos <=3D round_up(disksize, blocksize)) + return; + + order_len =3D READ_ONCE(ei->i_ordered_len); + if (!order_len) + return; + + /* + * Pair with smp_store_release() in ext4_iomap_end_bio() and + * ext4_block_zero_eof(). Ensure we see the updated i_ordered_lblk + * that was written before the release store to i_ordered_len. + */ + smp_rmb(); + order_lblk =3D READ_ONCE(ei->i_ordered_lblk); + if ((pos >> inode->i_blkbits) >=3D order_lblk + order_len) + wait_event(ei->i_ordered_wq, READ_ONCE(ei->i_ordered_len) =3D=3D 0); +} + static int ext4_iomap_wb_update_disksize(handle_t *handle, struct inode *i= node, loff_t end) { @@ -656,6 +689,9 @@ static void ext4_iomap_finish_ioend(struct iomap_ioend = *ioend) goto out; } =20 + /* Wait ordered zero data to be written out. */ + ext4_iomap_wb_ordered_wait(inode, pos, pos + size); + /* We may need to convert one extent and dirty the inode. */ credits =3D ext4_chunk_trans_blocks(inode, EXT4_MAX_BLOCKS(size, pos, inode->i_blkbits)); @@ -717,9 +753,26 @@ void ext4_iomap_end_bio(struct bio *bio) struct inode *inode =3D ioend->io_inode; struct ext4_inode_info *ei =3D EXT4_I(inode); struct ext4_sb_info *sbi =3D EXT4_SB(inode->i_sb); + unsigned long io_mode =3D (unsigned long)ioend->io_private; unsigned long flags; int ret; =20 + /* + * This is an ordered I/O, clear the ordered range set in + * ext4_block_zero_eof() and wake up all waiters that will update + * the inode i_disksize. + */ + if (io_mode =3D=3D EXT4_IOMAP_IOEND_ORDER_IO) { + /* + * Pairs with wait_event() in ext4_iomap_wb_ordered_wait(). + * Ensure i_ordered_len =3D 0 is visible before waking up + * waiters. + */ + smp_store_release(&ei->i_ordered_len, 0); + wake_up_all(&ei->i_ordered_wq); + goto defer; + } + /* Needs to convert unwritten extents or update the i_disksize. */ if ((ioend->io_flags & IOMAP_IOEND_UNWRITTEN) || ioend->io_offset + ioend->io_size > READ_ONCE(ei->i_disksize)) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index b2da4834b6bb..2fc07739c9e8 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1444,6 +1444,9 @@ static struct inode *ext4_alloc_inode(struct super_bl= ock *sb) ext4_fc_init_inode(&ei->vfs_inode); spin_lock_init(&ei->i_fc_lock); mmb_init(&ei->i_metadata_bhs, &ei->vfs_inode.i_data); + ei->i_ordered_lblk =3D 0; + ei->i_ordered_len =3D 0; + init_waitqueue_head(&ei->i_ordered_wq); return &ei->vfs_inode; } =20 @@ -1480,12 +1483,20 @@ static void ext4_destroy_inode(struct inode *inode) dump_stack(); } =20 - if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ERROR_FS) && - WARN_ON_ONCE(EXT4_I(inode)->i_reserved_data_blocks)) - ext4_msg(inode->i_sb, KERN_ERR, - "Inode %llu (%p): i_reserved_data_blocks (%u) not cleared!", - inode->i_ino, EXT4_I(inode), - EXT4_I(inode)->i_reserved_data_blocks); + if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ERROR_FS)) { + if (WARN_ON_ONCE(EXT4_I(inode)->i_reserved_data_blocks)) + ext4_msg(inode->i_sb, KERN_ERR, + "Inode %llu (%p): i_reserved_data_blocks (%u) not cleared!", + inode->i_ino, EXT4_I(inode), + EXT4_I(inode)->i_reserved_data_blocks); + + if (WARN_ON_ONCE(EXT4_I(inode)->i_ordered_len)) + ext4_msg(inode->i_sb, KERN_ERR, + "Inode %llu (%p): i_ordered_lblk (%u) and i_ordered_len (%u) not clea= red!", + inode->i_ino, EXT4_I(inode), + EXT4_I(inode)->i_ordered_lblk, + EXT4_I(inode)->i_ordered_len); + } } =20 static void ext4_shutdown(struct super_block *sb) --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07C7B318EE4; Wed, 22 Apr 2026 02:17:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824229; cv=none; b=ry/cNvpNgRfdFENvBwi4YaRq1oU4rFl6zRkN1B+gVv+vTFg/2OZQPP1opkWBpZNE5t8dkSs6gUxMoeF4fS/QlrkDyilRC75yi4+nD9fTAh5A9w8dIF7xdGaYrGNZXz+Qv2zB8hmDp8j2C1xu9jaOi1BYAi99KqUXqvDHXNo8ihg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824229; c=relaxed/simple; bh=Az1p7Z0+js2vdk8ppJIgCWml4zi9NJHrRJulMljaR50=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fP4Z9+L83nqYOxnQmp4AFnWh/rYMAl+TtUxDXbH0wHK8UotEmQMbIqwSWeqvzH5Sz7q6BprgEukzHFGc+A/0WXKXI1+O0XpcagXArMCdF7YVhBjn9i2ijpUQjKkGF2mty/5qYyTzpNVtRwnuyrp0X9ZPu5pWcBpUb0ynbnfTmLY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4g0jX53mNGzKHMVC; Wed, 22 Apr 2026 10:16:37 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 8C63B4059E; Wed, 22 Apr 2026 10:16:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S25; Wed, 22 Apr 2026 10:16:57 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 21/22] ext4: update i_disksize to i_size on ordered I/O completion Date: Wed, 22 Apr 2026 10:10:41 +0800 Message-ID: <20260422021042.4157510-22-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S25 X-Coremail-Antispam: 1UD129KBjvJXoW3WF13JFWxZw1kXr4kuF45KFg_yoWfXrWkpF W5K34rAw18XasF9rs2qryUXw1Fva18Gw48JFy7ur4vvFy5Awn2vFyxtryfCFW8trZ5Xw4j qFWktr48Wr1kAr7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW5JVW7JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Currently, i_disksize is updated after ordered data writeback to prevent exposing stale data in the post-EOF block. However, operations like fallocate and truncate update i_disksize directly. If the new i_disksize exceeds the original value, metadata may be written back before the zeroed data is persisted. To avoid this, we defer i_disksize updates when i_ordered_len is non-zero, only applying them after ordered I/O completes. But this deferral introduces a new problem: on ordered I/O completion, i_disksize is updated only to the end of that specific I/O, discarding any later updates (e.g., from fallocate) and causing filesystem inconsistency. A potential fix would involve scanning for dirty or writeback folios beyond the current position, then updating i_disksize to the start of the first such folio or to i_size. However, folio scanning is expensive and concurrency with operations like fallocate makes this approach prohibitively complex. Instead, update i_disksize directly to i_size upon ordered I/O completion. This may expose zeroed data if dirty data within the range is not yet written to disk after crash recovery, but it will never expose stale data. The is limited to unaligned append writes and is deemed acceptable. Suggested-by: Jan Kara Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 40 +++++++++++++++++++++++++++++++--------- fs/ext4/extents.c | 9 +++------ fs/ext4/inode.c | 3 --- fs/ext4/page-io.c | 23 ++++++++++++++++++----- 4 files changed, 52 insertions(+), 23 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 760400395cb7..59dcec47675f 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3495,13 +3495,21 @@ do { \ #define EXT4_FREECLUSTERS_WATERMARK 0 #endif =20 -/* Update i_disksize. Requires i_rwsem to avoid races with truncate */ +/* + * Update i_disksize. Requires i_rwsem to avoid races with truncate. + * + * In the iomap buffered I/O path, a non-zero i_ordered_len indicates that + * an ordered I/O (zeroing the EOF partial block) is still in progress. + * In that case, i_disksize will be updated after the ordered data has + * been written out. + */ static inline void ext4_update_i_disksize(struct inode *inode, loff_t news= ize) { WARN_ON_ONCE(S_ISREG(inode->i_mode) && !inode_is_locked(inode)); down_write(&EXT4_I(inode)->i_data_sem); - if (newsize > EXT4_I(inode)->i_disksize) + if (newsize > EXT4_I(inode)->i_disksize && + READ_ONCE(EXT4_I(inode)->i_ordered_len) =3D=3D 0) WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize); up_write(&EXT4_I(inode)->i_data_sem); } @@ -3515,7 +3523,8 @@ static inline int ext4_update_inode_size(struct inode= *inode, loff_t newsize) i_size_write(inode, newsize); changed =3D 1; } - if (newsize > EXT4_I(inode)->i_disksize) { + if (newsize > EXT4_I(inode)->i_disksize && + READ_ONCE(EXT4_I(inode)->i_ordered_len) =3D=3D 0) { ext4_update_i_disksize(inode, newsize); changed |=3D 2; } @@ -3523,19 +3532,32 @@ static inline int ext4_update_inode_size(struct ino= de *inode, loff_t newsize) } =20 /* - * Set i_size and i_disksize to 'newsize'. + * Set i_size and i_disksize to 'newsize'. In the iomap buffered I/O path, + * if i_ordered_len is non-zero and newsize exceeds the current i_disksize, + * the actual i_disksize update is deferred until after the ordered data is + * written out. In that case, i_disksize will be set to i_size upon I/O + * completion. * * Both i_rwsem and i_data_sem are required here to avoid races between - * generic append writeback and concurrent truncate that also modify - * i_size and i_disksize. + * generic append writeback (or ordered I/O writeback) and concurrent + * operations like fallocate and truncate that also modify i_size and + * i_disksize. */ -static inline void ext4_set_inode_size(struct inode *inode, loff_t newsize) +static inline void __ext4_set_inode_size(struct inode *inode, loff_t newsi= ze) { WARN_ON_ONCE(S_ISREG(inode->i_mode) && !inode_is_locked(inode)); + WARN_ON_ONCE(!rwsem_is_locked(&EXT4_I(inode)->i_data_sem)); =20 - down_write(&EXT4_I(inode)->i_data_sem); i_size_write(inode, newsize); - EXT4_I(inode)->i_disksize =3D newsize; + if (READ_ONCE(EXT4_I(inode)->i_ordered_len) =3D=3D 0 || + newsize < EXT4_I(inode)->i_disksize) + EXT4_I(inode)->i_disksize =3D newsize; +} + +static inline void ext4_set_inode_size(struct inode *inode, loff_t newsize) +{ + down_write(&EXT4_I(inode)->i_data_sem); + __ext4_set_inode_size(inode, newsize); up_write(&EXT4_I(inode)->i_data_sem); } =20 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 125f628e738a..e0c36cd920bf 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5531,7 +5531,7 @@ static int ext4_collapse_range(struct file *file, lof= f_t offset, loff_t len) ext4_lblk_t start_lblk, end_lblk; handle_t *handle; unsigned int credits; - loff_t start, new_size; + loff_t start; int ret; =20 trace_ext4_collapse_range(inode, offset, len); @@ -5597,9 +5597,7 @@ static int ext4_collapse_range(struct file *file, lof= f_t offset, loff_t len) goto out_handle; } =20 - new_size =3D inode->i_size - len; - i_size_write(inode, new_size); - EXT4_I(inode)->i_disksize =3D new_size; + __ext4_set_inode_size(inode, inode->i_size - len); =20 up_write(&EXT4_I(inode)->i_data_sem); ret =3D ext4_mark_inode_dirty(handle, inode); @@ -5671,8 +5669,7 @@ static int ext4_insert_range(struct file *file, loff_= t offset, loff_t len) ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle); =20 /* Expand file to avoid data loss if there is error while shifting */ - inode->i_size +=3D len; - EXT4_I(inode)->i_disksize +=3D len; + ext4_set_inode_size(inode, inode->i_size + len); ret =3D ext4_mark_inode_dirty(handle, inode); if (ret) goto out_handle; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 17bd4403c782..d983336390c7 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4805,9 +4805,6 @@ int ext4_block_zero_eof(struct inode *inode, loff_t f= rom, loff_t end) * truncating up or performing an append write, because there might be * exposing stale on-disk data which may caused by concurrent post-EOF * mmap write during folio writeback. - * - * TODO: In the iomap path, handle this by updating i_disksize to - * i_size after the zeroed data has been written back. */ if (did_zero && zero_written && !IS_DAX(inode)) { if (ext4_should_order_data(inode)) { diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 9c88671836fe..589c74b9f8a3 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -647,13 +647,13 @@ static void ext4_iomap_wb_ordered_wait(struct inode *= inode, } =20 static int ext4_iomap_wb_update_disksize(handle_t *handle, struct inode *i= node, - loff_t end) + loff_t end, bool is_ordered) { - loff_t new_disksize =3D end; + loff_t new_disksize, i_size; struct ext4_inode_info *ei =3D EXT4_I(inode); int ret; =20 - if (new_disksize <=3D READ_ONCE(ei->i_disksize)) + if (end <=3D READ_ONCE(ei->i_disksize) && !is_ordered) return 0; =20 /* @@ -661,7 +661,18 @@ static int ext4_iomap_wb_update_disksize(handle_t *han= dle, struct inode *inode, * are avoided by checking i_size under i_data_sem. */ down_write(&ei->i_data_sem); - new_disksize =3D min(new_disksize, i_size_read(inode)); + i_size =3D i_size_read(inode); + + /* + * Update i_disksize to i_size when completing an ordered I/O that + * zeroes the old EOF partial block. This ensures i_disksize is + * correctly advanced during truncate-up on a blocksize-unaligned + * file, preventing it from remaining stale. A downside is that + * zeroed data may be exposed after crash recovery if the dirty + * data in this range is not yet on disk, but stale data will + * never be exposed. + */ + new_disksize =3D is_ordered ? i_size : min(end, i_size); if (new_disksize > ei->i_disksize) ei->i_disksize =3D new_disksize; up_write(&ei->i_data_sem); @@ -678,6 +689,7 @@ static void ext4_iomap_finish_ioend(struct iomap_ioend = *ioend) struct super_block *sb =3D inode->i_sb; loff_t pos =3D ioend->io_offset; size_t size =3D ioend->io_size; + unsigned long io_mode =3D (unsigned long)ioend->io_private; handle_t *handle; int credits; int ret, err; @@ -707,7 +719,8 @@ static void ext4_iomap_finish_ioend(struct iomap_ioend = *ioend) goto out_journal; } =20 - ret =3D ext4_iomap_wb_update_disksize(handle, inode, pos + size); + ret =3D ext4_iomap_wb_update_disksize(handle, inode, pos + size, + io_mode =3D=3D EXT4_IOMAP_IOEND_ORDER_IO); out_journal: err =3D ext4_journal_stop(handle); if (!ret) --=20 2.52.0 From nobody Wed Jun 17 02:57:53 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC4AD35838A; Wed, 22 Apr 2026 02:17:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824227; cv=none; b=Ts3OCMYw0i8y7IODhCUdn1UXlgSIA/Lvvad4pC2aCuMqpPB+l7pTDFhXEWE9RZhSgsgrZI6522C2lJwdyfR6ohrJ4puBTi5gwcCeKhrzVDfnR8B7FzhAAdwWSyW/JOBdpUK40H212DEo8llKOgZ7sIAkQyiBChpFlREpdtPxF3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776824227; c=relaxed/simple; bh=cD0bCy3g9mJTd0A1fuNVh0uWNy9F36geLGfTMHjUsw0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Uf6iYW7O9A8GSDRNQKeE+VI0KQ8KBxcdTKDsFGlWPVjNSwB2sxH6kcNam6sta5uryHt2ZOSI708CV98t6nU7oFL0mxHCC2Ej7AMhxnoEBycl1Y2Fwoh3MMkjgm5Q3RhqV2YmhZPMf+mQ2dFYuQS+MZboPyJ2PsGjoawuJKyQhwM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4g0jWN3CQQzYQtrW; Wed, 22 Apr 2026 10:16:00 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id A0CE940603; Wed, 22 Apr 2026 10:16:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP3 (Coremail) with SMTP id _Ch0CgB3JL6PL+hpqkgUBQ--.2635S26; Wed, 22 Apr 2026 10:16:57 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, djwong@kernel.org, hch@infradead.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH v3 22/22] ext4: add tracepoints for ordered I/O in the iomap buffered I/O path Date: Wed, 22 Apr 2026 10:10:42 +0800 Message-ID: <20260422021042.4157510-23-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> References: <20260422021042.4157510-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _Ch0CgB3JL6PL+hpqkgUBQ--.2635S26 X-Coremail-Antispam: 1UD129KBjvJXoW3Ww1xury5uFyDuFW7Cw18uFg_yoWxCFyrpF 1DCFyrGw48Zrn09w4xXw4Iqr4YvF4rCa18try3WFyDZ3yxAr92kF47tF90vFy8tr4qkryI gF4DArWkKw1DXrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW5JVW7JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0JUWMKtUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi To facilitate the tracing of ordered I/Os in the iomap buffered I/O path, add tracepoints to track the ordered I/O flow: - ext4_iomap_ordered_submit: trace when ordered I/O is being submitted; - ext4_iomap_ordered_complete: trace when ordered I/O completes; - ext4_iomap_disksize_update: trace when i_disksize is updated, either when appending I/O or when an ordered I/O completes; - ext4_block_zero_eof - trace zero EOF partial block. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 4 ++ fs/ext4/page-io.c | 8 +++ include/trace/events/ext4.h | 97 +++++++++++++++++++++++++++++++++++++ 3 files changed, 109 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index d983336390c7..ca4284da2a2b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4377,6 +4377,9 @@ static int ext4_iomap_writeback_submit(struct iomap_w= ritepage_ctx *wpc, ioend->io_offset + ioend->io_size); =20 if (start <=3D order_lblk && end >=3D order_lblk + order_len) { + trace_ext4_iomap_ordered_submit(ioend->io_inode, + ioend->io_offset, ioend->io_size, + order_lblk, order_len); ioend->io_bio.bi_end_io =3D ext4_iomap_end_bio; ioend->io_private =3D (void *)EXT4_IOMAP_IOEND_ORDER_IO; ioend->io_flags |=3D IOMAP_IOEND_BOUNDARY; @@ -4879,6 +4882,7 @@ int ext4_block_zero_eof(struct inode *inode, loff_t f= rom, loff_t end) } } =20 + trace_ext4_block_zero_eof(inode, from, length, did_zero, zero_written); return 0; } =20 diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 589c74b9f8a3..979a88c38fff 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -31,6 +31,8 @@ #include "xattr.h" #include "acl.h" =20 +#include + static struct kmem_cache *io_end_cachep; static struct kmem_cache *io_end_vec_cachep; =20 @@ -673,6 +675,9 @@ static int ext4_iomap_wb_update_disksize(handle_t *hand= le, struct inode *inode, * never be exposed. */ new_disksize =3D is_ordered ? i_size : min(end, i_size); + trace_ext4_iomap_disksize_update(inode, end, i_size, ei->i_disksize, + new_disksize, is_ordered); + if (new_disksize > ei->i_disksize) ei->i_disksize =3D new_disksize; up_write(&ei->i_data_sem); @@ -782,6 +787,9 @@ void ext4_iomap_end_bio(struct bio *bio) * waiters. */ smp_store_release(&ei->i_ordered_len, 0); + trace_ext4_iomap_ordered_complete(inode, ioend->io_offset, + ioend->io_size, READ_ONCE(ei->i_ordered_lblk), + READ_ONCE(ei->i_ordered_len)); wake_up_all(&ei->i_ordered_wq); goto defer; } diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index ebafa06cd191..423aec6d09d1 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -3141,6 +3141,103 @@ DEFINE_SET_IOMAP_EVENT(ext4_iomap_buffered_write_be= gin); DEFINE_SET_IOMAP_EVENT(ext4_iomap_map_writeback_range); DEFINE_SET_IOMAP_EVENT(ext4_iomap_zero_begin); =20 +/* Ordered I/O tracepoints for iomap buffered I/O path */ +DECLARE_EVENT_CLASS(ext4_iomap_ordered_io, + TP_PROTO(struct inode *inode, loff_t io_offset, size_t io_size, + ext4_lblk_t i_ordered_lblk, unsigned int i_ordered_len), + TP_ARGS(inode, io_offset, io_size, i_ordered_lblk, i_ordered_len), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(u64, ino) + __field(loff_t, io_offset) + __field(size_t, io_size) + __field(ext4_lblk_t, i_ordered_lblk) + __field(unsigned int, i_ordered_len) + ), + TP_fast_assign( + __entry->dev =3D inode->i_sb->s_dev; + __entry->ino =3D inode->i_ino; + __entry->io_offset =3D io_offset; + __entry->io_size =3D io_size; + __entry->i_ordered_lblk =3D i_ordered_lblk; + __entry->i_ordered_len =3D i_ordered_len; + ), + TP_printk("dev %d:%d ino %llu io_offset %lld io_size %zu i_ordered_lblk %= u i_ordered_len %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, __entry->io_offset, __entry->io_size, + __entry->i_ordered_lblk, __entry->i_ordered_len) +); + +DEFINE_EVENT(ext4_iomap_ordered_io, ext4_iomap_ordered_submit, + TP_PROTO(struct inode *inode, loff_t io_offset, size_t io_size, + ext4_lblk_t i_ordered_lblk, unsigned int i_ordered_len), + TP_ARGS(inode, io_offset, io_size, i_ordered_lblk, i_ordered_len) +); + +DEFINE_EVENT(ext4_iomap_ordered_io, ext4_iomap_ordered_complete, + TP_PROTO(struct inode *inode, loff_t io_offset, size_t io_size, + ext4_lblk_t i_ordered_lblk, unsigned int i_ordered_len), + TP_ARGS(inode, io_offset, io_size, i_ordered_lblk, i_ordered_len) +); + + +/* i_disksize update tracepoint */ +TRACE_EVENT(ext4_iomap_disksize_update, + TP_PROTO(struct inode *inode, loff_t end, loff_t i_size, + loff_t i_disksize, loff_t new_disksize, bool is_ordered), + TP_ARGS(inode, end, i_size, i_disksize, new_disksize, is_ordered), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(u64, ino) + __field(loff_t, end) + __field(loff_t, i_size) + __field(loff_t, i_disksize) + __field(loff_t, new_disksize) + __field(bool, is_ordered) + ), + TP_fast_assign( + __entry->dev =3D inode->i_sb->s_dev; + __entry->ino =3D inode->i_ino; + __entry->end =3D end; + __entry->i_size =3D i_size; + __entry->i_disksize =3D i_disksize; + __entry->new_disksize =3D new_disksize; + __entry->is_ordered =3D is_ordered; + ), + TP_printk("dev %d:%d ino %llu end %lld i_size %lld i_disksize %lld new_di= sksize %lld is_ordered %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, __entry->end, __entry->i_size, + __entry->i_disksize, __entry->new_disksize, + __entry->is_ordered) +); + +/* Block zero EOF tracepoint */ +TRACE_EVENT(ext4_block_zero_eof, + TP_PROTO(struct inode *inode, loff_t from, loff_t length, + bool did_zero, bool zero_written), + TP_ARGS(inode, from, length, did_zero, zero_written), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(u64, ino) + __field(loff_t, from) + __field(loff_t, length) + __field(bool, did_zero) + __field(bool, zero_written) + ), + TP_fast_assign( + __entry->dev =3D inode->i_sb->s_dev; + __entry->ino =3D inode->i_ino; + __entry->from =3D from; + __entry->length =3D length; + __entry->did_zero =3D did_zero; + __entry->zero_written =3D zero_written; + ), + TP_printk("dev %d:%d ino %llu zero EOF from %lld length %lld did_zero %d = zero_written %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, __entry->from, __entry->length, + __entry->did_zero, __entry->zero_written) +); + #endif /* _TRACE_EXT4_H */ =20 /* This part must be outside protection */ --=20 2.52.0