From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5B0925A640; Tue, 3 Feb 2026 06:30:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100220; cv=none; b=DB7fbaclG9+UQAD5A6KnZq6zYb3Vlxjy5tVXNYbBVpJdfQnEhpvLrPSJ/e9Th9eaJaJRzaQoqXGseswDzQbaaZUEmsOjJZLVC6Rju366+798tkFwGwFK9MISZ+rOEgzY3RWPe9XURxz5Ii5ME3b9VF9oU5jTywR5+m/1OqT3IcE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100220; c=relaxed/simple; bh=/NaDFenwVQi21L1j17CQX/ddh2aX70Zgs+yP1ob/gsc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OVoHiDQwC4nkMPcJIaYDdx5ucUp8v1KelZKcD8TgoMjd6Fnw060UQoHqXziyAtjgmYm7cJozFz88TrijSkItWok9K2ijXq53HymalH9IB0JbI+Yju09I+9N8t8s4KY3DPRiVKuuKECcwXmC9C/usimPXtF1q7ZkWZZPlQRoTQNs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4f4trJ39kMzKHMYY; Tue, 3 Feb 2026 14:29:52 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id BB65A40570; Tue, 3 Feb 2026 14:30:13 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S5; Tue, 03 Feb 2026 14:30:13 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 01/22] ext4: make ext4_block_zero_page_range() pass out did_zero Date: Tue, 3 Feb 2026 14:25:01 +0800 Message-ID: <20260203062523.3869120-2-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S5 X-Coremail-Antispam: 1UD129KBjvJXoWxCFW7GrW7Gw4DXr4kXrWDArb_yoW5tr1Upr y5tw45ur47u34q9F4xWF12qr1Skwn3GFW8W343G3s0v34IqF1xtF95K3ZYvF4jg3y7Xay0 qF4Yy3y2gr1UJrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHYb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YV CY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF04k2 0xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r 1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2Ij 64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr 0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF 0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x07jnWrAUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Modify ext4_block_zero_page_range() to pass out the did_zero parameter. This parameter is set to true once a partial block has been zeroed out. This change prepares for moving ordered data handling out of __ext4_block_zero_page_range(), which is being adapted for the conversion of the block zero range to the iomap infrastructure. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index da96db5f2345..759a2a031a9d 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4030,7 +4030,8 @@ void ext4_set_aops(struct inode *inode) * racing writeback can come later and flush the stale pagecache to disk. */ static int __ext4_block_zero_page_range(handle_t *handle, - struct address_space *mapping, loff_t from, loff_t length) + struct address_space *mapping, loff_t from, loff_t length, + bool *did_zero) { unsigned int offset, blocksize, pos; ext4_lblk_t iblock; @@ -4118,6 +4119,8 @@ static int __ext4_block_zero_page_range(handle_t *han= dle, err =3D ext4_jbd2_inode_add_write(handle, inode, from, length); } + if (!err && did_zero) + *did_zero =3D true; =20 unlock: folio_unlock(folio); @@ -4133,7 +4136,8 @@ static int __ext4_block_zero_page_range(handle_t *han= dle, * that corresponds to 'from' */ static int ext4_block_zero_page_range(handle_t *handle, - struct address_space *mapping, loff_t from, loff_t length) + struct address_space *mapping, loff_t from, loff_t length, + bool *did_zero) { struct inode *inode =3D mapping->host; unsigned blocksize =3D inode->i_sb->s_blocksize; @@ -4147,10 +4151,11 @@ static int ext4_block_zero_page_range(handle_t *han= dle, length =3D max; =20 if (IS_DAX(inode)) { - return dax_zero_range(inode, from, length, NULL, + return dax_zero_range(inode, from, length, did_zero, &ext4_iomap_ops); } - return __ext4_block_zero_page_range(handle, mapping, from, length); + return __ext4_block_zero_page_range(handle, mapping, from, length, + did_zero); } =20 /* @@ -4173,7 +4178,7 @@ static int ext4_block_truncate_page(handle_t *handle, blocksize =3D i_blocksize(inode); length =3D blocksize - (from & (blocksize - 1)); =20 - return ext4_block_zero_page_range(handle, mapping, from, length); + return ext4_block_zero_page_range(handle, mapping, from, length, NULL); } =20 int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, @@ -4196,13 +4201,13 @@ int ext4_zero_partial_blocks(handle_t *handle, stru= ct inode *inode, if (start =3D=3D end && (partial_start || (partial_end !=3D sb->s_blocksize - 1))) { err =3D ext4_block_zero_page_range(handle, mapping, - lstart, length); + lstart, length, NULL); return err; } /* Handle partial zero out on the start of the range */ if (partial_start) { - err =3D ext4_block_zero_page_range(handle, mapping, - lstart, sb->s_blocksize); + err =3D ext4_block_zero_page_range(handle, mapping, lstart, + sb->s_blocksize, NULL); if (err) return err; } @@ -4210,7 +4215,7 @@ int ext4_zero_partial_blocks(handle_t *handle, struct= inode *inode, if (partial_end !=3D sb->s_blocksize - 1) err =3D ext4_block_zero_page_range(handle, mapping, byte_end - partial_end, - partial_end + 1); + partial_end + 1, NULL); return err; } =20 --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AA47274B59; Tue, 3 Feb 2026 06:30:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100218; cv=none; b=rIE+ORotirKDXm5R7BR1XT7CvvRtbGWWnenv8f0y8zzMewjqg7zkiEvhIQMh2dIeF4ahX0PvDM2UUKpMaZ+hboBkLDazi08PLsXuIp6/owXCf9oOxRs9XuHcOYi6UZJRIwabUhjEN5WcKbIUEh6nkKCDkA+zWu51qYnu8DVhXkg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100218; c=relaxed/simple; bh=CMTE+Cf76Gzeg0G+tthiDJ1aQ5q8wvGQclGet9gJkbg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZCnXttZfzvoDzfghq1R/vk07Egfo0tTfidRwydhcY3LXH04/dj3RRyx5d0QSGYUeLHMVTRNqUjr1eMpM58hz796Dy83rPepky1nzWemm4BXNOhxSMa8iHrsdaMkGOzukahg0tiq3sYmTmOSZlp6I8XljL6s8s9c8YJVM+T5XK0w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqp6QR3zYQtvJ; Tue, 3 Feb 2026 14:29:26 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id D07B84058F; Tue, 3 Feb 2026 14:30:13 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S6; Tue, 03 Feb 2026 14:30:13 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 02/22] ext4: make ext4_block_truncate_page() return zeroed length Date: Tue, 3 Feb 2026 14:25:02 +0800 Message-ID: <20260203062523.3869120-3-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S6 X-Coremail-Antispam: 1UD129KBjvJXoW7WF18Cr1Dtw4kJF18KFyrCrg_yoW8AFWUp3 45GrW5Wr47u34q9an7uFs3Xr1a93WfGFW8Way3Gr98u34fXF1ftF90g3ZYvF4jg3yxXayj qF45tFW7uw17ArJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHYb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUXw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YV CY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF04k2 0xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r 1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2Ij 64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr 0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF 0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x07UCiihUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Modify ext4_block_truncate_page() to return the zeroed length. This is prepared for the move out of ordered data handling in __ext4_block_zero_page_range(), which is prepared for the conversion of block zero range to the iomap infrastructure. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 759a2a031a9d..f856ea015263 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4163,6 +4163,7 @@ static int ext4_block_zero_page_range(handle_t *handl= e, * up to the end of the block which corresponds to `from'. * This required during truncate. We need to physically zero the tail end * of that block so it doesn't yield old data if the file is later grown. + * Return the zeroed length on success. */ static int ext4_block_truncate_page(handle_t *handle, struct address_space *mapping, loff_t from) @@ -4170,6 +4171,8 @@ static int ext4_block_truncate_page(handle_t *handle, unsigned length; unsigned blocksize; struct inode *inode =3D mapping->host; + bool did_zero =3D false; + int err; =20 /* If we are processing an encrypted inode during orphan list handling */ if (IS_ENCRYPTED(inode) && !fscrypt_has_encryption_key(inode)) @@ -4178,7 +4181,12 @@ static int ext4_block_truncate_page(handle_t *handle, blocksize =3D i_blocksize(inode); length =3D blocksize - (from & (blocksize - 1)); =20 - return ext4_block_zero_page_range(handle, mapping, from, length, NULL); + err =3D ext4_block_zero_page_range(handle, mapping, from, length, + &did_zero); + if (err) + return err; + + return did_zero ? length : 0; } =20 int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 180DE29ACCD; Tue, 3 Feb 2026 06:30:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100218; cv=none; b=oDvGFJQArxTdIfknMSzN203ywc5FjZ4YGw+vOCw54gzuWIf7Tdbm4q8BlgrW0t+i5C2DsQb/GGKdKByWiI8jDUKBLeZ9Ce5X2ygLbaACSJ/VpkPDkYh9XISDZrI3YufdoOQksOotUNgVoxlz+mdMNkMAa3iW/PgGr5ofDkuuQ4M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100218; c=relaxed/simple; bh=AgnJHZvpuhGhhEC/wxrYKwfc53/ALq3B5sdYyj9G4lw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FLDyMTZt6CAEs/WqdJZQt0NY5qOYJbDIi/meGLf1RExJipc4slWa2TPTN8GaZOuiMoWyuxGKs3w/t3dhtzOu0CZ+U5XkHTQob2DXUcjcczWZM3Ci4wwSG8BkmP3pPpwqrxOgp3ddcZ1wzDAV0GNdhwy8p+1Mqt1lqkJHJOnKeck= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4f4trJ4VhQzKHMZj; Tue, 3 Feb 2026 14:29:52 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id E8C194056B; Tue, 3 Feb 2026 14:30:13 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S7; Tue, 03 Feb 2026 14:30:13 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 03/22] ext4: only order data when partially block truncating down Date: Tue, 3 Feb 2026 14:25:03 +0800 Message-ID: <20260203062523.3869120-4-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S7 X-Coremail-Antispam: 1UD129KBjvJXoWxWFyUZF1UtF13Wr45Aw4kXrb_yoW5tw4rpF W3Kw4xJrn7G34Du3WS93W7Xr1Yk3WrCF48KFyxWw4kZ3s8Xry2yF15KFy0kay7trW3G3Wj vFWUtry7u3ZrAaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHFb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUWw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YV CY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF04k2 0xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r 1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2Ij 64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Cr 0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI 42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUFzVbDUUUU Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Currently, __ext4_block_zero_page_range() is called in the following four cases to zero out the data in partial blocks: 1. Truncate down. 2. Truncate up. 3. Perform block allocation (e.g., fallocate) or append writes across a range extending beyond the end of the file (EOF). 4. Partial block punch hole. If the default ordered data mode is used, __ext4_block_zero_page_range() will write back the zeroed data to the disk through the order mode after zeroing out. Among the cases 1,2 and 3 described above, only case 1 actually requires this ordered write. Assuming no one intentionally bypasses the file system to write directly to the disk. When performing a truncate down operation, ensuring that the data beyond the EOF is zeroed out before updating i_disksize is sufficient to prevent old data from being exposed when the file is later extended. In other words, as long as the on-disk data in case 1 can be properly zeroed out, only the data in memory needs to be zeroed out in cases 2 and 3, without requiring ordered data. Case 4 does not require ordered data because the entire punch hole operation does not provide atomicity guarantees. Therefore, it's safe to move the ordered data operation from __ext4_block_zero_page_range() to ext4_truncate(). It should be noted that after this change, we can only determine whether to perform ordered data operations based on whether the target block has been zeroed, rather than on the state of the buffer head. Consequently, unnecessary ordered data operations may occur when truncating an unwritten dirty block. However, this scenario is relatively rare, so the overall impact is minimal. This is prepared for the conversion to the iomap infrastructure since it doesn't use ordered data mode and requires active writeback, which reduces the complexity of the conversion. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f856ea015263..20b60abcf777 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4106,19 +4106,10 @@ static int __ext4_block_zero_page_range(handle_t *h= andle, folio_zero_range(folio, offset, length); BUFFER_TRACE(bh, "zeroed end of block"); =20 - if (ext4_should_journal_data(inode)) { + if (ext4_should_journal_data(inode)) err =3D ext4_dirty_journalled_data(handle, bh); - } else { + else mark_buffer_dirty(bh); - /* - * Only the written block requires ordered data to prevent - * exposing stale data. - */ - if (!buffer_unwritten(bh) && !buffer_delay(bh) && - ext4_should_order_data(inode)) - err =3D ext4_jbd2_inode_add_write(handle, inode, from, - length); - } if (!err && did_zero) *did_zero =3D true; =20 @@ -4578,8 +4569,23 @@ int ext4_truncate(struct inode *inode) goto out_trace; } =20 - if (inode->i_size & (inode->i_sb->s_blocksize - 1)) - ext4_block_truncate_page(handle, mapping, inode->i_size); + if (inode->i_size & (inode->i_sb->s_blocksize - 1)) { + unsigned int zero_len; + + zero_len =3D ext4_block_truncate_page(handle, mapping, + inode->i_size); + if (zero_len < 0) { + err =3D zero_len; + goto out_stop; + } + if (zero_len && !IS_DAX(inode) && + ext4_should_order_data(inode)) { + err =3D ext4_jbd2_inode_add_write(handle, inode, + inode->i_size, zero_len); + if (err) + goto out_stop; + } + } =20 /* * We add the inode to the orphan list, so that if this --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6CD723FC41; Tue, 3 Feb 2026 06:30:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100220; cv=none; b=HP0VKHxGyaW8HZCx+YRB/rrjY3gT2GvXSCX3zIUClQDJSdr07UCah0UbWKCAHwC7kW7YiyjHMpOvnAr0lh0XWePT9OyuXnpvNk1xl2rmcwBSDkMMdD40N/Lun/A1OFi/ofCEhFeeozXW4FuUGzAU6wAC08FWzUnDvCi+dEN7RTI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100220; c=relaxed/simple; bh=SDzKcbZvhkkzWPKHGZlcKBfdQ1Ey6Gz96qtbAlPSZfM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fVvxjCP6G2tZa3YNFlD+rSiK9bR4ax3hTDQSfUM/+IxhIHnWEP6G3K5hNCT7FkbUDdgGcP5ZV7rZkPnZCqO7ITZFCEjZX+L4nWY8PoBROF5Pod85QlhkI0MQdAMMvZleJ0P5/y8KSSU4OPooLTig6J79n/kUxPoDbM1avT6ZeFM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqq0KJdzYQtxj; Tue, 3 Feb 2026 14:29:27 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id F39984056F; Tue, 3 Feb 2026 14:30:13 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S8; Tue, 03 Feb 2026 14:30:13 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 04/22] ext4: factor out journalled block zeroing range Date: Tue, 3 Feb 2026 14:25:04 +0800 Message-ID: <20260203062523.3869120-5-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S8 X-Coremail-Antispam: 1UD129KBjvJXoWxXw15AF43Aw45AFyfuF1ftFb_yoWrWF4fpr y5K34DurW7ur9FgF4Sq3ZFqr1a934rWrW8WFyxGr93Za4YqF17KFyUK3WFqF45Kr47Ga40 qF4Yy347u3WUJ3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHab4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8 JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUFPETDU UUU Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Refactor __ext4_block_zero_page_range() by separating the block zeroing operations for ordered data mode and journal data mode into two distinct functions. Additionally, extract a common helper, ext4_block_get_zero_range(), to identify the buffer that requires zeroing. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 84 ++++++++++++++++++++++++++++++++++++------------- 1 file changed, 63 insertions(+), 21 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 20b60abcf777..7990ad566e10 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4029,13 +4029,12 @@ void ext4_set_aops(struct inode *inode) * ext4_punch_hole, etc) which needs to be properly zeroed out. Otherwise a * racing writeback can come later and flush the stale pagecache to disk. */ -static int __ext4_block_zero_page_range(handle_t *handle, - struct address_space *mapping, loff_t from, loff_t length, - bool *did_zero) +static struct buffer_head *ext4_block_get_zero_range(struct inode *inode, + loff_t from, loff_t length) { unsigned int offset, blocksize, pos; ext4_lblk_t iblock; - struct inode *inode =3D mapping->host; + struct address_space *mapping =3D inode->i_mapping; struct buffer_head *bh; struct folio *folio; int err =3D 0; @@ -4044,7 +4043,7 @@ static int __ext4_block_zero_page_range(handle_t *han= dle, FGP_LOCK | FGP_ACCESSED | FGP_CREAT, mapping_gfp_constraint(mapping, ~__GFP_FS)); if (IS_ERR(folio)) - return PTR_ERR(folio); + return ERR_CAST(folio); =20 blocksize =3D inode->i_sb->s_blocksize; =20 @@ -4096,24 +4095,65 @@ static int __ext4_block_zero_page_range(handle_t *h= andle, } } } - if (ext4_should_journal_data(inode)) { - BUFFER_TRACE(bh, "get write access"); - err =3D ext4_journal_get_write_access(handle, inode->i_sb, bh, - EXT4_JTR_NONE); - if (err) - goto unlock; - } - folio_zero_range(folio, offset, length); + return bh; + +unlock: + folio_unlock(folio); + folio_put(folio); + return err ? ERR_PTR(err) : NULL; +} + +static int ext4_block_zero_range(struct inode *inode, loff_t from, + loff_t length, bool *did_zero) +{ + struct buffer_head *bh; + struct folio *folio; + + bh =3D ext4_block_get_zero_range(inode, from, length); + if (IS_ERR_OR_NULL(bh)) + return PTR_ERR_OR_ZERO(bh); + + folio =3D bh->b_folio; + folio_zero_range(folio, offset_in_folio(folio, from), length); BUFFER_TRACE(bh, "zeroed end of block"); =20 - if (ext4_should_journal_data(inode)) - err =3D ext4_dirty_journalled_data(handle, bh); - else - mark_buffer_dirty(bh); - if (!err && did_zero) + mark_buffer_dirty(bh); + if (did_zero) *did_zero =3D true; =20 -unlock: + folio_unlock(folio); + folio_put(folio); + return 0; +} + +static int ext4_journalled_block_zero_range(handle_t *handle, + struct inode *inode, loff_t from, loff_t length, bool *did_zero) +{ + struct buffer_head *bh; + struct folio *folio; + int err; + + bh =3D ext4_block_get_zero_range(inode, from, length); + if (IS_ERR_OR_NULL(bh)) + return PTR_ERR_OR_ZERO(bh); + folio =3D bh->b_folio; + + BUFFER_TRACE(bh, "get write access"); + err =3D ext4_journal_get_write_access(handle, inode->i_sb, bh, + EXT4_JTR_NONE); + if (err) + goto out; + + folio_zero_range(folio, offset_in_folio(folio, from), length); + BUFFER_TRACE(bh, "zeroed end of block"); + + err =3D ext4_dirty_journalled_data(handle, bh); + if (err) + goto out; + + if (did_zero) + *did_zero =3D true; +out: folio_unlock(folio); folio_put(folio); return err; @@ -4144,9 +4184,11 @@ static int ext4_block_zero_page_range(handle_t *hand= le, if (IS_DAX(inode)) { return dax_zero_range(inode, from, length, did_zero, &ext4_iomap_ops); + } else if (ext4_should_journal_data(inode)) { + return ext4_journalled_block_zero_range(handle, inode, from, + length, did_zero); } - return __ext4_block_zero_page_range(handle, mapping, from, length, - did_zero); + return ext4_block_zero_range(inode, from, length, did_zero); } =20 /* --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AAB7280309; Tue, 3 Feb 2026 06:30:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100220; cv=none; b=N2MJ1UjmiuJVWN3doM7zT3uxF+hnXNKOsdxusr4/XM2Rr6y8hHz2gu8k+1CHDK+n7VbX0KUccwHQNCU5scKPUYTRiUA+G95bLPMm96jCokdHYaIn6cVmDCw8zPIVAH0oEnHp5FvlcoexuRs94QSznhpCew/WTfO6hL/xYMiIoLQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100220; c=relaxed/simple; bh=0H3W+6a71flSLzpqq6UEC3TaLlEPPuhZoaLQAw8asCU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hZtPGrmHxpGGRV68+kBiEEjxDBzPG5UuBXsZZhUTWQkx5jibj9w6WS9JI0h9+3UixhQJvgqg5R8CqUTirco4NfU6Xl6RefpVBQXMLzvHikmvbETP0RmWX1R+iOC+poYcqJhZsFkV1LDmrTzKl8gCSFEV4Wjn7NKT6QEhvAEr12M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqq14jkzYQtxx; Tue, 3 Feb 2026 14:29:27 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 18EA440577; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S9; Tue, 03 Feb 2026 14:30:13 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 05/22] ext4: stop passing handle to ext4_journalled_block_zero_range() Date: Tue, 3 Feb 2026 14:25:05 +0800 Message-ID: <20260203062523.3869120-6-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S9 X-Coremail-Antispam: 1UD129KBjvJXoW3GF15WFW7tw48WFW3WryxGrg_yoWfXr1Dpr yUAw1rCr43uryq9F4xKFsFvr4a93Z7GFW8Gry7Gr9YvasrXw1xKF1DK3WrtFWjqrW7Wa10 vF4Yy34jg3WUJ3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHab4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8 JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUFPETDU UUU Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" When zeroing partial blocks, only the journal data mode requires an active journal handle. Therefore, stop passing the handle to ext4_zero_partial_blocks() and related functions, and make ext4_journalled_block_zero_range() start a handle independently. Currently, this change has no practical impact because all calls occur within the context of an active handle. This change prepares for moving ext4_block_truncate_page() out of an active handle, which is a prerequisite for converting block zero range operations to the iomap infrastructure because it requires active writeback after truncate down. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 4 ++-- fs/ext4/extents.c | 6 +++--- fs/ext4/inode.c | 54 +++++++++++++++++++++++++---------------------- 3 files changed, 34 insertions(+), 30 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index e0ed273e2e8a..19d0b4917aea 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3103,8 +3103,8 @@ extern int ext4_chunk_trans_blocks(struct inode *, in= t nrblocks); extern int ext4_chunk_trans_extent(struct inode *inode, int nrblocks); extern int ext4_meta_trans_blocks(struct inode *inode, int lblocks, int pextents); -extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, - loff_t lstart, loff_t lend); +extern int ext4_zero_partial_blocks(struct inode *inode, + loff_t lstart, loff_t lend); extern vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf); extern qsize_t *ext4_get_reserved_space(struct inode *inode); extern int ext4_get_projid(struct inode *inode, kprojid_t *projid); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 3630b27e4fd7..953bf8945bda 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4627,8 +4627,8 @@ static int ext4_alloc_file_blocks(struct file *file, = ext4_lblk_t offset, inode_get_ctime(inode)); if (epos > old_size) { pagecache_isize_extended(inode, old_size, epos); - ext4_zero_partial_blocks(handle, inode, - old_size, epos - old_size); + ext4_zero_partial_blocks(inode, old_size, + epos - old_size); } } ret2 =3D ext4_mark_inode_dirty(handle, inode); @@ -4746,7 +4746,7 @@ static long ext4_zero_range(struct file *file, loff_t= offset, } =20 /* Zero out partial block at the edges of the range */ - ret =3D ext4_zero_partial_blocks(handle, inode, offset, len); + ret =3D ext4_zero_partial_blocks(inode, offset, len); if (ret) goto out_handle; =20 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 7990ad566e10..c05b1c0a1b45 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1458,7 +1458,7 @@ static int ext4_write_end(const struct kiocb *iocb, =20 if (old_size < pos && !verity) { pagecache_isize_extended(inode, old_size, pos); - ext4_zero_partial_blocks(handle, inode, old_size, pos - old_size); + ext4_zero_partial_blocks(inode, old_size, pos - old_size); } /* * Don't mark the inode dirty under folio lock. First, it unnecessarily @@ -1576,7 +1576,7 @@ static int ext4_journalled_write_end(const struct kio= cb *iocb, =20 if (old_size < pos && !verity) { pagecache_isize_extended(inode, old_size, pos); - ext4_zero_partial_blocks(handle, inode, old_size, pos - old_size); + ext4_zero_partial_blocks(inode, old_size, pos - old_size); } =20 if (size_changed) { @@ -3252,7 +3252,7 @@ static int ext4_da_do_write_end(struct address_space = *mapping, if (IS_ERR(handle)) return PTR_ERR(handle); if (zero_len) - ext4_zero_partial_blocks(handle, inode, old_size, zero_len); + ext4_zero_partial_blocks(inode, old_size, zero_len); ext4_mark_inode_dirty(handle, inode); ext4_journal_stop(handle); =20 @@ -4126,16 +4126,23 @@ static int ext4_block_zero_range(struct inode *inod= e, loff_t from, return 0; } =20 -static int ext4_journalled_block_zero_range(handle_t *handle, - struct inode *inode, loff_t from, loff_t length, bool *did_zero) +static int ext4_journalled_block_zero_range(struct inode *inode, loff_t fr= om, + loff_t length, bool *did_zero) { struct buffer_head *bh; struct folio *folio; + handle_t *handle; int err; =20 + handle =3D ext4_journal_start(inode, EXT4_HT_MISC, 1); + if (IS_ERR(handle)) + return PTR_ERR(handle); + bh =3D ext4_block_get_zero_range(inode, from, length); - if (IS_ERR_OR_NULL(bh)) - return PTR_ERR_OR_ZERO(bh); + if (IS_ERR_OR_NULL(bh)) { + err =3D PTR_ERR_OR_ZERO(bh); + goto out_handle; + } folio =3D bh->b_folio; =20 BUFFER_TRACE(bh, "get write access"); @@ -4156,6 +4163,8 @@ static int ext4_journalled_block_zero_range(handle_t = *handle, out: folio_unlock(folio); folio_put(folio); +out_handle: + ext4_journal_stop(handle); return err; } =20 @@ -4166,9 +4175,9 @@ static int ext4_journalled_block_zero_range(handle_t = *handle, * the end of the block it will be shortened to end of the block * that corresponds to 'from' */ -static int ext4_block_zero_page_range(handle_t *handle, - struct address_space *mapping, loff_t from, loff_t length, - bool *did_zero) +static int ext4_block_zero_page_range(struct address_space *mapping, + loff_t from, loff_t length, + bool *did_zero) { struct inode *inode =3D mapping->host; unsigned blocksize =3D inode->i_sb->s_blocksize; @@ -4185,7 +4194,7 @@ static int ext4_block_zero_page_range(handle_t *handl= e, return dax_zero_range(inode, from, length, did_zero, &ext4_iomap_ops); } else if (ext4_should_journal_data(inode)) { - return ext4_journalled_block_zero_range(handle, inode, from, + return ext4_journalled_block_zero_range(inode, from, length, did_zero); } return ext4_block_zero_range(inode, from, length, did_zero); @@ -4198,8 +4207,7 @@ static int ext4_block_zero_page_range(handle_t *handl= e, * of that block so it doesn't yield old data if the file is later grown. * Return the zeroed length on success. */ -static int ext4_block_truncate_page(handle_t *handle, - struct address_space *mapping, loff_t from) +static int ext4_block_truncate_page(struct address_space *mapping, loff_t = from) { unsigned length; unsigned blocksize; @@ -4214,16 +4222,14 @@ static int ext4_block_truncate_page(handle_t *handl= e, blocksize =3D i_blocksize(inode); length =3D blocksize - (from & (blocksize - 1)); =20 - err =3D ext4_block_zero_page_range(handle, mapping, from, length, - &did_zero); + err =3D ext4_block_zero_page_range(mapping, from, length, &did_zero); if (err) return err; =20 return did_zero ? length : 0; } =20 -int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, - loff_t lstart, loff_t length) +int ext4_zero_partial_blocks(struct inode *inode, loff_t lstart, loff_t le= ngth) { struct super_block *sb =3D inode->i_sb; struct address_space *mapping =3D inode->i_mapping; @@ -4241,20 +4247,19 @@ int ext4_zero_partial_blocks(handle_t *handle, stru= ct inode *inode, /* Handle partial zero within the single block */ if (start =3D=3D end && (partial_start || (partial_end !=3D sb->s_blocksize - 1))) { - err =3D ext4_block_zero_page_range(handle, mapping, - lstart, length, NULL); + err =3D ext4_block_zero_page_range(mapping, lstart, length, NULL); return err; } /* Handle partial zero out on the start of the range */ if (partial_start) { - err =3D ext4_block_zero_page_range(handle, mapping, lstart, + err =3D ext4_block_zero_page_range(mapping, lstart, sb->s_blocksize, NULL); if (err) return err; } /* Handle partial zero out on the end of the range */ if (partial_end !=3D sb->s_blocksize - 1) - err =3D ext4_block_zero_page_range(handle, mapping, + err =3D ext4_block_zero_page_range(mapping, byte_end - partial_end, partial_end + 1, NULL); return err; @@ -4462,7 +4467,7 @@ int ext4_punch_hole(struct file *file, loff_t offset,= loff_t length) return ret; } =20 - ret =3D ext4_zero_partial_blocks(handle, inode, offset, length); + ret =3D ext4_zero_partial_blocks(inode, offset, length); if (ret) goto out_handle; =20 @@ -4614,8 +4619,7 @@ int ext4_truncate(struct inode *inode) if (inode->i_size & (inode->i_sb->s_blocksize - 1)) { unsigned int zero_len; =20 - zero_len =3D ext4_block_truncate_page(handle, mapping, - inode->i_size); + zero_len =3D ext4_block_truncate_page(mapping, inode->i_size); if (zero_len < 0) { err =3D zero_len; goto out_stop; @@ -5990,7 +5994,7 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dent= ry *dentry, inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); if (oldsize & (inode->i_sb->s_blocksize - 1)) - ext4_block_truncate_page(handle, + ext4_block_truncate_page( inode->i_mapping, oldsize); } =20 --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 326312561A2; Tue, 3 Feb 2026 06:30:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100218; cv=none; b=gVLd0O51o+PE+SAqjwAws6IsZHn0ybKndFORL6A1F01CyuwooFsd+j9+VmwO90uQvZMcdhitXtfE9qwlvsBoPQGvCU9wxBACELGyXF/jNm4RwiIQFaH5dWVX7x+xPEkXlM7rG/JYuFzE35g8Qaq7OblLCz/y7kskQ8jrpd5Wziw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100218; c=relaxed/simple; bh=S2JHZ9z2ebAASrluuxrESw3zqucOOxCysAALmIGrhV0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QM1Au2MTynbCbuEN7DRvksPA6TJJYtPxXTw7yf/a/ZJtVLkwBveDCH1vLeWxlPqY8wZNVw6jWEtkkBbjkg/bnQImhXU3SLnp2fPTH54e8AtzfwdOwciDe7FhkDEt2pQlL/YjhA2Dmh/YXYelFS2S/CUX8+YUi9wLtoCOWybycKo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4f4trJ5zWzzKHMbB; Tue, 3 Feb 2026 14:29:52 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 2A8304058F; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S10; Tue, 03 Feb 2026 14:30:13 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 06/22] ext4: don't zero partial block under an active handle when truncating down Date: Tue, 3 Feb 2026 14:25:06 +0800 Message-ID: <20260203062523.3869120-7-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S10 X-Coremail-Antispam: 1UD129KBjvJXoWxJFyUXw47Ar1UGw43JFWDJwb_yoW5Gr1xpF 9xG3y5Jr48W34q9ayIqFsrZF15K3WfCayjgFWxGrs5tr98X34FvF13KrWIkFWYyrZ5W3yj qF1UAryUWF1DC3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_ Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" When truncating down, move the call to ext4_block_truncate_page() before starting the handle. This change has no effect in non-journal data mode because it doesn't require an active handle. However, in journal data mode, it may cause the zeroing of partial blocks and the release of subsequent full blocks to be distributed across different transactions. This is safe as well because the transaction that zeroes the blocks will always be committed first, and the entire truncate operation does not require atomicity guarantee. This change prepares for converting the block zero range to the iomap infrastructure, which does not use ordered data mode and requires active writeback to prevent exposing stale data. The writeback must be completed before the transaction to remove the orphan is committed, and it cannot be performed within an active handle. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c05b1c0a1b45..e67c750866a5 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4570,7 +4570,7 @@ int ext4_inode_attach_jinode(struct inode *inode) int ext4_truncate(struct inode *inode) { struct ext4_inode_info *ei =3D EXT4_I(inode); - unsigned int credits; + unsigned int credits, zero_len =3D 0; int err =3D 0, err2; handle_t *handle; struct address_space *mapping =3D inode->i_mapping; @@ -4603,6 +4603,12 @@ int ext4_truncate(struct inode *inode) err =3D ext4_inode_attach_jinode(inode); if (err) goto out_trace; + + zero_len =3D ext4_block_truncate_page(mapping, inode->i_size); + if (zero_len < 0) { + err =3D zero_len; + goto out_trace; + } } =20 if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) @@ -4616,21 +4622,12 @@ int ext4_truncate(struct inode *inode) goto out_trace; } =20 - if (inode->i_size & (inode->i_sb->s_blocksize - 1)) { - unsigned int zero_len; - - zero_len =3D ext4_block_truncate_page(mapping, inode->i_size); - if (zero_len < 0) { - err =3D zero_len; + /* Ordered zeroed data to prevent exposure of stale data. */ + if (zero_len && !IS_DAX(inode) && ext4_should_order_data(inode)) { + err =3D ext4_jbd2_inode_add_write(handle, inode, inode->i_size, + zero_len); + if (err) goto out_stop; - } - if (zero_len && !IS_DAX(inode) && - ext4_should_order_data(inode)) { - err =3D ext4_jbd2_inode_add_write(handle, inode, - inode->i_size, zero_len); - if (err) - goto out_stop; - } } =20 /* --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E3BD2BDC10; Tue, 3 Feb 2026 06:30:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100220; cv=none; b=DDpon+31ofT2dUsVc/rQz5iriBp9h7HaVptWRooQEvh0gKP8uiq7dGIt7Y2SIhg/TItAeShGm0vObzrz4c+ZRVy1l3DVcx6/4B8Z5Rvw5vQanSTyMVUMWClwXKuPwLUrsQrkX5NSYEipyFvm9VjeSGnSuPYiyW+EdmvSCGv8pzY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100220; c=relaxed/simple; bh=rwwIn6WDq2HBFnr0u9BZcbT5Rj0V+aWP3Jpzj29fqDU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mY66X6bNQayjBHtJCrwF4r3znlzMGUzMcnX21SzZcGPX2KCdf7dg7ZRD5Q49Th+LpPF5ycXwCxlabsCuOJpqD6CnJd3ML8H3gzSqPu+eph+8mxn5DzWxzwb+sk2nnFy8K3ljxAH4yS76nMkzxAJTx5BkECBtr3sLmgc0vItHm/k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4f4trJ6ffvzKHMb2; Tue, 3 Feb 2026 14:29:52 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 3F23240573; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S11; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 07/22] ext4: move ext4_block_zero_page_range() out of an active handle Date: Tue, 3 Feb 2026 14:25:07 +0800 Message-ID: <20260203062523.3869120-8-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S11 X-Coremail-Antispam: 1UD129KBjvJXoW3Jry8Cw15ury5trW8Wr18Grg_yoW7ZF4Upr W3J3WfKr48ua4qgr4Ikr4DZr4Yk3W8Kr4UCrWIkr9YqasrZw1ftF1Yya40qFWUtrW8W3Wj vF4jkr17G3WUC3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_ Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" In the cases of truncating up and beyond EOF with fallocate, since truncating down, buffered writeback, and DIO write operations have guaranteed that the on-disk data has been zeroed, only the data in memory needs to be zeroed out. Therefore, it is safe to move the call to ext4_block_zero_page_range() outside the active handle. In the case of a partial zero range and a partial punch hole, the entire operation does not require atomicity guarantees. Therefore, it is also safe to move the ext4_block_zero_page_range() call outside the active handle. This change prepares for converting the block zero range to the iomap infrastructure. The folio lock will be held during the zeroing process. Since the iomap iteration process always holds the folio lock before starting a new handle, we need to ensure that the folio lock is not held while an active handle is in use; otherwise, a potential deadlock may occur. Signed-off-by: Zhang Yi --- fs/ext4/extents.c | 31 ++++++++++++------------------- fs/ext4/inode.c | 33 +++++++++++++++++---------------- 2 files changed, 29 insertions(+), 35 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 953bf8945bda..afe92e58ca8d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4625,11 +4625,6 @@ static int ext4_alloc_file_blocks(struct file *file,= ext4_lblk_t offset, if (ext4_update_inode_size(inode, epos) & 0x1) inode_set_mtime_to_ts(inode, inode_get_ctime(inode)); - if (epos > old_size) { - pagecache_isize_extended(inode, old_size, epos); - ext4_zero_partial_blocks(inode, old_size, - epos - old_size); - } } ret2 =3D ext4_mark_inode_dirty(handle, inode); ext4_update_inode_fsync_trans(handle, inode, 1); @@ -4638,6 +4633,11 @@ static int ext4_alloc_file_blocks(struct file *file,= ext4_lblk_t offset, if (unlikely(ret2)) break; =20 + if (new_size && epos > old_size) { + pagecache_isize_extended(inode, old_size, epos); + ext4_zero_partial_blocks(inode, old_size, + epos - old_size); + } if (alloc_zero && (map.m_flags & (EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN))) { ret2 =3D ext4_issue_zeroout(inode, map.m_lblk, map.m_pblk, @@ -4673,7 +4673,7 @@ static long ext4_zero_range(struct file *file, loff_t= offset, ext4_lblk_t start_lblk, end_lblk; unsigned int blocksize =3D i_blocksize(inode); unsigned int blkbits =3D inode->i_blkbits; - int ret, flags, credits; + int ret, flags; =20 trace_ext4_zero_range(inode, offset, len, mode); WARN_ON_ONCE(!inode_is_locked(inode)); @@ -4731,25 +4731,18 @@ static long ext4_zero_range(struct file *file, loff= _t offset, if (IS_ALIGNED(offset | end, blocksize)) return ret; =20 - /* - * In worst case we have to writeout two nonadjacent unwritten - * blocks and update the inode - */ - credits =3D (2 * ext4_ext_index_trans_blocks(inode, 2)) + 1; - if (ext4_should_journal_data(inode)) - credits +=3D 2; - handle =3D ext4_journal_start(inode, EXT4_HT_MISC, credits); + /* Zero out partial block at the edges of the range */ + ret =3D ext4_zero_partial_blocks(inode, offset, len); + if (ret) + return ret; + + handle =3D ext4_journal_start(inode, EXT4_HT_MISC, 1); if (IS_ERR(handle)) { ret =3D PTR_ERR(handle); ext4_std_error(inode->i_sb, ret); return ret; } =20 - /* Zero out partial block at the edges of the range */ - ret =3D ext4_zero_partial_blocks(inode, offset, len); - if (ret) - goto out_handle; - if (new_size) ext4_update_inode_size(inode, new_size); ret =3D ext4_mark_inode_dirty(handle, inode); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e67c750866a5..9c0e70256527 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4456,8 +4456,12 @@ int ext4_punch_hole(struct file *file, loff_t offset= , loff_t length) if (ret) return ret; =20 + ret =3D ext4_zero_partial_blocks(inode, offset, length); + if (ret) + return ret; + if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) - credits =3D ext4_chunk_trans_extent(inode, 2); + credits =3D ext4_chunk_trans_extent(inode, 0); else credits =3D ext4_blocks_for_truncate(inode); handle =3D ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits); @@ -4467,10 +4471,6 @@ int ext4_punch_hole(struct file *file, loff_t offset= , loff_t length) return ret; } =20 - ret =3D ext4_zero_partial_blocks(inode, offset, length); - if (ret) - goto out_handle; - /* If there are blocks to remove, do it */ start_lblk =3D EXT4_B_TO_LBLK(inode, offset); end_lblk =3D end >> inode->i_blkbits; @@ -5973,15 +5973,6 @@ int ext4_setattr(struct mnt_idmap *idmap, struct den= try *dentry, goto out_mmap_sem; } =20 - handle =3D ext4_journal_start(inode, EXT4_HT_INODE, 3); - if (IS_ERR(handle)) { - error =3D PTR_ERR(handle); - goto out_mmap_sem; - } - if (ext4_handle_valid(handle) && shrink) { - error =3D ext4_orphan_add(handle, inode); - orphan =3D 1; - } /* * Update c/mtime and tail zero the EOF folio on * truncate up. ext4_truncate() handles the shrink case @@ -5989,10 +5980,20 @@ int ext4_setattr(struct mnt_idmap *idmap, struct de= ntry *dentry, */ if (!shrink) { inode_set_mtime_to_ts(inode, - inode_set_ctime_current(inode)); + inode_set_ctime_current(inode)); if (oldsize & (inode->i_sb->s_blocksize - 1)) ext4_block_truncate_page( - inode->i_mapping, oldsize); + inode->i_mapping, oldsize); + } + + handle =3D ext4_journal_start(inode, EXT4_HT_INODE, 3); + if (IS_ERR(handle)) { + error =3D PTR_ERR(handle); + goto out_mmap_sem; + } + if (ext4_handle_valid(handle) && shrink) { + error =3D ext4_orphan_add(handle, inode); + orphan =3D 1; } =20 if (shrink) --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AFEA23C4FF; Tue, 3 Feb 2026 06:30:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100221; cv=none; b=FLAkL3zK/VWlOd8Q17amsm2C99kZSv6/n7Syg74N5r0Un7qC0fVmgHMxE/nSggqoN+Fw0KjPsOpIPINPQgHVKfJOalBM5grj1+fYFjBsvAsSWTIT07E11LzDX2tD8CPck8r3hBX3gNiWOyO1cqZcchL4tdUB0JianJlYmeek/qk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100221; c=relaxed/simple; bh=S6zZLZA6qQmW24CKxJl46apECpy26rNSYksWDEwOGMc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PTuJGMG3JZ530qsWJr73hVE3G1zde//E7SCwppEl1lJxdvbiMNDScY4i9NZq/HDcXZ8eZ0n5GDwhiYEYjmca0CgkvEvPMa60vsn0rEnwlxea1f5Ua+d3kRLKIzJqC/o35uTfvduDtqOuZCK7o1IubKiyvLz7rYLwzMvtVxvFTl0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqq36pZzYQty9; Tue, 3 Feb 2026 14:29:27 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 5972C40590; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S12; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 08/22] ext4: zero post EOF partial block before appending write Date: Tue, 3 Feb 2026 14:25:08 +0800 Message-ID: <20260203062523.3869120-9-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S12 X-Coremail-Antispam: 1UD129KBjvJXoWxGrW8WrWrtryxCFWkCr1fXrb_yoWrWr45pF ZIkF1fuw1Igr9rWrWfWFs8Z34Ykas5JrW7GFyfKrWrZFnxZw18KF12qa4YkFW5trZrXw4F qF4qgFy8G3WUC3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_ Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" In cases of appending write beyond the end of file (EOF), ext4_zero_partial_blocks() is called within ext4_*_write_end() to zero out the partial block beyond the EOF. This prevents exposing stale data that might be written through mmap. However, supporting only the regular buffered write path is insufficient. It is also necessary to support the DAX path as well as the upcoming iomap buffered write path. Therefore, move this operation to ext4_write_checks(). In addition, the zero length is limited within the EOF block to prevent ext4_zero_partial_blocks() from attempting to zero out the extra end block (although it would not do anything anyway). Signed-off-by: Zhang Yi --- fs/ext4/file.c | 20 ++++++++++++++++++++ fs/ext4/inode.c | 21 +++++++-------------- 2 files changed, 27 insertions(+), 14 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 4320ebff74f3..3ecc09f286e4 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -271,6 +271,9 @@ static ssize_t ext4_generic_write_checks(struct kiocb *= iocb, =20 static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) { + struct inode *inode =3D file_inode(iocb->ki_filp); + unsigned int blocksize =3D i_blocksize(inode); + loff_t old_size =3D i_size_read(inode); ssize_t ret, count; =20 count =3D ext4_generic_write_checks(iocb, from); @@ -280,6 +283,23 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, s= truct iov_iter *from) ret =3D file_modified(iocb->ki_filp); if (ret) return ret; + + /* + * If the position is beyond the EOF, it is necessary to zero out the + * partial block that beyond the existing EOF, as it may contains + * stale data written through mmap. + */ + if (iocb->ki_pos > old_size && (old_size & (blocksize - 1))) { + loff_t end =3D round_up(old_size, blocksize); + + if (iocb->ki_pos < end) + end =3D iocb->ki_pos; + + ret =3D ext4_zero_partial_blocks(inode, old_size, end - old_size); + if (ret) + return ret; + } + return count; } =20 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9c0e70256527..1ac93c39d21e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1456,10 +1456,9 @@ static int ext4_write_end(const struct kiocb *iocb, folio_unlock(folio); folio_put(folio); =20 - if (old_size < pos && !verity) { + if (old_size < pos && !verity) pagecache_isize_extended(inode, old_size, pos); - ext4_zero_partial_blocks(inode, old_size, pos - old_size); - } + /* * Don't mark the inode dirty under folio lock. First, it unnecessarily * makes the holding time of folio lock longer. Second, it forces lock @@ -1574,10 +1573,8 @@ static int ext4_journalled_write_end(const struct ki= ocb *iocb, folio_unlock(folio); folio_put(folio); =20 - if (old_size < pos && !verity) { + if (old_size < pos && !verity) pagecache_isize_extended(inode, old_size, pos); - ext4_zero_partial_blocks(inode, old_size, pos - old_size); - } =20 if (size_changed) { ret2 =3D ext4_mark_inode_dirty(handle, inode); @@ -3196,7 +3193,7 @@ static int ext4_da_do_write_end(struct address_space = *mapping, struct inode *inode =3D mapping->host; loff_t old_size =3D inode->i_size; bool disksize_changed =3D false; - loff_t new_i_size, zero_len =3D 0; + loff_t new_i_size; handle_t *handle; =20 if (unlikely(!folio_buffers(folio))) { @@ -3240,19 +3237,15 @@ static int ext4_da_do_write_end(struct address_spac= e *mapping, folio_unlock(folio); folio_put(folio); =20 - if (pos > old_size) { + if (pos > old_size) pagecache_isize_extended(inode, old_size, pos); - zero_len =3D pos - old_size; - } =20 - if (!disksize_changed && !zero_len) + if (!disksize_changed) return copied; =20 - handle =3D ext4_journal_start(inode, EXT4_HT_INODE, 2); + handle =3D ext4_journal_start(inode, EXT4_HT_INODE, 1); if (IS_ERR(handle)) return PTR_ERR(handle); - if (zero_len) - ext4_zero_partial_blocks(inode, old_size, zero_len); ext4_mark_inode_dirty(handle, inode); ext4_journal_stop(handle); =20 --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76ADA37F8C0; Tue, 3 Feb 2026 06:30:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100222; cv=none; b=rGYZjKZAA86OFu4DdDTZ3O3vl1Q/AeF2D+RNXyusdvRYy8Foc8lng5CitBUtKQqUo6xwgogrhIHN2nNt2JqBpmgZDCRbl12rJdKil946Pajwd9HUZQQKyrT42XsUzupfuKVqfpoDLMA1ZXPHPn/supdN6qI/xhNPRviunZ42rDI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100222; c=relaxed/simple; bh=5nYAtSBBktRwoUlJt2M9Q9IXrS4ecJTDuDJAjw2Ah80=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OmVjKcEgmpghHo15stPaq0q0uleiayB2yv1tmMPn9p7o0LEiH9ixoOUd3xWsxho/vgZ8ce2WEQL9HOZcFlmSZQshM+Rhy7Sa13yddw3oIjHSkBr8HoFvB8XgH9aP/nXd2yvHu7Z3scF3neH2v9G0nEAGlb+2192pdXu7iBHzvAA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqq3k25zYQtyH; Tue, 3 Feb 2026 14:29:27 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 721DF4058F; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S13; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 09/22] ext4: add a new iomap aops for regular file's buffered IO path Date: Tue, 3 Feb 2026 14:25:09 +0800 Message-ID: <20260203062523.3869120-10-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S13 X-Coremail-Antispam: 1UD129KBjvJXoWxXF1fWry3ZF4rKFW8KFyxAFb_yoW5tw48pF 98Kas8GF18XF9rua1SqFZrZF4Yya4fJw4UKFW3W3WavFn8J3y7KFW0k3WjyFy5J3ykAry2 qr4j9ry7WF17CrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_ Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Starts support for iomap in the buffered I/O path for regular files on ext4. - Introduces a new iomap address space operation, ext4_iomap_aops. - Adds an inode state flag, EXT4_STATE_BUFFERED_IOMAP, which indicates that the inode uses the iomap path instead of the original buffer_head path for buffered I/O. Most callbacks of ext4_iomap_aops can directly utilize generic iomap implementations, the remaining callbacks: read_folio(), readahead(), and writepages() will be implemented in later patches. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 7 +++++++ fs/ext4/inode.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 19d0b4917aea..4930446cfec1 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1978,6 +1978,7 @@ enum { EXT4_STATE_FC_COMMITTING, /* Fast commit ongoing */ EXT4_STATE_FC_FLUSHING_DATA, /* Fast commit flushing data */ EXT4_STATE_ORPHAN_FILE, /* Inode orphaned in orphan file */ + EXT4_STATE_BUFFERED_IOMAP, /* Inode use iomap for buffered IO */ }; =20 #define EXT4_INODE_BIT_FNS(name, field, offset) \ @@ -2046,6 +2047,12 @@ static inline bool ext4_inode_orphan_tracked(struct = inode *inode) !list_empty(&EXT4_I(inode)->i_orphan); } =20 +/* Whether the inode pass through the iomap infrastructure for buffered I/= O */ +static inline bool ext4_inode_buffered_iomap(struct inode *inode) +{ + return ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP); +} + /* * Codes for operating systems */ diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 1ac93c39d21e..fb7e75de2065 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3903,6 +3903,22 @@ const struct iomap_ops ext4_iomap_report_ops =3D { .iomap_begin =3D ext4_iomap_begin_report, }; =20 +static int ext4_iomap_read_folio(struct file *file, struct folio *folio) +{ + return 0; +} + +static void ext4_iomap_readahead(struct readahead_control *rac) +{ + +} + +static int ext4_iomap_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + return 0; +} + /* * For data=3Djournal mode, folio should be marked dirty only when it was * writeably mapped. When that happens, it was already attached to the @@ -3989,6 +4005,20 @@ static const struct address_space_operations ext4_da= _aops =3D { .swap_activate =3D ext4_iomap_swap_activate, }; =20 +static const struct address_space_operations ext4_iomap_aops =3D { + .read_folio =3D ext4_iomap_read_folio, + .readahead =3D ext4_iomap_readahead, + .writepages =3D ext4_iomap_writepages, + .dirty_folio =3D iomap_dirty_folio, + .bmap =3D ext4_bmap, + .invalidate_folio =3D iomap_invalidate_folio, + .release_folio =3D iomap_release_folio, + .migrate_folio =3D filemap_migrate_folio, + .is_partially_uptodate =3D iomap_is_partially_uptodate, + .error_remove_folio =3D generic_error_remove_folio, + .swap_activate =3D ext4_iomap_swap_activate, +}; + static const struct address_space_operations ext4_dax_aops =3D { .writepages =3D ext4_dax_writepages, .dirty_folio =3D noop_dirty_folio, @@ -4010,6 +4040,8 @@ void ext4_set_aops(struct inode *inode) } if (IS_DAX(inode)) inode->i_mapping->a_ops =3D &ext4_dax_aops; + else if (ext4_inode_buffered_iomap(inode)) + inode->i_mapping->a_ops =3D &ext4_iomap_aops; else if (test_opt(inode->i_sb, DELALLOC)) inode->i_mapping->a_ops =3D &ext4_da_aops; else --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76A3C37F8BD; Tue, 3 Feb 2026 06:30:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100223; cv=none; b=ihczkh5oxf3rWMNwuXSC5Mn/sUH8ThNYsPBV3kA5+exfWQze030QvuYP93W5TBJFynQ6yl97fKATa4yDIxIfbBrAiFOzIvCb2DnCw6DwIYAF++U37XcNwQWK/1T7PYXvCNs1Heu4+CxVEJl6vuD6EWSBHgR20QLeflDhKAhv/sM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100223; c=relaxed/simple; bh=n8Fey6ZMPVRbrgmyZndXPZXpheYwkacJ4YbQOvQNnO0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cio4X1I2J9ebEf+PdKlmWGrVEyYIxUEY0mHILZ310p20A8fKU5bVYP2dmiIRtnvOaOmQhhLcWrshEvlYGBK7jw/tJg2Z4eU8OBNutZFoY1bAHiV+vW9C0XFaFIuh2qPTvOpaf8SgNDLC+HvhY0y6wSw/oNKT9lnCognUkIxGlHg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqq4KBXzYQtyP; Tue, 3 Feb 2026 14:29:27 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 861AD4056B; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S14; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 10/22] ext4: implement buffered read iomap path Date: Tue, 3 Feb 2026 14:25:10 +0800 Message-ID: <20260203062523.3869120-11-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S14 X-Coremail-Antispam: 1UD129KBjvJXoW7Zr15Wr1UtrWDuFyxWw48WFg_yoW8Kw13pF Z0kFy5Gr47XrnI9F4SqFZrJr1Fk3WxtF4UWryfGwnxuFyYkrW2gayUWFyYvF15tw47AF18 XF4jkr1xGF4UArDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_ Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Introduce a new iomap_ops instance, ext4_iomap_buffer_read_ops, to implement the iomap read path for ext4, specifically the read_folio() and readahead() callbacks of ext4_iomap_aops. ext4_iomap_map_blocks() invokes ext4_map_blocks() to query the extent mapping status of the read range and then converts the mapping information to iomap type. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fb7e75de2065..25d9462d2da7 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3903,14 +3903,57 @@ const struct iomap_ops ext4_iomap_report_ops =3D { .iomap_begin =3D ext4_iomap_begin_report, }; =20 +static int ext4_iomap_map_blocks(struct inode *inode, loff_t offset, + loff_t length, struct ext4_map_blocks *map) +{ + u8 blkbits =3D inode->i_blkbits; + + if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) + return -EINVAL; + + /* Calculate the first and last logical blocks respectively. */ + map->m_lblk =3D offset >> blkbits; + map->m_len =3D min_t(loff_t, (offset + length - 1) >> blkbits, + EXT4_MAX_LOGICAL_BLOCK) - map->m_lblk + 1; + + return ext4_map_blocks(NULL, inode, map, 0); +} + +static int ext4_iomap_buffered_read_begin(struct inode *inode, loff_t offs= et, + loff_t length, unsigned int flags, struct iomap *iomap, + struct iomap *srcmap) +{ + struct ext4_map_blocks map; + int ret; + + if (unlikely(ext4_forced_shutdown(inode->i_sb))) + return -EIO; + + /* Inline data support is not yet available. */ + if (WARN_ON_ONCE(ext4_has_inline_data(inode))) + return -ERANGE; + + ret =3D ext4_iomap_map_blocks(inode, offset, length, &map); + if (ret < 0) + return ret; + + ext4_set_iomap(inode, iomap, &map, offset, length, flags); + return 0; +} + +const struct iomap_ops ext4_iomap_buffered_read_ops =3D { + .iomap_begin =3D ext4_iomap_buffered_read_begin, +}; + static int ext4_iomap_read_folio(struct file *file, struct folio *folio) { + iomap_bio_read_folio(folio, &ext4_iomap_buffered_read_ops); return 0; } =20 static void ext4_iomap_readahead(struct readahead_control *rac) { - + iomap_bio_readahead(rac, &ext4_iomap_buffered_read_ops); } =20 static int ext4_iomap_writepages(struct address_space *mapping, --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A421A2DECBA; Tue, 3 Feb 2026 06:30:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100219; cv=none; b=de8sB445GvFf9NxOxVpGeAV+p5WbTUlPNeC2r8WRE62RY6WHMcB7+czgtzT+uLZWvRoWWf+7L5mjkQ2Lny3FZSNh8MkICRfv1wGheuKjPlEfyldfHnH3HE7Y+4CCpIhh7WOUscZE0orxaHkii0fiWXYa2j+dl631nm/SGdoroig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100219; c=relaxed/simple; bh=m6IuXLODPsEfc30QQhnkRmrGf/hgVBy9Bw5It6YvUHo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QCi8U3z3IDwl9Nn/Q3pNkM6ZOWLzsmfhbjxgM4wQZBbr4Wd0RAne5Rsb+y6Ep5f62sLOZsbNFT71sAHK+MqjIWWXx9TDIeGcQQ49vaMpFIi0TSBNtjfiXnWC4xnHcwrnX5unpoH4g9HckB+bybfpgdBcecAn9g61l3SzSwpfLm0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4f4trK23z2zKHMbt; Tue, 3 Feb 2026 14:29:53 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 94E2E40575; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S15; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 11/22] ext4: pass out extent seq counter when mapping da blocks Date: Tue, 3 Feb 2026 14:25:11 +0800 Message-ID: <20260203062523.3869120-12-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S15 X-Coremail-Antispam: 1UD129KBjvJXoW7tF1kJF1rJr4rXrW5tw1xuFg_yoW8WFWkp3 9Ykr1rGw1xZ34v9ay0q3W7ZFyrKa15AFW7GrWfXw18Ka4DWFySqF4jkF12yFy0gr4xXr1F vF4FkryUCw4fCFDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_ Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" The iomap buffered write path does not hold any locks between querying the inode extent mapping information and performing buffered writes. It relies on the sequence counter saved in the inode to determine whether the mapping information is stale. Commit 07c440e8da8f ("ext4: pass out extent seq counter when mapping blocks") passed out the sequence number when mapping blocks, but missed two places where it would be used later in the iomap buffered delayed write path; these have now been filled in. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 25d9462d2da7..c9489978358e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1903,7 +1903,7 @@ static int ext4_da_map_blocks(struct inode *inode, st= ruct ext4_map_blocks *map) ext4_check_map_extents_env(inode); =20 /* Lookup extent status tree firstly */ - if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, NULL)) { + if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, &map->m_seq)) { map->m_len =3D min_t(unsigned int, map->m_len, es.es_len - (map->m_lblk - es.es_lblk)); =20 @@ -1956,7 +1956,7 @@ static int ext4_da_map_blocks(struct inode *inode, st= ruct ext4_map_blocks *map) * is held in write mode, before inserting a new da entry in * the extent status tree. */ - if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, NULL)) { + if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, &map->m_seq)) { map->m_len =3D min_t(unsigned int, map->m_len, es.es_len - (map->m_lblk - es.es_lblk)); =20 --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88FDA37F8D8; Tue, 3 Feb 2026 06:30:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100224; cv=none; b=NFiHWIEWS7S6wxi32eynONin+7CVx6uQ2HL3c4Ya1BmrVRpH7Q7f5OHAzisnzqraX7XdFLvKMX0JV2k57mrs4ZOqw9YLS5Olz6vugkodFyAFEqsR25a/lXdgQReaz9RySVtyWaNKRLIgndMtj5c4Uuo6I2KQn3/pwagjdjyIl/g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100224; c=relaxed/simple; bh=b/XHEgx5q6NvWz/lJzuZYF78KNDLPHYXM3TY2NlNfHU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cZ9xh1ml66BqNC0whY3IJmKRHMTqozMTnvaTQmLWV13nBkTy4mkv45p+SOoMOW66O3NzwmahNof2Sq+j7p3nLzg9ukGjF7Kw0V1ayZ1RVpFcS/Uikpa07572y+eYQIyji9O/G5sUI6qPtCuerRnPUOZqdHc7CPTCbjZ5k1WKIA4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4f4trK2XzlzKHMc3; Tue, 3 Feb 2026 14:29:53 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id A743E4058D; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S16; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 12/22] ext4: implement buffered write iomap path Date: Tue, 3 Feb 2026 14:25:12 +0800 Message-ID: <20260203062523.3869120-13-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S16 X-Coremail-Antispam: 1UD129KBjvJXoW3uFyUAF4rZF4fJr13Jw4UXFb_yoWkKw1xpa s0kry5GFsrXr97uF4ftF4UZr1F93WxtrW7CrW3Wrn8XryqyrWIqF48KFyayF15trZ7Cr4j qF4jkry8Wr4UCrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_ Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Introduce two new iomap_ops instances, ext4_iomap_buffer_write_ops and ext4_iomap_buffer_da_write_ops, to implement the iomap write paths for ext4. ext4_iomap_buffer_da_write_begin() invokes ext4_da_map_blocks() to map delayed allocation extents and ext4_iomap_buffer_write_begin() invokes ext4_iomap_get_blocks() to directly allocate blocks in non-delayed allocation mode. Additionally, add ext4_iomap_valid() to check the validity of extents by iomap infrastructure. Key notes: - Since we don't use ordered data mode to prevent exposing stale data in the non-delayed allocation path, we ignore the dioread_nolock mount option and always allocate unwritten extents for new blocks. - The iomap write path maps multiple blocks at a time in the iomap_begin() callbacks, so we must remove the stale delayed allocation range in case of short writes and write failures. Otherwise, this could result in a range of delayed extents being covered by a clean folio, which would lead to inaccurate space reservation. - The lock ordering of the folio lock and transaction start is the opposite of that in the buffer_head buffered write path, update the locking document as well. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 4 ++ fs/ext4/file.c | 20 +++++- fs/ext4/inode.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++- fs/ext4/super.c | 10 ++- 4 files changed, 200 insertions(+), 7 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 4930446cfec1..89059b15ee5c 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3062,6 +3062,7 @@ int ext4_walk_page_buffers(handle_t *handle, int do_journal_get_write_access(handle_t *handle, struct inode *inode, struct buffer_head *bh); void ext4_set_inode_mapping_order(struct inode *inode); +int ext4_nonda_switch(struct super_block *sb); #define FALL_BACK_TO_NONDELALLOC 1 #define CONVERT_INLINE_DATA 2 =20 @@ -3930,6 +3931,9 @@ static inline void ext4_clear_io_unwritten_flag(ext4_= io_end_t *io_end) =20 extern const struct iomap_ops ext4_iomap_ops; extern const struct iomap_ops ext4_iomap_report_ops; +extern const struct iomap_ops ext4_iomap_buffered_write_ops; +extern const struct iomap_ops ext4_iomap_buffered_da_write_ops; +extern const struct iomap_write_ops ext4_iomap_write_ops; =20 static inline int ext4_buffer_uptodate(struct buffer_head *bh) { diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 3ecc09f286e4..11fbc607d332 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -303,6 +303,21 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, s= truct iov_iter *from) return count; } =20 +static ssize_t ext4_iomap_buffered_write(struct kiocb *iocb, + struct iov_iter *from) +{ + struct inode *inode =3D file_inode(iocb->ki_filp); + const struct iomap_ops *iomap_ops; + + if (test_opt(inode->i_sb, DELALLOC) && !ext4_nonda_switch(inode->i_sb)) + iomap_ops =3D &ext4_iomap_buffered_da_write_ops; + else + iomap_ops =3D &ext4_iomap_buffered_write_ops; + + return iomap_file_buffered_write(iocb, from, iomap_ops, + &ext4_iomap_write_ops, NULL); +} + static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, struct iov_iter *from) { @@ -317,7 +332,10 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *= iocb, if (ret <=3D 0) goto out; =20 - ret =3D generic_perform_write(iocb, from); + if (ext4_inode_buffered_iomap(inode)) + ret =3D ext4_iomap_buffered_write(iocb, from); + else + ret =3D generic_perform_write(iocb, from); =20 out: inode_unlock(inode); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c9489978358e..da4fd62c6963 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3065,7 +3065,7 @@ static int ext4_dax_writepages(struct address_space *= mapping, return ret; } =20 -static int ext4_nonda_switch(struct super_block *sb) +int ext4_nonda_switch(struct super_block *sb) { s64 free_clusters, dirty_clusters; struct ext4_sb_info *sbi =3D EXT4_SB(sb); @@ -3462,6 +3462,15 @@ static bool ext4_inode_datasync_dirty(struct inode *= inode) return inode_state_read_once(inode) & I_DIRTY_DATASYNC; } =20 +static bool ext4_iomap_valid(struct inode *inode, const struct iomap *ioma= p) +{ + return iomap->validity_cookie =3D=3D READ_ONCE(EXT4_I(inode)->i_es_seq); +} + +const struct iomap_write_ops ext4_iomap_write_ops =3D { + .iomap_valid =3D ext4_iomap_valid, +}; + static void ext4_set_iomap(struct inode *inode, struct iomap *iomap, struct ext4_map_blocks *map, loff_t offset, loff_t length, unsigned int flags) @@ -3496,6 +3505,8 @@ static void ext4_set_iomap(struct inode *inode, struc= t iomap *iomap, !ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) iomap->flags |=3D IOMAP_F_MERGED; =20 + iomap->validity_cookie =3D map->m_seq; + /* * Flags passed to ext4_map_blocks() for direct I/O writes can result * in m_flags having both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN bits @@ -3903,8 +3914,12 @@ const struct iomap_ops ext4_iomap_report_ops =3D { .iomap_begin =3D ext4_iomap_begin_report, }; =20 +/* Map blocks */ +typedef int (ext4_get_blocks_t)(struct inode *, struct ext4_map_blocks *); + static int ext4_iomap_map_blocks(struct inode *inode, loff_t offset, - loff_t length, struct ext4_map_blocks *map) + loff_t length, ext4_get_blocks_t get_blocks, + struct ext4_map_blocks *map) { u8 blkbits =3D inode->i_blkbits; =20 @@ -3916,6 +3931,9 @@ static int ext4_iomap_map_blocks(struct inode *inode,= loff_t offset, map->m_len =3D min_t(loff_t, (offset + length - 1) >> blkbits, EXT4_MAX_LOGICAL_BLOCK) - map->m_lblk + 1; =20 + if (get_blocks) + return get_blocks(inode, map); + return ext4_map_blocks(NULL, inode, map, 0); } =20 @@ -3933,7 +3951,91 @@ static int ext4_iomap_buffered_read_begin(struct ino= de *inode, loff_t offset, if (WARN_ON_ONCE(ext4_has_inline_data(inode))) return -ERANGE; =20 - ret =3D ext4_iomap_map_blocks(inode, offset, length, &map); + ret =3D ext4_iomap_map_blocks(inode, offset, length, NULL, &map); + if (ret < 0) + return ret; + + ext4_set_iomap(inode, iomap, &map, offset, length, flags); + return 0; +} + +static int ext4_iomap_get_blocks(struct inode *inode, + struct ext4_map_blocks *map) +{ + loff_t i_size =3D i_size_read(inode); + handle_t *handle; + int ret, needed_blocks; + + /* + * Check if the blocks have already been allocated, this could + * avoid initiating a new journal transaction and return the + * mapping information directly. + */ + if ((map->m_lblk + map->m_len) <=3D + round_up(i_size, i_blocksize(inode)) >> inode->i_blkbits) { + ret =3D ext4_map_blocks(NULL, inode, map, 0); + if (ret < 0) + return ret; + if (map->m_flags & (EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN | + EXT4_MAP_DELAYED)) + return 0; + } + + /* + * Reserve one block more for addition to orphan list in case + * we allocate blocks but write fails for some reason. + */ + needed_blocks =3D ext4_chunk_trans_blocks(inode, map->m_len) + 1; + handle =3D ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + ret =3D ext4_map_blocks(handle, inode, map, + EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); + /* + * We have to stop handle here for two reasons. + * + * - One is a potential deadlock caused by the subsequent call to + * balance_dirty_pages(). It may wait for the dirty pages to be + * written back, which could initiate another handle and cause it + * to wait for the current one to complete. + * + * - Another one is that we cannot hole lock folio under an active + * handle because the lock order of iomap is always acquires the + * folio lock before starting a new handle; otherwise, this could + * cause a potential deadlock. + */ + ext4_journal_stop(handle); + + return ret; +} + +static int ext4_iomap_buffered_do_write_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap, bool delalloc) +{ + int ret, retries =3D 0; + struct ext4_map_blocks map; + ext4_get_blocks_t *get_blocks; + + ret =3D ext4_emergency_state(inode->i_sb); + if (unlikely(ret)) + return ret; + + /* Inline data support is not yet available. */ + if (WARN_ON_ONCE(ext4_has_inline_data(inode))) + return -ERANGE; + if (WARN_ON_ONCE(!(flags & IOMAP_WRITE))) + return -EINVAL; + + if (delalloc) + get_blocks =3D ext4_da_map_blocks; + else + get_blocks =3D ext4_iomap_get_blocks; +retry: + ret =3D ext4_iomap_map_blocks(inode, offset, length, get_blocks, &map); + if (ret =3D=3D -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) + goto retry; if (ret < 0) return ret; =20 @@ -3941,6 +4043,71 @@ static int ext4_iomap_buffered_read_begin(struct ino= de *inode, loff_t offset, return 0; } =20 +static int ext4_iomap_buffered_write_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + return ext4_iomap_buffered_do_write_begin(inode, offset, length, flags, + iomap, srcmap, false); +} + +static int ext4_iomap_buffered_da_write_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + return ext4_iomap_buffered_do_write_begin(inode, offset, length, flags, + iomap, srcmap, true); +} + +/* + * Drop the staled delayed allocation range from the write failure, + * including both start and end blocks. If not, we could leave a range + * of delayed extents covered by a clean folio, it could lead to + * inaccurate space reservation. + */ +static void ext4_iomap_punch_delalloc(struct inode *inode, loff_t offset, + loff_t length, struct iomap *iomap) +{ + down_write(&EXT4_I(inode)->i_data_sem); + ext4_es_remove_extent(inode, offset >> inode->i_blkbits, + DIV_ROUND_UP_ULL(length, EXT4_BLOCK_SIZE(inode->i_sb))); + up_write(&EXT4_I(inode)->i_data_sem); +} + +static int ext4_iomap_buffered_da_write_end(struct inode *inode, loff_t of= fset, + loff_t length, ssize_t written, + unsigned int flags, + struct iomap *iomap) +{ + loff_t start_byte, end_byte; + + /* If we didn't reserve the blocks, we're not allowed to punch them. */ + if (iomap->type !=3D IOMAP_DELALLOC || !(iomap->flags & IOMAP_F_NEW)) + return 0; + + /* Nothing to do if we've written the entire delalloc extent */ + start_byte =3D iomap_last_written_block(inode, offset, written); + end_byte =3D round_up(offset + length, i_blocksize(inode)); + if (start_byte >=3D end_byte) + return 0; + + filemap_invalidate_lock(inode->i_mapping); + iomap_write_delalloc_release(inode, start_byte, end_byte, flags, + iomap, ext4_iomap_punch_delalloc); + filemap_invalidate_unlock(inode->i_mapping); + return 0; +} + + +const struct iomap_ops ext4_iomap_buffered_write_ops =3D { + .iomap_begin =3D ext4_iomap_buffered_write_begin, +}; + +const struct iomap_ops ext4_iomap_buffered_da_write_ops =3D { + .iomap_begin =3D ext4_iomap_buffered_da_write_begin, + .iomap_end =3D ext4_iomap_buffered_da_write_end, +}; + const struct iomap_ops ext4_iomap_buffered_read_ops =3D { .iomap_begin =3D ext4_iomap_buffered_read_begin, }; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 69eb63dde983..b68509505558 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -104,9 +104,13 @@ static const struct fs_parameter_spec ext4_param_specs= []; * -> page lock -> i_data_sem (rw) * * buffered write path: - * sb_start_write -> i_mutex -> mmap_lock - * sb_start_write -> i_mutex -> transaction start -> page lock -> - * i_data_sem (rw) + * sb_start_write -> i_rwsem (w) -> mmap_lock + * - buffer_head path: + * sb_start_write -> i_rwsem (w) -> transaction start -> folio lock -> + * i_data_sem (rw) + * - iomap path: + * sb_start_write -> i_rwsem (w) -> transaction start -> i_data_sem (rw) + * sb_start_write -> i_rwsem (w) -> folio lock * * truncate: * sb_start_write -> i_mutex -> invalidate_lock (w) -> i_mmap_rwsem (w) -> --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E6C03815D2; Tue, 3 Feb 2026 06:30:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100224; cv=none; b=hrhpRHRBWz0/xS1uDCl4PwwqugXmYpmYDrR+6gYkMYKlUmpinqjjr5ehudQFZ43h9E0vKSs78/DD2h/OkI1+eXz9OsQ0xPmhr6jUj8CWl/zN3IYngQBgJpC/FfJcxY7gzJAeZZw2zInASC+I/o37c6MJn0H1pqIYKgh9Eew+HfU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100224; c=relaxed/simple; bh=Vd6xxkOhK4dvyZpBYkB6b+dQhRuxKD02kT0JWgySG2Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QrBn7Xn9MyOnbHISdjmeHEdzbSGWOfgT0qIlo6KNSgaSbxgZDexvv04V2RC9VOJAcL/JTbt1/gFCsI/5dT93bwL5FUR2jCgoT2JNWej6fVWEvQFSIU+vBRslxPwgclm5hU6Ik2KsmVDDsD+KAgGUXYwg4VhiRHv+QT0llY5hcr0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqq5wP1zYQtyh; Tue, 3 Feb 2026 14:29:27 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id BF3594058F; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S17; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 13/22] ext4: implement writeback iomap path Date: Tue, 3 Feb 2026 14:25:13 +0800 Message-ID: <20260203062523.3869120-14-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S17 X-Coremail-Antispam: 1UD129KBjvAXoW3ury7WF1xur1rtry7ZrW7twb_yoW8WF18Zo WSqa13Xr48Jry5tayFkF1ftryUuan7Gw4rJr45ZrZFvasxJa4Yyw4fGw47W3W7Xw4FkFyf ZrWxJ3W5Gr48J3Wrn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOt7kC6x804xWl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK 8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF 0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vE j48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxV AFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x02 67AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I 80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCj c4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0aVACjI8F5VA0II8E6IAqYI8I648v4I1lFI xGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l 42xK82IY64kExVAvwVAq07x20xyl4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67 AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIY rxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14 v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j 6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjxUFP ETDUUUU Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Implement the iomap writeback path for ext4. It implement ext4_iomap_writepages(), introduce a new iomap_writeback_ops instance, ext4_writeback_ops, and create a new end I/O extent conversion worker to convert unwritten extents after the I/O is completed. In the ->writeback_range() callback, it first call ext4_iomap_map_writeback_range() to query the longest range of existing mapped extents. For performance considerations, if the block range has not been allocated, it attempts to allocate a range of longest blocks which is based on the writeback length and the delalloc extent length, rather than allocating for a single folio length at a time. Then, add the folio to the iomap_ioend instance. In the ->writeback_submit() callback, it registers a special end bio callback, ext4_iomap_end_bio(), which will start a worker if we need to convert unwritten extents or need to update i_disksize after the data has been written back, and if we need to abort the journal when the I/O is failed to write back. Key notes: - Since we aim to allocate a range of blocks as long as possible within the writeback length for each invocation of ->writeback_range() callback, we may allocate a long range but write less in certain corner cases. Therefore, we have to ignore the dioread_nolock mount option and always allocate unwritten blocks. This is consistent with the non-delayed buffer write process. - Since ->writeback_range() is always executed under the folio lock, this means we need to start the handle under the folio lock as well. This is opposite to the order in the buffer_head writeback path. Therefore, we cannot use the ordered data mode to write back data, otherwise it would cause a deadlock. Fortunately, since we always allocate unwritten extents when allocating blocks, the functionality of the ordered data mode is already quite limited and can be replaced by other methods. - Since we don't use ordered data mode, the deadlock problem that was expected to be resolved through the reserve handle does not exists here. Therefore, we also do not need to use the reserve handle when converting the unwritten extent in the end I/O worker, we can start a normal journal handle instead. - Since we always allocate unwritten blocks, we also delay updating the i_disksize until the I/O is done, which could prevent the exposure of zero data that may occur during a system crash while performing buffer append writes. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 4 + fs/ext4/inode.c | 213 +++++++++++++++++++++++++++++++++++++++++++++- fs/ext4/page-io.c | 119 ++++++++++++++++++++++++++ fs/ext4/super.c | 7 +- 4 files changed, 341 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 89059b15ee5c..520f6d5dcdab 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1176,6 +1176,8 @@ struct ext4_inode_info { */ struct list_head i_rsv_conversion_list; struct work_struct i_rsv_conversion_work; + struct list_head i_iomap_ioend_list; + struct work_struct i_iomap_ioend_work; =20 /* * Transactions that contain inode's metadata needed to complete @@ -3874,6 +3876,8 @@ int ext4_bio_write_folio(struct ext4_io_submit *io, s= truct folio *page, size_t len); extern struct ext4_io_end_vec *ext4_alloc_io_end_vec(ext4_io_end_t *io_end= ); extern struct ext4_io_end_vec *ext4_last_io_end_vec(ext4_io_end_t *io_end); +extern void ext4_iomap_end_io(struct work_struct *work); +extern void ext4_iomap_end_bio(struct bio *bio); =20 /* mmp.c */ extern int ext4_multi_mount_protect(struct super_block *, ext4_fsblk_t); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index da4fd62c6963..4a7d18511c3f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -44,6 +44,7 @@ #include =20 #include "ext4_jbd2.h" +#include "ext4_extents.h" #include "xattr.h" #include "acl.h" #include "truncate.h" @@ -4123,10 +4124,220 @@ static void ext4_iomap_readahead(struct readahead_= control *rac) iomap_bio_readahead(rac, &ext4_iomap_buffered_read_ops); } =20 +struct ext4_writeback_ctx { + struct iomap_writepage_ctx ctx; + unsigned int data_seq; +}; + +static int ext4_iomap_map_one_extent(struct inode *inode, + struct ext4_map_blocks *map) +{ + struct extent_status es; + handle_t *handle =3D NULL; + int credits, map_flags; + int retval; + + credits =3D ext4_chunk_trans_blocks(inode, map->m_len); + handle =3D ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, credits); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + map->m_flags =3D 0; + /* + * It is necessary to look up extent and map blocks under i_data_sem + * in write mode, otherwise, the delalloc extent may become stale + * during concurrent truncate operations. + */ + ext4_fc_track_inode(handle, inode); + down_write(&EXT4_I(inode)->i_data_sem); + if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, &map->m_seq)) { + retval =3D es.es_len - (map->m_lblk - es.es_lblk); + map->m_len =3D min_t(unsigned int, retval, map->m_len); + + if (ext4_es_is_delayed(&es)) { + map->m_flags |=3D EXT4_MAP_DELAYED; + trace_ext4_da_write_pages_extent(inode, map); + /* + * Call ext4_map_create_blocks() to allocate any + * delayed allocation blocks. It is possible that + * we're going to need more metadata blocks, however + * we must not fail because we're in writeback and + * there is nothing we can do so it might result in + * data loss. So use reserved blocks to allocate + * metadata if possible. + */ + map_flags =3D EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT | + EXT4_GET_BLOCKS_METADATA_NOFAIL | + EXT4_EX_NOCACHE; + + retval =3D ext4_map_create_blocks(handle, inode, map, + map_flags); + if (retval > 0) + ext4_fc_track_range(handle, inode, map->m_lblk, + map->m_lblk + map->m_len - 1); + goto out; + } else if (unlikely(ext4_es_is_hole(&es))) + goto out; + + /* Found written or unwritten extent. */ + map->m_pblk =3D ext4_es_pblock(&es) + map->m_lblk - es.es_lblk; + map->m_flags =3D ext4_es_is_written(&es) ? + EXT4_MAP_MAPPED : EXT4_MAP_UNWRITTEN; + goto out; + } + + retval =3D ext4_map_query_blocks(handle, inode, map, EXT4_EX_NOCACHE); +out: + up_write(&EXT4_I(inode)->i_data_sem); + ext4_journal_stop(handle); + return retval < 0 ? retval : 0; +} + +static int ext4_iomap_map_writeback_range(struct iomap_writepage_ctx *wpc, + loff_t offset, unsigned int dirty_len) +{ + struct ext4_writeback_ctx *ewpc =3D + container_of(wpc, struct ext4_writeback_ctx, ctx); + struct inode *inode =3D wpc->inode; + struct super_block *sb =3D inode->i_sb; + struct journal_s *journal =3D EXT4_SB(sb)->s_journal; + struct ext4_inode_info *ei =3D EXT4_I(inode); + struct ext4_map_blocks map; + unsigned int blkbits =3D inode->i_blkbits; + unsigned int index =3D offset >> blkbits; + unsigned int blk_end, blk_len; + int ret; + + ret =3D ext4_emergency_state(sb); + if (unlikely(ret)) + return ret; + + /* Check validity of the cached writeback mapping. */ + if (offset >=3D wpc->iomap.offset && + offset < wpc->iomap.offset + wpc->iomap.length && + ewpc->data_seq =3D=3D READ_ONCE(ei->i_es_seq)) + return 0; + + blk_len =3D dirty_len >> blkbits; + blk_end =3D min_t(unsigned int, (wpc->wbc->range_end >> blkbits), + (UINT_MAX - 1)); + if (blk_end > index + blk_len) + blk_len =3D blk_end - index + 1; + +retry: + map.m_lblk =3D index; + map.m_len =3D min_t(unsigned int, MAX_WRITEPAGES_EXTENT_LEN, blk_len); + ret =3D ext4_map_blocks(NULL, inode, &map, + EXT4_GET_BLOCKS_IO_SUBMIT | EXT4_EX_NOCACHE); + if (ret < 0) + return ret; + + /* + * The map is not a delalloc extent, it must either be a hole + * or an extent which have already been allocated. + */ + if (!(map.m_flags & EXT4_MAP_DELAYED)) + goto out; + + /* Map one delalloc extent. */ + ret =3D ext4_iomap_map_one_extent(inode, &map); + if (ret < 0) { + if (ext4_emergency_state(sb)) + return ret; + + /* + * Retry transient ENOSPC errors, if + * ext4_count_free_blocks() is non-zero, a commit + * should free up blocks. + */ + if (ret =3D=3D -ENOSPC && journal && ext4_count_free_clusters(sb)) { + jbd2_journal_force_commit_nested(journal); + goto retry; + } + + ext4_msg(sb, KERN_CRIT, + "Delayed block allocation failed for inode %lu at logical offset %llu = with max blocks %u with error %d", + inode->i_ino, (unsigned long long)map.m_lblk, + (unsigned int)map.m_len, -ret); + ext4_msg(sb, KERN_CRIT, + "This should not happen!! Data will be lost\n"); + if (ret =3D=3D -ENOSPC) + ext4_print_free_blocks(inode); + return ret; + } +out: + ewpc->data_seq =3D map.m_seq; + ext4_set_iomap(inode, &wpc->iomap, &map, offset, dirty_len, 0); + return 0; +} + +static void ext4_iomap_discard_folio(struct folio *folio, loff_t pos) +{ + struct inode *inode =3D folio->mapping->host; + loff_t length =3D folio_pos(folio) + folio_size(folio) - pos; + + ext4_iomap_punch_delalloc(inode, pos, length, NULL); +} + +static ssize_t ext4_iomap_writeback_range(struct iomap_writepage_ctx *wpc, + struct folio *folio, u64 offset, + unsigned int len, u64 end_pos) +{ + ssize_t ret; + + ret =3D ext4_iomap_map_writeback_range(wpc, offset, len); + if (!ret) + ret =3D iomap_add_to_ioend(wpc, folio, offset, end_pos, len); + if (ret < 0) + ext4_iomap_discard_folio(folio, offset); + return ret; +} + +static int ext4_iomap_writeback_submit(struct iomap_writepage_ctx *wpc, + int error) +{ + struct iomap_ioend *ioend =3D wpc->wb_ctx; + struct ext4_inode_info *ei =3D EXT4_I(ioend->io_inode); + + /* Need to convert unwritten extents when I/Os are completed. */ + if ((ioend->io_flags & IOMAP_IOEND_UNWRITTEN) || + ioend->io_offset + ioend->io_size > READ_ONCE(ei->i_disksize)) + ioend->io_bio.bi_end_io =3D ext4_iomap_end_bio; + + return iomap_ioend_writeback_submit(wpc, error); +} + +static const struct iomap_writeback_ops ext4_writeback_ops =3D { + .writeback_range =3D ext4_iomap_writeback_range, + .writeback_submit =3D ext4_iomap_writeback_submit, +}; + static int ext4_iomap_writepages(struct address_space *mapping, struct writeback_control *wbc) { - return 0; + struct inode *inode =3D mapping->host; + struct super_block *sb =3D inode->i_sb; + long nr =3D wbc->nr_to_write; + int alloc_ctx, ret; + struct ext4_writeback_ctx ewpc =3D { + .ctx =3D { + .inode =3D inode, + .wbc =3D wbc, + .ops =3D &ext4_writeback_ops, + }, + }; + + ret =3D ext4_emergency_state(sb); + if (unlikely(ret)) + return ret; + + alloc_ctx =3D ext4_writepages_down_read(sb); + trace_ext4_writepages(inode, wbc); + ret =3D iomap_writepages(&ewpc.ctx); + trace_ext4_writepages_result(inode, wbc, ret, nr - wbc->nr_to_write); + ext4_writepages_up_read(sb, alloc_ctx); + + return ret; } =20 /* diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index a8c95eee91b7..d74aa430636f 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -592,3 +593,121 @@ int ext4_bio_write_folio(struct ext4_io_submit *io, s= truct folio *folio, =20 return 0; } + +static void ext4_iomap_finish_ioend(struct iomap_ioend *ioend) +{ + struct inode *inode =3D ioend->io_inode; + struct super_block *sb =3D inode->i_sb; + struct ext4_inode_info *ei =3D EXT4_I(inode); + loff_t pos =3D ioend->io_offset; + size_t size =3D ioend->io_size; + loff_t new_disksize; + handle_t *handle; + int credits; + int ret, err; + + ret =3D blk_status_to_errno(ioend->io_bio.bi_status); + if (unlikely(ret)) { + if (test_opt(sb, DATA_ERR_ABORT)) + jbd2_journal_abort(EXT4_SB(sb)->s_journal, ret); + goto out; + } + + /* We may need to convert one extent and dirty the inode. */ + credits =3D ext4_chunk_trans_blocks(inode, + EXT4_MAX_BLOCKS(size, pos, inode->i_blkbits)); + handle =3D ext4_journal_start(inode, EXT4_HT_EXT_CONVERT, credits); + if (IS_ERR(handle)) { + ret =3D PTR_ERR(handle); + goto out_err; + } + + if (ioend->io_flags & IOMAP_IOEND_UNWRITTEN) { + ret =3D ext4_convert_unwritten_extents(handle, inode, pos, size); + if (ret) + goto out_journal; + } + + /* + * Update on-disk size after IO is completed. Races with + * truncate are avoided by checking i_size under i_data_sem. + */ + new_disksize =3D pos + size; + if (new_disksize > READ_ONCE(ei->i_disksize)) { + down_write(&ei->i_data_sem); + new_disksize =3D min(new_disksize, i_size_read(inode)); + if (new_disksize > ei->i_disksize) + ei->i_disksize =3D new_disksize; + up_write(&ei->i_data_sem); + ret =3D ext4_mark_inode_dirty(handle, inode); + if (ret) + EXT4_ERROR_INODE_ERR(inode, -ret, + "Failed to mark inode dirty"); + } + +out_journal: + err =3D ext4_journal_stop(handle); + if (!ret) + ret =3D err; +out_err: + if (ret < 0 && !ext4_emergency_state(sb)) { + ext4_msg(sb, KERN_EMERG, + "failed to convert unwritten extents to written extents or update inod= e size -- potential data loss! (inode %lu, error %d)", + inode->i_ino, ret); + } +out: + iomap_finish_ioends(ioend, ret); +} + +/* + * Work on buffered iomap completed IO, to convert unwritten extents to + * mapped extents + */ +void ext4_iomap_end_io(struct work_struct *work) +{ + struct ext4_inode_info *ei =3D container_of(work, struct ext4_inode_info, + i_iomap_ioend_work); + struct iomap_ioend *ioend; + struct list_head ioend_list; + unsigned long flags; + + spin_lock_irqsave(&ei->i_completed_io_lock, flags); + list_replace_init(&ei->i_iomap_ioend_list, &ioend_list); + spin_unlock_irqrestore(&ei->i_completed_io_lock, flags); + + iomap_sort_ioends(&ioend_list); + while (!list_empty(&ioend_list)) { + ioend =3D list_entry(ioend_list.next, struct iomap_ioend, io_list); + list_del_init(&ioend->io_list); + iomap_ioend_try_merge(ioend, &ioend_list); + ext4_iomap_finish_ioend(ioend); + } +} + +void ext4_iomap_end_bio(struct bio *bio) +{ + struct iomap_ioend *ioend =3D iomap_ioend_from_bio(bio); + struct ext4_inode_info *ei =3D EXT4_I(ioend->io_inode); + struct ext4_sb_info *sbi =3D EXT4_SB(ioend->io_inode->i_sb); + unsigned long flags; + int ret; + + /* Needs to convert unwritten extents or update the i_disksize. */ + if ((ioend->io_flags & IOMAP_IOEND_UNWRITTEN) || + ioend->io_offset + ioend->io_size > READ_ONCE(ei->i_disksize)) + goto defer; + + /* Needs to abort the journal on data_err=3Dabort. */ + ret =3D blk_status_to_errno(ioend->io_bio.bi_status); + if (unlikely(ret) && test_opt(ioend->io_inode->i_sb, DATA_ERR_ABORT)) + goto defer; + + iomap_finish_ioends(ioend, ret); + return; +defer: + spin_lock_irqsave(&ei->i_completed_io_lock, flags); + if (list_empty(&ei->i_iomap_ioend_list)) + queue_work(sbi->rsv_conversion_wq, &ei->i_iomap_ioend_work); + list_add_tail(&ioend->io_list, &ei->i_iomap_ioend_list); + spin_unlock_irqrestore(&ei->i_completed_io_lock, flags); +} diff --git a/fs/ext4/super.c b/fs/ext4/super.c index b68509505558..cffe63deba31 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -123,7 +123,10 @@ static const struct fs_parameter_spec ext4_param_specs= []; * sb_start_write -> i_mutex -> transaction start -> i_data_sem (rw) * * writepages: - * transaction start -> page lock(s) -> i_data_sem (rw) + * - buffer_head path: + * transaction start -> folio lock(s) -> i_data_sem (rw) + * - iomap path: + * folio lock -> transaction start -> i_data_sem (rw) */ =20 static const struct fs_context_operations ext4_context_ops =3D { @@ -1426,10 +1429,12 @@ static struct inode *ext4_alloc_inode(struct super_= block *sb) #endif ei->jinode =3D NULL; INIT_LIST_HEAD(&ei->i_rsv_conversion_list); + INIT_LIST_HEAD(&ei->i_iomap_ioend_list); spin_lock_init(&ei->i_completed_io_lock); ei->i_sync_tid =3D 0; ei->i_datasync_tid =3D 0; INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work); + INIT_WORK(&ei->i_iomap_ioend_work, ext4_iomap_end_io); ext4_fc_init_inode(&ei->vfs_inode); spin_lock_init(&ei->i_fc_lock); return &ei->vfs_inode; --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88EDA37F8D3; Tue, 3 Feb 2026 06:30:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100225; cv=none; b=WggmjBXwwaZfCFFCGZRYONf5pCzAJl20mwEWYL8mvRyi+yMVE1yB+O97JxCwHq0Jj4CfDE2gDJvWqpDqmNav+OfE7AgrKaRGtgqIcLW2VrvFKrtzTXr/b/N9frPqlvqhv3oYEBxk46mds9wZ36owRtVQpK+jlr9FQQqmmw+uoVg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100225; c=relaxed/simple; bh=BrGxNbkrwS6XZ9Qc2+PxxE0UTblauY+mqetFSTrayl8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XLDSORApIuNODXA4ylrqLR/yjb5red7oXDnO+hvsTdvwZuCYRBLzoyaCUcdK+N+vwmqn9nRIknKGqW7Ab3u/tjnfZK6vx+oTQUCtUXeh1wZpY6cw/2es29kL502EP0szlhpiQDpuqXMBOJQR+ZmEjGsINdVsFiXSN1f0DU1pY8E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4f4trK3yqBzKHMc6; Tue, 3 Feb 2026 14:29:53 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id D720B4058C; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S18; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 14/22] ext4: implement mmap iomap path Date: Tue, 3 Feb 2026 14:25:14 +0800 Message-ID: <20260203062523.3869120-15-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S18 X-Coremail-Antispam: 1UD129KBjvJXoWxXrWxZF1fGr45Jw17ZrW3GFg_yoW5Krykpr 95KrZ5GrsxZwnI9rs7WFs8Zr15KayxtrW7WrW3Wr13ZFy7t340ga18KF1avF15t3yxAr42 qF4jkF18W3W3ArDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_ Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Introduce ext4_iomap_page_mkwrite() to implement the mmap iomap path for ext4. Most of this work is delegated to iomap_page_mkwrite(), which only needs to be called with ext4_iomap_buffer_write_ops and ext4_iomap_buffer_da_write_ops as arguments to allocate and map the blocks. However, the lock ordering of the folio lock and transaction start is the opposite of that in the buffer_head buffered write path, update the locking document accordingly. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 32 +++++++++++++++++++++++++++++++- fs/ext4/super.c | 8 ++++++-- 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 4a7d18511c3f..0d2852159fa3 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4026,7 +4026,7 @@ static int ext4_iomap_buffered_do_write_begin(struct = inode *inode, /* Inline data support is not yet available. */ if (WARN_ON_ONCE(ext4_has_inline_data(inode))) return -ERANGE; - if (WARN_ON_ONCE(!(flags & IOMAP_WRITE))) + if (WARN_ON_ONCE(!(flags & (IOMAP_WRITE | IOMAP_FAULT)))) return -EINVAL; =20 if (delalloc) @@ -4086,6 +4086,14 @@ static int ext4_iomap_buffered_da_write_end(struct i= node *inode, loff_t offset, if (iomap->type !=3D IOMAP_DELALLOC || !(iomap->flags & IOMAP_F_NEW)) return 0; =20 + /* + * iomap_page_mkwrite() will never fail in a way that requires delalloc + * extents that it allocated to be revoked. Hence never try to release + * them here. + */ + if (flags & IOMAP_FAULT) + return 0; + /* Nothing to do if we've written the entire delalloc extent */ start_byte =3D iomap_last_written_block(inode, offset, written); end_byte =3D round_up(offset + length, i_blocksize(inode)); @@ -7135,6 +7143,23 @@ static int ext4_block_page_mkwrite(struct inode *ino= de, struct folio *folio, return ret; } =20 +static vm_fault_t ext4_iomap_page_mkwrite(struct vm_fault *vmf) +{ + struct inode *inode =3D file_inode(vmf->vma->vm_file); + const struct iomap_ops *iomap_ops; + + /* + * ext4_nonda_switch() could writeback this folio, so have to + * call it before lock folio. + */ + if (test_opt(inode->i_sb, DELALLOC) && !ext4_nonda_switch(inode->i_sb)) + iomap_ops =3D &ext4_iomap_buffered_da_write_ops; + else + iomap_ops =3D &ext4_iomap_buffered_write_ops; + + return iomap_page_mkwrite(vmf, iomap_ops, NULL); +} + vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; @@ -7157,6 +7182,11 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) =20 filemap_invalidate_lock_shared(mapping); =20 + if (ext4_inode_buffered_iomap(inode)) { + ret =3D ext4_iomap_page_mkwrite(vmf); + goto out; + } + err =3D ext4_convert_inline_data(inode); if (err) goto out_ret; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index cffe63deba31..4bb77703ffe1 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -100,8 +100,12 @@ static const struct fs_parameter_spec ext4_param_specs= []; * Lock ordering * * page fault path: - * mmap_lock -> sb_start_pagefault -> invalidate_lock (r) -> transaction s= tart - * -> page lock -> i_data_sem (rw) + * - buffer_head path: + * mmap_lock -> sb_start_pagefault -> invalidate_lock (r) -> + * transaction start -> folio lock -> i_data_sem (rw) + * - iomap path: + * mmap_lock -> sb_start_pagefault -> invalidate_lock (r) -> + * folio lock -> transaction start -> i_data_sem (rw) * * buffered write path: * sb_start_write -> i_rwsem (w) -> mmap_lock --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C2B3387595; Tue, 3 Feb 2026 06:30:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100224; cv=none; b=aZKo4lVs67/7Y/4clsqQuJ+jXIQl8bG8cxBEFUF/sRDU1vn8n1meCLZU/sWU5R61BQRTMZgqk/6apPnKAX3qaHp/TUxhKz3SoqC0UnKg9BdFTQUeJotuHt3D0ujQbR3z9dpp3oO4G+KFa3/F013MQQzCkY60L9jf5W6TtZoCU9A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100224; c=relaxed/simple; bh=ZxCKs32PUEFcs2N8BKa4Scrw5G7H2To9Mf9HYKqIM+Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cSHLvMwtl4UpbIOaA0jFAjJ21sc9o+SrO9l+nRgSVThT+3boRrqLNaEHJWvOHfgxb/zx2lCJb5ddyuRPxn5HBFSQjjkm37HhXqlrO/6bzWpNcepZ6VMXJX9WX+46uNmryVjzcrGgfYOocVLRwh7zXBOIcfBCnBbxG9H5qKVaRnc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqr08jRzYQtyk; Tue, 3 Feb 2026 14:29:28 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id EC75B40574; Tue, 3 Feb 2026 14:30:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S19; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 15/22] iomap: correct the range of a partial dirty clear Date: Tue, 3 Feb 2026 14:25:15 +0800 Message-ID: <20260203062523.3869120-16-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S19 X-Coremail-Antispam: 1UD129KBjvJXoW7uw1fWr4rXF13tFWrZFy3CFg_yoW8AFyxpF s3KFs8KrWUX348u3ykZFy8XFnaya97XFW8ArW7Wr9xGa15tF1YgF1v9ay3uFyIgr4xAF10 vFnxtryxCr4DAaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_ Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" The block range calculation in ifs_clear_range_dirty() is incorrect when partial clear a range in a folio. We can't clear the dirty bit of the first block or the last block if the start or end offset is blocksize unaligned, this has not yet caused any issue since we always clear a whole folio in iomap_writeback_folio(). Fix this by round up the first block and round down the last block, correct the calculation of nr_blks. Signed-off-by: Zhang Yi --- This is modified from: https://lore.kernel.org/linux-fsdevel/20240812121159.3775074-2-yi.zhang@hu= aweicloud.com/ Changes: - Use round_up() instead of DIV_ROUND_UP() to prevent wasted integer division. fs/iomap/buffered-io.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 154456e39fe5..3c8e085e79cf 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -167,11 +167,15 @@ static void ifs_clear_range_dirty(struct folio *folio, { struct inode *inode =3D folio->mapping->host; unsigned int blks_per_folio =3D i_blocks_per_folio(inode, folio); - unsigned int first_blk =3D (off >> inode->i_blkbits); - unsigned int last_blk =3D (off + len - 1) >> inode->i_blkbits; - unsigned int nr_blks =3D last_blk - first_blk + 1; + unsigned int first_blk =3D round_up(off, i_blocksize(inode)) >> + inode->i_blkbits; + unsigned int last_blk =3D (off + len) >> inode->i_blkbits; + unsigned int nr_blks =3D last_blk - first_blk; unsigned long flags; =20 + if (!nr_blks) + return; + spin_lock_irqsave(&ifs->state_lock, flags); bitmap_clear(ifs->state, first_blk + blks_per_folio, nr_blks); spin_unlock_irqrestore(&ifs->state_lock, flags); --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C6A3217F33; Tue, 3 Feb 2026 06:30:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100226; cv=none; b=u4ZrNrT/J/Rgc4lWoctJ1nz4ZSt/4FsNvDjb8eOpURGbsJhhAaeZea+KKalUzPPD5Tp8SPjvrckL30eeAUnGPseFa3yLbaA2MNETScdntEKytQD/XLQJofy2k7HCYI6BHTap7Mx0i+uAlyhHlJs9iH8E2tq0j8AanxQg1Saowtg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100226; c=relaxed/simple; bh=sdjhgFitvoj93jiAlBSeBXFFj/fozLH8/bgWeU3UPS0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YaGyzDqK0k4S4T2YtqF9nGOPehi+2WQbpDoHjaOjjcYNnB+ho5yyyTKX2737ZHloSohawCFtxbezOLsuysUCuDuKGODVpY5aWHRP7/Q2kycS0EGKcx+l52kM0RnMi+5Qzr1eDkGKKdTZquEfq0vJCSsV8tWs94H7RuXtkx6Y4do= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqr0nNYzYQtyn; Tue, 3 Feb 2026 14:29:28 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 0D1FE40575; Tue, 3 Feb 2026 14:30:15 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S20; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 16/22] iomap: support invalidating partial folios Date: Tue, 3 Feb 2026 14:25:16 +0800 Message-ID: <20260203062523.3869120-17-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S20 X-Coremail-Antispam: 1UD129KBjvJXoW7uF1xtrW5Jw48tw18Kr4fAFb_yoW8Aw4kpF W5KrWDGryDGr17uw17Ca1xZF1j9a93XF17CFW3Ww1a9Fs8tw1vgFy5Ka1Y9ayUAr97AFyS vrsFqFyvqF15A3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_ Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Current iomap_invalidate_folio() can only invalidate an entire folio. If we truncate a partial folio on a filesystem where the block size is smaller than the folio size, it will leave behind dirty bits from the truncated or punched blocks. During the write-back process, it will attempt to map the invalid hole range. Fortunately, this has not caused any real problems so far because the ->writeback_range() function corrects the length. However, the implementation of FALLOC_FL_ZERO_RANGE in ext4 depends on the support for invalidating partial folios. When ext4 partially zeroes out a dirty and unwritten folio, it does not perform a flush first like XFS. Therefore, if the dirty bits of the corresponding area cannot be cleared, the zeroed area after writeback remains in the written state rather than reverting to the unwritten state. Therefore, fix this by supporting invalidating partial folios. Signed-off-by: Zhang Yi Reviewed-by: Darrick J. Wong --- This is cherry picked form: https://lore.kernel.org/linux-fsdevel/20240812121159.3775074-3-yi.zhang@hu= aweicloud.com/ No code changes, only update the commit message to explain why Ext4 needs this. fs/iomap/buffered-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 3c8e085e79cf..d4dd1874a471 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -744,6 +744,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t= offset, size_t len) WARN_ON_ONCE(folio_test_writeback(folio)); folio_cancel_dirty(folio); ifs_free(folio); + } else { + iomap_clear_range_dirty(folio, offset, len); } } EXPORT_SYMBOL_GPL(iomap_invalidate_folio); --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3639F387598; Tue, 3 Feb 2026 06:30:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100224; cv=none; b=e2IpeC6CIploSigJ2Ty+j2IEPENeYpF87CELZ0VBqIQIUvcSqniiCElkGVFvjTUdv2Y4TuoI8oVgY14xPdfKCf79w2Tlk5HIiVozxqg1Gz2+DvnIpaZQkVLeFgG3ZCGyZKip77Nfe5atoajn9h01cBhJf847xViAPl/sb2QN4UM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100224; c=relaxed/simple; bh=gqkcO0uUZ+Eq+shjRmDg5PzjBHQRmoKJo8G4i9A8gEE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Jxa7ACX5G7JLou+ZJ2IVbCYucHN5UGcFVfgz9rYsKTSy0G5RTidsHwpROI+I/wBdamBKHEtK5ZZprq/GnXR+JtBEGhcsrrS6XGypg6VSWu2OvdFPCwz3SdIfiM7nq+RZ1LNOKcTIlBzes5INy3Lypy6ZjOLaeh9ZVid8RyAOGFk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4f4trK5m7ZzKHMcM; Tue, 3 Feb 2026 14:29:53 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 1F0AF4056B; Tue, 3 Feb 2026 14:30:15 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S21; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 17/22] ext4: implement partial block zero range iomap path Date: Tue, 3 Feb 2026 14:25:17 +0800 Message-ID: <20260203062523.3869120-18-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S21 X-Coremail-Antispam: 1UD129KBjvJXoWxAw1xAr1rZryUArWfWw48WFg_yoWrtw43pr WDKrW5Gr47Xr9Igr4ftFsrXr1Yk3WxKrW8Wry3Grn8Z3s0q34xKa18KFyak3W5tw47Cw4j qF4jyr1xKF1UArDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_ Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Introduce a new iomap_ops instance, ext4_iomap_zero_ops, along with ext4_iomap_block_zero_range() to implement the iomap block zeroing range for ext4. ext4_iomap_block_zero_range() invokes iomap_zero_range() and passes ext4_iomap_zero_begin() to locate and zero out a mapped partial block or a dirty, unwritten partial block. Note that zeroing out under an active handle can cause deadlock since the order of acquiring the folio lock and starting a handle is inconsistent with the iomap iteration procedure. Therefore, ext4_iomap_block_zero_range() cannot be called under an active handle. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 0d2852159fa3..c59f3adba0f3 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4107,6 +4107,50 @@ static int ext4_iomap_buffered_da_write_end(struct i= node *inode, loff_t offset, return 0; } =20 +static int ext4_iomap_zero_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + struct iomap_iter *iter =3D container_of(iomap, struct iomap_iter, iomap); + struct ext4_map_blocks map; + u8 blkbits =3D inode->i_blkbits; + unsigned int iomap_flags =3D 0; + int ret; + + ret =3D ext4_emergency_state(inode->i_sb); + if (unlikely(ret)) + return ret; + + if (WARN_ON_ONCE(!(flags & IOMAP_ZERO))) + return -EINVAL; + + ret =3D ext4_iomap_map_blocks(inode, offset, length, NULL, &map); + if (ret < 0) + return ret; + + /* + * Look up dirty folios for unwritten mappings within EOF. Providing + * this bypasses the flush iomap uses to trigger extent conversion + * when unwritten mappings have dirty pagecache in need of zeroing. + */ + if (map.m_flags & EXT4_MAP_UNWRITTEN) { + loff_t offset =3D ((loff_t)map.m_lblk) << blkbits; + loff_t end =3D ((loff_t)map.m_lblk + map.m_len) << blkbits; + + iomap_fill_dirty_folios(iter, &offset, end, &iomap_flags); + if ((offset >> blkbits) < map.m_lblk + map.m_len) + map.m_len =3D (offset >> blkbits) - map.m_lblk; + } + + ext4_set_iomap(inode, iomap, &map, offset, length, flags); + iomap->flags |=3D iomap_flags; + + return 0; +} + +const struct iomap_ops ext4_iomap_zero_ops =3D { + .iomap_begin =3D ext4_iomap_zero_begin, +}; =20 const struct iomap_ops ext4_iomap_buffered_write_ops =3D { .iomap_begin =3D ext4_iomap_buffered_write_begin, @@ -4622,6 +4666,32 @@ static int ext4_journalled_block_zero_range(struct i= node *inode, loff_t from, return err; } =20 +static int ext4_iomap_block_zero_range(struct inode *inode, loff_t from, + loff_t length, bool *did_zero) +{ + /* + * Zeroing out under an active handle can cause deadlock since + * the order of acquiring the folio lock and starting a handle is + * inconsistent with the iomap writeback procedure. + */ + if (WARN_ON_ONCE(ext4_handle_valid(journal_current_handle()))) + return -EINVAL; + + /* The zeroing scope should not extend across a block. */ + if (WARN_ON_ONCE((from >> inode->i_blkbits) !=3D + ((from + length - 1) >> inode->i_blkbits))) + return -EINVAL; + + if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ORPHAN_FS) && + !(inode_state_read_once(inode) & (I_NEW | I_FREEING))) + WARN_ON_ONCE(!inode_is_locked(inode) && + !rwsem_is_locked(&inode->i_mapping->invalidate_lock)); + + return iomap_zero_range(inode, from, length, did_zero, + &ext4_iomap_zero_ops, + &ext4_iomap_write_ops, NULL); +} + /* * ext4_block_zero_page_range() zeros out a mapping of length 'length' * starting from file offset 'from'. The range to be zero'd must @@ -4650,6 +4720,9 @@ static int ext4_block_zero_page_range(struct address_= space *mapping, } else if (ext4_should_journal_data(inode)) { return ext4_journalled_block_zero_range(inode, from, length, did_zero); + } else if (ext4_inode_buffered_iomap(inode)) { + return ext4_iomap_block_zero_range(inode, from, length, + did_zero); } return ext4_block_zero_range(inode, from, length, did_zero); } @@ -5063,6 +5136,18 @@ int ext4_truncate(struct inode *inode) err =3D zero_len; goto out_trace; } + /* + * inodes using the iomap buffered I/O path do not use the + * ordered data mode, it is necessary to write out zeroed data + * before the updating i_disksize transaction is committed. + */ + if (zero_len > 0 && ext4_inode_buffered_iomap(inode)) { + err =3D filemap_write_and_wait_range(mapping, + inode->i_size, + inode->i_size + zero_len - 1); + if (err) + return err; + } } =20 if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26DD9387593; Tue, 3 Feb 2026 06:30:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100224; cv=none; b=mI/IKSdR8NkasQ3jUbcJ2TKDqGtzpZlootz5RdvOZQhaBOhdgAAWjSu6u3BX1C+a11dYLiEIP3Z9Sk9UL12DZujoc5WIZC3PvGHyQqgcTc5PtfP6oLvpfO6l8DSrJrf0F//l05Oow4xe/+wM2GtQagIKLmTZ8G5zxlSRQ/XBYaE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100224; c=relaxed/simple; bh=81fBNYVwZyhqpSP9hm1l5AESEUnNUjgm5eH549AHHeA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Mv1bWYYeO/6ELeg+oqMQTv+1y379ibAbulMLHYUAR1W5ytPsAqI0BETy3l9Dlxe19d7PYjfVmxw76T/iSPm6LjObDX8E3I8bFAxOZr7o6wgG89qaPIc+H5AMCrByM7pmhEfn4kpOeRElTbjTWxXNk0Eyit1xzOifL+LosFb71Uk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4f4trK6GNxzKHMck; Tue, 3 Feb 2026 14:29:53 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 333EC4058C; Tue, 3 Feb 2026 14:30:15 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S22; Tue, 03 Feb 2026 14:30:14 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 18/22] ext4: do not order data for inodes using buffered iomap path Date: Tue, 3 Feb 2026 14:25:18 +0800 Message-ID: <20260203062523.3869120-19-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S22 X-Coremail-Antispam: 1UD129KBjvdXoWrKF4kJw45uryrtF1rKFWrGrg_yoWkJwc_XF yv9r4jq34avFn7uFWrGF15JasFyr48WryruFykKw4F9ryDtrsrZr1DAwsrAryDWF40kFnx Cr48Wr4rCw1xXjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbL8YFVCjjxCrM7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20E Y4v20xvaj40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s 0DM28IrcIa0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2 WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkE bVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjsIEF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI 8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI7VAKI48JMxAI w28IcVAKzI0EY4vE52x082I5MxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI 0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y 0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWx JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7IU1-zst UUUUU== Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" In the iomap buffered I/O path, we always allocate unwritten blocks when doing append write, and flush data when doing partial block truncate down. Therefore, there is no risk of exposing stale data, disable ordered data mode for the iomap buffered I/O path. Signed-off-by: Zhang Yi --- fs/ext4/ext4_jbd2.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 63d17c5201b5..7061c7188053 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -383,7 +383,12 @@ static inline int ext4_should_journal_data(struct inod= e *inode) =20 static inline int ext4_should_order_data(struct inode *inode) { - return ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE; + /* + * inodes using the iomap buffered I/O path do not use the + * ordered data mode. + */ + return !ext4_inode_buffered_iomap(inode) && + (ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE); } =20 static inline int ext4_should_writeback_data(struct inode *inode) --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46A7A38B7D4; Tue, 3 Feb 2026 06:30:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100227; cv=none; b=UGzpqxdPYG9/eh1Z4bJ3PeW2KOjpHtoNQW7I9obMLLVDhBz4yhhrpg5h/HF9X4Ck1o2/KkNmkKceo0NmfecL7WvSh7xIBd2Q15wfohxJjmsEYkiWRTH+1TFdYusPuTQrdKVBaDYHthxt6yabq3GGLmFCtn1bdPgPgZaxdqSpNyQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100227; c=relaxed/simple; bh=RlJROsKsuZ2IUY0rvdbWaYINjizodYjUAPp4dLBZuY4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=p9LJlIxhDN0HOIEJIkRaLHuroUrnjDWdj45wTKsqEsxMvdefiLQ4NktFNsWod8XX15JuxBjFMy6Mtpya8I7qCCYMCHLgrTkgTb0YnM03lh12hjmeT8qKr7K8ocVX2kXy3FcjKExthx30ZhYaqCDtvxE25MI/Fxc96swN3PJtazk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqr2LFMzYQv00; Tue, 3 Feb 2026 14:29:28 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 4319E4056B; Tue, 3 Feb 2026 14:30:15 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S23; Tue, 03 Feb 2026 14:30:15 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 19/22] ext4: add block mapping tracepoints for iomap buffered I/O path Date: Tue, 3 Feb 2026 14:25:19 +0800 Message-ID: <20260203062523.3869120-20-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S23 X-Coremail-Antispam: 1UD129KBjvJXoWxXw4DXw48ur43Zr47KFWDCFg_yoWrJFyxpF yDtFy5GF4rZrsF9w4fWrW3Zr1Fva1xKr4UGry3Wry5JFWxtr42gF4UGFyYyFy5tw4jkryf XF4Yyry8G3W7urDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHqb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7Iv64x0x7Aq67IIx4CEVc8vx2IErcIFxwACI4 02YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0EwIxGrwCF 04k20xvEw4C26cxK6c8Ij28IcwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_ Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07UZyC LUUUUU= Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Add tracepoints for iomap buffered read, write, partial block zeroing and writeback operations. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 6 +++++ include/trace/events/ext4.h | 45 +++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c59f3adba0f3..77dcca584153 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3956,6 +3956,8 @@ static int ext4_iomap_buffered_read_begin(struct inod= e *inode, loff_t offset, if (ret < 0) return ret; =20 + trace_ext4_iomap_buffered_read_begin(inode, &map, offset, length, + flags); ext4_set_iomap(inode, iomap, &map, offset, length, flags); return 0; } @@ -4040,6 +4042,8 @@ static int ext4_iomap_buffered_do_write_begin(struct = inode *inode, if (ret < 0) return ret; =20 + trace_ext4_iomap_buffered_write_begin(inode, &map, offset, length, + flags); ext4_set_iomap(inode, iomap, &map, offset, length, flags); return 0; } @@ -4142,6 +4146,7 @@ static int ext4_iomap_zero_begin(struct inode *inode, map.m_len =3D (offset >> blkbits) - map.m_lblk; } =20 + trace_ext4_iomap_zero_begin(inode, &map, offset, length, flags); ext4_set_iomap(inode, iomap, &map, offset, length, flags); iomap->flags |=3D iomap_flags; =20 @@ -4319,6 +4324,7 @@ static int ext4_iomap_map_writeback_range(struct ioma= p_writepage_ctx *wpc, } out: ewpc->data_seq =3D map.m_seq; + trace_ext4_iomap_map_writeback_range(inode, &map, offset, dirty_len, 0); ext4_set_iomap(inode, &wpc->iomap, &map, offset, dirty_len, 0); return 0; } diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index a3e8fe414df8..1922df4190e7 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -3096,6 +3096,51 @@ TRACE_EVENT(ext4_move_extent_exit, __entry->ret) ); =20 +DECLARE_EVENT_CLASS(ext4_set_iomap_class, + TP_PROTO(struct inode *inode, struct ext4_map_blocks *map, + loff_t offset, loff_t length, unsigned int flags), + TP_ARGS(inode, map, offset, length, flags), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(ino_t, ino) + __field(ext4_lblk_t, m_lblk) + __field(unsigned int, m_len) + __field(unsigned int, m_flags) + __field(u64, m_seq) + __field(loff_t, offset) + __field(loff_t, length) + __field(unsigned int, iomap_flags) + ), + TP_fast_assign( + __entry->dev =3D inode->i_sb->s_dev; + __entry->ino =3D inode->i_ino; + __entry->m_lblk =3D map->m_lblk; + __entry->m_len =3D map->m_len; + __entry->m_flags =3D map->m_flags; + __entry->m_seq =3D map->m_seq; + __entry->offset =3D offset; + __entry->length =3D length; + __entry->iomap_flags =3D flags; + + ), + TP_printk("dev %d:%d ino %lu m_lblk %u m_len %u m_flags %s m_seq %llu ori= g_off 0x%llx orig_len 0x%llx iomap_flags 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, __entry->m_lblk, __entry->m_len, + show_mflags(__entry->m_flags), __entry->m_seq, + __entry->offset, __entry->length, __entry->iomap_flags) +) + +#define DEFINE_SET_IOMAP_EVENT(name) \ +DEFINE_EVENT(ext4_set_iomap_class, name, \ + TP_PROTO(struct inode *inode, struct ext4_map_blocks *map, \ + loff_t offset, loff_t length, unsigned int flags), \ + TP_ARGS(inode, map, offset, length, flags)) + +DEFINE_SET_IOMAP_EVENT(ext4_iomap_buffered_read_begin); +DEFINE_SET_IOMAP_EVENT(ext4_iomap_buffered_write_begin); +DEFINE_SET_IOMAP_EVENT(ext4_iomap_map_writeback_range); +DEFINE_SET_IOMAP_EVENT(ext4_iomap_zero_begin); + #endif /* _TRACE_EXT4_H */ =20 /* This part must be outside protection */ --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 102EC388876; Tue, 3 Feb 2026 06:30:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100225; cv=none; b=UJz1/mxWwdUe/9/IkIrtABJoPpQBkNwMVuVa2uUwaLlo+bRiqwUKaxiuA8XriR52m4QOeoJdjArv5azJeIO+nAPCKW8iM5mvrmD+KsZRgURROjAjaWh62bo54yA+0D9iuJlt7ljjcATnrXddMH5T6oolnNOVaNI2Bow4in+U0Mw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100225; c=relaxed/simple; bh=YWnzHBvysTerHb/MHMNVNu68FQOvOuAK2d9mJq7gYTg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=W/dtus0OCwyQBEy7991ekKPDtPfh6vjQshE1gmECWjlAhf+2YzLYWU/6hidGttyhr8EOkdhGP5q/+jjZ0HPt0sxF9p2G+SURTrtV80VGXMN4zLtYNrAqh6KPUrLwF3gmk6vryMEj5dkki9VDQ/+uZiHXCuRqzrw2HVRSnnVyux8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqr2wb4zYQv00; Tue, 3 Feb 2026 14:29:28 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 561D940579; Tue, 3 Feb 2026 14:30:15 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S24; Tue, 03 Feb 2026 14:30:15 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 20/22] ext4: disable online defrag when inode using iomap buffered I/O path Date: Tue, 3 Feb 2026 14:25:20 +0800 Message-ID: <20260203062523.3869120-21-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S24 X-Coremail-Antispam: 1UD129KBjvdXoW7Jw47Gw4Uur13GryUXry5urg_yoWkGrX_ta 97JrWkWr1YyFsaka98Jas8trnYkF48WFn5WFZ5Kr18uw17Z398Gr4DCry2yr98Wr1UXrZ8 Arn7Jr1rKF12gjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUblAYFVCjjxCrM7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20E Y4v20xvaj40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s 0DM28IrcIa0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26F4UJVW0owA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AK xVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ew Av7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY 6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0aVACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxc IEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l42xK 82IY64kExVAvwVAq07x20xyl4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxV WUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI 7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Xr0_Ar1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r 4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4U MIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjxUFPETDU UUU Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Online defragmentation does not currently support inodes that using iomap buffered I/O path, as it still relies on buffer_head for the management of sub-folio blocks. Signed-off-by: Zhang Yi --- fs/ext4/move_extent.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c index ce1f738dff93..fd8dabdfd962 100644 --- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -476,6 +476,17 @@ static int mext_check_validity(struct inode *orig_inod= e, return -EOPNOTSUPP; } =20 + /* + * TODO: support online defrag for inodes that using the buffered + * I/O iomap path. + */ + if (ext4_inode_buffered_iomap(orig_inode) || + ext4_inode_buffered_iomap(donor_inode)) { + ext4_msg(sb, KERN_ERR, + "Online defrag not supported for inode with iomap buffered IO path"); + return -EOPNOTSUPP; + } + if (donor_inode->i_mode & (S_ISUID|S_ISGID)) { ext4_debug("ext4 move extent: suid or sgid is set to donor file [ino:ori= g %lu, donor %lu]\n", orig_inode->i_ino, donor_inode->i_ino); --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FEA8388873; Tue, 3 Feb 2026 06:30:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100225; cv=none; b=fr5l4DJWF5ArtruslxlfvZEjQXzLt40Yt+lDvEHvcUw6Db2hMSgMD6fQLfD6m9pnrimvetcUsd3Eg7RnaUMnxTs1sfdAMY23WfkzCaaoKXy6+2DIv7+EjEB0PxACSK0ApOiwCMwSHDGxC9xwif346i6TL7Raqb1bzB68Bnl46Z8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100225; c=relaxed/simple; bh=YCpvPDEa2v8A4k0A4a3t6qIctYRGO4EpHlhJMUoxfiY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HevkDG2tZJJrTdNGz3f1EONbM+pY0w+/hGIbUadbaf8kL841olxwXREaIHs3OJxDmy0Ed21R68Oio1+b14drGtHB3xR5O405VqtkM4a/6pQkH86yHWRuhrc2JaqYRZ7Cl74RN5vZY8kyhzxsuVV74OpKz4c+G5z6bqDlcEeFui0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f4tqr3dMzzYQv0D; Tue, 3 Feb 2026 14:29:28 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 712C040570; Tue, 3 Feb 2026 14:30:15 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S25; Tue, 03 Feb 2026 14:30:15 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 21/22] ext4: partially enable iomap for the buffered I/O path of regular files Date: Tue, 3 Feb 2026 14:25:21 +0800 Message-ID: <20260203062523.3869120-22-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S25 X-Coremail-Antispam: 1UD129KBjvJXoWxAryUXw4fuF1fCF18AFyrJFb_yoWrur4Upr 9xKryrGw4DXas29w4ftr4UZr1Yv3WxG3yUW3yS9rs8ZFyDJw1IqF1UtF1rAF15JrWrWw4a qF40kr1UursxCrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHSb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVWxJr0_GcWl84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2 WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkE bVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjsIEF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI 8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI7VAKI48JMxAI w28IcVAKzI0EY4vE52x082I5MxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI 0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y 0x0EwIxGrwCI42IY6xIIjxv20xvE14v26ryj6F1UMIIF0xvE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWx JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7IU1-zst UUUUU== Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Partially enable iomap for the buffered I/O path of regular files. We now support default filesystem features, mount options, and the bigalloc feature. However, inline data, fs_verity, fs_crypt, online defragmentation, and data=3Djournal mode are not yet supported. Some of these features are expected to be gradually supported in the future. The filesystem will automatically fall back to the original buffered_head path if these mount options or features are enabled. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 1 + fs/ext4/ext4_jbd2.c | 1 + fs/ext4/ialloc.c | 1 + fs/ext4/inode.c | 36 ++++++++++++++++++++++++++++++++++++ 4 files changed, 39 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 520f6d5dcdab..259c6e780e65 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3064,6 +3064,7 @@ int ext4_walk_page_buffers(handle_t *handle, int do_journal_get_write_access(handle_t *handle, struct inode *inode, struct buffer_head *bh); void ext4_set_inode_mapping_order(struct inode *inode); +void ext4_enable_buffered_iomap(struct inode *inode); int ext4_nonda_switch(struct super_block *sb); #define FALL_BACK_TO_NONDELALLOC 1 #define CONVERT_INLINE_DATA 2 diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 05e5946ed9b3..f587bfbe8423 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -16,6 +16,7 @@ int ext4_inode_journal_mode(struct inode *inode) ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) || test_opt(inode->i_sb, DATA_FLAGS) =3D=3D EXT4_MOUNT_JOURNAL_DATA || (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) && + !ext4_inode_buffered_iomap(inode) && !test_opt(inode->i_sb, DELALLOC))) { /* We do not support data journalling for encrypted data */ if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode)) diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index b20a1bf866ab..dfa6f60f67b3 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1334,6 +1334,7 @@ struct inode *__ext4_new_inode(struct mnt_idmap *idma= p, } } =20 + ext4_enable_buffered_iomap(inode); ext4_set_inode_mapping_order(inode); =20 ext4_update_inode_fsync_trans(handle, inode, 1); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 77dcca584153..bbdd0bb3bc8b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -903,6 +903,9 @@ static int _ext4_get_block(struct inode *inode, sector_= t iblock, =20 if (ext4_has_inline_data(inode)) return -ERANGE; + /* inodes using the iomap buffered I/O path should not go here. */ + if (WARN_ON_ONCE(ext4_inode_buffered_iomap(inode))) + return -EINVAL; =20 map.m_lblk =3D iblock; map.m_len =3D bh->b_size >> inode->i_blkbits; @@ -2771,6 +2774,12 @@ static int ext4_do_writepages(struct mpage_da_data *= mpd) if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) goto out_writepages; =20 + /* inodes using the iomap buffered I/O path should not go here. */ + if (WARN_ON_ONCE(ext4_inode_buffered_iomap(inode))) { + ret =3D -EINVAL; + goto out_writepages; + } + /* * If the filesystem has aborted, it is read-only, so return * right away instead of dumping stack traces later on that @@ -5730,6 +5739,31 @@ static int check_igot_inode(struct inode *inode, ext= 4_iget_flags flags, return -EFSCORRUPTED; } =20 +void ext4_enable_buffered_iomap(struct inode *inode) +{ + struct super_block *sb =3D inode->i_sb; + + if (!S_ISREG(inode->i_mode)) + return; + if (ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE)) + return; + + /* Unsupported Features */ + if (ext4_has_feature_inline_data(sb)) + return; + if (ext4_has_feature_verity(sb)) + return; + if (ext4_has_feature_encrypt(sb)) + return; + if (test_opt(sb, DATA_FLAGS) =3D=3D EXT4_MOUNT_JOURNAL_DATA || + ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA)) + return; + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) + return; + + ext4_set_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP); +} + void ext4_set_inode_mapping_order(struct inode *inode) { struct super_block *sb =3D inode->i_sb; @@ -6015,6 +6049,8 @@ struct inode *__ext4_iget(struct super_block *sb, uns= igned long ino, if (ret) goto bad_inode; =20 + ext4_enable_buffered_iomap(inode); + if (S_ISREG(inode->i_mode)) { inode->i_op =3D &ext4_file_inode_operations; inode->i_fop =3D &ext4_file_operations; --=20 2.52.0 From nobody Sun Feb 8 10:32:56 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3AA8F38759A; Tue, 3 Feb 2026 06:30:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100226; cv=none; b=X2TjtdL1uPrzb0vjs4cR2ItOfNleDMNbuVgkF0o6gYEYuJb2yTsGAg7syvu+kSFFXTAJVpTcr9ORmzwswp+u87v8LGX2YugfAPW+wTGZ5LikydMQNekGwREWOXoBpLo8viaEA74h/NSMtPItdNQA+xIrMzAIqBIZMmTnEUhsjds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770100226; c=relaxed/simple; bh=7aGjj2yEAzOMByCYB7Wj1LVCh4ag9n3hHscskJsSD24=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=syZ8TkxvAX1tLEOYn7inXTAF1Nz3zhC24MBGN+ZJYw/fokIRnsTtMLf6foR3K3zm9I/DKsKrRCK+t062sWrYu+N8VImYzg8sQyzUjCc4gh4LJuzQUKzxLBk37BJax8PcuP+qdrh8tPJZprqEnQpBoutVsRiyqu0E+z3sGqWuLSw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4f4trL1M8CzKHMd3; Tue, 3 Feb 2026 14:29:54 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 808A74058C; Tue, 3 Feb 2026 14:30:15 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.85.155]) by APP4 (Coremail) with SMTP id gCh0CgAHaPjnlYFpiadbGA--.27803S26; Tue, 03 Feb 2026 14:30:15 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, yizhang089@gmail.com, libaokun1@huawei.com, yangerkun@huawei.com, yukuai@fnnas.com Subject: [PATCH -next v2 22/22] ext4: introduce a mount option for iomap buffered I/O path Date: Tue, 3 Feb 2026 14:25:22 +0800 Message-ID: <20260203062523.3869120-23-yi.zhang@huawei.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260203062523.3869120-1-yi.zhang@huawei.com> References: <20260203062523.3869120-1-yi.zhang@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAHaPjnlYFpiadbGA--.27803S26 X-Coremail-Antispam: 1UD129KBjvJXoWxAFWfCF13Gw1fGrWkXF4Dtwb_yoW5AFykpr 909FyrGw1DXr9Y9w48Cr4rJr1Yy3Z0ka1UurZ0grsrWFZrAryxXFyfKF1rCF4aqrW8X34I qF1rWw17WF43CrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHSb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVWxJr0_GcWl84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2 WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkE bVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjsIEF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI 8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI7VAKI48JMxAI w28IcVAKzI0EY4vE52x082I5MxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI 0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y 0x0EwIxGrwCI42IY6xIIjxv20xvE14v26ryj6F1UMIIF0xvE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWx JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7IU1-zst UUUUU== Sender: yi.zhang@huaweicloud.com X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" Since the iomap buffered I/O path does not yet support all existing features, it cannot be used by default. Introduce 'buffered_iomap' and 'nobuffered_iomap' mount options to enable and disable the iomap buffered I/O path for regular files. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 1 + fs/ext4/inode.c | 2 ++ fs/ext4/super.c | 7 +++++++ 3 files changed, 10 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 259c6e780e65..4e209c14dab9 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1288,6 +1288,7 @@ struct ext4_inode_info { * scanning in mballoc */ #define EXT4_MOUNT2_ABORT 0x00000100 /* Abort filesystem */ +#define EXT4_MOUNT2_BUFFERED_IOMAP 0x00000200 /* Use iomap for buffered I/= O */ =20 #define clear_opt(sb, opt) EXT4_SB(sb)->s_mount_opt &=3D \ ~EXT4_MOUNT_##opt diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bbdd0bb3bc8b..a3d7c98309bb 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5743,6 +5743,8 @@ void ext4_enable_buffered_iomap(struct inode *inode) { struct super_block *sb =3D inode->i_sb; =20 + if (!test_opt2(sb, BUFFERED_IOMAP)) + return; if (!S_ISREG(inode->i_mode)) return; if (ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE)) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 4bb77703ffe1..d967792c7cb1 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1701,6 +1701,7 @@ enum { Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable, Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache, Opt_no_prefetch_block_bitmaps, Opt_mb_optimize_scan, + Opt_buffered_iomap, Opt_nobuffered_iomap, Opt_errors, Opt_data, Opt_data_err, Opt_jqfmt, Opt_dax_type, #ifdef CONFIG_EXT4_DEBUG Opt_fc_debug_max_replay, Opt_fc_debug_force @@ -1839,6 +1840,8 @@ static const struct fs_parameter_spec ext4_param_spec= s[] =3D { fsparam_flag ("no_prefetch_block_bitmaps", Opt_no_prefetch_block_bitmaps), fsparam_s32 ("mb_optimize_scan", Opt_mb_optimize_scan), + fsparam_flag ("buffered_iomap", Opt_buffered_iomap), + fsparam_flag ("nobuffered_iomap", Opt_nobuffered_iomap), fsparam_string ("check", Opt_removed), /* mount option from ext2/3 */ fsparam_flag ("nocheck", Opt_removed), /* mount option from ext2/3 */ fsparam_flag ("reservation", Opt_removed), /* mount option from ext2/3 */ @@ -1932,6 +1935,10 @@ static const struct mount_opts { {Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET}, {Opt_no_prefetch_block_bitmaps, EXT4_MOUNT_NO_PREFETCH_BLOCK_BITMAPS, MOPT_SET}, + {Opt_buffered_iomap, EXT4_MOUNT2_BUFFERED_IOMAP, + MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY}, + {Opt_nobuffered_iomap, EXT4_MOUNT2_BUFFERED_IOMAP, + MOPT_CLEAR | MOPT_2 | MOPT_EXT4_ONLY}, #ifdef CONFIG_EXT4_DEBUG {Opt_fc_debug_force, EXT4_MOUNT2_JOURNAL_FAST_COMMIT, MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY}, --=20 2.52.0