From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61F1D13E03E; Tue, 22 Oct 2024 03:12:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566780; cv=none; b=dhEHvFWJ07qZFmQdLc8H38eysb9dNezr0icw2O01lRgjDSQ0M4DKqkpunrq4KLF/zELYsFb/1+bfv4yYe1MD/s32tkNLpdOfhLPiWn34E/b1FCpUKQ5rKuvHpL39bNog1pZecigPLSebOo4crJcuu13pNnVSqUBsvWo9jYNlB4s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566780; c=relaxed/simple; bh=Un7uRmLmnNBLObbxKEcGdntaorjVplO34POchkhMyuE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e/mBwx0ddok5DlagbdJlg/uCL4qd6vkf0+NJDuPPWb3VGDWNLR+nP9rvzjdiotMyVH9hD6WOQYSVsMhUhTfBiokdW4DBtt+btRTHbIJ7T4RnCpEBVfoWEOak682A6G23DS+H3MuQIDKljGpwpnHtT4qvhQXXI7k+OVjtb3NvxXk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcg34f5Lz4f3lW3; Tue, 22 Oct 2024 11:12:31 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id CCB881A0359; Tue, 22 Oct 2024 11:12:49 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S5; Tue, 22 Oct 2024 11:12:49 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 01/27] ext4: remove writable userspace mappings before truncating page cache Date: Tue, 22 Oct 2024 19:10:32 +0800 Message-ID: <20241022111059.2566137-2-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S5 X-Coremail-Antispam: 1UD129KBjvJXoWxXFyrJF43Xw18CFy5Gry5XFb_yoWrAF1kpr 9rGFyfCrWrZasrWa1Sg3WUZw1rK3WkCF4UJ34fGr1UXFyrX3WkKF1Dtw1UAF4UKrW8Jw4j vF45trWjgF45A3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_Jr4l82xGYIkIc2x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2 F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjx v20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E 87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64 kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm 72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYx C7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI7VAK I48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7 xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY6xII jxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAIw2 0EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x02 67AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7sR_Q6LtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi When zeroing a range of folios on the filesystem which block size is less than the page size, the file's mapped partial blocks within one page will be marked as unwritten, we should remove writable userspace mappings to ensure that ext4_page_mkwrite() can be called during subsequent write access to these folios. Otherwise, data written by subsequent mmap writes may not be saved to disk. $mkfs.ext4 -b 1024 /dev/vdb $mount /dev/vdb /mnt $xfs_io -t -f -c "pwrite -S 0x58 0 4096" -c "mmap -rw 0 4096" \ -c "mwrite -S 0x5a 2048 2048" -c "fzero 2048 2048" \ -c "mwrite -S 0x59 2048 2048" -c "close" /mnt/foo $od -Ax -t x1z /mnt/foo 000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 * 000800 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 * 001000 $umount /mnt && mount /dev/vdb /mnt $od -Ax -t x1z /mnt/foo 000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 * 000800 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 001000 Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 2 ++ fs/ext4/extents.c | 1 + fs/ext4/inode.c | 41 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 44 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 44b0d418143c..6d0267afd4c1 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3020,6 +3020,8 @@ extern int ext4_inode_attach_jinode(struct inode *ino= de); extern int ext4_can_truncate(struct inode *inode); extern int ext4_truncate(struct inode *); extern int ext4_break_layouts(struct inode *); +extern void ext4_truncate_folios_range(struct inode *inode, loff_t start, + loff_t end); extern int ext4_punch_hole(struct file *file, loff_t offset, loff_t length= ); extern void ext4_set_inode_flags(struct inode *, bool init); extern int ext4_alloc_da_blocks(struct inode *inode); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 34e25eee6521..2a054c3689f0 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4677,6 +4677,7 @@ static long ext4_zero_range(struct file *file, loff_t= offset, } =20 /* Now release the pages and zero block aligned part of pages */ + ext4_truncate_folios_range(inode, start, end); truncate_pagecache_range(inode, start, end - 1); inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); =20 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 54bdd4884fe6..8b34e79112d5 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -3870,6 +3871,46 @@ int ext4_update_disksize_before_punch(struct inode *= inode, loff_t offset, return ret; } =20 +static inline void ext4_truncate_folio(struct inode *inode, + loff_t start, loff_t end) +{ + unsigned long blocksize =3D i_blocksize(inode); + struct folio *folio; + + if (round_up(start, blocksize) >=3D round_down(end, blocksize)) + return; + + folio =3D filemap_lock_folio(inode->i_mapping, start >> PAGE_SHIFT); + if (IS_ERR(folio)) + return; + + if (folio_mkclean(folio)) + folio_mark_dirty(folio); + folio_unlock(folio); + folio_put(folio); +} + +/* + * When truncating a range of folios, if the block size is less than the + * page size, the file's mapped partial blocks within one page could be + * freed or converted to unwritten. We should call this function to remove + * writable userspace mappings so that ext4_page_mkwrite() can be called + * during subsequent write access to these folios. + */ +void ext4_truncate_folios_range(struct inode *inode, loff_t start, loff_t = end) +{ + unsigned long blocksize =3D i_blocksize(inode); + + if (end > inode->i_size) + end =3D inode->i_size; + if (start >=3D end || blocksize >=3D PAGE_SIZE) + return; + + ext4_truncate_folio(inode, start, min(round_up(start, PAGE_SIZE), end)); + if (end > round_up(start, PAGE_SIZE)) + ext4_truncate_folio(inode, round_down(end, PAGE_SIZE), end); +} + static void ext4_wait_dax_page(struct inode *inode) { filemap_invalidate_unlock(inode->i_mapping); --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7D64450E2; Tue, 22 Oct 2024 03:12:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566776; cv=none; b=lV/TBFX4G+ccfgZxBbjWWkcED+IGkwvRgAGjG32jEZMEWtilVjpuqgHVGbDCr04s8rGijtzIAkF1yDzVgF5Fxe5SKK1lo3TWzktd79kES9mR+UYFR+Pv9NucNnMBRF7DT47LcGiYjYFB5vwt8oAo/O0cgU/ILzqJB/IOvnDUKWU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566776; c=relaxed/simple; bh=eTE9FdAXMuVd28LY2UqYUhldzFq3CJH4Ou5K6L/lr0o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=p1rrlLvVb5tsALTTOWBXDT5mCLJSHBDB04yBgfPTyRGhhT+KyrNhVEKNNgTbn/qDfaM7xI4y71tKuUaThhOsPRdYklsZPb0zkKfks6um0Nhxy3tg3eierw4W5FLdw6SianV3zAuuC1xK+qJ1qlcVNgjmrfvcJ7dDotWyo+z2QgU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4XXcg50frNz4f3jMX; Tue, 22 Oct 2024 11:12:33 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 732F71A058E; Tue, 22 Oct 2024 11:12:50 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S6; Tue, 22 Oct 2024 11:12:50 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 02/27] ext4: don't explicit update times in ext4_fallocate() Date: Tue, 22 Oct 2024 19:10:33 +0800 Message-ID: <20241022111059.2566137-3-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S6 X-Coremail-Antispam: 1UD129KBjvJXoWxZFyDZr13KFy8urW5KF48JFb_yoW5GFyrpr WrGa4rG3W8WFyq9rWFkr4UZrs7K3W7Gr48XrZ5u34xua4DJwnYgF1jyFySyF4rtrW8Zr4j vFyUKw1UWw1Uu37anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQI14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_Jryl82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2 F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjx v20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E 87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64 kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm 72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYx C7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI7VAK I48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7 xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY6xII jxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAIw2 0EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x02 67AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7sRRBT53UUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi After commit 'ad5cd4f4ee4d ("ext4: fix fallocate to use file_modified to update permissions consistently"), we can update mtime and ctime appropriately through file_modified() when doing zero range, collapse rage, insert range and punch hole, hence there is no need to explicit update times in those paths, just drop them. Signed-off-by: Zhang Yi Reviewed-by: Jan Kara --- fs/ext4/extents.c | 4 ---- fs/ext4/inode.c | 1 - 2 files changed, 5 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 2a054c3689f0..aa07b5ddaff8 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4679,7 +4679,6 @@ static long ext4_zero_range(struct file *file, loff_t= offset, /* Now release the pages and zero block aligned part of pages */ ext4_truncate_folios_range(inode, start, end); truncate_pagecache_range(inode, start, end - 1); - inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); =20 ret =3D ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, flags); @@ -4704,7 +4703,6 @@ static long ext4_zero_range(struct file *file, loff_t= offset, goto out_mutex; } =20 - inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); if (new_size) ext4_update_inode_size(inode, new_size); ret =3D ext4_mark_inode_dirty(handle, inode); @@ -5440,7 +5438,6 @@ static int ext4_collapse_range(struct file *file, lof= f_t offset, loff_t len) up_write(&EXT4_I(inode)->i_data_sem); if (IS_SYNC(inode)) ext4_handle_sync(handle); - inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); ret =3D ext4_mark_inode_dirty(handle, inode); ext4_update_inode_fsync_trans(handle, inode, 1); =20 @@ -5550,7 +5547,6 @@ static int ext4_insert_range(struct file *file, loff_= t offset, loff_t len) /* Expand file to avoid data loss if there is error while shifting */ inode->i_size +=3D len; EXT4_I(inode)->i_disksize +=3D len; - inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); ret =3D ext4_mark_inode_dirty(handle, inode); if (ret) goto out_stop; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 8b34e79112d5..f8796f7b0f94 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4085,7 +4085,6 @@ int ext4_punch_hole(struct file *file, loff_t offset,= loff_t length) if (IS_SYNC(inode)) ext4_handle_sync(handle); =20 - inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); ret2 =3D ext4_mark_inode_dirty(handle, inode); if (unlikely(ret2)) ret =3D ret2; --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0C3C13E04C; Tue, 22 Oct 2024 03:12:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566782; cv=none; b=dAmrxXxmz+1oQpXj/7EfIqSCChOc0TG7E9Lh7Sf6mU0SaOnWAmu53S3hWwGagE/PtZHbVG0btBTDEhAx0gVXdwy0zVj0BU3puJaK+0ZUF/8cS+KJ8UvYjNAx8q+B9pDGn04zqBfPdcosMJy+8Kr8QJ5sOHBULSmFhAt9jJZvo2w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566782; c=relaxed/simple; bh=6M3qTW3UtnW4iYeVI9qa0bpqFTev2GsxqOQy2F7zyjU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=H64IL7e+a83SW/AZclgxdhALX4AIbzDkrQuEt6MZ+yzB80uVG7ROVCkuSoB9ssKARZWiSW8nWPL4eeIpnSyyJeSgjKAxLfzK1P9KGtyRznsdsG9i2sXnaewGiB6odWBtMPODYtjBPnh7FpFkw09BfXg+XnjWUIFsIiWc/F3VBWs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgB64frz4f3jkq; Tue, 22 Oct 2024 11:12:38 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 2020B1A07B6; Tue, 22 Oct 2024 11:12:51 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S7; Tue, 22 Oct 2024 11:12:50 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 03/27] ext4: don't write back data before punch hole in nojournal mode Date: Tue, 22 Oct 2024 19:10:34 +0800 Message-ID: <20241022111059.2566137-4-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S7 X-Coremail-Antispam: 1UD129KBjvJXoW7WF1DWr4UAry8Kr1rCF13twb_yoW8Zry5pr 9Ik3yrtF48WFZFkr4SqFsrXF1rKaykGrW8WFyxG3s7Way5Aw1xKF4q9F1rGayUt39xArWj qF40qa4xWFyUAFDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQa14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JrWl82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2 F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjx v20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E 87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64 kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm 72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYx C7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI7VAK I48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7 xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY6xII jxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWxJVW8Jr1lIxAIcVCF04 k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7Cj xVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRMKZLUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi There is no need to write back all data before punching a hole in data=3Dordered|writeback mode since it will be dropped soon after removing space, so just remove the filemap_write_and_wait_range() in these modes. However, in data=3Djournal mode, we need to write dirty pages out before discarding page cache in case of crash before committing the freeing data transaction, which could expose old, stale data. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f8796f7b0f94..94b923afcd9c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3965,17 +3965,6 @@ int ext4_punch_hole(struct file *file, loff_t offset= , loff_t length) =20 trace_ext4_punch_hole(inode, offset, length, 0); =20 - /* - * Write out all dirty pages to avoid race conditions - * Then release them. - */ - if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { - ret =3D filemap_write_and_wait_range(mapping, offset, - offset + length - 1); - if (ret) - return ret; - } - inode_lock(inode); =20 /* No need to punch hole beyond i_size */ @@ -4037,6 +4026,21 @@ int ext4_punch_hole(struct file *file, loff_t offset= , loff_t length) ret =3D ext4_update_disksize_before_punch(inode, offset, length); if (ret) goto out_dio; + + /* + * For journalled data we need to write (and checkpoint) pages + * before discarding page cache to avoid inconsitent data on + * disk in case of crash before punching trans is committed. + */ + if (ext4_should_journal_data(inode)) { + ret =3D filemap_write_and_wait_range(mapping, + first_block_offset, last_block_offset); + if (ret) + goto out_dio; + } + + ext4_truncate_folios_range(inode, first_block_offset, + last_block_offset + 1); truncate_pagecache_range(inode, first_block_offset, last_block_offset); } --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B369D146A63; Tue, 22 Oct 2024 03:13:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566782; cv=none; b=tEq+r8OQciRoPwxazhWPircBofwzgBD/idqG36o0AfBCiOP7HXerDMhTOKZBjivr7CFnwZQcmM8awplj8L4AIcP6IIDpdwhVK1A58ksARMNFfgqEfh4aqM11C0Jso2G0Ec10K1IPRHIjWrRGoN4VTUGoVS87eYjX0U3sbq3XPe0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566782; c=relaxed/simple; bh=9OoTY+6/DPLRRyXqggoqwvpTYNignh7uw7u+WydgWiA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=C8jB1ZXpIdgMvErPAbKHQGF9DuagwgAdbM3FDCOE5rvNs0psENo0NG+TmZfpCgvzS5YbCXUXhNvjx7DFNVNk3RhPqJuePl0YjwbG5dyEWb4wm97miUWTu1I/zsw6pR8HKOG9ATFF7QBmPB5r25Oao7l9QW1o44uk88s4tAYwSoU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcg54Jh4z4f3lVc; Tue, 22 Oct 2024 11:12:33 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id C0CBC1A0359; Tue, 22 Oct 2024 11:12:51 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S8; Tue, 22 Oct 2024 11:12:51 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 04/27] ext4: refactor ext4_punch_hole() Date: Tue, 22 Oct 2024 19:10:35 +0800 Message-ID: <20241022111059.2566137-5-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S8 X-Coremail-Antispam: 1UD129KBjvJXoW3GF48tFWrGF1fAF17GFWfKrg_yoW3GFy7p3 y3Ary5Kr48WFyq9F4Iqr4DXF1Ik3WDKrWUWryxGr1fW34qyw1IgFs0kF1Fgay5trZrZw4j vF45t347W34UCFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQv14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWxJVW8Jr1lIxAIcV CF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIE c7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRtl1hUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi The current implementation of ext4_punch_hole() contains complex position calculations and stale error tags. To improve the code's clarity and maintainability, it is essential to clean up the code and improve its readability, this can be achieved by: a) simplifying and renaming variables; b) eliminating unnecessary position calculations; c) writing back all data in data=3Djournal mode, and drop page cache from the original offset to the end, rather than using aligned blocks, d) renaming the stale error tags. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 140 +++++++++++++++++++++--------------------------- 1 file changed, 62 insertions(+), 78 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 94b923afcd9c..1d128333bd06 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3955,13 +3955,14 @@ int ext4_punch_hole(struct file *file, loff_t offse= t, loff_t length) { struct inode *inode =3D file_inode(file); struct super_block *sb =3D inode->i_sb; - ext4_lblk_t first_block, stop_block; + ext4_lblk_t start_lblk, end_lblk; struct address_space *mapping =3D inode->i_mapping; - loff_t first_block_offset, last_block_offset, max_length; - struct ext4_sb_info *sbi =3D EXT4_SB(inode->i_sb); + loff_t max_end =3D EXT4_SB(sb)->s_bitmap_maxbytes - sb->s_blocksize; + loff_t end =3D offset + length; + unsigned long blocksize =3D i_blocksize(inode); handle_t *handle; unsigned int credits; - int ret =3D 0, ret2 =3D 0; + int ret =3D 0; =20 trace_ext4_punch_hole(inode, offset, length, 0); =20 @@ -3969,36 +3970,27 @@ int ext4_punch_hole(struct file *file, loff_t offse= t, loff_t length) =20 /* No need to punch hole beyond i_size */ if (offset >=3D inode->i_size) - goto out_mutex; + goto out; =20 /* - * If the hole extends beyond i_size, set the hole - * to end after the page that contains i_size + * If the hole extends beyond i_size, set the hole to end after + * the page that contains i_size, and also make sure that the hole + * within one block before last range. */ - if (offset + length > inode->i_size) { - length =3D inode->i_size + - PAGE_SIZE - (inode->i_size & (PAGE_SIZE - 1)) - - offset; - } + if (end > inode->i_size) + end =3D round_up(inode->i_size, PAGE_SIZE); + if (end > max_end) + end =3D max_end; + length =3D end - offset; =20 /* - * For punch hole the length + offset needs to be within one block - * before last range. Adjust the length if it goes beyond that limit. + * Attach jinode to inode for jbd2 if we do any zeroing of partial + * block. */ - max_length =3D sbi->s_bitmap_maxbytes - inode->i_sb->s_blocksize; - if (offset + length > max_length) - length =3D max_length - offset; - - if (offset & (sb->s_blocksize - 1) || - (offset + length) & (sb->s_blocksize - 1)) { - /* - * Attach jinode to inode for jbd2 if we do any zeroing of - * partial block - */ + if (offset & (blocksize - 1) || end & (blocksize - 1)) { ret =3D ext4_inode_attach_jinode(inode); if (ret < 0) - goto out_mutex; - + goto out; } =20 /* Wait all existing dio workers, newcomers will block on i_rwsem */ @@ -4006,7 +3998,7 @@ int ext4_punch_hole(struct file *file, loff_t offset,= loff_t length) =20 ret =3D file_modified(file); if (ret) - goto out_mutex; + goto out; =20 /* * Prevent page faults from reinstantiating pages we have released from @@ -4016,34 +4008,24 @@ int ext4_punch_hole(struct file *file, loff_t offse= t, loff_t length) =20 ret =3D ext4_break_layouts(inode); if (ret) - goto out_dio; - - first_block_offset =3D round_up(offset, sb->s_blocksize); - last_block_offset =3D round_down((offset + length), sb->s_blocksize) - 1; + goto out_invalidate_lock; =20 - /* Now release the pages and zero block aligned part of pages*/ - if (last_block_offset > first_block_offset) { + /* + * For journalled data we need to write (and checkpoint) pages + * before discarding page cache to avoid inconsitent data on + * disk in case of crash before punching trans is committed. + */ + if (ext4_should_journal_data(inode)) { + ret =3D filemap_write_and_wait_range(mapping, offset, end - 1); + } else { ret =3D ext4_update_disksize_before_punch(inode, offset, length); - if (ret) - goto out_dio; - - /* - * For journalled data we need to write (and checkpoint) pages - * before discarding page cache to avoid inconsitent data on - * disk in case of crash before punching trans is committed. - */ - if (ext4_should_journal_data(inode)) { - ret =3D filemap_write_and_wait_range(mapping, - first_block_offset, last_block_offset); - if (ret) - goto out_dio; - } - - ext4_truncate_folios_range(inode, first_block_offset, - last_block_offset + 1); - truncate_pagecache_range(inode, first_block_offset, - last_block_offset); + ext4_truncate_folios_range(inode, offset, end); } + if (ret) + goto out_invalidate_lock; + + /* Now release the pages and zero block aligned part of pages*/ + truncate_pagecache_range(inode, offset, end - 1); =20 if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) credits =3D ext4_writepage_trans_blocks(inode); @@ -4053,52 +4035,54 @@ int ext4_punch_hole(struct file *file, loff_t offse= t, loff_t length) if (IS_ERR(handle)) { ret =3D PTR_ERR(handle); ext4_std_error(sb, ret); - goto out_dio; + goto out_invalidate_lock; } =20 - ret =3D ext4_zero_partial_blocks(handle, inode, offset, - length); + ret =3D ext4_zero_partial_blocks(handle, inode, offset, length); if (ret) - goto out_stop; - - first_block =3D (offset + sb->s_blocksize - 1) >> - EXT4_BLOCK_SIZE_BITS(sb); - stop_block =3D (offset + length) >> EXT4_BLOCK_SIZE_BITS(sb); + goto out_handle; =20 /* If there are blocks to remove, do it */ - if (stop_block > first_block) { - ext4_lblk_t hole_len =3D stop_block - first_block; + start_lblk =3D round_up(offset, blocksize) >> inode->i_blkbits; + end_lblk =3D end >> inode->i_blkbits; + + if (end_lblk > start_lblk) { + ext4_lblk_t hole_len =3D end_lblk - start_lblk; =20 down_write(&EXT4_I(inode)->i_data_sem); ext4_discard_preallocations(inode); =20 - ext4_es_remove_extent(inode, first_block, hole_len); + ext4_es_remove_extent(inode, start_lblk, hole_len); =20 if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) - ret =3D ext4_ext_remove_space(inode, first_block, - stop_block - 1); + ret =3D ext4_ext_remove_space(inode, start_lblk, + end_lblk - 1); else - ret =3D ext4_ind_remove_space(handle, inode, first_block, - stop_block); + ret =3D ext4_ind_remove_space(handle, inode, start_lblk, + end_lblk); + if (ret) { + up_write(&EXT4_I(inode)->i_data_sem); + goto out_handle; + } =20 - ext4_es_insert_extent(inode, first_block, hole_len, ~0, + ext4_es_insert_extent(inode, start_lblk, hole_len, ~0, EXTENT_STATUS_HOLE, 0); up_write(&EXT4_I(inode)->i_data_sem); } - ext4_fc_track_range(handle, inode, first_block, stop_block); + ext4_fc_track_range(handle, inode, start_lblk, end_lblk); + + ret =3D ext4_mark_inode_dirty(handle, inode); + if (unlikely(ret)) + goto out_handle; + + ext4_update_inode_fsync_trans(handle, inode, 1); if (IS_SYNC(inode)) ext4_handle_sync(handle); - - ret2 =3D ext4_mark_inode_dirty(handle, inode); - if (unlikely(ret2)) - ret =3D ret2; - if (ret >=3D 0) - ext4_update_inode_fsync_trans(handle, inode, 1); -out_stop: +out_handle: ext4_journal_stop(handle); -out_dio: +out_invalidate_lock: filemap_invalidate_unlock(mapping); -out_mutex: +out: inode_unlock(inode); return ret; } --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCA44146A9F; Tue, 22 Oct 2024 03:13:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566783; cv=none; b=V4GopPnZVgwfThWtsJ41ZcEpfmJElKH+EGgrHf76YJw24Y1gbu2v9u5XvRDyyHQSu5jN1KAelfhrwRmZw5VdhuOzXItNDN92aEPKNQPrlv1FJPv2SKDN6eth3fOiOOY1gAFWTVAqv0g8VgauDmbL+dI4IekdN3F3Vox8u7sqRrA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566783; c=relaxed/simple; bh=SL3556zehzIj4Ho5hXlwQ/qb69LOIePNOS7/RN3+B9k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B8vlb/NYadNu1VKUsbO0P2nKD1dEBUTd4Z6g9HSPd5c96Sj5l53VCLaGJWC/wxQFFUMK7aiXsYpUXAAjQtYoLes+ZKkhy23PJmG7SU7MY0rIxQSfACoeJRnlnqLFyZR5q6U4kgGk1BOEvLzLHFzMYNcYCLdW0o38Ye4LbFCx0T8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcg61f68z4f3lW1; Tue, 22 Oct 2024 11:12:34 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 64D5A1A058E; Tue, 22 Oct 2024 11:12:52 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S9; Tue, 22 Oct 2024 11:12:52 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 05/27] ext4: refactor ext4_zero_range() Date: Tue, 22 Oct 2024 19:10:36 +0800 Message-ID: <20241022111059.2566137-6-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S9 X-Coremail-Antispam: 1UD129KBjvJXoW3JFWrGr4DWF1UGr13Jr1fJFb_yoW3GF18pF ZIqrW5Kr48WFyq9r48KFsrZF40k3WkKrWUCFyxWrn5X3srtwn7Kan8KF95WFWIqrZ7Zw4Y vF4Yyry7GFWUWFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQv14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWxJVW8Jr1lIxAIcV CF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIE c7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRtl1hUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi The current implementation of ext4_zero_range() contains complex position calculations and stale error tags. To improve the code's clarity and maintainability, it is essential to clean up the code and improve its readability, this can be achieved by: a) simplifying and renaming variables, making the style the same as ext4_punch_hole(); b) eliminating unnecessary position calculations, writing back all data in data=3Djournal mode, and drop page cache from the original offset to the end, rather than using aligned blocks; c) renaming the stale out_mutex tags. Signed-off-by: Zhang Yi --- fs/ext4/extents.c | 161 +++++++++++++++++++--------------------------- 1 file changed, 65 insertions(+), 96 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index aa07b5ddaff8..f843342e5164 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4565,40 +4565,15 @@ static long ext4_zero_range(struct file *file, loff= _t offset, struct inode *inode =3D file_inode(file); struct address_space *mapping =3D file->f_mapping; handle_t *handle =3D NULL; - unsigned int max_blocks; loff_t new_size =3D 0; - int ret =3D 0; - int flags; - int credits; - int partial_begin, partial_end; - loff_t start, end; - ext4_lblk_t lblk; + loff_t end =3D offset + len; + ext4_lblk_t start_lblk, end_lblk; + unsigned int blocksize =3D i_blocksize(inode); unsigned int blkbits =3D inode->i_blkbits; + int ret, flags, credits; =20 trace_ext4_zero_range(inode, offset, len, mode); =20 - /* - * Round up offset. This is not fallocate, we need to zero out - * blocks, so convert interior block aligned part of the range to - * unwritten and possibly manually zero out unaligned parts of the - * range. Here, start and partial_begin are inclusive, end and - * partial_end are exclusive. - */ - start =3D round_up(offset, 1 << blkbits); - end =3D round_down((offset + len), 1 << blkbits); - - if (start < offset || end > offset + len) - return -EINVAL; - partial_begin =3D offset & ((1 << blkbits) - 1); - partial_end =3D (offset + len) & ((1 << blkbits) - 1); - - lblk =3D start >> blkbits; - max_blocks =3D (end >> blkbits); - if (max_blocks < lblk) - max_blocks =3D 0; - else - max_blocks -=3D lblk; - inode_lock(inode); =20 /* @@ -4606,88 +4581,78 @@ static long ext4_zero_range(struct file *file, loff= _t offset, */ if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { ret =3D -EOPNOTSUPP; - goto out_mutex; + goto out; } =20 if (!(mode & FALLOC_FL_KEEP_SIZE) && - (offset + len > inode->i_size || - offset + len > EXT4_I(inode)->i_disksize)) { - new_size =3D offset + len; + (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) { + new_size =3D end; ret =3D inode_newsize_ok(inode, new_size); if (ret) - goto out_mutex; + goto out; } =20 - flags =3D EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT; - /* Wait all existing dio workers, newcomers will block on i_rwsem */ inode_dio_wait(inode); =20 ret =3D file_modified(file); if (ret) - goto out_mutex; - - /* Preallocate the range including the unaligned edges */ - if (partial_begin || partial_end) { - ret =3D ext4_alloc_file_blocks(file, - round_down(offset, 1 << blkbits) >> blkbits, - (round_up((offset + len), 1 << blkbits) - - round_down(offset, 1 << blkbits)) >> blkbits, - new_size, flags); - if (ret) - goto out_mutex; - - } - - /* Zero range excluding the unaligned edges */ - if (max_blocks > 0) { - flags |=3D (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN | - EXT4_EX_NOCACHE); + goto out; =20 - /* - * Prevent page faults from reinstantiating pages we have - * released from page cache. - */ - filemap_invalidate_lock(mapping); + /* + * Prevent page faults from reinstantiating pages we have released + * from page cache. + */ + filemap_invalidate_lock(mapping); =20 - ret =3D ext4_break_layouts(inode); - if (ret) { - filemap_invalidate_unlock(mapping); - goto out_mutex; - } + ret =3D ext4_break_layouts(inode); + if (ret) + goto out_invalidate_lock; =20 + /* + * For journalled data we need to write (and checkpoint) pages before + * discarding page cache to avoid inconsitent data on disk in case of + * crash before zeroing trans is committed. + */ + if (ext4_should_journal_data(inode)) { + ret =3D filemap_write_and_wait_range(mapping, offset, end - 1); + } else { ret =3D ext4_update_disksize_before_punch(inode, offset, len); - if (ret) { - filemap_invalidate_unlock(mapping); - goto out_mutex; - } + ext4_truncate_folios_range(inode, offset, end); + } + if (ret) + goto out_invalidate_lock; =20 - /* - * For journalled data we need to write (and checkpoint) pages - * before discarding page cache to avoid inconsitent data on - * disk in case of crash before zeroing trans is committed. - */ - if (ext4_should_journal_data(inode)) { - ret =3D filemap_write_and_wait_range(mapping, start, - end - 1); - if (ret) { - filemap_invalidate_unlock(mapping); - goto out_mutex; - } - } + /* Now release the pages and zero block aligned part of pages */ + truncate_pagecache_range(inode, offset, end - 1); =20 - /* Now release the pages and zero block aligned part of pages */ - ext4_truncate_folios_range(inode, start, end); - truncate_pagecache_range(inode, start, end - 1); + flags =3D EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT; + /* Preallocate the range including the unaligned edges */ + if (offset & (blocksize - 1) || end & (blocksize - 1)) { + ext4_lblk_t alloc_lblk =3D offset >> blkbits; + ext4_lblk_t len_lblk =3D EXT4_MAX_BLOCKS(len, offset, blkbits); =20 - ret =3D ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, - flags); - filemap_invalidate_unlock(mapping); + ret =3D ext4_alloc_file_blocks(file, alloc_lblk, len_lblk, + new_size, flags); if (ret) - goto out_mutex; + goto out_invalidate_lock; } - if (!partial_begin && !partial_end) - goto out_mutex; + + /* Zero range excluding the unaligned edges */ + start_lblk =3D round_up(offset, blocksize) >> blkbits; + end_lblk =3D end >> blkbits; + if (end_lblk > start_lblk) { + ext4_lblk_t zero_blks =3D end_lblk - start_lblk; + + flags |=3D (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN | EXT4_EX_NOCACHE); + ret =3D ext4_alloc_file_blocks(file, start_lblk, zero_blks, + new_size, flags); + if (ret) + goto out_invalidate_lock; + } + /* Finish zeroing out if it doesn't contain partial block */ + if (!(offset & (blocksize - 1)) && !(end & (blocksize - 1))) + goto out_invalidate_lock; =20 /* * In worst case we have to writeout two nonadjacent unwritten @@ -4700,25 +4665,29 @@ static long ext4_zero_range(struct file *file, loff= _t offset, if (IS_ERR(handle)) { ret =3D PTR_ERR(handle); ext4_std_error(inode->i_sb, ret); - goto out_mutex; + goto out_invalidate_lock; } =20 + /* Zero out partial block at the edges of the range */ + ret =3D ext4_zero_partial_blocks(handle, inode, offset, len); + if (ret) + goto out_handle; + if (new_size) ext4_update_inode_size(inode, new_size); ret =3D ext4_mark_inode_dirty(handle, inode); if (unlikely(ret)) goto out_handle; - /* Zero out partial block at the edges of the range */ - ret =3D ext4_zero_partial_blocks(handle, inode, offset, len); - if (ret >=3D 0) - ext4_update_inode_fsync_trans(handle, inode, 1); =20 + ext4_update_inode_fsync_trans(handle, inode, 1); if (file->f_flags & O_SYNC) ext4_handle_sync(handle); =20 out_handle: ext4_journal_stop(handle); -out_mutex: +out_invalidate_lock: + filemap_invalidate_unlock(mapping); +out: inode_unlock(inode); return ret; } --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56E71148318; Tue, 22 Oct 2024 03:13:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566783; cv=none; b=Ull5qcXfeRkH0X9wgma3A1dAl0QjmM5ysOMwW3VmgHxai638bF0IhqLTGHUbpsBhyxuL2ZPgigKeYimjG5keUtzREWcoykPyM6Er9sPcylkesCUS8+o24RXlXv8fa4LNg8989O1UQ9m0qy9ShYsAzHBYGxDsPzjjo18oVsUdzSc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566783; c=relaxed/simple; bh=T/gNZrAi+yf3cn2sY8Slhlo+bj7bmazb0AjAtYqV5+g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VVF5hsOMFar+hSWBTPDYwUTrGZSyuPCYdl7pz3EELUmK5U/j39U3VdsCEqlk9frpLjTaEfV4IW9LMcTuAtmWk0nYhmqO/q8rd1XdK0pha9WbBqIAEa1VnaqWXpIvf0hR+6ApWnt7GrHEq0iHXY5uA4U88Fs+Z/43rouNd21BXuU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgD5QN1z4f3jkc; Tue, 22 Oct 2024 11:12:40 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 088B21A018D; Tue, 22 Oct 2024 11:12:53 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S10; Tue, 22 Oct 2024 11:12:52 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 06/27] ext4: refactor ext4_collapse_range() Date: Tue, 22 Oct 2024 19:10:37 +0800 Message-ID: <20241022111059.2566137-7-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S10 X-Coremail-Antispam: 1UD129KBjvJXoW3Gry8KryxGr4fWrW5Kw18uFg_yoWxJr4fpr ZxWry5Kr40ga4kWr48tF4DZF10y3W0g3yUWrWxGrnYqa4qyrnrKa4YyFyF9Fy8trWkZFWj qF40v34UWrW7Aa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Simplify ext4_collapse_range() and align its code style with that of ext4_zero_range() and ext4_punch_hole(). Refactor it by: a) renaming variables, b) removing redundant input parameter checks and moving the remaining checks under i_rwsem in preparation for future refactoring, and c) renaming the three stale error tags. Signed-off-by: Zhang Yi --- fs/ext4/extents.c | 103 +++++++++++++++++++++------------------------- 1 file changed, 48 insertions(+), 55 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index f843342e5164..a4e95f3b5f09 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5295,43 +5295,36 @@ static int ext4_collapse_range(struct file *file, l= off_t offset, loff_t len) struct inode *inode =3D file_inode(file); struct super_block *sb =3D inode->i_sb; struct address_space *mapping =3D inode->i_mapping; - ext4_lblk_t punch_start, punch_stop; + loff_t end =3D offset + len; + ext4_lblk_t start_lblk, end_lblk; handle_t *handle; unsigned int credits; - loff_t new_size, ioffset; + loff_t start, new_size; int ret; =20 - /* - * We need to test this early because xfstests assumes that a - * collapse range of (0, 1) will return EOPNOTSUPP if the file - * system does not support collapse range. - */ - if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) - return -EOPNOTSUPP; + trace_ext4_collapse_range(inode, offset, len); =20 - /* Collapse range works only on fs cluster size aligned regions. */ - if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) - return -EINVAL; + inode_lock(inode); =20 - trace_ext4_collapse_range(inode, offset, len); + /* Currently just for extent based files */ + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { + ret =3D -EOPNOTSUPP; + goto out; + } =20 - punch_start =3D offset >> EXT4_BLOCK_SIZE_BITS(sb); - punch_stop =3D (offset + len) >> EXT4_BLOCK_SIZE_BITS(sb); + /* Collapse range works only on fs cluster size aligned regions. */ + if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) { + ret =3D -EINVAL; + goto out; + } =20 - inode_lock(inode); /* * There is no need to overlap collapse range with EOF, in which case * it is effectively a truncate operation */ - if (offset + len >=3D inode->i_size) { + if (end >=3D inode->i_size) { ret =3D -EINVAL; - goto out_mutex; - } - - /* Currently just for extent based files */ - if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { - ret =3D -EOPNOTSUPP; - goto out_mutex; + goto out; } =20 /* Wait for existing dio to complete */ @@ -5339,7 +5332,7 @@ static int ext4_collapse_range(struct file *file, lof= f_t offset, loff_t len) =20 ret =3D file_modified(file); if (ret) - goto out_mutex; + goto out; =20 /* * Prevent page faults from reinstantiating pages we have released from @@ -5349,55 +5342,52 @@ static int ext4_collapse_range(struct file *file, l= off_t offset, loff_t len) =20 ret =3D ext4_break_layouts(inode); if (ret) - goto out_mmap; + goto out_invalidate_lock; =20 /* + * Write tail of the last page before removed range and data that + * will be shifted since they will get removed from the page cache + * below. We are also protected from pages becoming dirty by + * i_rwsem and invalidate_lock. * Need to round down offset to be aligned with page size boundary * for page size > block size. */ - ioffset =3D round_down(offset, PAGE_SIZE); - /* - * Write tail of the last page before removed range since it will get - * removed from the page cache below. - */ - ret =3D filemap_write_and_wait_range(mapping, ioffset, offset); - if (ret) - goto out_mmap; - /* - * Write data that will be shifted to preserve them when discarding - * page cache below. We are also protected from pages becoming dirty - * by i_rwsem and invalidate_lock. - */ - ret =3D filemap_write_and_wait_range(mapping, offset + len, - LLONG_MAX); + start =3D round_down(offset, PAGE_SIZE); + ret =3D filemap_write_and_wait_range(mapping, start, offset); + if (!ret) + ret =3D filemap_write_and_wait_range(mapping, end, LLONG_MAX); if (ret) - goto out_mmap; - truncate_pagecache(inode, ioffset); + goto out_invalidate_lock; + + truncate_pagecache(inode, start); =20 credits =3D ext4_writepage_trans_blocks(inode); handle =3D ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits); if (IS_ERR(handle)) { ret =3D PTR_ERR(handle); - goto out_mmap; + goto out_invalidate_lock; } ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle); =20 + start_lblk =3D offset >> inode->i_blkbits; + end_lblk =3D (offset + len) >> inode->i_blkbits; + down_write(&EXT4_I(inode)->i_data_sem); ext4_discard_preallocations(inode); - ext4_es_remove_extent(inode, punch_start, EXT_MAX_BLOCKS - punch_start); + ext4_es_remove_extent(inode, start_lblk, EXT_MAX_BLOCKS - start_lblk); =20 - ret =3D ext4_ext_remove_space(inode, punch_start, punch_stop - 1); + ret =3D ext4_ext_remove_space(inode, start_lblk, end_lblk - 1); if (ret) { up_write(&EXT4_I(inode)->i_data_sem); - goto out_stop; + goto out_handle; } ext4_discard_preallocations(inode); =20 - ret =3D ext4_ext_shift_extents(inode, handle, punch_stop, - punch_stop - punch_start, SHIFT_LEFT); + ret =3D ext4_ext_shift_extents(inode, handle, end_lblk, + end_lblk - start_lblk, SHIFT_LEFT); if (ret) { up_write(&EXT4_I(inode)->i_data_sem); - goto out_stop; + goto out_handle; } =20 new_size =3D inode->i_size - len; @@ -5405,16 +5395,19 @@ static int ext4_collapse_range(struct file *file, l= off_t offset, loff_t len) EXT4_I(inode)->i_disksize =3D new_size; =20 up_write(&EXT4_I(inode)->i_data_sem); - if (IS_SYNC(inode)) - ext4_handle_sync(handle); ret =3D ext4_mark_inode_dirty(handle, inode); + if (ret) + goto out_handle; + ext4_update_inode_fsync_trans(handle, inode, 1); + if (IS_SYNC(inode)) + ext4_handle_sync(handle); =20 -out_stop: +out_handle: ext4_journal_stop(handle); -out_mmap: +out_invalidate_lock: filemap_invalidate_unlock(mapping); -out_mutex: +out: inode_unlock(inode); return ret; } --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EBFB1487D1; Tue, 22 Oct 2024 03:13:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566785; cv=none; b=i9sHGMGsF37YTirdUIx1bCVRkW/swwWH1eyWR4eRgP0szwgiX8CdabHWbjFYXz2v3WrqAqWpW4VWvZ+xUl915z5gFiwGRNE0Q+aaDJ+jQnyV0skXF3ULjZRnztrsp8tFrOHflgAXMpvr5MJMwKQAwFcOK+klVTBnKSx9sGqCJAM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566785; c=relaxed/simple; bh=QSoSH/PS71Cm1qttsP1bNXGF+E8W1iWagKD5oA18EkQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KUR3S9OxcAZVzp125lQEv9JGEyUVNt97irLCk9AeGBWlaM6hvUAPAdX59vuWBT5Y0URhGpWIXrnKyQG4Wn/BMlSslHSHH11lWjj878EE8uxPtQ6v7ddGORAx3Ejl9451gt98MW6J6U0kPI1CArbeZWFATYrlsn15hLIXMLIrLKg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcg73Vqkz4f3lVx; Tue, 22 Oct 2024 11:12:35 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id A5F681A0359; Tue, 22 Oct 2024 11:12:53 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S11; Tue, 22 Oct 2024 11:12:53 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 07/27] ext4: refactor ext4_insert_range() Date: Tue, 22 Oct 2024 19:10:38 +0800 Message-ID: <20241022111059.2566137-8-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S11 X-Coremail-Antispam: 1UD129KBjvJXoW3GryUCF13Wr1UJw45uryDWrg_yoWxAr48pr ZxWry5GrWFqa4v9rW8KF4DZF18K3WkG3y7GryxGrn3Xa4jvr9rK3WYkFyYgFy8KrWkZrWY vF4Fk345Way2ka7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Simplify ext4_insert_range() and align its code style with that of ext4_collapse_range(). Refactor it by: a) renaming variables, b) removing redundant input parameter checks and moving the remaining checks under i_rwsem in preparation for future refactoring, and c) renaming the three stale error tags. Signed-off-by: Zhang Yi --- fs/ext4/extents.c | 101 ++++++++++++++++++++++------------------------ 1 file changed, 48 insertions(+), 53 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index a4e95f3b5f09..4e35c2415e9b 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5428,45 +5428,37 @@ static int ext4_insert_range(struct file *file, lof= f_t offset, loff_t len) handle_t *handle; struct ext4_ext_path *path; struct ext4_extent *extent; - ext4_lblk_t offset_lblk, len_lblk, ee_start_lblk =3D 0; + ext4_lblk_t start_lblk, len_lblk, ee_start_lblk =3D 0; unsigned int credits, ee_len; - int ret =3D 0, depth, split_flag =3D 0; - loff_t ioffset; - - /* - * We need to test this early because xfstests assumes that an - * insert range of (0, 1) will return EOPNOTSUPP if the file - * system does not support insert range. - */ - if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) - return -EOPNOTSUPP; - - /* Insert range works only on fs cluster size aligned regions. */ - if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) - return -EINVAL; + int ret, depth, split_flag =3D 0; + loff_t start; =20 trace_ext4_insert_range(inode, offset, len); =20 - offset_lblk =3D offset >> EXT4_BLOCK_SIZE_BITS(sb); - len_lblk =3D len >> EXT4_BLOCK_SIZE_BITS(sb); - inode_lock(inode); + /* Currently just for extent based files */ if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { ret =3D -EOPNOTSUPP; - goto out_mutex; + goto out; } =20 - /* Check whether the maximum file size would be exceeded */ - if (len > inode->i_sb->s_maxbytes - inode->i_size) { - ret =3D -EFBIG; - goto out_mutex; + /* Insert range works only on fs cluster size aligned regions. */ + if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) { + ret =3D -EINVAL; + goto out; } =20 /* Offset must be less than i_size */ if (offset >=3D inode->i_size) { ret =3D -EINVAL; - goto out_mutex; + goto out; + } + + /* Check whether the maximum file size would be exceeded */ + if (len > inode->i_sb->s_maxbytes - inode->i_size) { + ret =3D -EFBIG; + goto out; } =20 /* Wait for existing dio to complete */ @@ -5474,7 +5466,7 @@ static int ext4_insert_range(struct file *file, loff_= t offset, loff_t len) =20 ret =3D file_modified(file); if (ret) - goto out_mutex; + goto out; =20 /* * Prevent page faults from reinstantiating pages we have released from @@ -5484,25 +5476,24 @@ static int ext4_insert_range(struct file *file, lof= f_t offset, loff_t len) =20 ret =3D ext4_break_layouts(inode); if (ret) - goto out_mmap; + goto out_invalidate_lock; =20 /* - * Need to round down to align start offset to page size boundary - * for page size > block size. + * Write out all dirty pages. Need to round down to align start offset + * to page size boundary for page size > block size. */ - ioffset =3D round_down(offset, PAGE_SIZE); - /* Write out all dirty pages */ - ret =3D filemap_write_and_wait_range(inode->i_mapping, ioffset, - LLONG_MAX); + start =3D round_down(offset, PAGE_SIZE); + ret =3D filemap_write_and_wait_range(mapping, start, LLONG_MAX); if (ret) - goto out_mmap; - truncate_pagecache(inode, ioffset); + goto out_invalidate_lock; + + truncate_pagecache(inode, start); =20 credits =3D ext4_writepage_trans_blocks(inode); handle =3D ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits); if (IS_ERR(handle)) { ret =3D PTR_ERR(handle); - goto out_mmap; + goto out_invalidate_lock; } ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle); =20 @@ -5511,16 +5502,19 @@ static int ext4_insert_range(struct file *file, lof= f_t offset, loff_t len) EXT4_I(inode)->i_disksize +=3D len; ret =3D ext4_mark_inode_dirty(handle, inode); if (ret) - goto out_stop; + goto out_handle; + + start_lblk =3D offset >> inode->i_blkbits; + len_lblk =3D len >> inode->i_blkbits; =20 down_write(&EXT4_I(inode)->i_data_sem); ext4_discard_preallocations(inode); =20 - path =3D ext4_find_extent(inode, offset_lblk, NULL, 0); + path =3D ext4_find_extent(inode, start_lblk, NULL, 0); if (IS_ERR(path)) { up_write(&EXT4_I(inode)->i_data_sem); ret =3D PTR_ERR(path); - goto out_stop; + goto out_handle; } =20 depth =3D ext_depth(inode); @@ -5530,16 +5524,16 @@ static int ext4_insert_range(struct file *file, lof= f_t offset, loff_t len) ee_len =3D ext4_ext_get_actual_len(extent); =20 /* - * If offset_lblk is not the starting block of extent, split - * the extent @offset_lblk + * If start_lblk is not the starting block of extent, split + * the extent @start_lblk */ - if ((offset_lblk > ee_start_lblk) && - (offset_lblk < (ee_start_lblk + ee_len))) { + if ((start_lblk > ee_start_lblk) && + (start_lblk < (ee_start_lblk + ee_len))) { if (ext4_ext_is_unwritten(extent)) split_flag =3D EXT4_EXT_MARK_UNWRIT1 | EXT4_EXT_MARK_UNWRIT2; path =3D ext4_split_extent_at(handle, inode, path, - offset_lblk, split_flag, + start_lblk, split_flag, EXT4_EX_NOCACHE | EXT4_GET_BLOCKS_PRE_IO | EXT4_GET_BLOCKS_METADATA_NOFAIL); @@ -5548,31 +5542,32 @@ static int ext4_insert_range(struct file *file, lof= f_t offset, loff_t len) if (IS_ERR(path)) { up_write(&EXT4_I(inode)->i_data_sem); ret =3D PTR_ERR(path); - goto out_stop; + goto out_handle; } } =20 ext4_free_ext_path(path); - ext4_es_remove_extent(inode, offset_lblk, EXT_MAX_BLOCKS - offset_lblk); + ext4_es_remove_extent(inode, start_lblk, EXT_MAX_BLOCKS - start_lblk); =20 /* - * if offset_lblk lies in a hole which is at start of file, use + * if start_lblk lies in a hole which is at start of file, use * ee_start_lblk to shift extents */ ret =3D ext4_ext_shift_extents(inode, handle, - max(ee_start_lblk, offset_lblk), len_lblk, SHIFT_RIGHT); - + max(ee_start_lblk, start_lblk), len_lblk, SHIFT_RIGHT); up_write(&EXT4_I(inode)->i_data_sem); + if (ret) + goto out_handle; + + ext4_update_inode_fsync_trans(handle, inode, 1); if (IS_SYNC(inode)) ext4_handle_sync(handle); - if (ret >=3D 0) - ext4_update_inode_fsync_trans(handle, inode, 1); =20 -out_stop: +out_handle: ext4_journal_stop(handle); -out_mmap: +out_invalidate_lock: filemap_invalidate_unlock(mapping); -out_mutex: +out: inode_unlock(inode); return ret; } --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98F541487DC; Tue, 22 Oct 2024 03:13:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566784; cv=none; b=f2oahD1vSlMaHn4raz9NCtIPg7lIgEitSZfQu+y42t786gj+kIVm9fthSmxtPx0NlWlGjuudaYj7emUjR+Whmrr/Dg3zrgfw27uIZh1eMZLGVeCSQV8hEupOxlY9NE2fU01fiD2E3RcPXpFM2DW7BgoHWN+O9iJhulshHkh74VY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566784; c=relaxed/simple; bh=5A6JmUMLg4zuQzawlkKnnalaCarkP+KrzfNz2a5uV8s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IchPzrbzkZptwlz9i9Pv7QtaISg1ps5YM7SChjler2Vzne+N0bX5rveln+tzNGHDrBPqsUIkWS/NqIkASYs3BgQHsiweDTZy7WBlQqh4gmClxR/gTermu6CuOJm28QkzOram1IP8Yqj5G0lWAQDayJeXgOslTXH7pUTH6e3VBsU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcg812Jwz4f3lW6; Tue, 22 Oct 2024 11:12:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 502AB1A0568; Tue, 22 Oct 2024 11:12:54 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S12; Tue, 22 Oct 2024 11:12:54 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 08/27] ext4: factor out ext4_do_fallocate() Date: Tue, 22 Oct 2024 19:10:39 +0800 Message-ID: <20241022111059.2566137-9-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S12 X-Coremail-Antispam: 1UD129KBjvJXoWxuFy8JFW7Kr48tw1fJFyrJFb_yoWrKr4fpF Z8JryUGFWxXa4DWrW0qw4UXFn8ta1kKrWUWrWI9rnaq3s0y3sxKF1YkFyFgFWftrW8Ar4j vF4Yyry7CF17A3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Now the real job of normal fallocate are open coded in ext4_fallocate(), factor out a new helper ext4_do_fallocate() to do the real job, like others functions (e.g. ext4_zero_range()) in ext4_fallocate() do, this can make the code more clear, no functional changes. Signed-off-by: Zhang Yi Reviewed-by: Jan Kara --- fs/ext4/extents.c | 125 ++++++++++++++++++++++------------------------ 1 file changed, 60 insertions(+), 65 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 4e35c2415e9b..2f727104f53d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4692,6 +4692,58 @@ static long ext4_zero_range(struct file *file, loff_= t offset, return ret; } =20 +static long ext4_do_fallocate(struct file *file, loff_t offset, + loff_t len, int mode) +{ + struct inode *inode =3D file_inode(file); + loff_t end =3D offset + len; + loff_t new_size =3D 0; + ext4_lblk_t start_lblk, len_lblk; + int ret; + + trace_ext4_fallocate_enter(inode, offset, len, mode); + + start_lblk =3D offset >> inode->i_blkbits; + len_lblk =3D EXT4_MAX_BLOCKS(len, offset, inode->i_blkbits); + + inode_lock(inode); + + /* We only support preallocation for extent-based files only. */ + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { + ret =3D -EOPNOTSUPP; + goto out; + } + + if (!(mode & FALLOC_FL_KEEP_SIZE) && + (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) { + new_size =3D end; + ret =3D inode_newsize_ok(inode, new_size); + if (ret) + goto out; + } + + /* Wait all existing dio workers, newcomers will block on i_rwsem */ + inode_dio_wait(inode); + + ret =3D file_modified(file); + if (ret) + goto out; + + ret =3D ext4_alloc_file_blocks(file, start_lblk, len_lblk, new_size, + EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); + if (ret) + goto out; + + if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) { + ret =3D ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal, + EXT4_I(inode)->i_sync_tid); + } +out: + inode_unlock(inode); + trace_ext4_fallocate_exit(inode, offset, len_lblk, ret); + return ret; +} + /* * preallocate space for a file. This implements ext4's fallocate file * operation, which gets called from sys_fallocate system call. @@ -4702,12 +4754,7 @@ static long ext4_zero_range(struct file *file, loff_= t offset, long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { struct inode *inode =3D file_inode(file); - loff_t new_size =3D 0; - unsigned int max_blocks; - int ret =3D 0; - int flags; - ext4_lblk_t lblk; - unsigned int blkbits =3D inode->i_blkbits; + int ret; =20 /* * Encrypted inodes can't handle collapse range or insert @@ -4729,71 +4776,19 @@ long ext4_fallocate(struct file *file, int mode, lo= ff_t offset, loff_t len) ret =3D ext4_convert_inline_data(inode); inode_unlock(inode); if (ret) - goto exit; + return ret; =20 - if (mode & FALLOC_FL_PUNCH_HOLE) { + if (mode & FALLOC_FL_PUNCH_HOLE) ret =3D ext4_punch_hole(file, offset, len); - goto exit; - } - - if (mode & FALLOC_FL_COLLAPSE_RANGE) { + else if (mode & FALLOC_FL_COLLAPSE_RANGE) ret =3D ext4_collapse_range(file, offset, len); - goto exit; - } - - if (mode & FALLOC_FL_INSERT_RANGE) { + else if (mode & FALLOC_FL_INSERT_RANGE) ret =3D ext4_insert_range(file, offset, len); - goto exit; - } - - if (mode & FALLOC_FL_ZERO_RANGE) { + else if (mode & FALLOC_FL_ZERO_RANGE) ret =3D ext4_zero_range(file, offset, len, mode); - goto exit; - } - trace_ext4_fallocate_enter(inode, offset, len, mode); - lblk =3D offset >> blkbits; - - max_blocks =3D EXT4_MAX_BLOCKS(len, offset, blkbits); - flags =3D EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT; - - inode_lock(inode); - - /* - * We only support preallocation for extent-based files only - */ - if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { - ret =3D -EOPNOTSUPP; - goto out; - } - - if (!(mode & FALLOC_FL_KEEP_SIZE) && - (offset + len > inode->i_size || - offset + len > EXT4_I(inode)->i_disksize)) { - new_size =3D offset + len; - ret =3D inode_newsize_ok(inode, new_size); - if (ret) - goto out; - } - - /* Wait all existing dio workers, newcomers will block on i_rwsem */ - inode_dio_wait(inode); - - ret =3D file_modified(file); - if (ret) - goto out; - - ret =3D ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, flags); - if (ret) - goto out; + else + ret =3D ext4_do_fallocate(file, offset, len, mode); =20 - if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) { - ret =3D ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal, - EXT4_I(inode)->i_sync_tid); - } -out: - inode_unlock(inode); - trace_ext4_fallocate_exit(inode, offset, max_blocks, ret); -exit: return ret; } =20 --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 226BD13DBBE; Tue, 22 Oct 2024 03:12:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566780; cv=none; b=D1ZGPNmajqzd9SmC1CpUTq9/Ct9uQImKA7n3JJOCTrClWv2Q7eMHWDj+JtQkXE4RuhTL8fNqc9Gy4xT1YejuupqRFoPWFy7RXPenN9T3xxjrbNAFy02g7CGSKbJfuEqsUP72wOTw2XPHqIa9uLpCh9Sn9s71lu7cWQCbNM28YgU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566780; c=relaxed/simple; bh=Hs642+YcBh6zGzG/cPCBStH/9nkc21Aq/HGV670zhPA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GkNPKuH8nnFCMz7FalIc65DFMG1LhlH1D7sCvNyb6KkqTqL5Trm2kF+A7w23QTgwtk+IsZ3vAQAgV29n38AvBnu0OX44L2plwFHayTRczA6OsAuIyEv/6vUd7iiEauxhC6hgp/YWg48v7n+fh9mru2h+gdxjp7fjvTZqE59ssws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgG4jwxz4f3jkm; Tue, 22 Oct 2024 11:12:42 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id E790C1A0359; Tue, 22 Oct 2024 11:12:54 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S13; Tue, 22 Oct 2024 11:12:54 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 09/27] ext4: move out inode_lock into ext4_fallocate() Date: Tue, 22 Oct 2024 19:10:40 +0800 Message-ID: <20241022111059.2566137-10-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S13 X-Coremail-Antispam: 1UD129KBjvJXoW3GrWxtF47uw47tr13XF1rtFb_yoW3Xr4rpr Z8G3y5Jr4rXFykWrWvqa1DZF1jy3Z7KrWUWrW8urnFyasFy34fKF4YyFyF9FWrtrW8ZrWY vF4Utry7CFy7C3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Currently, all five sub-functions of ext4_fallocate() acquire the inode's i_rwsem at the beginning and release it before exiting. This process can be simplified by factoring out the management of i_rwsem into the ext4_fallocate() function. Signed-off-by: Zhang Yi --- fs/ext4/extents.c | 90 +++++++++++++++-------------------------------- fs/ext4/inode.c | 13 +++---- 2 files changed, 33 insertions(+), 70 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 2f727104f53d..a2db4e85790f 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4573,23 +4573,18 @@ static long ext4_zero_range(struct file *file, loff= _t offset, int ret, flags, credits; =20 trace_ext4_zero_range(inode, offset, len, mode); + WARN_ON_ONCE(!inode_is_locked(inode)); =20 - inode_lock(inode); - - /* - * Indirect files do not support unwritten extents - */ - if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { - ret =3D -EOPNOTSUPP; - goto out; - } + /* Indirect files do not support unwritten extents */ + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) + return -EOPNOTSUPP; =20 if (!(mode & FALLOC_FL_KEEP_SIZE) && (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) { new_size =3D end; ret =3D inode_newsize_ok(inode, new_size); if (ret) - goto out; + return ret; } =20 /* Wait all existing dio workers, newcomers will block on i_rwsem */ @@ -4597,7 +4592,7 @@ static long ext4_zero_range(struct file *file, loff_t= offset, =20 ret =3D file_modified(file); if (ret) - goto out; + return ret; =20 /* * Prevent page faults from reinstantiating pages we have released @@ -4687,8 +4682,6 @@ static long ext4_zero_range(struct file *file, loff_t= offset, ext4_journal_stop(handle); out_invalidate_lock: filemap_invalidate_unlock(mapping); -out: - inode_unlock(inode); return ret; } =20 @@ -4702,12 +4695,11 @@ static long ext4_do_fallocate(struct file *file, lo= ff_t offset, int ret; =20 trace_ext4_fallocate_enter(inode, offset, len, mode); + WARN_ON_ONCE(!inode_is_locked(inode)); =20 start_lblk =3D offset >> inode->i_blkbits; len_lblk =3D EXT4_MAX_BLOCKS(len, offset, inode->i_blkbits); =20 - inode_lock(inode); - /* We only support preallocation for extent-based files only. */ if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { ret =3D -EOPNOTSUPP; @@ -4739,7 +4731,6 @@ static long ext4_do_fallocate(struct file *file, loff= _t offset, EXT4_I(inode)->i_sync_tid); } out: - inode_unlock(inode); trace_ext4_fallocate_exit(inode, offset, len_lblk, ret); return ret; } @@ -4774,9 +4765,8 @@ long ext4_fallocate(struct file *file, int mode, loff= _t offset, loff_t len) =20 inode_lock(inode); ret =3D ext4_convert_inline_data(inode); - inode_unlock(inode); if (ret) - return ret; + goto out; =20 if (mode & FALLOC_FL_PUNCH_HOLE) ret =3D ext4_punch_hole(file, offset, len); @@ -4788,7 +4778,8 @@ long ext4_fallocate(struct file *file, int mode, loff= _t offset, loff_t len) ret =3D ext4_zero_range(file, offset, len, mode); else ret =3D ext4_do_fallocate(file, offset, len, mode); - +out: + inode_unlock(inode); return ret; } =20 @@ -5298,36 +5289,27 @@ static int ext4_collapse_range(struct file *file, l= off_t offset, loff_t len) int ret; =20 trace_ext4_collapse_range(inode, offset, len); - - inode_lock(inode); + WARN_ON_ONCE(!inode_is_locked(inode)); =20 /* Currently just for extent based files */ - if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { - ret =3D -EOPNOTSUPP; - goto out; - } - + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) + return -EOPNOTSUPP; /* Collapse range works only on fs cluster size aligned regions. */ - if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) { - ret =3D -EINVAL; - goto out; - } - + if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) + return -EINVAL; /* * There is no need to overlap collapse range with EOF, in which case * it is effectively a truncate operation */ - if (end >=3D inode->i_size) { - ret =3D -EINVAL; - goto out; - } + if (end >=3D inode->i_size) + return -EINVAL; =20 /* Wait for existing dio to complete */ inode_dio_wait(inode); =20 ret =3D file_modified(file); if (ret) - goto out; + return ret; =20 /* * Prevent page faults from reinstantiating pages we have released from @@ -5402,8 +5384,6 @@ static int ext4_collapse_range(struct file *file, lof= f_t offset, loff_t len) ext4_journal_stop(handle); out_invalidate_lock: filemap_invalidate_unlock(mapping); -out: - inode_unlock(inode); return ret; } =20 @@ -5429,39 +5409,27 @@ static int ext4_insert_range(struct file *file, lof= f_t offset, loff_t len) loff_t start; =20 trace_ext4_insert_range(inode, offset, len); - - inode_lock(inode); + WARN_ON_ONCE(!inode_is_locked(inode)); =20 /* Currently just for extent based files */ - if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { - ret =3D -EOPNOTSUPP; - goto out; - } - + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) + return -EOPNOTSUPP; /* Insert range works only on fs cluster size aligned regions. */ - if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) { - ret =3D -EINVAL; - goto out; - } - + if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) + return -EINVAL; /* Offset must be less than i_size */ - if (offset >=3D inode->i_size) { - ret =3D -EINVAL; - goto out; - } - + if (offset >=3D inode->i_size) + return -EINVAL; /* Check whether the maximum file size would be exceeded */ - if (len > inode->i_sb->s_maxbytes - inode->i_size) { - ret =3D -EFBIG; - goto out; - } + if (len > inode->i_sb->s_maxbytes - inode->i_size) + return -EFBIG; =20 /* Wait for existing dio to complete */ inode_dio_wait(inode); =20 ret =3D file_modified(file); if (ret) - goto out; + return ret; =20 /* * Prevent page faults from reinstantiating pages we have released from @@ -5562,8 +5530,6 @@ static int ext4_insert_range(struct file *file, loff_= t offset, loff_t len) ext4_journal_stop(handle); out_invalidate_lock: filemap_invalidate_unlock(mapping); -out: - inode_unlock(inode); return ret; } =20 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 1d128333bd06..bea19cd6e676 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3962,15 +3962,14 @@ int ext4_punch_hole(struct file *file, loff_t offse= t, loff_t length) unsigned long blocksize =3D i_blocksize(inode); handle_t *handle; unsigned int credits; - int ret =3D 0; + int ret; =20 trace_ext4_punch_hole(inode, offset, length, 0); - - inode_lock(inode); + WARN_ON_ONCE(!inode_is_locked(inode)); =20 /* No need to punch hole beyond i_size */ if (offset >=3D inode->i_size) - goto out; + return 0; =20 /* * If the hole extends beyond i_size, set the hole to end after @@ -3990,7 +3989,7 @@ int ext4_punch_hole(struct file *file, loff_t offset,= loff_t length) if (offset & (blocksize - 1) || end & (blocksize - 1)) { ret =3D ext4_inode_attach_jinode(inode); if (ret < 0) - goto out; + return ret; } =20 /* Wait all existing dio workers, newcomers will block on i_rwsem */ @@ -3998,7 +3997,7 @@ int ext4_punch_hole(struct file *file, loff_t offset,= loff_t length) =20 ret =3D file_modified(file); if (ret) - goto out; + return ret; =20 /* * Prevent page faults from reinstantiating pages we have released from @@ -4082,8 +4081,6 @@ int ext4_punch_hole(struct file *file, loff_t offset,= loff_t length) ext4_journal_stop(handle); out_invalidate_lock: filemap_invalidate_unlock(mapping); -out: - inode_unlock(inode); return ret; } =20 --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B1C31487E9; Tue, 22 Oct 2024 03:13:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566784; cv=none; b=JHuU9eUpzyMq0OfXDwME50+AvNyimXEsVUc+bRLWG6WkN50cn7Lz/4iimgfKtI6hDjEqCpX3mLs12WdwwpsrqDU6+XQkEo96feYjlZO4cbiKrRCg4e76Srp3rVo4xgwIG6jP5kjzSoUrhv58HXqMMKhlc1VaxpSvZxL6QzuWDLw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566784; c=relaxed/simple; bh=3kLoIhWBgWrbTJmEHk0gx9jhaeHoiNhK86cFpt3kCaM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZXJ6i7o7mKGXUoz6SYxfuHtGyB7N3OFKC+97eyd5Rs8pR6bk1EuDFxVEzjGUCbu9Lu45y0aL8i3Es0T15dQ0E/MTb3iY/hNit9PmwxfOC7NlcXOxGDiHzD+N6sfuhX+XJ8DhXrB7cDXl7BLdIEFk8ZyLr+1lpR0QW/XZS7mj/Jw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcg9330xz4f3lW6; Tue, 22 Oct 2024 11:12:37 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 943CB1A0568; Tue, 22 Oct 2024 11:12:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S14; Tue, 22 Oct 2024 11:12:55 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 10/27] ext4: move out common parts into ext4_fallocate() Date: Tue, 22 Oct 2024 19:10:41 +0800 Message-ID: <20241022111059.2566137-11-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S14 X-Coremail-Antispam: 1UD129KBjvJXoW3Jw1DCF1rGrWfKF4kZry8Grg_yoWftryfpF W5JrW5tFyxWFykWr4rAanrZF13twnFgrWUWrWxu34vvasIywnFka1YkFyFqFW3trW8Zr4j vF4jvr9rGFW7Z3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Currently, all zeroing ranges, punch holes, collapse ranges, and insert ranges first wait for all existing direct I/O workers to complete, and then they acquire the mapping's invalidate lock before performing the actual work. These common components are nearly identical, so we can simplify the code by factoring them out into the ext4_fallocate(). Signed-off-by: Zhang Yi --- fs/ext4/extents.c | 121 ++++++++++++++++------------------------------ fs/ext4/inode.c | 23 +-------- 2 files changed, 43 insertions(+), 101 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index a2db4e85790f..d5067d5aa449 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4587,23 +4587,6 @@ static long ext4_zero_range(struct file *file, loff_= t offset, return ret; } =20 - /* Wait all existing dio workers, newcomers will block on i_rwsem */ - inode_dio_wait(inode); - - ret =3D file_modified(file); - if (ret) - return ret; - - /* - * Prevent page faults from reinstantiating pages we have released - * from page cache. - */ - filemap_invalidate_lock(mapping); - - ret =3D ext4_break_layouts(inode); - if (ret) - goto out_invalidate_lock; - /* * For journalled data we need to write (and checkpoint) pages before * discarding page cache to avoid inconsitent data on disk in case of @@ -4616,7 +4599,7 @@ static long ext4_zero_range(struct file *file, loff_t= offset, ext4_truncate_folios_range(inode, offset, end); } if (ret) - goto out_invalidate_lock; + return ret; =20 /* Now release the pages and zero block aligned part of pages */ truncate_pagecache_range(inode, offset, end - 1); @@ -4630,7 +4613,7 @@ static long ext4_zero_range(struct file *file, loff_t= offset, ret =3D ext4_alloc_file_blocks(file, alloc_lblk, len_lblk, new_size, flags); if (ret) - goto out_invalidate_lock; + return ret; } =20 /* Zero range excluding the unaligned edges */ @@ -4643,11 +4626,11 @@ static long ext4_zero_range(struct file *file, loff= _t offset, ret =3D ext4_alloc_file_blocks(file, start_lblk, zero_blks, new_size, flags); if (ret) - goto out_invalidate_lock; + return ret; } /* Finish zeroing out if it doesn't contain partial block */ if (!(offset & (blocksize - 1)) && !(end & (blocksize - 1))) - goto out_invalidate_lock; + return ret; =20 /* * In worst case we have to writeout two nonadjacent unwritten @@ -4660,7 +4643,7 @@ static long ext4_zero_range(struct file *file, loff_t= offset, if (IS_ERR(handle)) { ret =3D PTR_ERR(handle); ext4_std_error(inode->i_sb, ret); - goto out_invalidate_lock; + return ret; } =20 /* Zero out partial block at the edges of the range */ @@ -4680,8 +4663,6 @@ static long ext4_zero_range(struct file *file, loff_t= offset, =20 out_handle: ext4_journal_stop(handle); -out_invalidate_lock: - filemap_invalidate_unlock(mapping); return ret; } =20 @@ -4714,13 +4695,6 @@ static long ext4_do_fallocate(struct file *file, lof= f_t offset, goto out; } =20 - /* Wait all existing dio workers, newcomers will block on i_rwsem */ - inode_dio_wait(inode); - - ret =3D file_modified(file); - if (ret) - goto out; - ret =3D ext4_alloc_file_blocks(file, start_lblk, len_lblk, new_size, EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); if (ret) @@ -4745,6 +4719,7 @@ static long ext4_do_fallocate(struct file *file, loff= _t offset, long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { struct inode *inode =3D file_inode(file); + struct address_space *mapping =3D file->f_mapping; int ret; =20 /* @@ -4768,6 +4743,29 @@ long ext4_fallocate(struct file *file, int mode, lof= f_t offset, loff_t len) if (ret) goto out; =20 + /* Wait all existing dio workers, newcomers will block on i_rwsem */ + inode_dio_wait(inode); + + ret =3D file_modified(file); + if (ret) + return ret; + + if ((mode & FALLOC_FL_MODE_MASK) =3D=3D FALLOC_FL_ALLOCATE_RANGE) { + ret =3D ext4_do_fallocate(file, offset, len, mode); + goto out; + } + + /* + * Follow-up operations will drop page cache, hold invalidate lock + * to prevent page faults from reinstantiating pages we have + * released from page cache. + */ + filemap_invalidate_lock(mapping); + + ret =3D ext4_break_layouts(inode); + if (ret) + goto out_invalidate_lock; + if (mode & FALLOC_FL_PUNCH_HOLE) ret =3D ext4_punch_hole(file, offset, len); else if (mode & FALLOC_FL_COLLAPSE_RANGE) @@ -4777,7 +4775,10 @@ long ext4_fallocate(struct file *file, int mode, lof= f_t offset, loff_t len) else if (mode & FALLOC_FL_ZERO_RANGE) ret =3D ext4_zero_range(file, offset, len, mode); else - ret =3D ext4_do_fallocate(file, offset, len, mode); + ret =3D -EOPNOTSUPP; + +out_invalidate_lock: + filemap_invalidate_unlock(mapping); out: inode_unlock(inode); return ret; @@ -5304,23 +5305,6 @@ static int ext4_collapse_range(struct file *file, lo= ff_t offset, loff_t len) if (end >=3D inode->i_size) return -EINVAL; =20 - /* Wait for existing dio to complete */ - inode_dio_wait(inode); - - ret =3D file_modified(file); - if (ret) - return ret; - - /* - * Prevent page faults from reinstantiating pages we have released from - * page cache. - */ - filemap_invalidate_lock(mapping); - - ret =3D ext4_break_layouts(inode); - if (ret) - goto out_invalidate_lock; - /* * Write tail of the last page before removed range and data that * will be shifted since they will get removed from the page cache @@ -5334,16 +5318,15 @@ static int ext4_collapse_range(struct file *file, l= off_t offset, loff_t len) if (!ret) ret =3D filemap_write_and_wait_range(mapping, end, LLONG_MAX); if (ret) - goto out_invalidate_lock; + return ret; =20 truncate_pagecache(inode, start); =20 credits =3D ext4_writepage_trans_blocks(inode); handle =3D ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits); - if (IS_ERR(handle)) { - ret =3D PTR_ERR(handle); - goto out_invalidate_lock; - } + if (IS_ERR(handle)) + return PTR_ERR(handle); + ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle); =20 start_lblk =3D offset >> inode->i_blkbits; @@ -5382,8 +5365,6 @@ static int ext4_collapse_range(struct file *file, lof= f_t offset, loff_t len) =20 out_handle: ext4_journal_stop(handle); -out_invalidate_lock: - filemap_invalidate_unlock(mapping); return ret; } =20 @@ -5424,23 +5405,6 @@ static int ext4_insert_range(struct file *file, loff= _t offset, loff_t len) if (len > inode->i_sb->s_maxbytes - inode->i_size) return -EFBIG; =20 - /* Wait for existing dio to complete */ - inode_dio_wait(inode); - - ret =3D file_modified(file); - if (ret) - return ret; - - /* - * Prevent page faults from reinstantiating pages we have released from - * page cache. - */ - filemap_invalidate_lock(mapping); - - ret =3D ext4_break_layouts(inode); - if (ret) - goto out_invalidate_lock; - /* * Write out all dirty pages. Need to round down to align start offset * to page size boundary for page size > block size. @@ -5448,16 +5412,15 @@ static int ext4_insert_range(struct file *file, lof= f_t offset, loff_t len) start =3D round_down(offset, PAGE_SIZE); ret =3D filemap_write_and_wait_range(mapping, start, LLONG_MAX); if (ret) - goto out_invalidate_lock; + return ret; =20 truncate_pagecache(inode, start); =20 credits =3D ext4_writepage_trans_blocks(inode); handle =3D ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits); - if (IS_ERR(handle)) { - ret =3D PTR_ERR(handle); - goto out_invalidate_lock; - } + if (IS_ERR(handle)) + return PTR_ERR(handle); + ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle); =20 /* Expand file to avoid data loss if there is error while shifting */ @@ -5528,8 +5491,6 @@ static int ext4_insert_range(struct file *file, loff_= t offset, loff_t len) =20 out_handle: ext4_journal_stop(handle); -out_invalidate_lock: - filemap_invalidate_unlock(mapping); return ret; } =20 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bea19cd6e676..1ccf84a64b7b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3992,23 +3992,6 @@ int ext4_punch_hole(struct file *file, loff_t offset= , loff_t length) return ret; } =20 - /* Wait all existing dio workers, newcomers will block on i_rwsem */ - inode_dio_wait(inode); - - ret =3D file_modified(file); - if (ret) - return ret; - - /* - * Prevent page faults from reinstantiating pages we have released from - * page cache. - */ - filemap_invalidate_lock(mapping); - - ret =3D ext4_break_layouts(inode); - if (ret) - goto out_invalidate_lock; - /* * For journalled data we need to write (and checkpoint) pages * before discarding page cache to avoid inconsitent data on @@ -4021,7 +4004,7 @@ int ext4_punch_hole(struct file *file, loff_t offset,= loff_t length) ext4_truncate_folios_range(inode, offset, end); } if (ret) - goto out_invalidate_lock; + return ret; =20 /* Now release the pages and zero block aligned part of pages*/ truncate_pagecache_range(inode, offset, end - 1); @@ -4034,7 +4017,7 @@ int ext4_punch_hole(struct file *file, loff_t offset,= loff_t length) if (IS_ERR(handle)) { ret =3D PTR_ERR(handle); ext4_std_error(sb, ret); - goto out_invalidate_lock; + return ret; } =20 ret =3D ext4_zero_partial_blocks(handle, inode, offset, length); @@ -4079,8 +4062,6 @@ int ext4_punch_hole(struct file *file, loff_t offset,= loff_t length) ext4_handle_sync(handle); out_handle: ext4_journal_stop(handle); -out_invalidate_lock: - filemap_invalidate_unlock(mapping); return ret; } =20 --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2AE613E88C; Tue, 22 Oct 2024 03:12:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566781; cv=none; b=l0DSry85iyEVbFY4V3Vh0j5/Rbz58/SMHMT4GT9RfIkThHxfhCmMZOj2llMTuRKxvuDSvlwvodvB8v0Zk3KZxsJEQ6WmI5niBaiSdJ8fOAtH5jk7ToOcSS5YmfT9KLaPmPHhVgTe31oh21+oU/nRYoxLUf3vz9/Ua+OUJQEPtlY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566781; c=relaxed/simple; bh=Q4ZjcXrFDZS2msa9ZwQ+vB5v3/8DlSJ9xp0ij3cEtWA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XeLJ4zNxDivNMVZhAfwptTEU452qFl5thQFkrLax8E1sZ+snEmjFe685Ct0Sh1MueBjGVAbIYR0Gu0Gjqyga6i1aN2n6nHFwygC583ZFZdkqkhL2iRQrD44LoVHox36Dg/dKevFMDHg/ET4UOs18LUXjVLaTJ5c9hwNujB607F0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgH6bxHz4f3jkk; Tue, 22 Oct 2024 11:12:43 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 344111A058E; Tue, 22 Oct 2024 11:12:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S15; Tue, 22 Oct 2024 11:12:55 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 11/27] ext4: use reserved metadata blocks when splitting extent on endio Date: Tue, 22 Oct 2024 19:10:42 +0800 Message-ID: <20241022111059.2566137-12-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S15 X-Coremail-Antispam: 1UD129KBjvJXoW7Zw15Ary3ZF4fKF4UGrW8Xrb_yoW8ur1DpF yay3W8Gr40vw1Y9F92ya1UAr12k3W5GF47urZ8tayUuay7Ar1rXF12k34F9FyYvrWxJa1Y vr409w1UZwn5uaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi When performing buffered writes, we may need to split and convert an unwritten extent into a written one during the end I/O process. However, we do not reserve space specifically for these metadata changes, we only reserve 2% of space or 4096 blocks. To address this, we use EXT4_GET_BLOCKS_PRE_IO to potentially split extents in advance and EXT4_GET_BLOCKS_METADATA_NOFAIL to utilize reserved space if necessary. These two approaches can reduce the likelihood of running out of space and losing data. However, these methods are merely best efforts, we could still run out of space, and there is not much difference between converting an extent during the writeback process and the end I/O process, it won't increase the rick of losing data if we postpone the conversion. Therefore, also use EXT4_GET_BLOCKS_METADATA_NOFAIL in ext4_convert_unwritten_extents_endio() to prepare for the buffered I/O iomap conversion, which may perform extent conversion during the end I/O process. Signed-off-by: Zhang Yi --- fs/ext4/extents.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index d5067d5aa449..33bc2cc5aff4 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3767,6 +3767,8 @@ ext4_convert_unwritten_extents_endio(handle_t *handle= , struct inode *inode, * illegal. */ if (ee_block !=3D map->m_lblk || ee_len > map->m_len) { + int flags =3D EXT4_GET_BLOCKS_CONVERT | + EXT4_GET_BLOCKS_METADATA_NOFAIL; #ifdef CONFIG_EXT4_DEBUG ext4_warning(inode->i_sb, "Inode (%ld) finished: extent logical block %l= lu," " len %u; IO logical block %llu, len %u", @@ -3774,7 +3776,7 @@ ext4_convert_unwritten_extents_endio(handle_t *handle= , struct inode *inode, (unsigned long long)map->m_lblk, map->m_len); #endif path =3D ext4_split_convert_extents(handle, inode, map, path, - EXT4_GET_BLOCKS_CONVERT, NULL); + flags, NULL); if (IS_ERR(path)) return path; =20 --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2135142E78; Tue, 22 Oct 2024 03:12:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566785; cv=none; b=SONKffdL0qFhKUyjqwZi2C1EwvtRyzq6By6+oG2EQG5RV237UF72bpkuG1OdjN9qnfjuYNS1uzo9ov+nKW8a+sbHtmsFPw6pG2KiMwXXiL1w/Opn5tRiAyIWtQvYV6abVVFgNC/DV2GVRHPb8Fd2KNTLX/4MwLbLATygJ+8Uh+Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566785; c=relaxed/simple; bh=wA/+byRUE9HEVfAVPo0uMAiN7r+pgEmHZjo59/JZv7c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c9wdkTPijzBpGuONDJL2IK4oF9uqsdzH9Dt35ucD7XZcmu479iYM9SQ9hvdwEh7HcfmD8hW67oApdPZJTbX9FZzXabQ3tuK0AkUeXPSyJjYohXEXlP8RJ6Au+MhCrMseH7Ej39TK1/mvXIw/73XEuljRusOus/AE+j6vHRQeWys= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4XXcgC3PVqz4f3jY2; Tue, 22 Oct 2024 11:12:39 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id D0FE91A018D; Tue, 22 Oct 2024 11:12:56 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S16; Tue, 22 Oct 2024 11:12:56 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 12/27] ext4: introduce seq counter for the extent status entry Date: Tue, 22 Oct 2024 19:10:43 +0800 Message-ID: <20241022111059.2566137-13-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S16 X-Coremail-Antispam: 1UD129KBjvJXoWxtrW8WF4fuF1UCFWkuF15urg_yoW3GFWxpa 9rAr15GrWkXw4q93WxZw4rWr43Wa48CrW7Gr9IgFWFvFW8tr1DKF1UtF1jvF98tFW0yrnr XFWFy34DA3WUWa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi In the iomap_write_iter(), the iomap buffered write frame does not hold any locks between querying the inode extent mapping info and performing page cache writes. As a result, the extent mapping can be changed due to concurrent I/O in flight. Similarly, in the iomap_writepage_map(), the write-back process faces a similar problem: concurrent changes can invalidate the extent mapping before the I/O is submitted. Therefore, both of these processes must recheck the mapping info after acquiring the folio lock. To address this, similar to XFS, we propose introducing an extent sequence number to serve as a validity cookie for the extent. We will increment this number whenever the extent status tree changes, thereby preparing for the buffered write iomap conversion. Besides, it also changes the trace code style to make checkpatch.pl happy. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 1 + fs/ext4/extents_status.c | 13 ++++++++- fs/ext4/super.c | 1 + include/trace/events/ext4.h | 57 +++++++++++++++++++++---------------- 4 files changed, 46 insertions(+), 26 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 6d0267afd4c1..44f6867d3037 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1123,6 +1123,7 @@ struct ext4_inode_info { ext4_lblk_t i_es_shrink_lblk; /* Offset where we start searching for extents to shrink. Protected by i_es_lock */ + unsigned int i_es_seq; /* Change counter for extents */ =20 /* ialloc */ ext4_group_t i_last_alloc_group; diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index c786691dabd3..bea4f87db502 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -204,6 +204,13 @@ static inline ext4_lblk_t ext4_es_end(struct extent_st= atus *es) return es->es_lblk + es->es_len - 1; } =20 +static inline void ext4_es_inc_seq(struct inode *inode) +{ + struct ext4_inode_info *ei =3D EXT4_I(inode); + + WRITE_ONCE(ei->i_es_seq, READ_ONCE(ei->i_es_seq) + 1); +} + /* * search through the tree for an delayed extent with a given offset. If * it can't be found, try to find next extent. @@ -872,6 +879,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lb= lk_t lblk, BUG_ON(end < lblk); WARN_ON_ONCE(status & EXTENT_STATUS_DELAYED); =20 + ext4_es_inc_seq(inode); newes.es_lblk =3D lblk; newes.es_len =3D len; ext4_es_store_pblock_status(&newes, pblk, status); @@ -1519,13 +1527,15 @@ void ext4_es_remove_extent(struct inode *inode, ext= 4_lblk_t lblk, if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) return; =20 - trace_ext4_es_remove_extent(inode, lblk, len); es_debug("remove [%u/%u) from extent status tree of inode %lu\n", lblk, len, inode->i_ino); =20 if (!len) return; =20 + ext4_es_inc_seq(inode); + trace_ext4_es_remove_extent(inode, lblk, len); + end =3D lblk + len - 1; BUG_ON(end < lblk); =20 @@ -2107,6 +2117,7 @@ void ext4_es_insert_delayed_extent(struct inode *inod= e, ext4_lblk_t lblk, WARN_ON_ONCE((EXT4_B2C(sbi, lblk) =3D=3D EXT4_B2C(sbi, end)) && end_allocated); =20 + ext4_es_inc_seq(inode); newes.es_lblk =3D lblk; newes.es_len =3D len; ext4_es_store_pblock_status(&newes, ~0, EXTENT_STATUS_DELAYED); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 16a4ce704460..a01e0bbe57c8 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1409,6 +1409,7 @@ static struct inode *ext4_alloc_inode(struct super_bl= ock *sb) ei->i_es_all_nr =3D 0; ei->i_es_shk_nr =3D 0; ei->i_es_shrink_lblk =3D 0; + ei->i_es_seq =3D 0; ei->i_reserved_data_blocks =3D 0; spin_lock_init(&(ei->i_block_reservation_lock)); ext4_init_pending_tree(&ei->i_pending_tree); diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 156908641e68..6f2bf9035216 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2176,12 +2176,13 @@ DECLARE_EVENT_CLASS(ext4__es_extent, TP_ARGS(inode, es), =20 TP_STRUCT__entry( - __field( dev_t, dev ) - __field( ino_t, ino ) - __field( ext4_lblk_t, lblk ) - __field( ext4_lblk_t, len ) - __field( ext4_fsblk_t, pblk ) - __field( char, status ) + __field(dev_t, dev) + __field(ino_t, ino) + __field(ext4_lblk_t, lblk) + __field(ext4_lblk_t, len) + __field(ext4_fsblk_t, pblk) + __field(char, status) + __field(unsigned int, seq) ), =20 TP_fast_assign( @@ -2191,13 +2192,15 @@ DECLARE_EVENT_CLASS(ext4__es_extent, __entry->len =3D es->es_len; __entry->pblk =3D ext4_es_show_pblock(es); __entry->status =3D ext4_es_status(es); + __entry->seq =3D EXT4_I(inode)->i_es_seq; ), =20 - TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %s", + TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %s seq %u", MAJOR(__entry->dev), MINOR(__entry->dev), (unsigned long) __entry->ino, __entry->lblk, __entry->len, - __entry->pblk, show_extent_status(__entry->status)) + __entry->pblk, show_extent_status(__entry->status), + __entry->seq) ); =20 DEFINE_EVENT(ext4__es_extent, ext4_es_insert_extent, @@ -2218,10 +2221,11 @@ TRACE_EVENT(ext4_es_remove_extent, TP_ARGS(inode, lblk, len), =20 TP_STRUCT__entry( - __field( dev_t, dev ) - __field( ino_t, ino ) - __field( loff_t, lblk ) - __field( loff_t, len ) + __field(dev_t, dev) + __field(ino_t, ino) + __field(loff_t, lblk) + __field(loff_t, len) + __field(unsigned int, seq) ), =20 TP_fast_assign( @@ -2229,12 +2233,13 @@ TRACE_EVENT(ext4_es_remove_extent, __entry->ino =3D inode->i_ino; __entry->lblk =3D lblk; __entry->len =3D len; + __entry->seq =3D EXT4_I(inode)->i_es_seq; ), =20 - TP_printk("dev %d,%d ino %lu es [%lld/%lld)", + TP_printk("dev %d,%d ino %lu es [%lld/%lld) seq %u", MAJOR(__entry->dev), MINOR(__entry->dev), (unsigned long) __entry->ino, - __entry->lblk, __entry->len) + __entry->lblk, __entry->len, __entry->seq) ); =20 TRACE_EVENT(ext4_es_find_extent_range_enter, @@ -2486,14 +2491,15 @@ TRACE_EVENT(ext4_es_insert_delayed_extent, TP_ARGS(inode, es, lclu_allocated, end_allocated), =20 TP_STRUCT__entry( - __field( dev_t, dev ) - __field( ino_t, ino ) - __field( ext4_lblk_t, lblk ) - __field( ext4_lblk_t, len ) - __field( ext4_fsblk_t, pblk ) - __field( char, status ) - __field( bool, lclu_allocated ) - __field( bool, end_allocated ) + __field(dev_t, dev) + __field(ino_t, ino) + __field(ext4_lblk_t, lblk) + __field(ext4_lblk_t, len) + __field(ext4_fsblk_t, pblk) + __field(char, status) + __field(bool, lclu_allocated) + __field(bool, end_allocated) + __field(unsigned int, seq) ), =20 TP_fast_assign( @@ -2505,15 +2511,16 @@ TRACE_EVENT(ext4_es_insert_delayed_extent, __entry->status =3D ext4_es_status(es); __entry->lclu_allocated =3D lclu_allocated; __entry->end_allocated =3D end_allocated; + __entry->seq =3D EXT4_I(inode)->i_es_seq; ), =20 - TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %s " - "allocated %d %d", + TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %s allocated %= d %d seq %u", MAJOR(__entry->dev), MINOR(__entry->dev), (unsigned long) __entry->ino, __entry->lblk, __entry->len, __entry->pblk, show_extent_status(__entry->status), - __entry->lclu_allocated, __entry->end_allocated) + __entry->lclu_allocated, __entry->end_allocated, + __entry->seq) ); =20 /* fsmap traces */ --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 364EE1494CF; Tue, 22 Oct 2024 03:13:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566784; cv=none; b=O7ShU/5f0jlEC7SX5YIvy3cV50ideJgpA80vJtQjT9gjJBrzE9r7GAIo49QIPKtTqZvRIIA1qqSZla2F3dUtzbGs8yIu4qiXejePI3s5UhJ9IJFR0t4FPPzPo8toC6aQe6T8FUmaXJXOx5U9lCm9qla8pQOqQJXtdZmP7JERFiw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566784; c=relaxed/simple; bh=L2PnKFl1ZCOWQSi7+wvxH9mUeWEOTVmtXzjuGxJ5G7M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PnvmN/uxyz9N5tqq7VO1g7BuVW6Oe88UvlG3GmNO/t71z2vmL6QgMByOMi7fmBmE1XaUOCNk/lj5g8xvFauGEyGhM5uSRSi+c0ocgntwT0qDBiERWnZgjDRJ62ghxtU0fj9w+ngWntKCCM7OPpk41cpTYFugB2+1sHje9iNCmvs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgC27Rlz4f3lW5; Tue, 22 Oct 2024 11:12:39 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 77EE71A018D; Tue, 22 Oct 2024 11:12:57 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S17; Tue, 22 Oct 2024 11:12:57 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 13/27] ext4: add a new iomap aops for regular file's buffered IO path Date: Tue, 22 Oct 2024 19:10:44 +0800 Message-ID: <20241022111059.2566137-14-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S17 X-Coremail-Antispam: 1UD129KBjvJXoWxAFy8XFW5AF1kWFW5JFW5KFg_yoW5ZrykpF 98Kas3GF18Zr9rua1fXa9rAr4Yya4fJa1UKFW3G3Wa9r98GrW7KFWqka4jkFy5t3ykJr1I qr4j9ry7GF17CrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi This patch starts support for iomap in the buffered I/O path of ext4 regular files. First, it introduces a new iomap address space operation, ext4_iomap_aops. Additionally, it adds an inode state flag, EXT4_STATE_BUFFERED_IOMAP, which indicates that the inode uses the iomap path instead of the original buffer_head path for buffered I/O. Most callbacks of ext4_iomap_aops can directly utilize generic implementations, the remaining functions .read_folio(), .readahead(), and .writepages() will be implemented in later patches. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 1 + fs/ext4/inode.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 44f6867d3037..ee170196bfff 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1916,6 +1916,7 @@ enum { EXT4_STATE_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */ EXT4_STATE_FC_COMMITTING, /* Fast commit ongoing */ EXT4_STATE_ORPHAN_FILE, /* Inode orphaned in orphan file */ + EXT4_STATE_BUFFERED_IOMAP, /* Inode use iomap for buffered IO */ }; =20 #define EXT4_INODE_BIT_FNS(name, field, offset) \ diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 1ccf84a64b7b..b233f36efefa 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3526,6 +3526,22 @@ const struct iomap_ops ext4_iomap_report_ops =3D { .iomap_begin =3D ext4_iomap_begin_report, }; =20 +static int ext4_iomap_read_folio(struct file *file, struct folio *folio) +{ + return 0; +} + +static void ext4_iomap_readahead(struct readahead_control *rac) +{ + +} + +static int ext4_iomap_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + return 0; +} + /* * For data=3Djournal mode, folio should be marked dirty only when it was * writeably mapped. When that happens, it was already attached to the @@ -3612,6 +3628,20 @@ static const struct address_space_operations ext4_da= _aops =3D { .swap_activate =3D ext4_iomap_swap_activate, }; =20 +static const struct address_space_operations ext4_iomap_aops =3D { + .read_folio =3D ext4_iomap_read_folio, + .readahead =3D ext4_iomap_readahead, + .writepages =3D ext4_iomap_writepages, + .dirty_folio =3D iomap_dirty_folio, + .bmap =3D ext4_bmap, + .invalidate_folio =3D iomap_invalidate_folio, + .release_folio =3D iomap_release_folio, + .migrate_folio =3D filemap_migrate_folio, + .is_partially_uptodate =3D iomap_is_partially_uptodate, + .error_remove_folio =3D generic_error_remove_folio, + .swap_activate =3D ext4_iomap_swap_activate, +}; + static const struct address_space_operations ext4_dax_aops =3D { .writepages =3D ext4_dax_writepages, .dirty_folio =3D noop_dirty_folio, @@ -3633,6 +3663,8 @@ void ext4_set_aops(struct inode *inode) } if (IS_DAX(inode)) inode->i_mapping->a_ops =3D &ext4_dax_aops; + else if (ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) + inode->i_mapping->a_ops =3D &ext4_iomap_aops; else if (test_opt(inode->i_sb, DELALLOC)) inode->i_mapping->a_ops =3D &ext4_da_aops; else --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F09351474BF; Tue, 22 Oct 2024 03:13:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566783; cv=none; b=epPpTKZe+sNEsb6iJtOXxNHKKlQR+Y16fEp/bvTQCkZB9oW6BebaNbypqr/gqVZWKpV76jkWiI0OOH34IyleTYXFWSv35HtbORtbszUZkES+xHFFToHGnP4hRKLrBjmPXpE2pZsRQFEaCE2qNgO+5+iCRIzy3jXHgbZB8ZbIt+w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566783; c=relaxed/simple; bh=mfpbSaLpTZWPmr7Qv9kGMuWgxyR1PGeQlGnSV9/uK1M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TYhS0NATSfBhQ4zAs9wpB6ZbA7mya340rL5HIai0logD0bomUbUWtLc7vDPn/+uKGEbGdQYvXaq4kdzScU4wQcrU2/u1q5HpbXBsaDMtREZ+CFY51SBLY5Me8wq70gnn/YvXIVpePjfmOFRPyk+jnE3EBNFoVFEpqd2jikB+XGI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgK66cyz4f3jkk; Tue, 22 Oct 2024 11:12:45 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 227051A018D; Tue, 22 Oct 2024 11:12:58 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S18; Tue, 22 Oct 2024 11:12:57 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 14/27] ext4: implement buffered read iomap path Date: Tue, 22 Oct 2024 19:10:45 +0800 Message-ID: <20241022111059.2566137-15-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S18 X-Coremail-Antispam: 1UD129KBjvJXoW7CFyDXF1UXrW3Zw4UurWkXrb_yoW8KrWrpF 98KFy5GF47XrnI9a1SgFZrJr1Fk3WxtF45ZrWfWasxuFyYkrW2gay0gFyYvF1Yq3yxAr10 qr4jkr1xWF1UArDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Introduce a new iomap_ops, ext4_iomap_buffered_read_ops to implement the iomap read paths, specifically .read_folio() and .readahead() of ext4_iomap_aops. This .iomap_begin() handle invokes ext4_map_blocks() to query the extent mapping status of the read range and then converts the mapping information to iomap. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 37 +++++++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index b233f36efefa..f0bc4b58ac4f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3526,14 +3526,47 @@ const struct iomap_ops ext4_iomap_report_ops =3D { .iomap_begin =3D ext4_iomap_begin_report, }; =20 -static int ext4_iomap_read_folio(struct file *file, struct folio *folio) +static int ext4_iomap_buffered_read_begin(struct inode *inode, loff_t offs= et, + loff_t length, unsigned int flags, struct iomap *iomap, + struct iomap *srcmap) { + int ret; + struct ext4_map_blocks map; + u8 blkbits =3D inode->i_blkbits; + + if (unlikely(ext4_forced_shutdown(inode->i_sb))) + return -EIO; + if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) + return -EINVAL; + /* Inline data support is not yet available. */ + if (WARN_ON_ONCE(ext4_has_inline_data(inode))) + return -ERANGE; + + /* Calculate the first and last logical blocks respectively. */ + map.m_lblk =3D offset >> blkbits; + map.m_len =3D min_t(loff_t, (offset + length - 1) >> blkbits, + EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; + + ret =3D ext4_map_blocks(NULL, inode, &map, 0); + if (ret < 0) + return ret; + + ext4_set_iomap(inode, iomap, &map, offset, length, flags); return 0; } =20 -static void ext4_iomap_readahead(struct readahead_control *rac) +const struct iomap_ops ext4_iomap_buffered_read_ops =3D { + .iomap_begin =3D ext4_iomap_buffered_read_begin, +}; + +static int ext4_iomap_read_folio(struct file *file, struct folio *folio) { + return iomap_read_folio(folio, &ext4_iomap_buffered_read_ops); +} =20 +static void ext4_iomap_readahead(struct readahead_control *rac) +{ + iomap_readahead(rac, &ext4_iomap_buffered_read_ops); } =20 static int ext4_iomap_writepages(struct address_space *mapping, --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B324A148850; Tue, 22 Oct 2024 03:13:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566785; cv=none; b=LlSx4piQqUSyfPPLHT9/HpnUc+76IG9ChlQr2EY7NhvlLEQwcJCDFPa0y+LKJUsQMotUtYpkuvEB8FPkI08EAI4QjfWIXs9qFHN0sjQyN2ZdF4M3rO830ZF466+eKx8iC8jhoro+fyL9ZReLHcAhS/oTMsxOKmg5vrwm4JGQnaM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566785; c=relaxed/simple; bh=mvYROxnhlgk2x4bYLtyr7uhZYNNSevG/8KgsQUSX+BU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ShlEKI1V48EycMcOIqEVPtgmCJV5ZsajI52AznSHcohfToHiFl6LDzEpLiYmrr2NRHOt5x5lSJaSORAKfxsPZoSxKAsvYdEEpdp26PzsZv6FRWbJFytfrJyYrIhRqRSw9vCR0Xp2MbRwLl4HbE+g507l5SbKrmEWYbYyYsZJj80= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4XXcgF31JNz4f3jXP; Tue, 22 Oct 2024 11:12:41 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id BF4F01A018C; Tue, 22 Oct 2024 11:12:58 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S19; Tue, 22 Oct 2024 11:12:58 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 15/27] ext4: implement buffered write iomap path Date: Tue, 22 Oct 2024 19:10:46 +0800 Message-ID: <20241022111059.2566137-16-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S19 X-Coremail-Antispam: 1UD129KBjvJXoW3ZFy8Zr1fJw1kGFy8Ar4rGrg_yoWDZF4kpF Z0kry5GF47Xr97uF4ftF47Zr1Fk3Wxtr4UCrW3Wrn8Xr9IyryIqF409FyayF15t3yxCr4j qF4Ykry8Wr4UCrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Introduce two new iomap_ops: ext4_iomap_buffered_write_ops and ext4_iomap_buffered_da_write_ops to implement the iomap write path. These operations invoke ext4_da_map_blocks() to map delayed allocation extents and introduce ext4_iomap_get_blocks() to directly allocate blocks in non-delayed allocation mode. Additionally, implement ext4_iomap_valid() to check the validity of extent mapping. There are two key differences between the buffer_head write path and the iomap write path: 1) In the iomap write path, we always allocate unwritten extents for new blocks, which means we consistently enable dioread_nolock. Therefore, we do not need to truncate blocks for short writes and write failure. 2) The iomap write frame maps multi-blocks in the ->iomap_begin() function, so we must remove the stale delayed allocation range from the short writes and write failure. Otherwise, this could result in a range of delayed extents being covered by a clean folio, leading to inaccurate space reservation. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 3 + fs/ext4/file.c | 19 +++++- fs/ext4/inode.c | 155 +++++++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 169 insertions(+), 8 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index ee170196bfff..a09f96ef17d8 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2985,6 +2985,7 @@ int ext4_walk_page_buffers(handle_t *handle, struct buffer_head *bh)); int do_journal_get_write_access(handle_t *handle, struct inode *inode, struct buffer_head *bh); +int ext4_nonda_switch(struct super_block *sb); #define FALL_BACK_TO_NONDELALLOC 1 #define CONVERT_INLINE_DATA 2 =20 @@ -3845,6 +3846,8 @@ static inline void ext4_clear_io_unwritten_flag(ext4_= io_end_t *io_end) extern const struct iomap_ops ext4_iomap_ops; extern const struct iomap_ops ext4_iomap_overwrite_ops; extern const struct iomap_ops ext4_iomap_report_ops; +extern const struct iomap_ops ext4_iomap_buffered_write_ops; +extern const struct iomap_ops ext4_iomap_buffered_da_write_ops; =20 static inline int ext4_buffer_uptodate(struct buffer_head *bh) { diff --git a/fs/ext4/file.c b/fs/ext4/file.c index f14aed14b9cf..92471865b4e5 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -282,6 +282,20 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, s= truct iov_iter *from) return count; } =20 +static ssize_t ext4_iomap_buffered_write(struct kiocb *iocb, + struct iov_iter *from) +{ + struct inode *inode =3D file_inode(iocb->ki_filp); + const struct iomap_ops *iomap_ops; + + if (test_opt(inode->i_sb, DELALLOC) && !ext4_nonda_switch(inode->i_sb)) + iomap_ops =3D &ext4_iomap_buffered_da_write_ops; + else + iomap_ops =3D &ext4_iomap_buffered_write_ops; + + return iomap_file_buffered_write(iocb, from, iomap_ops, NULL); +} + static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, struct iov_iter *from) { @@ -296,7 +310,10 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *= iocb, if (ret <=3D 0) goto out; =20 - ret =3D generic_perform_write(iocb, from); + if (ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) + ret =3D ext4_iomap_buffered_write(iocb, from); + else + ret =3D generic_perform_write(iocb, from); =20 out: inode_unlock(inode); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f0bc4b58ac4f..23cbcaab0a56 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2862,7 +2862,7 @@ static int ext4_dax_writepages(struct address_space *= mapping, return ret; } =20 -static int ext4_nonda_switch(struct super_block *sb) +int ext4_nonda_switch(struct super_block *sb) { s64 free_clusters, dirty_clusters; struct ext4_sb_info *sbi =3D EXT4_SB(sb); @@ -3257,6 +3257,15 @@ static bool ext4_inode_datasync_dirty(struct inode *= inode) return inode->i_state & I_DIRTY_DATASYNC; } =20 +static bool ext4_iomap_valid(struct inode *inode, const struct iomap *ioma= p) +{ + return iomap->validity_cookie =3D=3D READ_ONCE(EXT4_I(inode)->i_es_seq); +} + +static const struct iomap_folio_ops ext4_iomap_folio_ops =3D { + .iomap_valid =3D ext4_iomap_valid, +}; + static void ext4_set_iomap(struct inode *inode, struct iomap *iomap, struct ext4_map_blocks *map, loff_t offset, loff_t length, unsigned int flags) @@ -3287,6 +3296,9 @@ static void ext4_set_iomap(struct inode *inode, struc= t iomap *iomap, !ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) iomap->flags |=3D IOMAP_F_MERGED; =20 + iomap->validity_cookie =3D READ_ONCE(EXT4_I(inode)->i_es_seq); + iomap->folio_ops =3D &ext4_iomap_folio_ops; + /* * Flags passed to ext4_map_blocks() for direct I/O writes can result * in m_flags having both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN bits @@ -3526,11 +3538,57 @@ const struct iomap_ops ext4_iomap_report_ops =3D { .iomap_begin =3D ext4_iomap_begin_report, }; =20 -static int ext4_iomap_buffered_read_begin(struct inode *inode, loff_t offs= et, - loff_t length, unsigned int flags, struct iomap *iomap, - struct iomap *srcmap) +static int ext4_iomap_get_blocks(struct inode *inode, + struct ext4_map_blocks *map) { - int ret; + loff_t i_size =3D i_size_read(inode); + handle_t *handle; + int ret, needed_blocks; + + /* + * Check if the blocks have already been allocated, this could + * avoid initiating a new journal transaction and return the + * mapping information directly. + */ + if ((map->m_lblk + map->m_len) <=3D + round_up(i_size, i_blocksize(inode)) >> inode->i_blkbits) { + ret =3D ext4_map_blocks(NULL, inode, map, 0); + if (ret < 0) + return ret; + if (map->m_flags & (EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN | + EXT4_MAP_DELAYED)) + return 0; + } + + /* + * Reserve one block more for addition to orphan list in case + * we allocate blocks but write fails for some reason. + */ + needed_blocks =3D ext4_writepage_trans_blocks(inode) + 1; + handle =3D ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, needed_blocks); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + ret =3D ext4_map_blocks(handle, inode, map, + EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); + /* + * We need to stop handle here due to a potential deadlock caused + * by the subsequent call to balance_dirty_pages(). This function + * may wait for the dirty pages to be written back, which could + * initiate another handle and cause it to wait for the first + * handle to complete. + */ + ext4_journal_stop(handle); + + return ret; +} + +static int ext4_iomap_buffered_begin(struct inode *inode, loff_t offset, + loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap, + bool delalloc) +{ + int ret, retries =3D 0; struct ext4_map_blocks map; u8 blkbits =3D inode->i_blkbits; =20 @@ -3541,13 +3599,23 @@ static int ext4_iomap_buffered_read_begin(struct in= ode *inode, loff_t offset, /* Inline data support is not yet available. */ if (WARN_ON_ONCE(ext4_has_inline_data(inode))) return -ERANGE; - +retry: /* Calculate the first and last logical blocks respectively. */ map.m_lblk =3D offset >> blkbits; map.m_len =3D min_t(loff_t, (offset + length - 1) >> blkbits, EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; + if (flags & IOMAP_WRITE) { + if (delalloc) + ret =3D ext4_da_map_blocks(inode, &map); + else + ret =3D ext4_iomap_get_blocks(inode, &map); =20 - ret =3D ext4_map_blocks(NULL, inode, &map, 0); + if (ret =3D=3D -ENOSPC && + ext4_should_retry_alloc(inode->i_sb, &retries)) + goto retry; + } else { + ret =3D ext4_map_blocks(NULL, inode, &map, 0); + } if (ret < 0) return ret; =20 @@ -3555,6 +3623,79 @@ static int ext4_iomap_buffered_read_begin(struct ino= de *inode, loff_t offset, return 0; } =20 +static int ext4_iomap_buffered_read_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + return ext4_iomap_buffered_begin(inode, offset, length, flags, + iomap, srcmap, false); +} + +static int ext4_iomap_buffered_write_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + return ext4_iomap_buffered_begin(inode, offset, length, flags, + iomap, srcmap, false); +} + +static int ext4_iomap_buffered_da_write_begin(struct inode *inode, + loff_t offset, loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + return ext4_iomap_buffered_begin(inode, offset, length, flags, + iomap, srcmap, true); +} + +/* + * Drop the staled delayed allocation range from the write failure, + * including both start and end blocks. If not, we could leave a range + * of delayed extents covered by a clean folio, it could lead to + * inaccurate space reservation. + */ +static void ext4_iomap_punch_delalloc(struct inode *inode, loff_t offset, + loff_t length, struct iomap *iomap) +{ + down_write(&EXT4_I(inode)->i_data_sem); + ext4_es_remove_extent(inode, offset >> inode->i_blkbits, + DIV_ROUND_UP_ULL(length, EXT4_BLOCK_SIZE(inode->i_sb))); + up_write(&EXT4_I(inode)->i_data_sem); +} + +static int ext4_iomap_buffered_da_write_end(struct inode *inode, loff_t of= fset, + loff_t length, ssize_t written, + unsigned int flags, + struct iomap *iomap) +{ + loff_t start_byte, end_byte; + + /* If we didn't reserve the blocks, we're not allowed to punch them. */ + if (iomap->type !=3D IOMAP_DELALLOC || !(iomap->flags & IOMAP_F_NEW)) + return 0; + + /* Nothing to do if we've written the entire delalloc extent */ + start_byte =3D iomap_last_written_block(inode, offset, written); + end_byte =3D round_up(offset + length, i_blocksize(inode)); + if (start_byte >=3D end_byte) + return 0; + + filemap_invalidate_lock(inode->i_mapping); + iomap_write_delalloc_release(inode, start_byte, end_byte, flags, + iomap, ext4_iomap_punch_delalloc); + filemap_invalidate_unlock(inode->i_mapping); + return 0; +} + + +const struct iomap_ops ext4_iomap_buffered_write_ops =3D { + .iomap_begin =3D ext4_iomap_buffered_write_begin, +}; + +const struct iomap_ops ext4_iomap_buffered_da_write_ops =3D { + .iomap_begin =3D ext4_iomap_buffered_da_write_begin, + .iomap_end =3D ext4_iomap_buffered_da_write_end, +}; + const struct iomap_ops ext4_iomap_buffered_read_ops =3D { .iomap_begin =3D ext4_iomap_buffered_read_begin, }; --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47C2B1494D8; Tue, 22 Oct 2024 03:13:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566784; cv=none; b=CMcP0xUkL2PuEDiLi8VuKhXPrsb99eVrT5ZduHEx7vYpyAaSEahAmE8FP3vpQUJ+zYq9tMW5lQfqzAaqRpXbi1NwNZm8PLqFIO9+0G77XnGBlThwbNlVs/mlC1Wat3DBowNB21qZ7xU+PxszWHGu2o6vWEvDmIa+kzjhZ331X6s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566784; c=relaxed/simple; bh=JtRQRisxrHnV8IY4GtAev6/mcJf6zvVRjOwlQ8nLm5M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Awj37Oi5zY/OASHTDzf8+f16e6tda/sPAmb58o8TYUKp7bL/i9MZQMB8C8w7ZiyP+zyZpzjUIhtboN5GzKOnF7oxzDitlK1Vn8EEf7GrEg1ZLdzPBYLyPJBwnpT3xDfMO7t/ZD0TSm1jleL6tNgY11qxaA3cCUwPImk1ZVe4eCw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgM0yXXz4f3jkk; Tue, 22 Oct 2024 11:12:47 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 64E721A018C; Tue, 22 Oct 2024 11:12:59 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S20; Tue, 22 Oct 2024 11:12:59 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 16/27] ext4: don't order data for inode with EXT4_STATE_BUFFERED_IOMAP Date: Tue, 22 Oct 2024 19:10:47 +0800 Message-ID: <20241022111059.2566137-17-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S20 X-Coremail-Antispam: 1UD129KBjvdXoWrtr48GFy3AF4rGF4fCryfJFb_yoWkGrc_Ga s7Ar48K34SvF1xWFWFkr15Xas29r48Wr1rWFZ5tr4fuFyUJrs7Aw1kZrsrZry5Wr42kF9x Cry8Gr1rt3W2gjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbQAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k26cxKx2IYs7xG 6rWj6s0DM7CIcVAFz4kK6r1j6r18M280x2IEY4vEnII2IxkI6r1a6r45M28IrcIa0xkI8V A2jI8067AKxVWUAVCq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJ M28CjxkF64kEwVA0rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2I x0cI8IcVCY1x0267AKxVWxJr0_GcWl84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vE x4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I64 8v4I1lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcV CF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIE c7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjTRio7uDUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi In the iomap buffered I/O path, there is no risk of exposing stale data because we always allocate unwritten extents for new allocated blocks, the extent changes to written only when the I/O is completed. Therefore, we do not need to order data in this mode. Signed-off-by: Zhang Yi --- fs/ext4/ext4_jbd2.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 0c77697d5e90..9dca10027032 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -467,6 +467,14 @@ static inline int ext4_should_journal_data(struct inod= e *inode) =20 static inline int ext4_should_order_data(struct inode *inode) { + /* + * There is no need to order data for inodes with iomap buffered I/O + * path since it always allocate unwritten extents for new allocated + * blocks and have no risk of stale data. + */ + if (ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) + return 0; + return ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE; } =20 --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3FAF14AD02; Tue, 22 Oct 2024 03:13:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566786; cv=none; b=FaRq/XgTfj1wmQOXL3AfKro8N1gvYGLt2gSozswQnFYxcOgYgvJxilELrLTKR7aWv/EFm6hrcujROGFPw7cRoZRqfAM7kgZ6CaPvwIw3qH1m7NgYDPfuKUnq8sagW9pdd4e1EyvT4QM6DH3sV7DQXA1tjA8EWJsz9Rgwd0aAynk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566786; c=relaxed/simple; bh=4wNxr6z3hjdV+Va2/DAKywfCxx5XItXDt3rDGOdQbBs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tGXW4ccsm3uTUEbhkXX2slLh9If6BTMXCsosDgOUvHaJc6YPWjaN7FAEnGXAEMUzL+qV+6ZkHszgOmjC2Cf4igGResrj2Esaap9PuAGHhIqxJKH22YjsUDBg7d+0FNFGd5smRrIAXrzVCd3TXKeeyjgekrxaA2jyeJhhvpHieAc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgM5bWvz4f3jkm; Tue, 22 Oct 2024 11:12:47 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 0C73F1A0568; Tue, 22 Oct 2024 11:13:00 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S21; Tue, 22 Oct 2024 11:12:59 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 17/27] ext4: implement writeback iomap path Date: Tue, 22 Oct 2024 19:10:48 +0800 Message-ID: <20241022111059.2566137-18-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S21 X-Coremail-Antispam: 1UD129KBjvAXoWfGFyruFW3Cry8JFy5ZF43Jrb_yoW8Xw4DCo WSva13XF48Jr98ta9Ykr1fJFyUuan7Ga1rAF15Zr40qa43JF1a9w4xGw43X3W7Ww4Fkryx ZryxJa15Gr4kJF4rn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOH7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l87I20VAvwVAaII0Ic2I_JFv_Gryl82xGYIkIc2 x26280x7IE14v26r126s0DM28IrcIa0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Y j41l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwV C0I7IYx2IY6xkF7I0E14v26F4UJVW0owA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xv wVC2z280aVCY1x0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFc xC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_ Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2 IErcIFxwACI402YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY 0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I 0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAI cVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42 IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280 aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRtl1hUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Implement ext4_iomap_writepages(), introduce ext4_writeback_ops, and create an end I/O extent conversion worker to implement the iomap buffered write-back path. In the map_blocks() handler, we first query the longest range of existing mapped extents. If the block range has not already been allocated, we attempt to allocate a range of blocks that is as long as possible to minimize the number of block mappings. This allocation is based on the write-back length and the delalloc extent length, rather than allocating for a single folio at a time. In the ->prepare_ioend() handler, we register the end I/O worker to convert unwritten extents into written extents. There are three key differences between the buffer_head write-back path and the iomap write-back path: 1) Since we aim to allocate a range of blocks as long as possible within the writeback length for each invocation of ->map_blocks(), we may allocate a long range but write less in certain corner cases. Therefore, we cannot convert the extent to written in advance within ->map_blocks(). Fortunately, there is minimal risk of losing data between split extents during the write-back and the end I/O process. We defer this action to the end I/O worker, where we can accurately determine the actual written length. Besides, we should remove the warning in ext4_convert_unwritten_extents_endio(). 2) Since we do not order data, the journal thread is not required to write back data. Besides, we also do not need to use the reserve handle when converting the unwritten extent in the end I/O worker, we can start normal handle directly. 3) We can also delay updating the i_disksize until the end of the I/O, which could prevent the exposure of zero data that may occur during a system crash while performing buffer append writes in the buffer_head buffered write path. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 4 + fs/ext4/extents.c | 22 +++--- fs/ext4/inode.c | 188 +++++++++++++++++++++++++++++++++++++++++++++- fs/ext4/page-io.c | 105 ++++++++++++++++++++++++++ fs/ext4/super.c | 2 + 5 files changed, 311 insertions(+), 10 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index a09f96ef17d8..d4d594d97634 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1151,6 +1151,8 @@ struct ext4_inode_info { */ struct list_head i_rsv_conversion_list; struct work_struct i_rsv_conversion_work; + struct list_head i_iomap_ioend_list; + struct work_struct i_iomap_ioend_work; =20 spinlock_t i_block_reservation_lock; =20 @@ -3773,6 +3775,8 @@ int ext4_bio_write_folio(struct ext4_io_submit *io, s= truct folio *page, size_t len); extern struct ext4_io_end_vec *ext4_alloc_io_end_vec(ext4_io_end_t *io_end= ); extern struct ext4_io_end_vec *ext4_last_io_end_vec(ext4_io_end_t *io_end); +extern void ext4_iomap_end_io(struct work_struct *work); +extern void ext4_iomap_end_bio(struct bio *bio); =20 /* mmp.c */ extern int ext4_multi_mount_protect(struct super_block *, ext4_fsblk_t); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 33bc2cc5aff4..4b30e6f0a634 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3760,20 +3760,24 @@ ext4_convert_unwritten_extents_endio(handle_t *hand= le, struct inode *inode, ext_debug(inode, "logical block %llu, max_blocks %u\n", (unsigned long long)ee_block, ee_len); =20 - /* If extent is larger than requested it is a clear sign that we still - * have some extent state machine issues left. So extent_split is still - * required. - * TODO: Once all related issues will be fixed this situation should be - * illegal. + /* + * If the extent is larger than requested, we should split it here. + * For inodes using the iomap buffered I/O path, we do not split in + * advance during the write-back process. Therefore, we may need to + * perform the split during the end I/O process here. However, + * other inodes should not require this action. */ if (ee_block !=3D map->m_lblk || ee_len > map->m_len) { int flags =3D EXT4_GET_BLOCKS_CONVERT | EXT4_GET_BLOCKS_METADATA_NOFAIL; #ifdef CONFIG_EXT4_DEBUG - ext4_warning(inode->i_sb, "Inode (%ld) finished: extent logical block %l= lu," - " len %u; IO logical block %llu, len %u", - inode->i_ino, (unsigned long long)ee_block, ee_len, - (unsigned long long)map->m_lblk, map->m_len); + if (!ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) { + ext4_warning(inode->i_sb, + "Inode (%ld) finished: extent logical block %llu, len %u; IO logi= cal block %llu, len %u", + inode->i_ino, (unsigned long long)ee_block, + ee_len, (unsigned long long)map->m_lblk, + map->m_len); + } #endif path =3D ext4_split_convert_extents(handle, inode, map, path, flags, NULL); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 23cbcaab0a56..a260942fd2dd 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -44,6 +44,7 @@ #include =20 #include "ext4_jbd2.h" +#include "ext4_extents.h" #include "xattr.h" #include "acl.h" #include "truncate.h" @@ -3710,10 +3711,195 @@ static void ext4_iomap_readahead(struct readahead_= control *rac) iomap_readahead(rac, &ext4_iomap_buffered_read_ops); } =20 +struct ext4_writeback_ctx { + struct iomap_writepage_ctx ctx; + struct writeback_control *wbc; + unsigned int data_seq; +}; + +static int ext4_iomap_map_one_extent(struct inode *inode, + struct ext4_map_blocks *map) +{ + struct extent_status es; + handle_t *handle =3D NULL; + int credits, map_flags; + int retval; + + credits =3D ext4_da_writepages_trans_blocks(inode); + handle =3D ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, credits); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + map->m_flags =3D 0; + /* + * It is necessary to look up extent and map blocks under i_data_sem + * in write mode, otherwise, the delalloc extent may become stale + * during concurrent truncate operations. + */ + down_write(&EXT4_I(inode)->i_data_sem); + if (likely(ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es))) { + retval =3D es.es_len - (map->m_lblk - es.es_lblk); + map->m_len =3D min_t(unsigned int, retval, map->m_len); + + if (ext4_es_is_delayed(&es)) { + map->m_flags |=3D EXT4_MAP_DELAYED; + trace_ext4_da_write_pages_extent(inode, map); + /* + * Call ext4_map_create_blocks() to allocate any + * delayed allocation blocks. It is possible that + * we're going to need more metadata blocks, however + * we must not fail because we're in writeback and + * there is nothing we can do so it might result in + * data loss. So use reserved blocks to allocate + * metadata if possible. + */ + map_flags =3D EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT | + EXT4_GET_BLOCKS_METADATA_NOFAIL; + + retval =3D ext4_map_create_blocks(handle, inode, map, + map_flags); + goto out; + } + if (unlikely(ext4_es_is_hole(&es))) + goto out; + + /* Found written or unwritten extent. */ + map->m_pblk =3D ext4_es_pblock(&es) + map->m_lblk - + es.es_lblk; + map->m_flags =3D ext4_es_is_written(&es) ? + EXT4_MAP_MAPPED : EXT4_MAP_UNWRITTEN; + goto out; + } + + retval =3D ext4_map_query_blocks(handle, inode, map); +out: + up_write(&EXT4_I(inode)->i_data_sem); + ext4_journal_stop(handle); + return retval < 0 ? retval : 0; +} + +static int ext4_iomap_map_blocks(struct iomap_writepage_ctx *wpc, + struct inode *inode, loff_t offset, + unsigned int dirty_len) +{ + struct ext4_writeback_ctx *ewpc =3D + container_of(wpc, struct ext4_writeback_ctx, ctx); + struct super_block *sb =3D inode->i_sb; + struct journal_s *journal =3D EXT4_SB(sb)->s_journal; + struct ext4_inode_info *ei =3D EXT4_I(inode); + struct ext4_map_blocks map; + unsigned int blkbits =3D inode->i_blkbits; + unsigned int index =3D offset >> blkbits; + unsigned int end, len; + int ret; + + if (unlikely(ext4_forced_shutdown(inode->i_sb))) + return -EIO; + + /* Check validity of the cached writeback mapping. */ + if (offset >=3D wpc->iomap.offset && + offset < wpc->iomap.offset + wpc->iomap.length && + ewpc->data_seq =3D=3D READ_ONCE(ei->i_es_seq)) + return 0; + + end =3D min_t(unsigned int, (ewpc->wbc->range_end >> blkbits), + (UINT_MAX - 1)); + len =3D (end > index + dirty_len) ? end - index + 1 : dirty_len; + +retry: + map.m_lblk =3D index; + map.m_len =3D min_t(unsigned int, MAX_WRITEPAGES_EXTENT_LEN, len); + ret =3D ext4_map_blocks(NULL, inode, &map, 0); + if (ret < 0) + return ret; + + /* + * The map is not a delalloc extent, it must either be a hole + * or an extent which have already been allocated. + */ + if (!(map.m_flags & EXT4_MAP_DELAYED)) + goto out; + + /* Map one delalloc extent. */ + ret =3D ext4_iomap_map_one_extent(inode, &map); + if (ret < 0) { + if (ext4_forced_shutdown(sb)) + return ret; + + /* + * Retry transient ENOSPC errors, if + * ext4_count_free_blocks() is non-zero, a commit + * should free up blocks. + */ + if (ret =3D=3D -ENOSPC && journal && ext4_count_free_clusters(sb)) { + jbd2_journal_force_commit_nested(journal); + goto retry; + } + + ext4_msg(sb, KERN_CRIT, + "Delayed block allocation failed for inode %lu at logical offset %llu = with max blocks %u with error %d", + inode->i_ino, (unsigned long long)map.m_lblk, + (unsigned int)map.m_len, -ret); + ext4_msg(sb, KERN_CRIT, + "This should not happen!! Data will be lost\n"); + if (ret =3D=3D -ENOSPC) + ext4_print_free_blocks(inode); + return ret; + } +out: + ewpc->data_seq =3D READ_ONCE(ei->i_es_seq); + ext4_set_iomap(inode, &wpc->iomap, &map, offset, + map.m_len << blkbits, 0); + return 0; +} + +static int ext4_iomap_prepare_ioend(struct iomap_ioend *ioend, int status) +{ + struct ext4_inode_info *ei =3D EXT4_I(ioend->io_inode); + + /* Need to convert unwritten extents when I/Os are completed. */ + if (ioend->io_type =3D=3D IOMAP_UNWRITTEN || + ioend->io_offset + ioend->io_size > READ_ONCE(ei->i_disksize)) + ioend->io_bio.bi_end_io =3D ext4_iomap_end_bio; + + return status; +} + +static void ext4_iomap_discard_folio(struct folio *folio, loff_t pos) +{ + struct inode *inode =3D folio->mapping->host; + loff_t length =3D folio_pos(folio) + folio_size(folio) - pos; + + ext4_iomap_punch_delalloc(inode, pos, length, NULL); +} + +static const struct iomap_writeback_ops ext4_writeback_ops =3D { + .map_blocks =3D ext4_iomap_map_blocks, + .prepare_ioend =3D ext4_iomap_prepare_ioend, + .discard_folio =3D ext4_iomap_discard_folio, +}; + static int ext4_iomap_writepages(struct address_space *mapping, struct writeback_control *wbc) { - return 0; + struct inode *inode =3D mapping->host; + struct super_block *sb =3D inode->i_sb; + long nr =3D wbc->nr_to_write; + int alloc_ctx, ret; + struct ext4_writeback_ctx ewpc =3D { + .wbc =3D wbc, + }; + + if (unlikely(ext4_forced_shutdown(sb))) + return -EIO; + + alloc_ctx =3D ext4_writepages_down_read(sb); + trace_ext4_writepages(inode, wbc); + ret =3D iomap_writepages(mapping, wbc, &ewpc.ctx, &ext4_writeback_ops); + trace_ext4_writepages_result(inode, wbc, ret, nr - wbc->nr_to_write); + ext4_writepages_up_read(sb, alloc_ctx); + + return ret; } =20 /* diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index ad5543866d21..659ee0fb7cea 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -562,3 +563,107 @@ int ext4_bio_write_folio(struct ext4_io_submit *io, s= truct folio *folio, =20 return 0; } + +static void ext4_iomap_finish_ioend(struct iomap_ioend *ioend) +{ + struct inode *inode =3D ioend->io_inode; + struct ext4_inode_info *ei =3D EXT4_I(inode); + loff_t pos =3D ioend->io_offset; + size_t size =3D ioend->io_size; + loff_t new_disksize; + handle_t *handle; + int credits; + int ret, err; + + ret =3D blk_status_to_errno(ioend->io_bio.bi_status); + if (unlikely(ret)) + goto out; + + /* + * We may need to convert up to one extent per block in + * the page and we may dirty the inode. + */ + credits =3D ext4_chunk_trans_blocks(inode, + EXT4_MAX_BLOCKS(size, pos, inode->i_blkbits)); + handle =3D ext4_journal_start(inode, EXT4_HT_EXT_CONVERT, credits); + if (IS_ERR(handle)) { + ret =3D PTR_ERR(handle); + goto out_err; + } + + if (ioend->io_type =3D=3D IOMAP_UNWRITTEN) { + ret =3D ext4_convert_unwritten_extents(handle, inode, pos, size); + if (ret) + goto out_journal; + } + + /* + * Update on-disk size after IO is completed. Races with + * truncate are avoided by checking i_size under i_data_sem. + */ + new_disksize =3D pos + size; + if (new_disksize > READ_ONCE(ei->i_disksize)) { + down_write(&ei->i_data_sem); + new_disksize =3D min(new_disksize, i_size_read(inode)); + if (new_disksize > ei->i_disksize) + ei->i_disksize =3D new_disksize; + up_write(&ei->i_data_sem); + ret =3D ext4_mark_inode_dirty(handle, inode); + if (ret) + EXT4_ERROR_INODE_ERR(inode, -ret, + "Failed to mark inode dirty"); + } + +out_journal: + err =3D ext4_journal_stop(handle); + if (!ret) + ret =3D err; +out_err: + if (ret < 0 && !ext4_forced_shutdown(inode->i_sb)) { + ext4_msg(inode->i_sb, KERN_EMERG, + "failed to convert unwritten extents to written extents or update inod= e size -- potential data loss! (inode %lu, error %d)", + inode->i_ino, ret); + } +out: + iomap_finish_ioends(ioend, ret); +} + +/* + * Work on buffered iomap completed IO, to convert unwritten extents to + * mapped extents + */ +void ext4_iomap_end_io(struct work_struct *work) +{ + struct ext4_inode_info *ei =3D container_of(work, struct ext4_inode_info, + i_iomap_ioend_work); + struct iomap_ioend *ioend; + struct list_head ioend_list; + unsigned long flags; + + spin_lock_irqsave(&ei->i_completed_io_lock, flags); + list_replace_init(&ei->i_iomap_ioend_list, &ioend_list); + spin_unlock_irqrestore(&ei->i_completed_io_lock, flags); + + iomap_sort_ioends(&ioend_list); + while (!list_empty(&ioend_list)) { + ioend =3D list_entry(ioend_list.next, struct iomap_ioend, io_list); + list_del_init(&ioend->io_list); + iomap_ioend_try_merge(ioend, &ioend_list); + ext4_iomap_finish_ioend(ioend); + } +} + +void ext4_iomap_end_bio(struct bio *bio) +{ + struct iomap_ioend *ioend =3D iomap_ioend_from_bio(bio); + struct ext4_inode_info *ei =3D EXT4_I(ioend->io_inode); + struct ext4_sb_info *sbi =3D EXT4_SB(ioend->io_inode->i_sb); + unsigned long flags; + + /* Only reserved conversions from writeback should enter here */ + spin_lock_irqsave(&ei->i_completed_io_lock, flags); + if (list_empty(&ei->i_iomap_ioend_list)) + queue_work(sbi->rsv_conversion_wq, &ei->i_iomap_ioend_work); + list_add_tail(&ioend->io_list, &ei->i_iomap_ioend_list); + spin_unlock_irqrestore(&ei->i_completed_io_lock, flags); +} diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a01e0bbe57c8..56baadec27e0 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1419,11 +1419,13 @@ static struct inode *ext4_alloc_inode(struct super_= block *sb) #endif ei->jinode =3D NULL; INIT_LIST_HEAD(&ei->i_rsv_conversion_list); + INIT_LIST_HEAD(&ei->i_iomap_ioend_list); spin_lock_init(&ei->i_completed_io_lock); ei->i_sync_tid =3D 0; ei->i_datasync_tid =3D 0; atomic_set(&ei->i_unwritten, 0); INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work); + INIT_WORK(&ei->i_iomap_ioend_work, ext4_iomap_end_io); ext4_fc_init_inode(&ei->vfs_inode); mutex_init(&ei->i_fc_lock); return &ei->vfs_inode; --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 796F315534E; Tue, 22 Oct 2024 03:13:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566786; cv=none; b=sHviAgBmSQgTCb447XvrnoHPxRtjfxC0f+4Oap+/CjTLigB9neJk9aUrguMyN+gFMJy3eL5lfxVP5yIRh3x5QHlN2HhvX5DWOibpDbqzqCLbCM1ejv1KXQhzbao2TArFfY6raNoOf4FNaeK9xrl6CbA88zs/hLthAl1kUL19L6E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566786; c=relaxed/simple; bh=sl8nMZCfHahdvL0vfn1kF0RABscNeTDuKi83wBvnCRw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=m/SkHpo+40S1J7VQcYdOF5Pxx60cPRSFjpGR/Wdc1xejRfAWtPrfR3Bt0orAN7XsyPzytfQd3MUHJenUukbzr7/dNK8E3iU4uMj3jKK2n1KB0HrY/UNjIrEu0BHPDWyc9/h1/u2t936DamsTHf7bY+jN+yOv/uLTuzU8aEkNO3c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4XXcgH2HSYz4f3jMP; Tue, 22 Oct 2024 11:12:43 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id AA7E01A058E; Tue, 22 Oct 2024 11:13:00 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S22; Tue, 22 Oct 2024 11:13:00 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 18/27] ext4: implement mmap iomap path Date: Tue, 22 Oct 2024 19:10:49 +0800 Message-ID: <20241022111059.2566137-19-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S22 X-Coremail-Antispam: 1UD129KBjvJXoW7Kw1DArW3CrykXF1ftr1DJrb_yoW8WFW3pF 9IkrWrGr47XwnY9FsagF98ZF1Yya4rWr47Xry3Crn8Zrnru3y5ta18KFn5ZF45J3yxZr4r tF4Fkr1UWw1akrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK 6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4 xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8 JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI 7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0x vE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv 6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7sRRgAFtUUUUU== X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Introduce ext4_iomap_page_mkwrite() to implement the mmap iomap path. It invoke iomap_page_mkwrite() and passes ext4_iomap_buffered_[da_]write_ops to make the folio dirty and map blocks, Almost all other work are handled by iomap_page_mkwrite(). Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a260942fd2dd..0a9b73534257 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6478,6 +6478,23 @@ static int ext4_bh_unmapped(handle_t *handle, struct= inode *inode, return !buffer_mapped(bh); } =20 +static vm_fault_t ext4_iomap_page_mkwrite(struct vm_fault *vmf) +{ + struct inode *inode =3D file_inode(vmf->vma->vm_file); + const struct iomap_ops *iomap_ops; + + /* + * ext4_nonda_switch() could writeback this folio, so have to + * call it before lock folio. + */ + if (test_opt(inode->i_sb, DELALLOC) && !ext4_nonda_switch(inode->i_sb)) + iomap_ops =3D &ext4_iomap_buffered_da_write_ops; + else + iomap_ops =3D &ext4_iomap_buffered_write_ops; + + return iomap_page_mkwrite(vmf, iomap_ops); +} + vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; @@ -6501,6 +6518,11 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) =20 filemap_invalidate_lock_shared(mapping); =20 + if (ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) { + ret =3D ext4_iomap_page_mkwrite(vmf); + goto out; + } + err =3D ext4_convert_inline_data(inode); if (err) goto out_ret; --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2476318EFF8; Tue, 22 Oct 2024 03:13:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566786; cv=none; b=DU7hX21GsFTxXGciC/BvnuumhvtkETkWsaki2yBqLy4BL642bMCYp84C/9q8VAMiSO3/YkwV0qYIXI4KvN55Pf/5R7JOnGwG5MuCAvD4YXsglcHCKsRotSYRVT/gEhYQ24XMh+4i6AKxaESFdSkVIzFkhcSoz5PsugTHipLOt0Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566786; c=relaxed/simple; bh=4NoQniKGHNdVBDky4VIHNSuSe3OHkt9IWIU1DQzi2tM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BpGIJaF3il8JrNppxXV4fFlnQFpmSYX4erSGAJmWH63IKFMKe/eeMd2CuCvLuFuBJzRynBEaIId8Vsm6Ck1nvy71uybDZ7pcJD6h/Euvh8Z0maBuYV6SoEzxRNoN1wZcFftgJseCfqXFQDdkQohUGKDvr95rf2tkjLlTBCpSYjo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgH13tZz4f3lVs; Tue, 22 Oct 2024 11:12:43 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 50B7E1A058E; Tue, 22 Oct 2024 11:13:01 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S23; Tue, 22 Oct 2024 11:13:01 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 19/27] ext4: do not always order data when partial zeroing out a block Date: Tue, 22 Oct 2024 19:10:50 +0800 Message-ID: <20241022111059.2566137-20-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S23 X-Coremail-Antispam: 1UD129KBjvJXoW3Gr13WFW3JrWDKr4xAry5urg_yoW7Ar13pr y5tw45ur47ua4q9F4xWF1DXr1ak3Z3GFW8WrW7G3sYv3s3X3WxKFy5K3WFyF4jgw4xXay0 qF4YyrWjgw1UArJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQm14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vE x4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I64 8v4I1lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcV CF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIE c7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjTRio7uDUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi When zeroing out a partial block during a partial truncate, zeroing range, or punching a hole, it is essential to order the data only during the partial truncate. This is necessary because there is a risk of exposing stale data. Consider a scenario in which a crash occurs just after the i_disksize transaction has been submitted but before the zeroed data is written out. In this case, the tail block will retain stale data, which could be exposed on the next expand truncate operation. However, partial zeroing range and punching hole don not have this risk. Therefore, we could move the ext4_jbd2_inode_add_write() out to ext4_truncate(), only order data for the partial truncate. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 50 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 38 insertions(+), 12 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 0a9b73534257..97be75cde481 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4038,7 +4038,9 @@ void ext4_set_aops(struct inode *inode) * racing writeback can come later and flush the stale pagecache to disk. */ static int __ext4_block_zero_page_range(handle_t *handle, - struct address_space *mapping, loff_t from, loff_t length) + struct address_space *mapping, + loff_t from, loff_t length, + bool *did_zero) { ext4_fsblk_t index =3D from >> PAGE_SHIFT; unsigned offset =3D from & (PAGE_SIZE-1); @@ -4116,14 +4118,16 @@ static int __ext4_block_zero_page_range(handle_t *h= andle, =20 if (ext4_should_journal_data(inode)) { err =3D ext4_dirty_journalled_data(handle, bh); + if (err) + goto unlock; } else { err =3D 0; mark_buffer_dirty(bh); - if (ext4_should_order_data(inode)) - err =3D ext4_jbd2_inode_add_write(handle, inode, from, - length); } =20 + if (did_zero) + *did_zero =3D true; + unlock: folio_unlock(folio); folio_put(folio); @@ -4138,7 +4142,9 @@ static int __ext4_block_zero_page_range(handle_t *han= dle, * that corresponds to 'from' */ static int ext4_block_zero_page_range(handle_t *handle, - struct address_space *mapping, loff_t from, loff_t length) + struct address_space *mapping, + loff_t from, loff_t length, + bool *did_zero) { struct inode *inode =3D mapping->host; unsigned offset =3D from & (PAGE_SIZE-1); @@ -4156,7 +4162,8 @@ static int ext4_block_zero_page_range(handle_t *handl= e, return dax_zero_range(inode, from, length, NULL, &ext4_iomap_ops); } - return __ext4_block_zero_page_range(handle, mapping, from, length); + return __ext4_block_zero_page_range(handle, mapping, from, length, + did_zero); } =20 /* @@ -4166,12 +4173,15 @@ static int ext4_block_zero_page_range(handle_t *han= dle, * of that block so it doesn't yield old data if the file is later grown. */ static int ext4_block_truncate_page(handle_t *handle, - struct address_space *mapping, loff_t from) + struct address_space *mapping, loff_t from, + loff_t *zero_len) { unsigned offset =3D from & (PAGE_SIZE-1); unsigned length; unsigned blocksize; struct inode *inode =3D mapping->host; + bool did_zero =3D false; + int ret; =20 /* If we are processing an encrypted inode during orphan list handling */ if (IS_ENCRYPTED(inode) && !fscrypt_has_encryption_key(inode)) @@ -4180,7 +4190,13 @@ static int ext4_block_truncate_page(handle_t *handle, blocksize =3D inode->i_sb->s_blocksize; length =3D blocksize - (offset & (blocksize - 1)); =20 - return ext4_block_zero_page_range(handle, mapping, from, length); + ret =3D ext4_block_zero_page_range(handle, mapping, from, length, + &did_zero); + if (ret) + return ret; + + *zero_len =3D length; + return 0; } =20 int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, @@ -4203,13 +4219,14 @@ int ext4_zero_partial_blocks(handle_t *handle, stru= ct inode *inode, if (start =3D=3D end && (partial_start || (partial_end !=3D sb->s_blocksize - 1))) { err =3D ext4_block_zero_page_range(handle, mapping, - lstart, length); + lstart, length, NULL); return err; } /* Handle partial zero out on the start of the range */ if (partial_start) { err =3D ext4_block_zero_page_range(handle, mapping, - lstart, sb->s_blocksize); + lstart, sb->s_blocksize, + NULL); if (err) return err; } @@ -4217,7 +4234,7 @@ int ext4_zero_partial_blocks(handle_t *handle, struct= inode *inode, if (partial_end !=3D sb->s_blocksize - 1) err =3D ext4_block_zero_page_range(handle, mapping, byte_end - partial_end, - partial_end + 1); + partial_end + 1, NULL); return err; } =20 @@ -4517,6 +4534,7 @@ int ext4_truncate(struct inode *inode) int err =3D 0, err2; handle_t *handle; struct address_space *mapping =3D inode->i_mapping; + loff_t zero_len =3D 0; =20 /* * There is a possibility that we're either freeing the inode @@ -4560,7 +4578,15 @@ int ext4_truncate(struct inode *inode) } =20 if (inode->i_size & (inode->i_sb->s_blocksize - 1)) - ext4_block_truncate_page(handle, mapping, inode->i_size); + ext4_block_truncate_page(handle, mapping, inode->i_size, + &zero_len); + + if (zero_len && ext4_should_order_data(inode)) { + err =3D ext4_jbd2_inode_add_write(handle, inode, inode->i_size, + zero_len); + if (err) + goto out_stop; + } =20 /* * We add the inode to the orphan list, so that if this --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D57B319F424; Tue, 22 Oct 2024 03:13:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566787; cv=none; b=qSxt1bORvL3naEvPjwXcSBYIzqlxzWsM38Vv1ISi3Lfncff5HvgDqUis02fUWdVpptthgR2uc41ks11h/VGvj43Fo4YOdpnWaCIMqrsGjwE5lRE6+96GkECM5IxxDRfAvxfvnZS6NI/QLNOoTcd+WQtrXx014KggRSfGA9jwIWg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566787; c=relaxed/simple; bh=ZybBkTVx2KW71/Nmb2vUl1H2ZG3QiraM4QFypEzqLtg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=k0jPOL58QmFEpIt+i/Brb3zGVGm/fQbcpuyPSHxAvKPNmawoDDsQHDDuk20UrGkHXvukt2aPiqmEJHQONUkizRrvhL+nIOwIDg2YSt9xF2TPOS3p6tuXgufWxadrzJO11MI8vFGGBYKV9Q3g1prma1O3UaJoTxaOKwLNWUwMxKY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4XXcgJ4Dd7z4f3jXg; Tue, 22 Oct 2024 11:12:44 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id EBDD71A0194; Tue, 22 Oct 2024 11:13:01 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S24; Tue, 22 Oct 2024 11:13:01 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 20/27] ext4: do not start handle if unnecessary while partial zeroing out a block Date: Tue, 22 Oct 2024 19:10:51 +0800 Message-ID: <20241022111059.2566137-21-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S24 X-Coremail-Antispam: 1UD129KBjvJXoW3tFW7ZF48ur1fAF47Gw15Arb_yoWDZF45pr y5Gr15Cr47ua4q9F4IgF4DXr1Ik3Z3KFW8WrW7Gr9Yva93Xw1fKF98KFnYvF4YgrW7Xay0 vF4Yy347Ww4UJ3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQm14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vE x4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I64 8v4I1lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Xr0_Ar1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcV CF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIE c7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjTRio7uDUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi When zeroing out a partial block in __ext4_block_zero_page_range() during a partial truncate, zeroing range, or punching a hole, we only need to start a handle in data=3Djournal mode because we need to log the zeroed data block, we don't need this handle in other modes. Therefore, we can start a handle in ext4_block_zero_page_range() and avoid performing the zeroing process under a running handle if it is in data=3Dordered or writeback mode. This change is essential for the conversion to iomap buffered I/O, as it helps prevent a potential deadlock issue. After we switch to using iomap_zero_range() to zero out a partial block in the later patches, iomap_zero_range() may write out dirty folios and wait for I/O to complete before the zeroing out. However, we can't wait I/O to complete under running handle because the end I/O process may also wait this handle to stop if the running transaction has begun to commit or the journal is running out of space. Therefore, let's postpone the start of handle in the of the partial truncation, zeroing range, and hole punching, in preparation for the buffered write iomap conversion. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 4 +-- fs/ext4/extents.c | 22 ++++++--------- fs/ext4/inode.c | 70 +++++++++++++++++++++++++---------------------- 3 files changed, 47 insertions(+), 49 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index d4d594d97634..e1b7f7024f07 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3034,8 +3034,8 @@ extern void ext4_set_aops(struct inode *inode); extern int ext4_writepage_trans_blocks(struct inode *); extern int ext4_normal_submit_inode_data_buffers(struct jbd2_inode *jinode= ); extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks); -extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, - loff_t lstart, loff_t lend); +extern int ext4_zero_partial_blocks(struct inode *inode, + loff_t lstart, loff_t lend); extern vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf); extern qsize_t *ext4_get_reserved_space(struct inode *inode); extern int ext4_get_projid(struct inode *inode, kprojid_t *projid); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 4b30e6f0a634..20e56cd17847 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4576,7 +4576,7 @@ static long ext4_zero_range(struct file *file, loff_t= offset, ext4_lblk_t start_lblk, end_lblk; unsigned int blocksize =3D i_blocksize(inode); unsigned int blkbits =3D inode->i_blkbits; - int ret, flags, credits; + int ret, flags; =20 trace_ext4_zero_range(inode, offset, len, mode); WARN_ON_ONCE(!inode_is_locked(inode)); @@ -4638,27 +4638,21 @@ static long ext4_zero_range(struct file *file, loff= _t offset, if (!(offset & (blocksize - 1)) && !(end & (blocksize - 1))) return ret; =20 - /* - * In worst case we have to writeout two nonadjacent unwritten - * blocks and update the inode - */ - credits =3D (2 * ext4_ext_index_trans_blocks(inode, 2)) + 1; - if (ext4_should_journal_data(inode)) - credits +=3D 2; - handle =3D ext4_journal_start(inode, EXT4_HT_MISC, credits); + /* Zero out partial block at the edges of the range */ + ret =3D ext4_zero_partial_blocks(inode, offset, len); + if (ret) + return ret; + + handle =3D ext4_journal_start(inode, EXT4_HT_INODE, 2); if (IS_ERR(handle)) { ret =3D PTR_ERR(handle); ext4_std_error(inode->i_sb, ret); return ret; } =20 - /* Zero out partial block at the edges of the range */ - ret =3D ext4_zero_partial_blocks(handle, inode, offset, len); - if (ret) - goto out_handle; - if (new_size) ext4_update_inode_size(inode, new_size); + ret =3D ext4_mark_inode_dirty(handle, inode); if (unlikely(ret)) goto out_handle; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 97be75cde481..34701afe61c2 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4037,8 +4037,7 @@ void ext4_set_aops(struct inode *inode) * ext4_punch_hole, etc) which needs to be properly zeroed out. Otherwise a * racing writeback can come later and flush the stale pagecache to disk. */ -static int __ext4_block_zero_page_range(handle_t *handle, - struct address_space *mapping, +static int __ext4_block_zero_page_range(struct address_space *mapping, loff_t from, loff_t length, bool *did_zero) { @@ -4046,16 +4045,25 @@ static int __ext4_block_zero_page_range(handle_t *h= andle, unsigned offset =3D from & (PAGE_SIZE-1); unsigned blocksize, pos; ext4_lblk_t iblock; + handle_t *handle; struct inode *inode =3D mapping->host; struct buffer_head *bh; struct folio *folio; int err =3D 0; =20 + if (ext4_should_journal_data(inode)) { + handle =3D ext4_journal_start(inode, EXT4_HT_MISC, 1); + if (IS_ERR(handle)) + return PTR_ERR(handle); + } + folio =3D __filemap_get_folio(mapping, from >> PAGE_SHIFT, FGP_LOCK | FGP_ACCESSED | FGP_CREAT, mapping_gfp_constraint(mapping, ~__GFP_FS)); - if (IS_ERR(folio)) - return PTR_ERR(folio); + if (IS_ERR(folio)) { + err =3D PTR_ERR(folio); + goto out; + } =20 blocksize =3D inode->i_sb->s_blocksize; =20 @@ -4106,22 +4114,24 @@ static int __ext4_block_zero_page_range(handle_t *h= andle, } } } + if (ext4_should_journal_data(inode)) { BUFFER_TRACE(bh, "get write access"); err =3D ext4_journal_get_write_access(handle, inode->i_sb, bh, EXT4_JTR_NONE); if (err) goto unlock; - } - folio_zero_range(folio, offset, length); - BUFFER_TRACE(bh, "zeroed end of block"); =20 - if (ext4_should_journal_data(inode)) { + folio_zero_range(folio, offset, length); + BUFFER_TRACE(bh, "zeroed end of block"); + err =3D ext4_dirty_journalled_data(handle, bh); if (err) goto unlock; } else { - err =3D 0; + folio_zero_range(folio, offset, length); + BUFFER_TRACE(bh, "zeroed end of block"); + mark_buffer_dirty(bh); } =20 @@ -4131,6 +4141,9 @@ static int __ext4_block_zero_page_range(handle_t *han= dle, unlock: folio_unlock(folio); folio_put(folio); +out: + if (ext4_should_journal_data(inode)) + ext4_journal_stop(handle); return err; } =20 @@ -4141,8 +4154,7 @@ static int __ext4_block_zero_page_range(handle_t *han= dle, * the end of the block it will be shortened to end of the block * that corresponds to 'from' */ -static int ext4_block_zero_page_range(handle_t *handle, - struct address_space *mapping, +static int ext4_block_zero_page_range(struct address_space *mapping, loff_t from, loff_t length, bool *did_zero) { @@ -4162,8 +4174,7 @@ static int ext4_block_zero_page_range(handle_t *handl= e, return dax_zero_range(inode, from, length, NULL, &ext4_iomap_ops); } - return __ext4_block_zero_page_range(handle, mapping, from, length, - did_zero); + return __ext4_block_zero_page_range(mapping, from, length, did_zero); } =20 /* @@ -4172,8 +4183,7 @@ static int ext4_block_zero_page_range(handle_t *handl= e, * This required during truncate. We need to physically zero the tail end * of that block so it doesn't yield old data if the file is later grown. */ -static int ext4_block_truncate_page(handle_t *handle, - struct address_space *mapping, loff_t from, +static int ext4_block_truncate_page(struct address_space *mapping, loff_t = from, loff_t *zero_len) { unsigned offset =3D from & (PAGE_SIZE-1); @@ -4190,8 +4200,7 @@ static int ext4_block_truncate_page(handle_t *handle, blocksize =3D inode->i_sb->s_blocksize; length =3D blocksize - (offset & (blocksize - 1)); =20 - ret =3D ext4_block_zero_page_range(handle, mapping, from, length, - &did_zero); + ret =3D ext4_block_zero_page_range(mapping, from, length, &did_zero); if (ret) return ret; =20 @@ -4199,8 +4208,7 @@ static int ext4_block_truncate_page(handle_t *handle, return 0; } =20 -int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, - loff_t lstart, loff_t length) +int ext4_zero_partial_blocks(struct inode *inode, loff_t lstart, loff_t le= ngth) { struct super_block *sb =3D inode->i_sb; struct address_space *mapping =3D inode->i_mapping; @@ -4218,21 +4226,19 @@ int ext4_zero_partial_blocks(handle_t *handle, stru= ct inode *inode, /* Handle partial zero within the single block */ if (start =3D=3D end && (partial_start || (partial_end !=3D sb->s_blocksize - 1))) { - err =3D ext4_block_zero_page_range(handle, mapping, - lstart, length, NULL); + err =3D ext4_block_zero_page_range(mapping, lstart, length, NULL); return err; } /* Handle partial zero out on the start of the range */ if (partial_start) { - err =3D ext4_block_zero_page_range(handle, mapping, - lstart, sb->s_blocksize, - NULL); + err =3D ext4_block_zero_page_range(mapping, lstart, + sb->s_blocksize, NULL); if (err) return err; } /* Handle partial zero out on the end of the range */ if (partial_end !=3D sb->s_blocksize - 1) - err =3D ext4_block_zero_page_range(handle, mapping, + err =3D ext4_block_zero_page_range(mapping, byte_end - partial_end, partial_end + 1, NULL); return err; @@ -4418,6 +4424,10 @@ int ext4_punch_hole(struct file *file, loff_t offset= , loff_t length) /* Now release the pages and zero block aligned part of pages*/ truncate_pagecache_range(inode, offset, end - 1); =20 + ret =3D ext4_zero_partial_blocks(inode, offset, length); + if (ret) + return ret; + if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) credits =3D ext4_writepage_trans_blocks(inode); else @@ -4429,10 +4439,6 @@ int ext4_punch_hole(struct file *file, loff_t offset= , loff_t length) return ret; } =20 - ret =3D ext4_zero_partial_blocks(handle, inode, offset, length); - if (ret) - goto out_handle; - /* If there are blocks to remove, do it */ start_lblk =3D round_up(offset, blocksize) >> inode->i_blkbits; end_lblk =3D end >> inode->i_blkbits; @@ -4564,6 +4570,8 @@ int ext4_truncate(struct inode *inode) err =3D ext4_inode_attach_jinode(inode); if (err) goto out_trace; + + ext4_block_truncate_page(mapping, inode->i_size, &zero_len); } =20 if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) @@ -4577,10 +4585,6 @@ int ext4_truncate(struct inode *inode) goto out_trace; } =20 - if (inode->i_size & (inode->i_sb->s_blocksize - 1)) - ext4_block_truncate_page(handle, mapping, inode->i_size, - &zero_len); - if (zero_len && ext4_should_order_data(inode)) { err =3D ext4_jbd2_inode_add_write(handle, inode, inode->i_size, zero_len); --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A9401C7B72; Tue, 22 Oct 2024 03:13:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566787; cv=none; b=J5ST+fbQJtIXz0Wx5bbqfihOneg4tvyGLM3qFfWIdlqYrFX8SDT80CEZWADGCbuxIkNJX5dxySeTRV7bERYPlJIfTnNTsRt/HL1M+QRkXM6xXm03Vy+M2e4G3kV7MQ2GwQy7PJ9+4hRfFWJlzDOaMpDCsdLxT9lJp/VduVoQO3w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566787; c=relaxed/simple; bh=elKdS8cCNg8nXPyquUp1cwsUe+GpytOnFehHkD0Zbvk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=umLtg3HQorzruYrFelXD6c3dN7zCTIVyGQqf/KQO32glx86y6c0JIMgYVcPogj7JTpZcJv58QXCWJUkDCH1ugdw2CDIHeXztUA87V1VJ7SMqQi7P1LvtcxA4qh1gHo7uf2nXxfyMgP+BNFrvpdafkCNU634xu/3/tS5M9T+3dAQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4XXcgK1cBMz4f3jJD; Tue, 22 Oct 2024 11:12:45 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 91FC31A058E; Tue, 22 Oct 2024 11:13:02 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S25; Tue, 22 Oct 2024 11:13:02 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 21/27] ext4: implement zero_range iomap path Date: Tue, 22 Oct 2024 19:10:52 +0800 Message-ID: <20241022111059.2566137-22-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S25 X-Coremail-Antispam: 1UD129KBjvJXoWxJFyftF4xAw15Zr15Kw4fuFg_yoW5GF43pr 4DK3yUWrsrW39F9w4SqFy7Xr1ay3WfGrW8Wry3Gr98Zr95Xa4xKFW5KFySkFyUtrW7A3yj qF4jyrW7Kr1UAFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQm14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vE x4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I64 8v4I1lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Xr0_Ar1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcV CF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIE c7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjTRio7uDUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Introduce ext4_iomap_zero_range() to implement the zero_range iomap path. Currently, this function direct invokes iomap_zero_range() to zero out a mapped partial block during the truncate down, zeroing range and punching hole. Almost all operations are handled by iomap_zero_range(). One important aspect to consider is the truncate-down operation. Since we do not order the data, it is essential to write out zeroed data before the i_disksize update transaction is committed. Otherwise, stale data may left over in the last block, which could be exposed during the next expand truncate operation. Signed-off-by: Zhang Yi --- fs/ext4/inode.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 34701afe61c2..50e4afd17e93 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4147,6 +4147,13 @@ static int __ext4_block_zero_page_range(struct addre= ss_space *mapping, return err; } =20 +static int ext4_iomap_zero_range(struct inode *inode, loff_t from, + loff_t length, bool *did_zero) +{ + return iomap_zero_range(inode, from, length, did_zero, + &ext4_iomap_buffered_write_ops); +} + /* * ext4_block_zero_page_range() zeros out a mapping of length 'length' * starting from file offset 'from'. The range to be zero'd must @@ -4173,6 +4180,8 @@ static int ext4_block_zero_page_range(struct address_= space *mapping, if (IS_DAX(inode)) { return dax_zero_range(inode, from, length, NULL, &ext4_iomap_ops); + } else if (ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP)) { + return ext4_iomap_zero_range(inode, from, length, did_zero); } return __ext4_block_zero_page_range(mapping, from, length, did_zero); } @@ -4572,6 +4581,22 @@ int ext4_truncate(struct inode *inode) goto out_trace; =20 ext4_block_truncate_page(mapping, inode->i_size, &zero_len); + /* + * inode with an iomap buffered I/O path does not order data, + * so it is necessary to write out zeroed data before the + * updating i_disksize transaction is committed. Otherwise, + * stale data may remain in the last block, which could be + * exposed during the next expand truncate operation. + */ + if (zero_len && ext4_test_inode_state(inode, + EXT4_STATE_BUFFERED_IOMAP)) { + loff_t zero_end =3D inode->i_size + zero_len; + + err =3D filemap_write_and_wait_range(mapping, + inode->i_size, zero_end - 1); + if (err) + goto out_trace; + } } =20 if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60C9E1F429E; Tue, 22 Oct 2024 03:13:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566789; cv=none; b=gKpwirXp9N4Z/4pwzKl5AiGo95Nv88AyMBEb5JB5ek+qieOOxvx9dOQxaOSTTj5X5XKu8qv1gGDQG9HRGw5jL1rnHVu2wr4f4G7Ms9+XMH0w5rIEONvjOWN9bz08gUyEqrpKaOXbJftqVyQM1amoEGE5+77HZXx1reT9eRZTkXc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566789; c=relaxed/simple; bh=ubP8dtgkjA6KMpBj7kQAXGPFgndkJ/y1ISEWHCdmxc4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XArNYAcLmZ36iknp3MHsUWAlzSrHEOt77K2t7Mx0yvH6XETlYWqPqqEfaaUQofyQ/Ld+KmA/pczOIZYgeuJUQW4Iasj+0d6rLAOIaoOOEmNgf/do0gRHbcJNNB7QlHA+CdIoDvMqJc5gY3Lm6ecPEtwCgOvWza/rQrEucFMNhFg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4XXcgK6Gs2z4f3jXy; Tue, 22 Oct 2024 11:12:45 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 3E8121A0568; Tue, 22 Oct 2024 11:13:03 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S26; Tue, 22 Oct 2024 11:13:02 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 22/27] ext4: disable online defrag when inode using iomap buffered I/O path Date: Tue, 22 Oct 2024 19:10:53 +0800 Message-ID: <20241022111059.2566137-23-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S26 X-Coremail-Antispam: 1UD129KBjvdXoW7GrW7Cr45WFy3Cw18Jw1kXwb_yoWDKwc_Ka 97XrWkWr4FvF4S9an8Ary5JrZ5CF48KF1fXFZ3KF4fuw42y395J34qkrZI9rykWr4jqrZx Crs7Jr1YgFy8WjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbQkFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k26cxKx2IYs7xG 6rWj6s0DM7CIcVAFz4kK6r1j6r18M280x2IEY4vEnII2IxkI6r1a6r45M28IrcIa0xkI8V A2jI8067AKxVWUAVCq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJ M28CjxkF64kEwVA0rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2I x0cI8IcVCY1x0267AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2 z280aVCY1x0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0V AKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1l Ox8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErc IFxwACI402YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0E wIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E74 80Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0 I7IYx2IY67AKxVW5JVW7JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6x AIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY 1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRtl1hUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Online defragmentation does not yet support the inode using iomap buffered I/O path, as it still relies on ext4_get_block() to get blocks and copy data. Therefore, we must disable it for the time being. Signed-off-by: Zhang Yi --- fs/ext4/move_extent.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c index b64661ea6e0e..508e342b4a1d 100644 --- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -610,6 +610,13 @@ ext4_move_extents(struct file *o_filp, struct file *d_= filp, __u64 orig_blk, return -EOPNOTSUPP; } =20 + if (ext4_test_inode_state(orig_inode, EXT4_STATE_BUFFERED_IOMAP) || + ext4_test_inode_state(donor_inode, EXT4_STATE_BUFFERED_IOMAP)) { + ext4_msg(orig_inode->i_sb, KERN_ERR, + "Online defrag not supported for inode with iomap buffered IO path"); + return -EOPNOTSUPP; + } + /* Protect orig and donor inodes against a truncate */ lock_two_nondirectories(orig_inode, donor_inode); =20 --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60BF11F4270; Tue, 22 Oct 2024 03:13:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566789; cv=none; b=Vh7Q0Db8RQxqZWpMc69sJiV3MyukPWd7z6a3CrbIoLWrofsQZf7PEYPjX8dn45KuKJLm56Pa07tp3VAR4uvlTe3HnujeJSsk3kwDlmmJNGzQfGkX+i5hWlGYK6fh/6uh48hhoPLCAXmWJ3caq3wGlZ5pw8S7ybvjeDxo0BS65qU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566789; c=relaxed/simple; bh=Z95YRtgtA+FXYnOrYJV2Oe0WpWpGeIHunhhahRecsYg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=r8RXI7sBhHuiRM9MuaXExqFFgDjsiewF7P8/k57hiXLF6sq3ZUYo0apny+NbJJzxMZVD2qGdgq2PZKQgcvGChePJqO4r/Z8NRkqxgbzcSGR5YxierNuV2u7ez5KShC79ZBMz3VjBaBQFcjcRG89Yfu6IMLvMm+cNRsrKe7RtT9I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4XXcgL3WPpz4f3jY4; Tue, 22 Oct 2024 11:12:46 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id D47F61A058E; Tue, 22 Oct 2024 11:13:03 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S27; Tue, 22 Oct 2024 11:13:03 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 23/27] ext4: disable inode journal mode when using iomap buffered I/O path Date: Tue, 22 Oct 2024 19:10:54 +0800 Message-ID: <20241022111059.2566137-24-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S27 X-Coremail-Antispam: 1UD129KBjvdXoW7XrW7Jr4DJF1UJr1Duw1xKrg_yoWDKFb_XF 97tr40q34Svw1Skrs5Cr15tFnFkFyxKr10grWfGa4rWFWUtF4DAF9rZry2v34DWr4rKF9x Zr4kJrW8Ka4xXjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbQkFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k26cxKx2IYs7xG 6rWj6s0DM7CIcVAFz4kK6r1j6r18M280x2IEY4vEnII2IxkI6r1a6r45M28IrcIa0xkI8V A2jI8067AKxVWUAVCq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJ M28CjxkF64kEwVA0rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2I x0cI8IcVCY1x0267AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2 z280aVCY1x0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0V AKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1l Ox8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErc IFxwACI402YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxVWUtVW8ZwCF04k20xvY0x0E wIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E74 80Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0 I7IYx2IY67AKxVW5JVW7JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6x AIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY 1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRtl1hUUUUU= X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi inode with data=3Djournal mode will not support the iomap buffered I/O path, so just disable this mode if EXT4_STATE_BUFFERED_IOMAP is set. Signed-off-by: Zhang Yi --- fs/ext4/ext4_jbd2.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index da4a82456383..367a29babe09 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -16,7 +16,8 @@ int ext4_inode_journal_mode(struct inode *inode) ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) || test_opt(inode->i_sb, DATA_FLAGS) =3D=3D EXT4_MOUNT_JOURNAL_DATA || (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) && - !test_opt(inode->i_sb, DELALLOC))) { + !test_opt(inode->i_sb, DELALLOC) && + !ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP))) { /* We do not support data journalling for encrypted data */ if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode)) return EXT4_INODE_ORDERED_DATA_MODE; /* ordered */ --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52BA61EABBF; Tue, 22 Oct 2024 03:13:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566790; cv=none; b=ZZVlk1jGAWmoMc9Occ9s9XGUJKYxAekEuEsLOHpr3yS1p+brZ5IylKHr4hH2vNKdPzWEbBA8dX1Ew3+Oh4r3t457CX4y3WI7O0aCHz/aZnRJK5yOuhmoUV9XgvRtfYckqjsU4Qbb9HsByCjzcRb8PojW0ebO/3WFpfvI82F3WEM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566790; c=relaxed/simple; bh=UB+Gs3civCuEYlS4cLJVQslxb6/ZGgNvoifXv/YHRsA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=b0Q5RspfAisFoKguoHd3lfBFAUcipQV7H3H2uImEZM+puS12NRyq53wrZZPQ2Z4PmhL0raJXRheaTiczuPwyrW9M+uFL9Zcj8ncEb09evSU7jpa9Xe1Yg6v5bAleOD7uCKhPeEw+DXE1uziflAGbk72HGUmmqKQ6TKMIdRs9bDI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgL2Rv3z4f3lVv; Tue, 22 Oct 2024 11:12:46 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 80E2A1A018D; Tue, 22 Oct 2024 11:13:04 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S28; Tue, 22 Oct 2024 11:13:04 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 24/27] ext4: partially enable iomap for the buffered I/O path of regular files Date: Tue, 22 Oct 2024 19:10:55 +0800 Message-ID: <20241022111059.2566137-25-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S28 X-Coremail-Antispam: 1UD129KBjvJXoWxZFy3uF1xur47Wr1UGrW8tFb_yoWrGF15pF ZxKF1rGr4v93s29r4ftF48Zr1av3WxKa1UWrWSgr95XFWUJw1SqF10yF15A3W5JrZ5u34a qF4jkr15uw43urDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQm14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vE x4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I64 8v4I1lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Xr0_Ar1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcV CF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIE c7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjTRio7uDUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Partially enable iomap for the buffered I/O path of regular files with the default mount option. This supports default filesystem features and the bigalloc feature. However, it does not yet support inline data, fs_verity, fs_crypt, online defrag and data=3Djournal mode. Some of these features should be supported gradually in the future. The filesystem will fallback to the buffered_head path automatically if these mount options or features are enabled. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 1 + fs/ext4/ialloc.c | 3 +++ fs/ext4/inode.c | 32 ++++++++++++++++++++++++++++++++ 3 files changed, 36 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index e1b7f7024f07..0096191b454c 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2987,6 +2987,7 @@ int ext4_walk_page_buffers(handle_t *handle, struct buffer_head *bh)); int do_journal_get_write_access(handle_t *handle, struct inode *inode, struct buffer_head *bh); +bool ext4_should_use_buffered_iomap(struct inode *inode); int ext4_nonda_switch(struct super_block *sb); #define FALL_BACK_TO_NONDELALLOC 1 #define CONVERT_INLINE_DATA 2 diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 7f1a5f90dbbd..2e3e257b9808 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1333,6 +1333,9 @@ struct inode *__ext4_new_inode(struct mnt_idmap *idma= p, } } =20 + if (ext4_should_use_buffered_iomap(inode)) + ext4_set_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP); + ext4_update_inode_fsync_trans(handle, inode, 1); =20 err =3D ext4_mark_inode_dirty(handle, inode); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 50e4afd17e93..512094dc4117 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -776,6 +776,8 @@ static int _ext4_get_block(struct inode *inode, sector_= t iblock, =20 if (ext4_has_inline_data(inode)) return -ERANGE; + if (WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP))) + return -EINVAL; =20 map.m_lblk =3D iblock; map.m_len =3D bh->b_size >> inode->i_blkbits; @@ -2572,6 +2574,9 @@ static int ext4_do_writepages(struct mpage_da_data *m= pd) =20 trace_ext4_writepages(inode, wbc); =20 + if (WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP))) + return -EINVAL; + /* * No pages to write? This is mainly a kludge to avoid starting * a transaction for special inodes like journal inode on last iput() @@ -5144,6 +5149,30 @@ static const char *check_igot_inode(struct inode *in= ode, ext4_iget_flags flags) return NULL; } =20 +bool ext4_should_use_buffered_iomap(struct inode *inode) +{ + struct super_block *sb =3D inode->i_sb; + + if (ext4_has_feature_inline_data(sb)) + return false; + if (ext4_has_feature_verity(sb)) + return false; + if (test_opt(sb, DATA_FLAGS) =3D=3D EXT4_MOUNT_JOURNAL_DATA) + return false; + if (!S_ISREG(inode->i_mode)) + return false; + if (IS_DAX(inode)) + return false; + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) + return false; + if (ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE)) + return false; + if (ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT)) + return false; + + return true; +} + struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, ext4_iget_flags flags, const char *function, unsigned int line) @@ -5408,6 +5437,9 @@ struct inode *__ext4_iget(struct super_block *sb, uns= igned long ino, if (ret) goto bad_inode; =20 + if (ext4_should_use_buffered_iomap(inode)) + ext4_set_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP); + if (S_ISREG(inode->i_mode)) { inode->i_op =3D &ext4_file_inode_operations; inode->i_fop =3D &ext4_file_operations; --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7B071FEFC0; Tue, 22 Oct 2024 03:13:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566790; cv=none; b=nj+u32SIKCsD1RPAvIMIttvEMKkftKqmksjd2JVzV2OV46qlTQI84NV9aCE+FZ0IJllTsgTtnu2C0CtHw+Zpbj7yCkFxkDKZWGlkFvfL/kOlw7IEiI3UYAZMjZuI3+p+eWjFkA6bsJi02Hngy9lOyBPsLISK6N12h4ZC+fBF+jw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566790; c=relaxed/simple; bh=T7Ax1zmPeYMqNlopQKLb8lsCwVcBJZ9ZJijwkkRpwBM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F0bPXJsbR0uh3ljwh0Dx/B6IYAE5Eo+fXaFD8QaL1xylCD6bvuqIf3Mwtz+A/0LD7oT9QmXxfFVkqLcRqh/1MLCeNkmFFJTO0iMG6vy8MAMApcuTORkUe3zwGOIJ8MaACR+yhjU/9D60EiwwMhfVS+V8ee/O5Qscj+fMUdwq1vU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgS68Hhz4f3jks; Tue, 22 Oct 2024 11:12:52 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 235601A018C; Tue, 22 Oct 2024 11:13:05 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S29; Tue, 22 Oct 2024 11:13:04 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 25/27] ext4: enable large folio for regular file with iomap buffered I/O path Date: Tue, 22 Oct 2024 19:10:56 +0800 Message-ID: <20241022111059.2566137-26-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S29 X-Coremail-Antispam: 1UD129KBjvJXoW7tFyrCw1fZw1xWr1xWF17Jrb_yoW8XrW7pr sxK3W8GrWkX34q9ws3KryxZr1Uta1xGw4UuFWF9wn8WrWDJ34SqF4jkF1xAF48JrWrA3y2 qFyIkr13Z3WfC3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQm14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vE x4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I64 8v4I1lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Xr0_Ar1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcV CF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIE c7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjTRio7uDUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Since we have converted the buffered I/O path to iomap for regular files, we can enable large folio support as well. This should result in significant performance gains for large I/O operations. Signed-off-by: Zhang Yi --- fs/ext4/ialloc.c | 4 +++- fs/ext4/inode.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 2e3e257b9808..6ff03fb74867 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1333,8 +1333,10 @@ struct inode *__ext4_new_inode(struct mnt_idmap *idm= ap, } } =20 - if (ext4_should_use_buffered_iomap(inode)) + if (ext4_should_use_buffered_iomap(inode)) { ext4_set_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP); + mapping_set_large_folios(inode->i_mapping); + } =20 ext4_update_inode_fsync_trans(handle, inode, 1); =20 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 512094dc4117..97abc88e6658 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5437,8 +5437,10 @@ struct inode *__ext4_iget(struct super_block *sb, un= signed long ino, if (ret) goto bad_inode; =20 - if (ext4_should_use_buffered_iomap(inode)) + if (ext4_should_use_buffered_iomap(inode)) { ext4_set_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP); + mapping_set_large_folios(inode->i_mapping); + } =20 if (S_ISREG(inode->i_mode)) { inode->i_op =3D &ext4_file_inode_operations; --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0AF31FF610; Tue, 22 Oct 2024 03:13:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566793; cv=none; b=LEY58rP5XRWV2NK18nV57dMPewcES4ZLFDaKSywcRejE2Zuu5R+qpZ/NS9CsRD95E+rY+Z6l/PYcUYNAkjOAmiN8E3f0Mqc14dm9qLLBjUvOJFGJs6laBH5NYXY87DFEdqDomlPc8ZOVHrvWH4TkmZRaRmB6qjL/dwW7MyVMX04= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566793; c=relaxed/simple; bh=uaagAkzcaigmaepALuH8W84E4x3TWY8c1iyfP0Yfel4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cYlOyQ+u6NTrw10cm1rNGGoaHqPGEHVok8Y31AlRYH5ZV5dfhEmd0jGZMJm021tUMXeEK5itVEct6uiDMD4xI3HhsTaA0H9ujSXRcm0DYQkwpN3wHmQ3gQrLpyCoPO0zEB7dSM4vg8KPSv+zs0oqeiSVSnefcDFK3ajB/p3PLdU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgM4RVxz4f3lVv; Tue, 22 Oct 2024 11:12:47 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id C054D1A018C; Tue, 22 Oct 2024 11:13:05 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S30; Tue, 22 Oct 2024 11:13:05 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 26/27] ext4: change mount options code style Date: Tue, 22 Oct 2024 19:10:57 +0800 Message-ID: <20241022111059.2566137-27-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S30 X-Coremail-Antispam: 1UD129KBjvJXoWfJw17urW3Wr1fAF1UXw13urg_yoWDAr1kpr 18JFW3u3Z8JrWku3W0kF48tw4FvrnY9r4vyrn8CF17Kw1qyry8XFyIgFyxA3W3t34rZ34r KFn0va45Ca17GrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQm14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vE x4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I64 8v4I1lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Xr0_Ar1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcV CF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIE c7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjTRio7uDUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Just remove the space between the macro name and open parenthesis to satisfy the checkpatch.pl script and prevent it from complaining when we add new mount options in the subsequent patch. This will not result in any logical changes. Signed-off-by: Zhang Yi --- fs/ext4/super.c | 175 +++++++++++++++++++++++------------------------- 1 file changed, 84 insertions(+), 91 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 56baadec27e0..89955081c4fe 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1723,101 +1723,94 @@ static const struct constant_table ext4_param_dax[= ] =3D { * separate for now. */ static const struct fs_parameter_spec ext4_param_specs[] =3D { - fsparam_flag ("bsddf", Opt_bsd_df), - fsparam_flag ("minixdf", Opt_minix_df), - fsparam_flag ("grpid", Opt_grpid), - fsparam_flag ("bsdgroups", Opt_grpid), - fsparam_flag ("nogrpid", Opt_nogrpid), - fsparam_flag ("sysvgroups", Opt_nogrpid), - fsparam_gid ("resgid", Opt_resgid), - fsparam_uid ("resuid", Opt_resuid), - fsparam_u32 ("sb", Opt_sb), - fsparam_enum ("errors", Opt_errors, ext4_param_errors), - fsparam_flag ("nouid32", Opt_nouid32), - fsparam_flag ("debug", Opt_debug), - fsparam_flag ("oldalloc", Opt_removed), - fsparam_flag ("orlov", Opt_removed), - fsparam_flag ("user_xattr", Opt_user_xattr), - fsparam_flag ("acl", Opt_acl), - fsparam_flag ("norecovery", Opt_noload), - fsparam_flag ("noload", Opt_noload), - fsparam_flag ("bh", Opt_removed), - fsparam_flag ("nobh", Opt_removed), - fsparam_u32 ("commit", Opt_commit), - fsparam_u32 ("min_batch_time", Opt_min_batch_time), - fsparam_u32 ("max_batch_time", Opt_max_batch_time), - fsparam_u32 ("journal_dev", Opt_journal_dev), - fsparam_bdev ("journal_path", Opt_journal_path), - fsparam_flag ("journal_checksum", Opt_journal_checksum), - fsparam_flag ("nojournal_checksum", Opt_nojournal_checksum), - fsparam_flag ("journal_async_commit",Opt_journal_async_commit), - fsparam_flag ("abort", Opt_abort), - fsparam_enum ("data", Opt_data, ext4_param_data), - fsparam_enum ("data_err", Opt_data_err, + fsparam_flag("bsddf", Opt_bsd_df), + fsparam_flag("minixdf", Opt_minix_df), + fsparam_flag("grpid", Opt_grpid), + fsparam_flag("bsdgroups", Opt_grpid), + fsparam_flag("nogrpid", Opt_nogrpid), + fsparam_flag("sysvgroups", Opt_nogrpid), + fsparam_gid("resgid", Opt_resgid), + fsparam_uid("resuid", Opt_resuid), + fsparam_u32("sb", Opt_sb), + fsparam_enum("errors", Opt_errors, ext4_param_errors), + fsparam_flag("nouid32", Opt_nouid32), + fsparam_flag("debug", Opt_debug), + fsparam_flag("oldalloc", Opt_removed), + fsparam_flag("orlov", Opt_removed), + fsparam_flag("user_xattr", Opt_user_xattr), + fsparam_flag("acl", Opt_acl), + fsparam_flag("norecovery", Opt_noload), + fsparam_flag("noload", Opt_noload), + fsparam_flag("bh", Opt_removed), + fsparam_flag("nobh", Opt_removed), + fsparam_u32("commit", Opt_commit), + fsparam_u32("min_batch_time", Opt_min_batch_time), + fsparam_u32("max_batch_time", Opt_max_batch_time), + fsparam_u32("journal_dev", Opt_journal_dev), + fsparam_bdev("journal_path", Opt_journal_path), + fsparam_flag("journal_checksum", Opt_journal_checksum), + fsparam_flag("nojournal_checksum", Opt_nojournal_checksum), + fsparam_flag("journal_async_commit", Opt_journal_async_commit), + fsparam_flag("abort", Opt_abort), + fsparam_enum("data", Opt_data, ext4_param_data), + fsparam_enum("data_err", Opt_data_err, ext4_param_data_err), - fsparam_string_empty - ("usrjquota", Opt_usrjquota), - fsparam_string_empty - ("grpjquota", Opt_grpjquota), - fsparam_enum ("jqfmt", Opt_jqfmt, ext4_param_jqfmt), - fsparam_flag ("grpquota", Opt_grpquota), - fsparam_flag ("quota", Opt_quota), - fsparam_flag ("noquota", Opt_noquota), - fsparam_flag ("usrquota", Opt_usrquota), - fsparam_flag ("prjquota", Opt_prjquota), - fsparam_flag ("barrier", Opt_barrier), - fsparam_u32 ("barrier", Opt_barrier), - fsparam_flag ("nobarrier", Opt_nobarrier), - fsparam_flag ("i_version", Opt_removed), - fsparam_flag ("dax", Opt_dax), - fsparam_enum ("dax", Opt_dax_type, ext4_param_dax), - fsparam_u32 ("stripe", Opt_stripe), - fsparam_flag ("delalloc", Opt_delalloc), - fsparam_flag ("nodelalloc", Opt_nodelalloc), - fsparam_flag ("warn_on_error", Opt_warn_on_error), - fsparam_flag ("nowarn_on_error", Opt_nowarn_on_error), - fsparam_u32 ("debug_want_extra_isize", - Opt_debug_want_extra_isize), - fsparam_flag ("mblk_io_submit", Opt_removed), - fsparam_flag ("nomblk_io_submit", Opt_removed), - fsparam_flag ("block_validity", Opt_block_validity), - fsparam_flag ("noblock_validity", Opt_noblock_validity), - fsparam_u32 ("inode_readahead_blks", - Opt_inode_readahead_blks), - fsparam_u32 ("journal_ioprio", Opt_journal_ioprio), - fsparam_u32 ("auto_da_alloc", Opt_auto_da_alloc), - fsparam_flag ("auto_da_alloc", Opt_auto_da_alloc), - fsparam_flag ("noauto_da_alloc", Opt_noauto_da_alloc), - fsparam_flag ("dioread_nolock", Opt_dioread_nolock), - fsparam_flag ("nodioread_nolock", Opt_dioread_lock), - fsparam_flag ("dioread_lock", Opt_dioread_lock), - fsparam_flag ("discard", Opt_discard), - fsparam_flag ("nodiscard", Opt_nodiscard), - fsparam_u32 ("init_itable", Opt_init_itable), - fsparam_flag ("init_itable", Opt_init_itable), - fsparam_flag ("noinit_itable", Opt_noinit_itable), + fsparam_string_empty("usrjquota", Opt_usrjquota), + fsparam_string_empty("grpjquota", Opt_grpjquota), + fsparam_enum("jqfmt", Opt_jqfmt, ext4_param_jqfmt), + fsparam_flag("grpquota", Opt_grpquota), + fsparam_flag("quota", Opt_quota), + fsparam_flag("noquota", Opt_noquota), + fsparam_flag("usrquota", Opt_usrquota), + fsparam_flag("prjquota", Opt_prjquota), + fsparam_flag("barrier", Opt_barrier), + fsparam_u32("barrier", Opt_barrier), + fsparam_flag("nobarrier", Opt_nobarrier), + fsparam_flag("i_version", Opt_removed), + fsparam_flag("dax", Opt_dax), + fsparam_enum("dax", Opt_dax_type, ext4_param_dax), + fsparam_u32("stripe", Opt_stripe), + fsparam_flag("delalloc", Opt_delalloc), + fsparam_flag("nodelalloc", Opt_nodelalloc), + fsparam_flag("warn_on_error", Opt_warn_on_error), + fsparam_flag("nowarn_on_error", Opt_nowarn_on_error), + fsparam_u32("debug_want_extra_isize", Opt_debug_want_extra_isize), + fsparam_flag("mblk_io_submit", Opt_removed), + fsparam_flag("nomblk_io_submit", Opt_removed), + fsparam_flag("block_validity", Opt_block_validity), + fsparam_flag("noblock_validity", Opt_noblock_validity), + fsparam_u32("inode_readahead_blks", Opt_inode_readahead_blks), + fsparam_u32("journal_ioprio", Opt_journal_ioprio), + fsparam_u32("auto_da_alloc", Opt_auto_da_alloc), + fsparam_flag("auto_da_alloc", Opt_auto_da_alloc), + fsparam_flag("noauto_da_alloc", Opt_noauto_da_alloc), + fsparam_flag("dioread_nolock", Opt_dioread_nolock), + fsparam_flag("nodioread_nolock", Opt_dioread_lock), + fsparam_flag("dioread_lock", Opt_dioread_lock), + fsparam_flag("discard", Opt_discard), + fsparam_flag("nodiscard", Opt_nodiscard), + fsparam_u32("init_itable", Opt_init_itable), + fsparam_flag("init_itable", Opt_init_itable), + fsparam_flag("noinit_itable", Opt_noinit_itable), #ifdef CONFIG_EXT4_DEBUG - fsparam_flag ("fc_debug_force", Opt_fc_debug_force), - fsparam_u32 ("fc_debug_max_replay", Opt_fc_debug_max_replay), + fsparam_flag("fc_debug_force", Opt_fc_debug_force), + fsparam_u32("fc_debug_max_replay", Opt_fc_debug_max_replay), #endif - fsparam_u32 ("max_dir_size_kb", Opt_max_dir_size_kb), - fsparam_flag ("test_dummy_encryption", - Opt_test_dummy_encryption), - fsparam_string ("test_dummy_encryption", - Opt_test_dummy_encryption), - fsparam_flag ("inlinecrypt", Opt_inlinecrypt), - fsparam_flag ("nombcache", Opt_nombcache), - fsparam_flag ("no_mbcache", Opt_nombcache), /* for backward compatibilit= y */ - fsparam_flag ("prefetch_block_bitmaps", - Opt_removed), - fsparam_flag ("no_prefetch_block_bitmaps", + fsparam_u32("max_dir_size_kb", Opt_max_dir_size_kb), + fsparam_flag("test_dummy_encryption", Opt_test_dummy_encryption), + fsparam_string("test_dummy_encryption", Opt_test_dummy_encryption), + fsparam_flag("inlinecrypt", Opt_inlinecrypt), + fsparam_flag("nombcache", Opt_nombcache), + fsparam_flag("no_mbcache", Opt_nombcache), /* for backward compatibility= */ + fsparam_flag("prefetch_block_bitmaps", Opt_removed), + fsparam_flag("no_prefetch_block_bitmaps", Opt_no_prefetch_block_bitmaps), - fsparam_s32 ("mb_optimize_scan", Opt_mb_optimize_scan), - fsparam_string ("check", Opt_removed), /* mount option from ext2/3 */ - fsparam_flag ("nocheck", Opt_removed), /* mount option from ext2/3 */ - fsparam_flag ("reservation", Opt_removed), /* mount option from ext2/3 */ - fsparam_flag ("noreservation", Opt_removed), /* mount option from ext2/3 = */ - fsparam_u32 ("journal", Opt_removed), /* mount option from ext2/3 */ + fsparam_s32("mb_optimize_scan", Opt_mb_optimize_scan), + fsparam_string("check", Opt_removed), /* mount option from ext2/3 */ + fsparam_flag("nocheck", Opt_removed), /* mount option from ext2/3 */ + fsparam_flag("reservation", Opt_removed), /* mount option from ext2/3 */ + fsparam_flag("noreservation", Opt_removed), /* mount option from ext2/3 = */ + fsparam_u32("journal", Opt_removed), /* mount option from ext2/3 */ {} }; =20 --=20 2.46.1 From nobody Tue Nov 26 03:29:49 2024 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5570E200109; Tue, 22 Oct 2024 03:13:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566791; cv=none; b=RcRw68I1MAid/0YhzUg9G12DApj7fBrxGxqxAUT9ZbXZFq4B4FmoyJLQTzwgqF3wUD2zrkxmJ+rOnY9hyCHqdHVdru+9kvsnDKpfsZDDOAfB9w0mC/mgRoOrPBq6sIdqwO21NUD5Vq3RJvZb9vN+yZmG3CBrELCd0J5CQ0l9PUw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729566791; c=relaxed/simple; bh=OAzCCl3m2q23m1U+qcLkhExvUHjt8Ff1nVP57Z12V6I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ISr+Zrdmrj/8Fcl4/paJCjDmHkYROINytjIcGVczotiiYIuVgDxxxFMMu9w+7sTG0yLRUUnK/F/81M2z+StNahAdRYW/WVwYDnow8hW4CrJLqCgGhdpkWfGAbymqfqlyc97KJliDmQWe2Zdk+gB41hLQ1MglNKbIKOHvhGNPqv4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XXcgN1lZBz4f3lVx; Tue, 22 Oct 2024 11:12:48 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 67C5D1A058E; Tue, 22 Oct 2024 11:13:06 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.112.188]) by APP4 (Coremail) with SMTP id gCh0CgCXysYlGBdnPSwWEw--.716S31; Tue, 22 Oct 2024 11:13:06 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, david@fromorbit.com, zokeefe@google.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com, yangerkun@huawei.com Subject: [PATCH 27/27] ext4: introduce a mount option for iomap buffered I/O path Date: Tue, 22 Oct 2024 19:10:58 +0800 Message-ID: <20241022111059.2566137-28-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> References: <20241022111059.2566137-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCXysYlGBdnPSwWEw--.716S31 X-Coremail-Antispam: 1UD129KBjvJXoWxZrW3WFyrtryxWF4xKw1rWFg_yoW5Cr18p3 sYkFy8Gr1kZr1F93y8CF48Wr1FkF4Yka1UuFWYgw17WFZrAryxXFyfKF1rCF42qrW8XryI gF1rWw17WF43CrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQm14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JF0E3s1l82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0 rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6x IIjxv20xvEc7CjxVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vE x4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I64 8v4I1lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Xr0_Ar1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcV CF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIE c7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjTRio7uDUUUU X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Zhang Yi Introduce the buffered_iomap and the nobuffered_iomap mount options to enable the iomap buffered I/O path for regular files. This option is currently disabled by default until we can support more comprehensive features along this path. Signed-off-by: Zhang Yi --- fs/ext4/ext4.h | 1 + fs/ext4/inode.c | 2 ++ fs/ext4/super.c | 7 +++++++ 3 files changed, 10 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 0096191b454c..c2a44530e026 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1257,6 +1257,7 @@ struct ext4_inode_info { * scanning in mballoc */ #define EXT4_MOUNT2_ABORT 0x00000100 /* Abort filesystem */ +#define EXT4_MOUNT2_BUFFERED_IOMAP 0x00000200 /* Use iomap for buffered IO= */ =20 #define clear_opt(sb, opt) EXT4_SB(sb)->s_mount_opt &=3D \ ~EXT4_MOUNT_##opt diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 97abc88e6658..b6e041a423f9 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5153,6 +5153,8 @@ bool ext4_should_use_buffered_iomap(struct inode *ino= de) { struct super_block *sb =3D inode->i_sb; =20 + if (!test_opt2(sb, BUFFERED_IOMAP)) + return false; if (ext4_has_feature_inline_data(sb)) return false; if (ext4_has_feature_verity(sb)) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 89955081c4fe..435a866359d9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1675,6 +1675,7 @@ enum { Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable, Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache, Opt_no_prefetch_block_bitmaps, Opt_mb_optimize_scan, + Opt_buffered_iomap, Opt_nobuffered_iomap, Opt_errors, Opt_data, Opt_data_err, Opt_jqfmt, Opt_dax_type, #ifdef CONFIG_EXT4_DEBUG Opt_fc_debug_max_replay, Opt_fc_debug_force @@ -1806,6 +1807,8 @@ static const struct fs_parameter_spec ext4_param_spec= s[] =3D { fsparam_flag("no_prefetch_block_bitmaps", Opt_no_prefetch_block_bitmaps), fsparam_s32("mb_optimize_scan", Opt_mb_optimize_scan), + fsparam_flag("buffered_iomap", Opt_buffered_iomap), + fsparam_flag("nobuffered_iomap", Opt_nobuffered_iomap), fsparam_string("check", Opt_removed), /* mount option from ext2/3 */ fsparam_flag("nocheck", Opt_removed), /* mount option from ext2/3 */ fsparam_flag("reservation", Opt_removed), /* mount option from ext2/3 */ @@ -1900,6 +1903,10 @@ static const struct mount_opts { {Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET}, {Opt_no_prefetch_block_bitmaps, EXT4_MOUNT_NO_PREFETCH_BLOCK_BITMAPS, MOPT_SET}, + {Opt_buffered_iomap, EXT4_MOUNT2_BUFFERED_IOMAP, + MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY}, + {Opt_nobuffered_iomap, EXT4_MOUNT2_BUFFERED_IOMAP, + MOPT_CLEAR | MOPT_2 | MOPT_EXT4_ONLY}, #ifdef CONFIG_EXT4_DEBUG {Opt_fc_debug_force, EXT4_MOUNT2_JOURNAL_FAST_COMMIT, MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY}, --=20 2.46.1