From nobody Sat Feb 7 23:24:23 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F69A30B521; Wed, 12 Nov 2025 11:07:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945648; cv=none; b=NooxkZfTMI+6WVOEhWK7aPiaHMUASC1eUQXsehLRK2naxmh4MsO/raNw3sut0YSj6lgRJ991jDhCpN7vSqCpqkdE95xdMX3emMfAemARQco8WTcv4f6oag4BkJB12kExbyE6zriidNYoqThoJ6s4mNIICpxK7RhslvMHSs1U1Kk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945648; c=relaxed/simple; bh=gGhX/HF7HE1xGclbvXDnFZ1UyaqcuO4KNg2AefNw2SI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e/g2KB7NgvmW4yM/M29JphACYe1Yq8V46fLwHbnKS0mzWByg+48mzeoSRy86QN4bLL00t92Fi/K0QkCpOeG2OWXP/IfxIpqlK8Y7G0lW++AVAIH6w353XQpgY4UVgYZroqq0aWyM5Ua+c+CGmPTKFloq+v4NnKS8WPm6NTG6GhA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=hyjTNEGP; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="hyjTNEGP" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC0eJdB001349; Wed, 12 Nov 2025 11:06:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=C7FUT7CXCMJDSthjg LtKb/tltjvIkhsm/ZNpYvb4/Uc=; b=hyjTNEGPpK2DJ6rNqRRYGanaHkb6URVkX Iq0nstPA6nclATWcapf0FYw7yVKfkSEZSlBK5wKQpQ87K9u6nvQu3z0Dh/fWZaKK 6XB3haorTbi6ijpjufEzFCSzcD8vyFv7olzEBvzlSfHU2vXyDg7cuRE8diLReUG8 o/eSmDYrce3MjHMOpwaISKwzWXM2LfMxm3km9KYyeeaihJa/w0Ph6saHTJbwgsJH Rk3sCHR0+rMJNGtvbb3Sf8wclLAUOxn+0bFU+RXEijSHlddaUvTFfMlNWVmLFtiL EhGxZj5BK14APsyF7olxhtCFwYUkLuSVeTs2cvvBacquW6Vxz2VCA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4a9wc7a4r8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:06:51 +0000 (GMT) Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 5ACAuRZQ016182; Wed, 12 Nov 2025 11:06:50 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4a9wc7a4r6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:06:50 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC9llv3011431; Wed, 12 Nov 2025 11:06:49 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4aajw1fe8u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:06:49 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5ACB6lxv42926548 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Nov 2025 11:06:48 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CE67E2004E; Wed, 12 Nov 2025 11:06:47 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B0BFC20040; Wed, 12 Nov 2025 11:06:42 +0000 (GMT) Received: from li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com (unknown [9.124.210.190]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 12 Nov 2025 11:06:42 +0000 (GMT) From: Ojaswin Mujoo To: Christian Brauner , djwong@kernel.org, ritesh.list@gmail.com, john.g.garry@oracle.com, tytso@mit.edu, willy@infradead.org, dchinner@redhat.com, hch@lst.de Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz, nilay@linux.ibm.com, martin.petersen@oracle.com, rostedt@goodmis.org, axboe@kernel.dk, linux-block@vger.kernel.org, linux-trace-kernel@vger.kernel.org Subject: [RFC PATCH 1/8] fs: Rename STATX{_ATTR}_WRITE_ATOMIC -> STATX{_ATTR}_WRITE_ATOMIC_DIO Date: Wed, 12 Nov 2025 16:36:04 +0530 Message-ID: X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTA4MDAxOCBTYWx0ZWRfX8RHd0xysnFkm BTbYLiuv9HhYQymQ/qMQe3hmNZXQj9DmvwRJ4WUmYIn+bk1Cx1TFQCIXpx6ciWFUqP4oDISG7nY Rc36xneczqWB4hs9Hbln6io8UeB2u/2Pt4Ak0NwQLFVz0B9p75WDg4PlCXLSkR4Sguka65is/zK AbyNNodLBltUArzZ74FokRQwFq10fNocUgft8F4ojCuWg6EP1v8pfSWR8fDWku60JJULSWvr4dX gUpsN6nVKrfd3kZp67NgAYxFMCZ/tq1NHQUq5pxvCo3CymTxi8bMPJoof9rBY0MMFe6H7+NpNj2 zAx648ue8gc1cXqjWOEpHhb4Dvl6mABiL8jsNP8ws6B14y6eqVBaPAeEpk1c+DJuciQy2FRLHGP J0IHig2F0o+ofuOUwX2ryZYNYicfNw== X-Authority-Analysis: v=2.4 cv=GcEaXAXL c=1 sm=1 tr=0 ts=69146a4b cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=yPCof4ZbAAAA:8 a=VnNF1IyMAAAA:8 a=K79LtrNRiPSSlZyMgSkA:9 a=cPQSjfK2_nFv0Q5t_7PE:22 X-Proofpoint-GUID: E3J5AjRFcPVFQWLOrRVyKxfjvV2iJ8BQ X-Proofpoint-ORIG-GUID: 4AEucmdFLFmf76SGVsvYw5hJhGMr-oLz X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-12_03,2025-11-11_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 malwarescore=0 bulkscore=0 adultscore=0 impostorscore=0 lowpriorityscore=0 clxscore=1011 spamscore=0 priorityscore=1501 phishscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511080018 Content-Type: text/plain; charset="utf-8" From: John Garry This is in preparation for adding atomic write support for buffered IO. Since the limits reported by FS for atomic write buffered IO could be different from direct IO, rename STATX_WRITE_ATOMIC -> STATX_WRITE_ATOMIC_DIO and STATX_ATTR_WRITE_ATOMIC -> STATX_ATTR_WRITE_ATOMIC_DIO, to make it clear that they are only relevant to direct IO. Later we will add a separate flag for reporting atomic write with buffered IO Co-developed-by: Ojaswin Mujoo Signed-off-by: Ojaswin Mujoo Signed-off-by: John Garry --- Documentation/filesystems/ext4/atomic_writes.rst | 4 ++-- block/bdev.c | 4 ++-- fs/ext4/inode.c | 2 +- fs/stat.c | 8 ++++---- fs/xfs/xfs_iops.c | 2 +- include/trace/misc/fs.h | 2 +- include/uapi/linux/stat.h | 8 ++++++-- tools/include/uapi/linux/stat.h | 8 ++++++-- tools/perf/trace/beauty/include/uapi/linux/stat.h | 8 ++++++-- 9 files changed, 29 insertions(+), 17 deletions(-) diff --git a/Documentation/filesystems/ext4/atomic_writes.rst b/Documentati= on/filesystems/ext4/atomic_writes.rst index ae8995740aa8..108c9e9cb977 100644 --- a/Documentation/filesystems/ext4/atomic_writes.rst +++ b/Documentation/filesystems/ext4/atomic_writes.rst @@ -189,7 +189,7 @@ The write must be aligned to the filesystem's block siz= e and not exceed the filesystem's maximum atomic write unit size. See ``generic_atomic_write_valid()`` for more details. =20 -``statx()`` system call with ``STATX_WRITE_ATOMIC`` flag can provide follo= wing +``statx()`` system call with ``STATX_WRITE_ATOMIC_DIO`` flag can provide f= ollowing details: =20 * ``stx_atomic_write_unit_min``: Minimum size of an atomic write request. @@ -198,7 +198,7 @@ details: separate memory buffers that can be gathered into a write operation (e.g., the iovcnt parameter for IOV_ITER). Currently, this is always se= t to one. =20 -The STATX_ATTR_WRITE_ATOMIC flag in ``statx->attributes`` is set if atomic +The STATX_ATTR_WRITE_ATOMIC_DIO flag in ``statx->attributes`` is set if at= omic writes are supported. =20 .. _atomic_write_bdev_support: diff --git a/block/bdev.c b/block/bdev.c index 810707cca970..3bc90d5feb4c 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -1308,7 +1308,7 @@ void sync_bdevs(bool wait) } =20 /* - * Handle STATX_{DIOALIGN, WRITE_ATOMIC} for block devices. + * Handle STATX_{DIOALIGN, WRITE_ATOMIC_DIO} for block devices. */ void bdev_statx(const struct path *path, struct kstat *stat, u32 request_m= ask) { @@ -1330,7 +1330,7 @@ void bdev_statx(const struct path *path, struct kstat= *stat, u32 request_mask) stat->result_mask |=3D STATX_DIOALIGN; } =20 - if (request_mask & STATX_WRITE_ATOMIC && bdev_can_atomic_write(bdev)) { + if (request_mask & STATX_WRITE_ATOMIC_DIO && bdev_can_atomic_write(bdev))= { struct request_queue *bd_queue =3D bdev->bd_queue; =20 generic_fill_statx_atomic_writes(stat, diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f9e4ac87211e..9555149a8ba6 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6097,7 +6097,7 @@ int ext4_getattr(struct mnt_idmap *idmap, const struc= t path *path, } } =20 - if ((request_mask & STATX_WRITE_ATOMIC) && S_ISREG(inode->i_mode)) { + if ((request_mask & STATX_WRITE_ATOMIC_DIO) && S_ISREG(inode->i_mode)) { struct ext4_sb_info *sbi =3D EXT4_SB(inode->i_sb); unsigned int awu_min =3D 0, awu_max =3D 0; =20 diff --git a/fs/stat.c b/fs/stat.c index 6c79661e1b96..7eb2a247ab67 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -138,7 +138,7 @@ EXPORT_SYMBOL(generic_fill_statx_attr); * @unit_max: Maximum supported atomic write length in bytes * @unit_max_opt: Optimised maximum supported atomic write length in bytes * - * Fill in the STATX{_ATTR}_WRITE_ATOMIC flags in the kstat structure from + * Fill in the STATX{_ATTR}_WRITE_ATOMIC_DIO flags in the kstat structure = from * atomic write unit_min and unit_max values. */ void generic_fill_statx_atomic_writes(struct kstat *stat, @@ -147,10 +147,10 @@ void generic_fill_statx_atomic_writes(struct kstat *s= tat, unsigned int unit_max_opt) { /* Confirm that the request type is known */ - stat->result_mask |=3D STATX_WRITE_ATOMIC; + stat->result_mask |=3D STATX_WRITE_ATOMIC_DIO; =20 /* Confirm that the file attribute type is known */ - stat->attributes_mask |=3D STATX_ATTR_WRITE_ATOMIC; + stat->attributes_mask |=3D STATX_ATTR_WRITE_ATOMIC_DIO; =20 if (unit_min) { stat->atomic_write_unit_min =3D unit_min; @@ -160,7 +160,7 @@ void generic_fill_statx_atomic_writes(struct kstat *sta= t, stat->atomic_write_segments_max =3D 1; =20 /* Confirm atomic writes are actually supported */ - stat->attributes |=3D STATX_ATTR_WRITE_ATOMIC; + stat->attributes |=3D STATX_ATTR_WRITE_ATOMIC_DIO; } } EXPORT_SYMBOL_GPL(generic_fill_statx_atomic_writes); diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index caff0125faea..f41fcdd3043b 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -741,7 +741,7 @@ xfs_vn_getattr( case S_IFREG: if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN)) xfs_report_dioalign(ip, stat); - if (request_mask & STATX_WRITE_ATOMIC) + if (request_mask & STATX_WRITE_ATOMIC_DIO) xfs_report_atomic_write(ip, stat); fallthrough; default: diff --git a/include/trace/misc/fs.h b/include/trace/misc/fs.h index 7ead1c61f0cb..19ea9339b9bd 100644 --- a/include/trace/misc/fs.h +++ b/include/trace/misc/fs.h @@ -161,5 +161,5 @@ { STATX_DIOALIGN, "DIOALIGN" }, \ { STATX_MNT_ID_UNIQUE, "MNT_ID_UNIQUE" }, \ { STATX_SUBVOL, "SUBVOL" }, \ - { STATX_WRITE_ATOMIC, "WRITE_ATOMIC" }, \ + { STATX_WRITE_ATOMIC_DIO, "WRITE_ATOMIC_DIO" }, \ { STATX_DIO_READ_ALIGN, "DIO_READ_ALIGN" }) diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index 1686861aae20..57f558be933e 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -217,7 +217,9 @@ struct statx { #define STATX_DIOALIGN 0x00002000U /* Want/got direct I/O alignment info = */ #define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id = */ #define STATX_SUBVOL 0x00008000U /* Want/got stx_subvol */ -#define STATX_WRITE_ATOMIC 0x00010000U /* Want/got atomic_write_* fields */ +#define STATX_WRITE_ATOMIC_DIO 0x00010000U /* Want/got dio atomic_write_* = fields */ +/* Old name kept for backward compatibility */ +#define STATX_WRITE_ATOMIC STATX_WRITE_ATOMIC_DIO #define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment in= fo */ =20 #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx e= xpansion */ @@ -254,7 +256,9 @@ struct statx { #define STATX_ATTR_MOUNT_ROOT 0x00002000 /* Root of a mount */ #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ #define STATX_ATTR_DAX 0x00200000 /* File is currently in DAX state */ -#define STATX_ATTR_WRITE_ATOMIC 0x00400000 /* File supports atomic write = operations */ +#define STATX_ATTR_WRITE_ATOMIC_DIO 0x00400000 /* File supports dio atomic= write operations */ +/* Old name kept for backward compatibility */ +#define STATX_ATTR_WRITE_ATOMIC STATX_ATTR_WRITE_ATOMIC_DIO =20 =20 #endif /* _UAPI_LINUX_STAT_H */ diff --git a/tools/include/uapi/linux/stat.h b/tools/include/uapi/linux/sta= t.h index 1686861aae20..57f558be933e 100644 --- a/tools/include/uapi/linux/stat.h +++ b/tools/include/uapi/linux/stat.h @@ -217,7 +217,9 @@ struct statx { #define STATX_DIOALIGN 0x00002000U /* Want/got direct I/O alignment info = */ #define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id = */ #define STATX_SUBVOL 0x00008000U /* Want/got stx_subvol */ -#define STATX_WRITE_ATOMIC 0x00010000U /* Want/got atomic_write_* fields */ +#define STATX_WRITE_ATOMIC_DIO 0x00010000U /* Want/got dio atomic_write_* = fields */ +/* Old name kept for backward compatibility */ +#define STATX_WRITE_ATOMIC STATX_WRITE_ATOMIC_DIO #define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment in= fo */ =20 #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx e= xpansion */ @@ -254,7 +256,9 @@ struct statx { #define STATX_ATTR_MOUNT_ROOT 0x00002000 /* Root of a mount */ #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ #define STATX_ATTR_DAX 0x00200000 /* File is currently in DAX state */ -#define STATX_ATTR_WRITE_ATOMIC 0x00400000 /* File supports atomic write = operations */ +#define STATX_ATTR_WRITE_ATOMIC_DIO 0x00400000 /* File supports dio atomic= write operations */ +/* Old name kept for backward compatibility */ +#define STATX_ATTR_WRITE_ATOMIC STATX_ATTR_WRITE_ATOMIC_DIO =20 =20 #endif /* _UAPI_LINUX_STAT_H */ diff --git a/tools/perf/trace/beauty/include/uapi/linux/stat.h b/tools/perf= /trace/beauty/include/uapi/linux/stat.h index 1686861aae20..57f558be933e 100644 --- a/tools/perf/trace/beauty/include/uapi/linux/stat.h +++ b/tools/perf/trace/beauty/include/uapi/linux/stat.h @@ -217,7 +217,9 @@ struct statx { #define STATX_DIOALIGN 0x00002000U /* Want/got direct I/O alignment info = */ #define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id = */ #define STATX_SUBVOL 0x00008000U /* Want/got stx_subvol */ -#define STATX_WRITE_ATOMIC 0x00010000U /* Want/got atomic_write_* fields */ +#define STATX_WRITE_ATOMIC_DIO 0x00010000U /* Want/got dio atomic_write_* = fields */ +/* Old name kept for backward compatibility */ +#define STATX_WRITE_ATOMIC STATX_WRITE_ATOMIC_DIO #define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment in= fo */ =20 #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx e= xpansion */ @@ -254,7 +256,9 @@ struct statx { #define STATX_ATTR_MOUNT_ROOT 0x00002000 /* Root of a mount */ #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ #define STATX_ATTR_DAX 0x00200000 /* File is currently in DAX state */ -#define STATX_ATTR_WRITE_ATOMIC 0x00400000 /* File supports atomic write = operations */ +#define STATX_ATTR_WRITE_ATOMIC_DIO 0x00400000 /* File supports dio atomic= write operations */ +/* Old name kept for backward compatibility */ +#define STATX_ATTR_WRITE_ATOMIC STATX_ATTR_WRITE_ATOMIC_DIO =20 =20 #endif /* _UAPI_LINUX_STAT_H */ --=20 2.51.0 From nobody Sat Feb 7 23:24:23 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D50312FDC5B; Wed, 12 Nov 2025 11:07:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945647; cv=none; b=TZHbf8IyX+R1iYjLC2ph56osZrXh47Zwk7vd23ZUnjNkFxwxu9z3wDyvWCuzIX+Eu0MI8RowT1XvXhoUPrFIfFn7/taWhXsAzooAQdMZA5d0nkDefAE6hDkcX5h1Bkto+CeMwLXXaCGWO5cOemmX//ANgfWZ6GhNPVXeX6XcyiE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945647; c=relaxed/simple; bh=FLWRr/VJnjjspZmZZKRG6gNx20Ea2h0RZt55siqHEDo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bKGJg8YwKsn+a/ugtPRz/bS/wJelXSnty1P4hqwKL00HHdUmNTlPQlcmFRFGzwsO6WrpuDBnmYLybiGJnmgQ+HmqUkFoYeAmG4kgezHy1C39QEopW4wNDYowZ3DxOKg5VWXlI8MMSxK4VAuXnGlaGyCuCZsMWzN2mq8J8VImOx8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Zsp8IGGU; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Zsp8IGGU" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC8Fa48031499; Wed, 12 Nov 2025 11:06:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=w0hJMBz/zYmRWuH4t 0r130WbFyOsxHRTCGHuO8OgZVY=; b=Zsp8IGGUX0adacqK7k+ZyC2gQLUArA5kD P+0u9+6r5JIOnIyHFQxeyhhUP9Kqww/MqrIuT2Lj8BF4R3Cuykd/4HflkKAU0VVD jiebOKq6ZfQ/ZYBZTVGF3yhEOjyxGbvyep1sdNt0Kc3i4FsQ6JUKmcznoniSP0yG JUwyoIMUodg3UmfGJnsKoz1ybPnxg2pfzxn+0TDNXZtIokV1z+1kzkmZetphg6G9 dQNQIA+ulOQQoCq99KdZRB0tOn0ZokgTx6XsNUlN63o5KhvGM6JIzd60cvVhWqHd Ruaq8bvWXeSgHNIXchpqllc40cx9tp3AVnghct3kvPNA+utXwWqsQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4a9wk8a21g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:06:56 +0000 (GMT) Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 5ACB3dNv020273; Wed, 12 Nov 2025 11:06:56 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4a9wk8a21c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:06:55 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC9llv6011431; Wed, 12 Nov 2025 11:06:54 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4aajw1fe98-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:06:54 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5ACB6rFc23003476 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Nov 2025 11:06:53 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 14C5520043; Wed, 12 Nov 2025 11:06:53 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3DFC020040; Wed, 12 Nov 2025 11:06:48 +0000 (GMT) Received: from li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com (unknown [9.124.210.190]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 12 Nov 2025 11:06:48 +0000 (GMT) From: Ojaswin Mujoo To: Christian Brauner , djwong@kernel.org, ritesh.list@gmail.com, john.g.garry@oracle.com, tytso@mit.edu, willy@infradead.org, dchinner@redhat.com, hch@lst.de Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz, nilay@linux.ibm.com, martin.petersen@oracle.com, rostedt@goodmis.org, axboe@kernel.dk, linux-block@vger.kernel.org, linux-trace-kernel@vger.kernel.org Subject: [RFC PATCH 2/8] mm: Add PG_atomic Date: Wed, 12 Nov 2025 16:36:05 +0530 Message-ID: <5f0a7c62a3c787f2011ada10abe3826a94f99e17.1762945505.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTA4MDAyMiBTYWx0ZWRfX3xol3abUgNip VKmyz7/RYq4tsPh3Uh4rPsga9jfsU05AW8tRZQGbW3X6tXhsPNG/YF0HRPqdfGBlttXVt+3+SdO NUb8c2JaqU8RymN5dbs5OABEMaoGMm238koxi0/ZUcHsfSiSi0B7fuMijWaugmoq3kObX8iBzhL qqOP7xaDCra7Me6fqcTUCpOa6E2wPgesSgh0En0u6lydJC6KXXfus3W1btuuK/NCUjRBZtC+0If AIDXIPhF3hIaRtOylujdeiW/UmTt2KK4c5XE8hWi30uZwZWDdbSFrKoTdvASyqGIP67+4RWb8h3 36wR4zZJ1QOQdG0EKXvrbIFyRjIow1Q6aazKbEhVeZmtjEg9trRXuozyO1Y56SYi/JKgM6KYBrW 4IbyCo+hDnmmqYMAuoBDVLP2SbldOg== X-Authority-Analysis: v=2.4 cv=ZK3aWH7b c=1 sm=1 tr=0 ts=69146a50 cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=yPCof4ZbAAAA:8 a=VnNF1IyMAAAA:8 a=tnb4hg_zOA4vhegj2c4A:9 a=cPQSjfK2_nFv0Q5t_7PE:22 X-Proofpoint-ORIG-GUID: jBYtFSONvtiNMU_wrHLkJ11-nJl7Gx4o X-Proofpoint-GUID: xD3cfsMfEHARDSfeDAr-NdTOxU0oXER0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-12_03,2025-11-11_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 lowpriorityscore=0 priorityscore=1501 suspectscore=0 phishscore=0 impostorscore=0 spamscore=0 bulkscore=0 adultscore=0 clxscore=1011 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511080022 Content-Type: text/plain; charset="utf-8" From: John Garry Add page flag PG_atomic, meaning that a folio needs to be written back atomically. This will be used by for handling RWF_ATOMIC buffered IO in upcoming patches. Co-developed-by: Ojaswin Mujoo Signed-off-by: Ojaswin Mujoo Signed-off-by: John Garry --- include/linux/page-flags.h | 5 +++++ include/trace/events/mmflags.h | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 0091ad1986bf..bdce0f58a77a 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -111,6 +111,7 @@ enum pageflags { PG_swapbacked, /* Page is backed by RAM/swap */ PG_unevictable, /* Page is "unevictable" */ PG_dropbehind, /* drop pages on IO completion */ + PG_atomic, /* Page is marked atomic for buffered atomic writes */ #ifdef CONFIG_MMU PG_mlocked, /* Page is vma mlocked */ #endif @@ -644,6 +645,10 @@ FOLIO_FLAG(unevictable, FOLIO_HEAD_PAGE) __FOLIO_CLEAR_FLAG(unevictable, FOLIO_HEAD_PAGE) FOLIO_TEST_CLEAR_FLAG(unevictable, FOLIO_HEAD_PAGE) =20 +FOLIO_FLAG(atomic, FOLIO_HEAD_PAGE) + __FOLIO_CLEAR_FLAG(atomic, FOLIO_HEAD_PAGE) + FOLIO_TEST_CLEAR_FLAG(atomic, FOLIO_HEAD_PAGE) + #ifdef CONFIG_MMU FOLIO_FLAG(mlocked, FOLIO_HEAD_PAGE) __FOLIO_CLEAR_FLAG(mlocked, FOLIO_HEAD_PAGE) diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index aa441f593e9a..a8294f6146a5 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -159,7 +159,8 @@ TRACE_DEFINE_ENUM(___GFP_LAST_BIT); DEF_PAGEFLAG_NAME(reclaim), \ DEF_PAGEFLAG_NAME(swapbacked), \ DEF_PAGEFLAG_NAME(unevictable), \ - DEF_PAGEFLAG_NAME(dropbehind) \ + DEF_PAGEFLAG_NAME(dropbehind), \ + DEF_PAGEFLAG_NAME(atomic) \ IF_HAVE_PG_MLOCK(mlocked) \ IF_HAVE_PG_HWPOISON(hwpoison) \ IF_HAVE_PG_IDLE(idle) \ --=20 2.51.0 From nobody Sat Feb 7 23:24:23 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 620EA30DD0F; Wed, 12 Nov 2025 11:07:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945650; cv=none; b=K/5DVbJ+IX2hk6EA8vXd2Ia8CtBUIkqQl5xeFBrtaoEtELO04P57f2Kp6z8AC0wO5pPMv6Yio1UL3Mo6IHW32n3MsIqwCLIsXiY7EVyq9Py9YGEyvwm4hpwquvSp2VRLGuqE1U7IspsnMLDC5EicPn8dkeybEc4KIbLAClUQAPk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945650; c=relaxed/simple; bh=0hn4x7y3RdafhF+sw9aYc80jSchaY5r4idcjJCTvt+A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T9HtOfjeRiZt7vULaS2NfbcU7e0efB/f72ApNTer2tOkciUJYsboUAiP2xeg2RhKGeE0XdftkuQVY7PmQAP0IFvgfj8lauCMyGcmYFENzFYLoDHfmanNH9iQ42Wv1AaA2h2qQbNfgZW8nv70Wl52vMkdCAEVpd1+HCFu/FnPnTI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=CNh9saPB; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="CNh9saPB" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC75R3S006381; Wed, 12 Nov 2025 11:07:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=mW0pKyBwlZQvck5vc KM5MUZDXfpxzSNxq6d+pSQ5Jas=; b=CNh9saPBrKG1sEn+nLEFZ6607f9QJOqFY xzJ6e/L4AJ0kSr8zGiCUnxn20Z9+LUQaZOcix6xn4RDzpb3No1ygaQ2tzY6cNjpY 9FBW4JK8IO8pbpBpyNCoXyTfSZKqoU49nZeuj8oJu97w0A0pYvY3/9PJaz4kl8mG 7Rum/Be0qM+X4ExAPdXqSyLo6GPi6pA+detTIWEA7YLzgF7APdp71S2lYG/Ag1Zq BmPJGD4VX+rWM7sW7PnLDz+N+kihcv+gdxmXtdEp6/zqwCVirG44F/us7BWbwFqF UXEp/lGm9tpEudTltQQ4UYB/x3Qf18pGL8AZahy9f8akFO3oh51iA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4a9wgx0q3b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:01 +0000 (GMT) Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 5ACB71vS031404; Wed, 12 Nov 2025 11:07:01 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4a9wgx0q39-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:01 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5ACA6he4007314; Wed, 12 Nov 2025 11:07:00 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4aajdjfjmw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:00 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5ACB6wtC62849296 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Nov 2025 11:06:58 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6632F20043; Wed, 12 Nov 2025 11:06:58 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 79CD120040; Wed, 12 Nov 2025 11:06:53 +0000 (GMT) Received: from li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com (unknown [9.124.210.190]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 12 Nov 2025 11:06:53 +0000 (GMT) From: Ojaswin Mujoo To: Christian Brauner , djwong@kernel.org, ritesh.list@gmail.com, john.g.garry@oracle.com, tytso@mit.edu, willy@infradead.org, dchinner@redhat.com, hch@lst.de Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz, nilay@linux.ibm.com, martin.petersen@oracle.com, rostedt@goodmis.org, axboe@kernel.dk, linux-block@vger.kernel.org, linux-trace-kernel@vger.kernel.org Subject: [RFC PATCH 3/8] fs: Add initial buffered atomic write support info to statx Date: Wed, 12 Nov 2025 16:36:06 +0530 Message-ID: X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ZzKbyI_GzfEh3BUnmUWsa68tSjkRaDI7 X-Proofpoint-ORIG-GUID: LWs0wFaB3T-4QBBE3xPobpPwgyOlT-Iq X-Authority-Analysis: v=2.4 cv=VMPQXtPX c=1 sm=1 tr=0 ts=69146a55 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=yPCof4ZbAAAA:8 a=VnNF1IyMAAAA:8 a=8MBPUBWLlftwMkNx52MA:9 a=cPQSjfK2_nFv0Q5t_7PE:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTA4MDAyMiBTYWx0ZWRfX6GWhHtiOHnq9 p0fly0Ldo2r+tn7E6bg+JHOSWHbSyNWIDjk60jmdC1D1nG3Zw8DoNsEMDo4BLsefeGHn6VhvmRF SxBmYmdGZLahlYcVv6fesTsVM4wCE11+M7cObL1PJmCBgYuCZwI0PxDLaF1SeOkuFaZll1YhtUS 72BAu2QCs60XjwVx3+sY88svKkGhPNcHEpUB2t+zns+Dtaz/Ernb+FRVuyin86JkUQJ868d9Hgx emkigW7MiN0y4jx6XnC2aVd7Tg4KEtmdO5OKlGPYOF1sffcRqE+IksuQ0bCY3sG8GgyfHqOpfNe yJ1JddKMXG8VfuDkZmxb9zGQUq0gNt2KXxO/OHsDmnFL8SqngTj5yjY93CeoF/piXsVWpxjLdNA 9Dlpjh1QjRuFNQzR6BolY68orNbofA== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-12_03,2025-11-11_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 bulkscore=0 clxscore=1011 phishscore=0 spamscore=0 malwarescore=0 adultscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511080022 Content-Type: text/plain; charset="utf-8" Extend statx system call to return additional info for buffered atomic write support for a file. Currently only direct IO is supported. New flags STATX_WRITE_ATOMIC_BUF and STATX_ATTR_WRITE_ATOMIC_BUF are for indicating whether the file knows and supports buffered atomic writes. Structure statx members stx_atomic_write_unit_{min, max, segments_max} will be reused for bufferd atomic writes. Flags STATX_WRITE_ATOMIC_DIO and STATX_WRITE_ATOMIC_BUF are mutually exclusive. With both flags set, statx will ignore the request and neither fields in statx.result_mask will be set. Also, make sure ext4 and xfs report atomic write unit min and max of 0 when the new flag is passed. Co-developed-by: John Garry Signed-off-by: John Garry Signed-off-by: Ojaswin Mujoo --- block/bdev.c | 3 +- fs/ext4/inode.c | 7 +- fs/stat.c | 33 +++-- fs/xfs/xfs_file.c | 9 +- fs/xfs/xfs_iops.c | 121 ++++++++++-------- fs/xfs/xfs_iops.h | 6 +- include/linux/fs.h | 3 +- include/trace/misc/fs.h | 1 + include/uapi/linux/stat.h | 2 + tools/include/uapi/linux/stat.h | 2 + .../trace/beauty/include/uapi/linux/stat.h | 2 + 11 files changed, 119 insertions(+), 70 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index 3bc90d5feb4c..8f0eab0a1ecf 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -1335,8 +1335,7 @@ void bdev_statx(const struct path *path, struct kstat= *stat, u32 request_mask) =20 generic_fill_statx_atomic_writes(stat, queue_atomic_write_unit_min_bytes(bd_queue), - queue_atomic_write_unit_max_bytes(bd_queue), - 0); + queue_atomic_write_unit_max_bytes(bd_queue), 0, true); } =20 stat->blksize =3D bdev_io_min(bdev); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9555149a8ba6..0d5013993fba 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6106,8 +6106,11 @@ int ext4_getattr(struct mnt_idmap *idmap, const stru= ct path *path, awu_max =3D sbi->s_awu_max; } =20 - generic_fill_statx_atomic_writes(stat, awu_min, awu_max, 0); - } + generic_fill_statx_atomic_writes(stat, awu_min, awu_max, 0, + true); + } else if (request_mask & STATX_WRITE_ATOMIC_BUF) + /* Atomic writes for buferred IO not supported yet */ + generic_fill_statx_atomic_writes(stat, 0, 0, 0, false); =20 flags =3D ei->i_flags & EXT4_FL_USER_VISIBLE; if (flags & EXT4_APPEND_FL) diff --git a/fs/stat.c b/fs/stat.c index 7eb2a247ab67..8ba3993dcd09 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -137,20 +137,27 @@ EXPORT_SYMBOL(generic_fill_statx_attr); * @unit_min: Minimum supported atomic write length in bytes * @unit_max: Maximum supported atomic write length in bytes * @unit_max_opt: Optimised maximum supported atomic write length in bytes + * @is_dio: Is the stat request for dio * - * Fill in the STATX{_ATTR}_WRITE_ATOMIC_DIO flags in the kstat structure = from - * atomic write unit_min and unit_max values. + * Fill in the STATX{_ATTR}_WRITE_ATOMIC_{DIO,BUF} flags in the kstat stru= cture + * from atomic write unit_min and unit_max values. */ void generic_fill_statx_atomic_writes(struct kstat *stat, unsigned int unit_min, unsigned int unit_max, - unsigned int unit_max_opt) + unsigned int unit_max_opt, + bool is_dio) { - /* Confirm that the request type is known */ - stat->result_mask |=3D STATX_WRITE_ATOMIC_DIO; + if (is_dio) { + /* Confirm that the request type is known */ + stat->result_mask |=3D STATX_WRITE_ATOMIC_DIO; =20 - /* Confirm that the file attribute type is known */ - stat->attributes_mask |=3D STATX_ATTR_WRITE_ATOMIC_DIO; + /* Confirm that the file attribute type is known */ + stat->attributes_mask |=3D STATX_ATTR_WRITE_ATOMIC_DIO; + } else { + stat->result_mask |=3D STATX_WRITE_ATOMIC_BUF; + stat->attributes_mask |=3D STATX_ATTR_WRITE_ATOMIC_BUF; + } =20 if (unit_min) { stat->atomic_write_unit_min =3D unit_min; @@ -160,7 +167,10 @@ void generic_fill_statx_atomic_writes(struct kstat *st= at, stat->atomic_write_segments_max =3D 1; =20 /* Confirm atomic writes are actually supported */ - stat->attributes |=3D STATX_ATTR_WRITE_ATOMIC_DIO; + if (is_dio) + stat->attributes |=3D STATX_ATTR_WRITE_ATOMIC_DIO; + else + stat->attributes |=3D STATX_ATTR_WRITE_ATOMIC_BUF; } } EXPORT_SYMBOL_GPL(generic_fill_statx_atomic_writes); @@ -206,6 +216,13 @@ int vfs_getattr_nosec(const struct path *path, struct = kstat *stat, stat->attributes_mask |=3D (STATX_ATTR_AUTOMOUNT | STATX_ATTR_DAX); =20 + if (request_mask & STATX_WRITE_ATOMIC_BUF && + request_mask & STATX_WRITE_ATOMIC_DIO) { + /* Both are mutually exclusive, disable them */ + request_mask &=3D + ~(STATX_WRITE_ATOMIC_BUF | STATX_WRITE_ATOMIC_DIO); + } + idmap =3D mnt_idmap(path->mnt); if (inode->i_op->getattr) { int ret; diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 5b9864c8582e..3efa575570ed 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1087,6 +1087,7 @@ xfs_file_write_iter( struct xfs_inode *ip =3D XFS_I(inode); ssize_t ret; size_t ocount =3D iov_iter_count(from); + bool is_dio =3D iocb->ki_flags & IOCB_DIRECT; =20 XFS_STATS_INC(ip->i_mount, xs_write_calls); =20 @@ -1097,10 +1098,10 @@ xfs_file_write_iter( return -EIO; =20 if (iocb->ki_flags & IOCB_ATOMIC) { - if (ocount < xfs_get_atomic_write_min(ip)) + if (ocount < xfs_get_atomic_write_min(ip, is_dio)) return -EINVAL; =20 - if (ocount > xfs_get_atomic_write_max(ip)) + if (ocount > xfs_get_atomic_write_max(ip, is_dio)) return -EINVAL; =20 ret =3D generic_atomic_write_valid(iocb, from); @@ -1111,7 +1112,7 @@ xfs_file_write_iter( if (IS_DAX(inode)) return xfs_file_dax_write(iocb, from); =20 - if (iocb->ki_flags & IOCB_DIRECT) { + if (is_dio) { /* * Allow a directio write to fall back to a buffered * write *only* in the case that we're doing a reflink @@ -1568,7 +1569,7 @@ xfs_file_open( if (xfs_is_shutdown(XFS_M(inode->i_sb))) return -EIO; file->f_mode |=3D FMODE_NOWAIT | FMODE_CAN_ODIRECT; - if (xfs_get_atomic_write_min(XFS_I(inode)) > 0) + if (xfs_get_atomic_write_min(XFS_I(inode), file->f_flags & O_DIRECT) > 0) file->f_mode |=3D FMODE_CAN_ATOMIC_WRITE; return generic_file_open(inode, file); } diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index f41fcdd3043b..f036c46b19c5 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -601,81 +601,99 @@ xfs_report_dioalign( =20 unsigned int xfs_get_atomic_write_min( - struct xfs_inode *ip) + struct xfs_inode *ip, + bool is_dio) { - struct xfs_mount *mp =3D ip->i_mount; + if (is_dio) { + struct xfs_mount *mp =3D ip->i_mount; =20 - /* - * If we can complete an atomic write via atomic out of place writes, - * then advertise a minimum size of one fsblock. Without this - * mechanism, we can only guarantee atomic writes up to a single LBA. - * - * If out of place writes are not available, we can guarantee an atomic - * write of exactly one single fsblock if the bdev will make that - * guarantee for us. - */ - if (xfs_inode_can_hw_atomic_write(ip) || - xfs_inode_can_sw_atomic_write(ip)) - return mp->m_sb.sb_blocksize; + /* + * If we can complete an atomic write via atomic out of place writes, + * then advertise a minimum size of one fsblock. Without this + * mechanism, we can only guarantee atomic writes up to a single LBA. + * + * If out of place writes are not available, we can guarantee an atomic + * write of exactly one single fsblock if the bdev will make that + * guarantee for us. + */ + if (xfs_inode_can_hw_atomic_write(ip) || + xfs_inode_can_sw_atomic_write(ip)) + return mp->m_sb.sb_blocksize; + } =20 + /* buffered IO not supported yet so return 0 right away */ return 0; } =20 unsigned int xfs_get_atomic_write_max( - struct xfs_inode *ip) + struct xfs_inode *ip, + bool is_dio) { struct xfs_mount *mp =3D ip->i_mount; =20 - /* - * If out of place writes are not available, we can guarantee an atomic - * write of exactly one single fsblock if the bdev will make that - * guarantee for us. - */ - if (!xfs_inode_can_sw_atomic_write(ip)) { - if (xfs_inode_can_hw_atomic_write(ip)) - return mp->m_sb.sb_blocksize; - return 0; + if (is_dio) { + /* + * If out of place writes are not available, we can guarantee an atomic + * write of exactly one single fsblock if the bdev will make that + * guarantee for us. + */ + if (!xfs_inode_can_sw_atomic_write(ip)) { + if (xfs_inode_can_hw_atomic_write(ip)) + return mp->m_sb.sb_blocksize; + return 0; + } + + /* + * If we can complete an atomic write via atomic out of place writes, + * then advertise a maximum size of whatever we can complete through + * that means. Hardware support is reported via max_opt, not here. + */ + if (XFS_IS_REALTIME_INODE(ip)) + return XFS_FSB_TO_B(mp, mp->m_groups[XG_TYPE_RTG].awu_max); + return XFS_FSB_TO_B(mp, mp->m_groups[XG_TYPE_AG].awu_max); } =20 - /* - * If we can complete an atomic write via atomic out of place writes, - * then advertise a maximum size of whatever we can complete through - * that means. Hardware support is reported via max_opt, not here. - */ - if (XFS_IS_REALTIME_INODE(ip)) - return XFS_FSB_TO_B(mp, mp->m_groups[XG_TYPE_RTG].awu_max); - return XFS_FSB_TO_B(mp, mp->m_groups[XG_TYPE_AG].awu_max); + /* buffered IO not supported yet so return 0 right away */ + return 0; } =20 unsigned int xfs_get_atomic_write_max_opt( - struct xfs_inode *ip) + struct xfs_inode *ip, + bool is_dio) { - unsigned int awu_max =3D xfs_get_atomic_write_max(ip); + if (is_dio) { + unsigned int awu_max =3D xfs_get_atomic_write_max(ip, is_dio); =20 - /* if the max is 1x block, then just keep behaviour that opt is 0 */ - if (awu_max <=3D ip->i_mount->m_sb.sb_blocksize) - return 0; + /* if the max is 1x block, then just keep behaviour that opt is 0 */ + if (awu_max <=3D ip->i_mount->m_sb.sb_blocksize) + return 0; =20 - /* - * Advertise the maximum size of an atomic write that we can tell the - * block device to perform for us. In general the bdev limit will be - * less than our out of place write limit, but we don't want to exceed - * the awu_max. - */ - return min(awu_max, xfs_inode_buftarg(ip)->bt_awu_max); + /* + * Advertise the maximum size of an atomic write that we can tell the + * block device to perform for us. In general the bdev limit will be + * less than our out of place write limit, but we don't want to exceed + * the awu_max. + */ + return min(awu_max, xfs_inode_buftarg(ip)->bt_awu_max); + } + + /* buffered IO not supported yet so return 0 right away */ + return 0; } =20 static void xfs_report_atomic_write( struct xfs_inode *ip, - struct kstat *stat) + struct kstat *stat, + bool is_dio) { generic_fill_statx_atomic_writes(stat, - xfs_get_atomic_write_min(ip), - xfs_get_atomic_write_max(ip), - xfs_get_atomic_write_max_opt(ip)); + xfs_get_atomic_write_min(ip, is_dio), + xfs_get_atomic_write_max(ip, is_dio), + xfs_get_atomic_write_max_opt(ip, is_dio), + is_dio); } =20 STATIC int @@ -741,8 +759,11 @@ xfs_vn_getattr( case S_IFREG: if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN)) xfs_report_dioalign(ip, stat); - if (request_mask & STATX_WRITE_ATOMIC_DIO) - xfs_report_atomic_write(ip, stat); + if (request_mask & + (STATX_WRITE_ATOMIC_DIO | STATX_WRITE_ATOMIC_BUF)) + xfs_report_atomic_write(ip, stat, + (request_mask & + STATX_WRITE_ATOMIC_DIO)); fallthrough; default: stat->blksize =3D xfs_stat_blksize(ip); diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h index 0896f6b8b3b8..09e79263add1 100644 --- a/fs/xfs/xfs_iops.h +++ b/fs/xfs/xfs_iops.h @@ -19,8 +19,8 @@ int xfs_inode_init_security(struct inode *inode, struct i= node *dir, extern void xfs_setup_inode(struct xfs_inode *ip); extern void xfs_setup_iops(struct xfs_inode *ip); extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init); -unsigned int xfs_get_atomic_write_min(struct xfs_inode *ip); -unsigned int xfs_get_atomic_write_max(struct xfs_inode *ip); -unsigned int xfs_get_atomic_write_max_opt(struct xfs_inode *ip); +unsigned int xfs_get_atomic_write_min(struct xfs_inode *ip, bool is_dio); +unsigned int xfs_get_atomic_write_max(struct xfs_inode *ip, bool is_dio); +unsigned int xfs_get_atomic_write_max_opt(struct xfs_inode *ip, bool is_di= o); =20 #endif /* __XFS_IOPS_H__ */ diff --git a/include/linux/fs.h b/include/linux/fs.h index c895146c1444..2dec66913e97 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3563,7 +3563,8 @@ void generic_fill_statx_attr(struct inode *inode, str= uct kstat *stat); void generic_fill_statx_atomic_writes(struct kstat *stat, unsigned int unit_min, unsigned int unit_max, - unsigned int unit_max_opt); + unsigned int unit_max_opt, + bool is_dio); extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, uns= igned int); extern int vfs_getattr(const struct path *, struct kstat *, u32, unsigned = int); void __inode_add_bytes(struct inode *inode, loff_t bytes); diff --git a/include/trace/misc/fs.h b/include/trace/misc/fs.h index 19ea9339b9bd..3b69910a5998 100644 --- a/include/trace/misc/fs.h +++ b/include/trace/misc/fs.h @@ -162,4 +162,5 @@ { STATX_MNT_ID_UNIQUE, "MNT_ID_UNIQUE" }, \ { STATX_SUBVOL, "SUBVOL" }, \ { STATX_WRITE_ATOMIC_DIO, "WRITE_ATOMIC_DIO" }, \ + { STATX_WRITE_ATOMIC_BUF, "WRITE_ATOMIC_BUF" }, \ { STATX_DIO_READ_ALIGN, "DIO_READ_ALIGN" }) diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index 57f558be933e..2d77da04df23 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -221,6 +221,7 @@ struct statx { /* Old name kept for backward compatibility */ #define STATX_WRITE_ATOMIC STATX_WRITE_ATOMIC_DIO #define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment in= fo */ +#define STATX_WRITE_ATOMIC_BUF 0x00040000U /* Want/got buf-io atomic_write= _* fields */ =20 #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx e= xpansion */ =20 @@ -259,6 +260,7 @@ struct statx { #define STATX_ATTR_WRITE_ATOMIC_DIO 0x00400000 /* File supports dio atomic= write operations */ /* Old name kept for backward compatibility */ #define STATX_ATTR_WRITE_ATOMIC STATX_ATTR_WRITE_ATOMIC_DIO +#define STATX_ATTR_WRITE_ATOMIC_BUF 0x00800000 /* File supports buf-io ato= mic write operations */ =20 =20 #endif /* _UAPI_LINUX_STAT_H */ diff --git a/tools/include/uapi/linux/stat.h b/tools/include/uapi/linux/sta= t.h index 57f558be933e..a7e0036669c2 100644 --- a/tools/include/uapi/linux/stat.h +++ b/tools/include/uapi/linux/stat.h @@ -221,6 +221,7 @@ struct statx { /* Old name kept for backward compatibility */ #define STATX_WRITE_ATOMIC STATX_WRITE_ATOMIC_DIO #define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment in= fo */ +#define STATX_WRITE_ATOMIC_BUF 0x00040000U /* Want/got buf-io atomic_writ= e_* fields */ =20 #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx e= xpansion */ =20 @@ -259,6 +260,7 @@ struct statx { #define STATX_ATTR_WRITE_ATOMIC_DIO 0x00400000 /* File supports dio atomic= write operations */ /* Old name kept for backward compatibility */ #define STATX_ATTR_WRITE_ATOMIC STATX_ATTR_WRITE_ATOMIC_DIO +#define STATX_ATTR_WRITE_ATOMIC_BUF 0x00800000 /* File supports buf-io ato= mic write operations */ =20 =20 #endif /* _UAPI_LINUX_STAT_H */ diff --git a/tools/perf/trace/beauty/include/uapi/linux/stat.h b/tools/perf= /trace/beauty/include/uapi/linux/stat.h index 57f558be933e..2d77da04df23 100644 --- a/tools/perf/trace/beauty/include/uapi/linux/stat.h +++ b/tools/perf/trace/beauty/include/uapi/linux/stat.h @@ -221,6 +221,7 @@ struct statx { /* Old name kept for backward compatibility */ #define STATX_WRITE_ATOMIC STATX_WRITE_ATOMIC_DIO #define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment in= fo */ +#define STATX_WRITE_ATOMIC_BUF 0x00040000U /* Want/got buf-io atomic_write= _* fields */ =20 #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx e= xpansion */ =20 @@ -259,6 +260,7 @@ struct statx { #define STATX_ATTR_WRITE_ATOMIC_DIO 0x00400000 /* File supports dio atomic= write operations */ /* Old name kept for backward compatibility */ #define STATX_ATTR_WRITE_ATOMIC STATX_ATTR_WRITE_ATOMIC_DIO +#define STATX_ATTR_WRITE_ATOMIC_BUF 0x00800000 /* File supports buf-io ato= mic write operations */ =20 =20 #endif /* _UAPI_LINUX_STAT_H */ --=20 2.51.0 From nobody Sat Feb 7 23:24:23 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF47E30E0D3; Wed, 12 Nov 2025 11:07:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945651; cv=none; b=KOCDejXo1X+FRtYPKPYdE5ET0HoQBtvyr4Tmhw6CSorH5U4bI8H9jB7Mk96hUynqI0QVV+RkEQeoaXFAcl40n+37D/mrbx/+H4JFrnUAzvE4SicVkC6Wr0g26i1S5v+3XntN2xIgqSc4jemWnFy4uUJD4GcEvAVG1MKvVDC5Ad4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945651; c=relaxed/simple; bh=HuBylNmXprk+5AT+RGf/JL2WhkvqEme6UgO4B6JJ0bg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dLfYJ7Ac55+szW/hTmfxU1iKklKVZqNk4h148qAB/CD6nbXj+Qm4jgBwVo5TNmpJk+670/dzEUZ8uBGVHdrKcIfg5zSw4CC/J2nFbCSZVn9QrUlRuirkuEYiNCPAMssF+xQ5KbyNHCvz936cEVy9tZQ2a8T1Xh2QOt/F+R7+Mos= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=hdaGaeA9; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="hdaGaeA9" Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC1H5ZP006228; Wed, 12 Nov 2025 11:07:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=JJFr4keXRYU2BQ2OG girGNGca9qGjwsLa/SbnM3D8iw=; b=hdaGaeA9BldZ8pKKwfDsXLBrZoO1PaP/P IOeuwCb9cfxZHgOr+Qqt9w1uN/9W6GBdyr99dYcZAirZkz/sQrQj7A2cPDpsoOEi c2LHXZdznXCWyQjBI/WEt4boKQuoghxSBSfO+QjY7ddEInPq2CyR4/tXdb3qajbW XX8p/ur/tfAWoXS93Gg9uPgTrItikzIrKcS5Wv/XzKHAUz9rboOkfIArFQEck2dj IOFz3Tt+WkABOdm7rYV5Vo+O37yPS5LoeKhG05r7RwikguIMFkEL4v+Fyma1C4B8 tj1iAF1gA4MuX9cpYZJdOWCb39rWIauGUedSZa/84qENQllsHK+wg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4a9wc7a4sc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:07 +0000 (GMT) Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 5ACB77Zv006058; Wed, 12 Nov 2025 11:07:07 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4a9wc7a4s8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:07 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC9pwaY028880; Wed, 12 Nov 2025 11:07:06 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4aag6sfwtj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:06 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5ACB749A14942606 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Nov 2025 11:07:04 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9609A2004F; Wed, 12 Nov 2025 11:07:03 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CD17220040; Wed, 12 Nov 2025 11:06:58 +0000 (GMT) Received: from li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com (unknown [9.124.210.190]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 12 Nov 2025 11:06:58 +0000 (GMT) From: Ojaswin Mujoo To: Christian Brauner , djwong@kernel.org, ritesh.list@gmail.com, john.g.garry@oracle.com, tytso@mit.edu, willy@infradead.org, dchinner@redhat.com, hch@lst.de Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz, nilay@linux.ibm.com, martin.petersen@oracle.com, rostedt@goodmis.org, axboe@kernel.dk, linux-block@vger.kernel.org, linux-trace-kernel@vger.kernel.org Subject: [RFC PATCH 4/8] iomap: buffered atomic write support Date: Wed, 12 Nov 2025 16:36:07 +0530 Message-ID: <8229fb9bcd2504b80caf0e763b1984d7ee6178b0.1762945505.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTA4MDAxOCBTYWx0ZWRfX/EFQSS99HVuT vjXvGh6n+TI2CpvVloLf9Dt/q2EePju9VVIVE11xJPxBEprRFhkfRf9Mm/zduTxHjXMfSQ76bBB bxf2rG0JIBJsicSlPxn7zpGXTbqgA1I/o75kYFOWEOwByybE2zyY++1t+VNqWY8hoWrjAFGizQJ knZ7DDImDJPHKsGhOp+oqZp+J26IZmr9M9F5+OUxtkHV2p3rVQfvQp1DbYYDpZiWvVK6eh76YaX R9HhHGpbqM7Imeu0mEAXYq+FRpB1Miuky1OdFoER1eRNafyVJyyNf+BiyBfrf6CYvkIgVUmJwi5 7oACn6C1nTrPJQuj+DdIt2FnZ2cY1gj3zlreSxLKlkMuOkU5eflEXWEpjlPB819Ok5XXaDGqGAk hkWXrQvAIuJUw9Gsj3LdXNJeJoVvWA== X-Authority-Analysis: v=2.4 cv=GcEaXAXL c=1 sm=1 tr=0 ts=69146a5b cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=yPCof4ZbAAAA:8 a=VnNF1IyMAAAA:8 a=dV1BHfAgKDRkxd7NJUoA:9 a=cPQSjfK2_nFv0Q5t_7PE:22 X-Proofpoint-GUID: tIXMLBmZYTUoyOIrw_fe93mzLokeKu4D X-Proofpoint-ORIG-GUID: caDOk5hfyFffsVCTQpLSUncy-bdl3obK X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-12_03,2025-11-11_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 malwarescore=0 bulkscore=0 adultscore=0 impostorscore=0 lowpriorityscore=0 clxscore=1015 spamscore=0 priorityscore=1501 phishscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511080018 Content-Type: text/plain; charset="utf-8" Add special handling of PG_atomic flag to iomap buffered write path. To flag an iomap iter for an atomic write, set IOMAP_ATOMIC. For a folio associated with a write which has IOMAP_ATOMIC set, set PG_atomic. Otherwise, when IOMAP_ATOMIC is unset, clear PG_atomic. This means that for an "atomic" folio which has not been written back, it loses it "atomicity". So if userspace issues a write with RWF_ATOMIC set and another write with RWF_ATOMIC unset, that folio is not written back atomically. For such a scenario to occur, it would be considered a userspace usage error. To ensure that a buffered atomic write is written back atomically when the write syscall returns, RWF_SYNC or similar needs to be used (in conjunction with RWF_ATOMIC). Only a single BIO should ever be submitted for an atomic write. So modify iomap_add_to_ioend() to ensure that we don't try to write back an atomic folio as part of a larger mixed-atomicity BIO. In iomap_alloc_ioend(), handle an atomic write by setting REQ_ATOMIC for the allocated BIO. When a folio is written back, again clear PG_atomic, as it is no longer required. Currently, RWF_ATOMIC with buffered IO is limited to single block size writes, and has 2 main restrictions: 1. Only blocksize =3D=3D pagesize is supported 2. Writes where the user buffer is not aligned to PAGE_SIZE are not supported For more details, refer to the comment in generic_atomic_write_valid() Co-developed-by: John Garry Signed-off-by: John Garry Signed-off-by: Ojaswin Mujoo --- fs/iomap/buffered-io.c | 48 ++++++++++++++++++++++++++++++++++++------ fs/iomap/ioend.c | 18 ++++++++++++---- fs/read_write.c | 34 ++++++++++++++++++++++++++++-- include/linux/iomap.h | 2 ++ 4 files changed, 89 insertions(+), 13 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index f099c086cbe8..947c76c2688a 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -850,11 +850,13 @@ static int iomap_write_begin(struct iomap_iter *iter, { const struct iomap *srcmap =3D iomap_iter_srcmap(iter); loff_t pos; - u64 len =3D min_t(u64, SIZE_MAX, iomap_length(iter)); + u64 orig_len =3D min_t(u64, SIZE_MAX, iomap_length(iter)); + u64 len; struct folio *folio; int status =3D 0; + bool is_atomic =3D iter->flags & IOMAP_ATOMIC; =20 - len =3D min_not_zero(len, *plen); + len =3D min_not_zero(orig_len, *plen); *foliop =3D NULL; *plen =3D 0; =20 @@ -922,6 +924,11 @@ static int iomap_write_begin(struct iomap_iter *iter, if (unlikely(status)) goto out_unlock; =20 + if (is_atomic && (len !=3D orig_len)) { + status =3D -EINVAL; + goto out_unlock; + } + *foliop =3D folio; *plen =3D len; return 0; @@ -931,7 +938,7 @@ static int iomap_write_begin(struct iomap_iter *iter, return status; } =20 -static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len, +static bool __iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t = len, size_t copied, struct folio *folio) { flush_dcache_folio(folio); @@ -951,7 +958,27 @@ static bool __iomap_write_end(struct inode *inode, lof= f_t pos, size_t len, return false; iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len); iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied); - filemap_dirty_folio(inode->i_mapping, folio); + filemap_dirty_folio(iter->inode->i_mapping, folio); + + /* + * Policy: non atomic write over a previously atomic range makes the + * range non-atomic. Handle this here. + */ + if (iter->flags & IOMAP_ATOMIC) { + if (copied < len) { + /* + * A short atomic write is only okay as long as nothing + * is written at all. If we have a partial write, there + * is a bug in our code. + */ + WARN_ON_ONCE(copied !=3D 0); + + return false; + } + folio_set_atomic(folio); + } else + folio_clear_atomic(folio); + return true; } =20 @@ -997,7 +1024,7 @@ static bool iomap_write_end(struct iomap_iter *iter, s= ize_t len, size_t copied, return bh_written =3D=3D copied; } =20 - return __iomap_write_end(iter->inode, pos, len, copied, folio); + return __iomap_write_end(iter, pos, len, copied, folio); } =20 static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i, @@ -1124,6 +1151,8 @@ iomap_file_buffered_write(struct kiocb *iocb, struct = iov_iter *i, iter.flags |=3D IOMAP_NOWAIT; if (iocb->ki_flags & IOCB_DONTCACHE) iter.flags |=3D IOMAP_DONTCACHE; + if (iocb->ki_flags & IOCB_ATOMIC) + iter.flags |=3D IOMAP_ATOMIC; =20 while ((ret =3D iomap_iter(&iter, ops)) > 0) iter.status =3D iomap_write_iter(&iter, i, write_ops); @@ -1588,6 +1617,7 @@ static int iomap_folio_mkwrite_iter(struct iomap_iter= *iter, } else { WARN_ON_ONCE(!folio_test_uptodate(folio)); folio_mark_dirty(folio); + folio_clear_atomic(folio); } =20 return iomap_iter_advance(iter, length); @@ -1642,8 +1672,10 @@ void iomap_finish_folio_write(struct inode *inode, s= truct folio *folio, WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs); WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <=3D 0); =20 - if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending)) + if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending)) { + folio_clear_atomic(folio); folio_end_writeback(folio); + } } EXPORT_SYMBOL_GPL(iomap_finish_folio_write); =20 @@ -1807,8 +1839,10 @@ int iomap_writeback_folio(struct iomap_writepage_ctx= *wpc, struct folio *folio) if (atomic_dec_and_test(&ifs->write_bytes_pending)) folio_end_writeback(folio); } else { - if (!wb_pending) + if (!wb_pending) { + folio_clear_atomic(folio); folio_end_writeback(folio); + } } mapping_set_error(inode->i_mapping, error); return error; diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c index b49fa75eab26..c129a695ceca 100644 --- a/fs/iomap/ioend.c +++ b/fs/iomap/ioend.c @@ -98,13 +98,17 @@ int iomap_ioend_writeback_submit(struct iomap_writepage= _ctx *wpc, int error) EXPORT_SYMBOL_GPL(iomap_ioend_writeback_submit); =20 static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *w= pc, - loff_t pos, u16 ioend_flags) + loff_t pos, u16 ioend_flags, + bool atomic) { struct bio *bio; + blk_opf_t opf =3D REQ_OP_WRITE | wbc_to_write_flags(wpc->wbc); + + if (atomic) + opf |=3D REQ_ATOMIC; =20 bio =3D bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS, - REQ_OP_WRITE | wbc_to_write_flags(wpc->wbc), - GFP_NOFS, &iomap_ioend_bioset); + opf, GFP_NOFS, &iomap_ioend_bioset); bio->bi_iter.bi_sector =3D iomap_sector(&wpc->iomap, pos); bio->bi_write_hint =3D wpc->inode->i_write_hint; wbc_init_bio(wpc->wbc, bio); @@ -122,6 +126,9 @@ static bool iomap_can_add_to_ioend(struct iomap_writepa= ge_ctx *wpc, loff_t pos, if ((ioend_flags & IOMAP_IOEND_NOMERGE_FLAGS) !=3D (ioend->io_flags & IOMAP_IOEND_NOMERGE_FLAGS)) return false; + if ((ioend_flags & IOMAP_IOEND_ATOMIC) || + (ioend->io_flags & IOMAP_IOEND_ATOMIC)) + return false; if (pos !=3D ioend->io_offset + ioend->io_size) return false; if (!(wpc->iomap.flags & IOMAP_F_ANON_WRITE) && @@ -156,6 +163,7 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *= wpc, struct folio *folio, unsigned int ioend_flags =3D 0; unsigned int map_len =3D min_t(u64, dirty_len, wpc->iomap.offset + wpc->iomap.length - pos); + bool is_atomic =3D folio_test_atomic(folio); int error; =20 trace_iomap_add_to_ioend(wpc->inode, pos, dirty_len, &wpc->iomap); @@ -180,6 +188,8 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *= wpc, struct folio *folio, ioend_flags |=3D IOMAP_IOEND_DONTCACHE; if (pos =3D=3D wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY)) ioend_flags |=3D IOMAP_IOEND_BOUNDARY; + if (is_atomic) + ioend_flags |=3D IOMAP_IOEND_ATOMIC; =20 if (!ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) { new_ioend: @@ -188,7 +198,7 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *= wpc, struct folio *folio, if (error) return error; } - wpc->wb_ctx =3D ioend =3D iomap_alloc_ioend(wpc, pos, ioend_flags); + wpc->wb_ctx =3D ioend =3D iomap_alloc_ioend(wpc, pos, ioend_flags, is_at= omic); } =20 if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff)) diff --git a/fs/read_write.c b/fs/read_write.c index 833bae068770..37546aa40f0d 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1802,6 +1802,8 @@ int generic_file_rw_checks(struct file *file_in, stru= ct file *file_out) =20 int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter) { + struct super_block *sb =3D iocb->ki_filp->f_mapping->host->i_sb; + size_t len =3D iov_iter_count(iter); =20 if (!iter_is_ubuf(iter)) @@ -1813,8 +1815,36 @@ int generic_atomic_write_valid(struct kiocb *iocb, s= truct iov_iter *iter) if (!IS_ALIGNED(iocb->ki_pos, len)) return -EINVAL; =20 - if (!(iocb->ki_flags & IOCB_DIRECT)) - return -EOPNOTSUPP; + if (!(iocb->ki_flags & IOCB_DIRECT)) { + /* Some restrictions to buferred IO */ + + /* + * We only support block size =3D=3D page size + * right now. This is to avoid the following: + * 1. 4kb block atomic write marks the complete 64kb folio as + * atomic. + * 2. Other writes, dirty the whole 64kb folio. + * 3. Writeback sees the whole folio dirty and atomic and tries + * to send a 64kb atomic write, which might exceed the + * allowed size and fail. + * + * Once we support sub-page atomic write tracking, we can remove + * this restriction. + */ + if (sb->s_blocksize !=3D PAGE_SIZE) + return -EOPNOTSUPP; + + /* + * If the user buffer of atomic write crosses page boundary, + * there's a possibility of short write, example if 1 user page + * could not be faulted or got reclaimed before the copy + * operation. For now don't allow such a scenario by ensuring + * user buffer is page aligned. + */ + if (!PAGE_ALIGNED(iov_iter_alignment(iter))) + return -EOPNOTSUPP; + + } =20 return 0; } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 8b1ac08c7474..693f3e5ad03c 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -390,6 +390,8 @@ sector_t iomap_bmap(struct address_space *mapping, sect= or_t bno, #define IOMAP_IOEND_DIRECT (1U << 3) /* is DONTCACHE I/O */ #define IOMAP_IOEND_DONTCACHE (1U << 4) +/* is atomic I/O. These are never merged */ +#define IOMAP_IOEND_ATOMIC (1U << 5) =20 /* * Flags that if set on either ioend prevent the merge of two ioends. --=20 2.51.0 From nobody Sat Feb 7 23:24:23 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 741433101DC; Wed, 12 Nov 2025 11:07:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945657; cv=none; b=ZS7411Ius4USm7yQBo5JRB/+nk8aCih31V6R93TxxHT9B0JYpD9Ww7+B6KjEo4sMPzOWcgXIGVP3TQuix6R7i97j65w7l/P800iiL5OvqMXeKw1rjxuwpY83UwoN0tTSFcFFWBmecsumAmizJVf9hbgjYDUEHzWS79nNK6hwyJE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945657; c=relaxed/simple; bh=R2bzbLbXmmS/L6Gh9leejSTadCTX3HQn0kZdy5GqfmY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fGJ0PFHpuYykymsXfa+Qe3bA0hparReFT2HgiQOcKiRz47sL8t9bnanxdcDCGaQJk7zONvpffqJqOUkdVjnZEz4FzSmiFVSDdL8rtAMEk39MhpjYhkEOWPN14Ema8Bz4ri9aUqcNqhszD86Kn3je58csJjQHxpKBtktwz6EnrOk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=rdUaptTB; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="rdUaptTB" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC0G5wA002861; Wed, 12 Nov 2025 11:07:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=EVZQc7qmgA5wJPVG+ Bj3uDtqtTwQveu8wh6J4oOIWpA=; b=rdUaptTBlliAo634fAM/R3w23qCzxijTF Ea+9sQj/jy4hBhqMIeH6Lp84abQ3k5HEyjjCoXzhyWq1Pc2jmM+CKwgR/LwI/09K uzTOe8QayPuEKjdE4fD8x0yaMJItHJJvfpP8Yb4eMEdKXuMRSiQNvXtA4lZSA/FO BpbiFLczD3MoYjh+/SESCSOT3fBgOZNQKRbnCVSjDSpsuVKQ/j/32hGKJxTzBInB /WT7tT9sxrDoFW27xrK63Eesw4BiSpWO/u0rWuSqS6s/X8BwbLLecpvyJ3L22C2s Sm99GML69A6eKF4QgFLs7jrkty7GO1MSRXLb0EyLcpxzpl+X5UXTQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aa3m87y3a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:13 +0000 (GMT) Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 5ACAsEBn031942; Wed, 12 Nov 2025 11:07:12 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aa3m87y35-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:12 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC9imJi008197; Wed, 12 Nov 2025 11:07:12 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4aah6myrcn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:11 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5ACB7ALe15991066 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Nov 2025 11:07:10 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E210C20043; Wed, 12 Nov 2025 11:07:09 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0659B20040; Wed, 12 Nov 2025 11:07:04 +0000 (GMT) Received: from li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com (unknown [9.124.210.190]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 12 Nov 2025 11:07:03 +0000 (GMT) From: Ojaswin Mujoo To: Christian Brauner , djwong@kernel.org, ritesh.list@gmail.com, john.g.garry@oracle.com, tytso@mit.edu, willy@infradead.org, dchinner@redhat.com, hch@lst.de Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz, nilay@linux.ibm.com, martin.petersen@oracle.com, rostedt@goodmis.org, axboe@kernel.dk, linux-block@vger.kernel.org, linux-trace-kernel@vger.kernel.org Subject: [RFC PATCH 5/8] iomap: pin pages for RWF_ATOMIC buffered write Date: Wed, 12 Nov 2025 16:36:08 +0530 Message-ID: X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=MtZfKmae c=1 sm=1 tr=0 ts=69146a61 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=WNJbaA9oiIIJLtcfmx8A:9 a=cPQSjfK2_nFv0Q5t_7PE:22 X-Proofpoint-GUID: FAM-rncxNJ-mpYkmvRm7V1ZZTPdubWGU X-Proofpoint-ORIG-GUID: qXWHihwQ9a3vmRb4XXCAE2VgMmyhm4ms X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTA4MDA3OSBTYWx0ZWRfX+WRaeh1uDxeb lvkABinf8Lb8H6mcI7vr4Ozn0wnY6ljzKG5irmKxBiXzor8yUNGOVF+tJ8PZGf2Lf3H6IJKvU5u spQe+QYlXAFwTt5w9D/kogAN/oCE9yo+cHjGWzvRTN2vx0QCttyVdTAb9Jiq7sibDp8/m9X266p 0APhKcd8LY1Z6gucORb0MWIQI8flR1UYaEkMaueZuv6rzAVPQ384uvzQYW87riwCmiCyyTp+rBp JiPww8wjh/lPZC6kelKOP6/qVBY5r/XWqGv9V5w/zOlw12iuiH1d+jno8gNGQvkLcfaQ+9/cOYF CSeAHWrzXljQ31SyM6b40KjhOjxjNOWTktDebnqwC1g31WlmeRnnsMu9tPD+GT8mWWzlwjPHTQA yvIsuezvumfv34T8shtMUvsusmvTeQ== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-12_03,2025-11-11_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 adultscore=0 priorityscore=1501 bulkscore=0 impostorscore=0 suspectscore=0 lowpriorityscore=0 clxscore=1015 phishscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511080079 Content-Type: text/plain; charset="utf-8" Currently, if the user buffer crosses a page boundary (even if it is a single block write), we can end up with the following scenario: 1. We prefault the 2 user pages in iomap_write_iter. 2. Due to memory pressure, 1 page is reclaimed. 3. copy_folio_from_iter_atomic() ends up doing a short copy This is unacceptable for RWF_ATOMIC writes since at this point our folio is already dirty and we will be unable to recover the old data to guarantee the atomic semantics. Get past this issue by taking inspiration from the direct IO code and performaing the following steps for RWF_ATOMIC: 1. Pin all the user pages. This pins the physical page but the user space mapping can still be unmapped by reclaim code, which can still cause a short write in copy_folio_from_iter_atomic(). 2. To get past the user mapping getting unmapped, don't use the user iter anymore but rather create a bvec out of the pinned pages. This way we area safe from unmapping since we use the kernel's mapping directly. Having a bvec also allows us directly reuse copy_folio_from_iter_atomic(). This ensures we should never see a short write since we prefault and pin the pages in case of RWF_ATOMIC Signed-off-by: Ojaswin Mujoo --- fs/iomap/buffered-io.c | 154 +++++++++++++++++++++++++++++++++++++---- fs/read_write.c | 11 --- 2 files changed, 140 insertions(+), 25 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 947c76c2688a..e7dbe9bcb439 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1027,6 +1027,73 @@ static bool iomap_write_end(struct iomap_iter *iter,= size_t len, size_t copied, return __iomap_write_end(iter, pos, len, copied, folio); } =20 +/* + * Prepare an atomic write by pinning its pages and creating a ITER_BVEC o= ut of + * them. This function also advances the original iter. Incase we encounte= r any + * error later, we revert the progress. + */ +static int iomap_atomic_write_prep(struct iov_iter *i, + struct iov_iter *atomic_iter, + struct bio_vec **atomic_bvecs, + struct page ***pages) +{ + size_t pg_off; + int bytes_pinned =3D 0; + int k =3D 0; + int len, total_len =3D 0, off; + int pinned_pgs =3D 0; + struct bio_vec *tmp_bvecs; + + bytes_pinned =3D iov_iter_extract_pages(i, pages, iov_iter_count(i), + UINT_MAX, 0, &pg_off); + /* + * iov_iter_extract_pages advances the iter but we didn't + * do any work yet, so revert. + */ + iov_iter_revert(i, bytes_pinned); + + pinned_pgs =3D DIV_ROUND_UP(pg_off + bytes_pinned, PAGE_SIZE); + + tmp_bvecs =3D kcalloc(pinned_pgs, sizeof(struct bio_vec), GFP_KERNEL); + + if (unlikely(!tmp_bvecs)) + return -ENOMEM; + + for (struct page *p; k < pinned_pgs && iov_iter_count(i); k++) { + p =3D (*pages)[k]; + off =3D (unsigned long)((char *)i->ubuf + i->iov_offset) % + PAGE_SIZE; + len =3D min(PAGE_SIZE - off, iov_iter_count(i)); + bvec_set_page(&tmp_bvecs[k], p, len, off); + iov_iter_advance(i, len); + total_len +=3D len; + } + + iov_iter_bvec(atomic_iter, ITER_SOURCE, tmp_bvecs, k, total_len); + + *atomic_bvecs =3D tmp_bvecs; + return pinned_pgs; +} + +static void iomap_atomic_write_cleanup(struct page ***pages, int *pinned_p= gs, + struct bio_vec **atomic_bvecs) +{ + if (*pinned_pgs) { + unpin_user_pages(*pages, *pinned_pgs); + *pinned_pgs =3D 0; + } + + if (*pages) { + kfree(*pages); + *pages =3D NULL; + } + + if (*atomic_bvecs) { + kfree(*atomic_bvecs); + *atomic_bvecs =3D NULL; + } +} + static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i, const struct iomap_write_ops *write_ops) { @@ -1035,6 +1102,11 @@ static int iomap_write_iter(struct iomap_iter *iter,= struct iov_iter *i, struct address_space *mapping =3D iter->inode->i_mapping; size_t chunk =3D mapping_max_folio_size(mapping); unsigned int bdp_flags =3D (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0; + bool is_atomic =3D iter->flags & IOMAP_ATOMIC; + struct page **pages =3D NULL; + int pinned_pgs; + struct iov_iter atomic_iter =3D {0}; + struct bio_vec *atomic_bvecs =3D NULL; =20 do { struct folio *folio; @@ -1057,19 +1129,52 @@ static int iomap_write_iter(struct iomap_iter *iter= , struct iov_iter *i, if (bytes > iomap_length(iter)) bytes =3D iomap_length(iter); =20 - /* - * Bring in the user page that we'll copy from _first_. - * Otherwise there's a nasty deadlock on copying from the - * same page as we're writing to, without it being marked - * up-to-date. - * - * For async buffered writes the assumption is that the user - * page has already been faulted in. This can be optimized by - * faulting the user page. - */ - if (unlikely(fault_in_iov_iter_readable(i, bytes) =3D=3D bytes)) { - status =3D -EFAULT; - break; + if (is_atomic) { + /* + * If the user pages get reclaimed or unmapped, we could + * end up faulting and doing a short copy in + * copy_folio_from_iter_atomic(), which is undesirable + * for RWF_ATOMIC. Hence: + * + * 1. Pin the pages to protect against reclaim + * + * 2. Iter's user page can still get unmapped from user + * page table leading to short copy. Protect against + * this by instead using an ITER_BVEC created out of + * the pinned pages. + */ + + pinned_pgs =3D iomap_atomic_write_prep(i, &atomic_iter, &atomic_bvecs, + &pages); + if (unlikely(pinned_pgs <=3D 0)) { + status =3D pinned_pgs; + break; + } + + if (pinned_pgs << PAGE_SHIFT < bytes) { + WARN_RATELIMIT( + true, + "Couldn't pin bytes for atomic write: pinned: %d, needed: %lld", + pinned_pgs << PAGE_SHIFT, bytes); + status =3D -EFAULT; + break; + } + + } else { + /* + * Bring in the user page that we'll copy from _first_. + * Otherwise there's a nasty deadlock on copying from the + * same page as we're writing to, without it being marked + * up-to-date. + * + * For async buffered writes the assumption is that the user + * page has already been faulted in. This can be optimized by + * faulting the user page. + */ + if (unlikely(fault_in_iov_iter_readable(i, bytes) =3D=3D bytes)) { + status =3D -EFAULT; + break; + } } =20 status =3D iomap_write_begin(iter, write_ops, &folio, &offset, @@ -1086,9 +1191,27 @@ static int iomap_write_iter(struct iomap_iter *iter,= struct iov_iter *i, if (mapping_writably_mapped(mapping)) flush_dcache_folio(folio); =20 - copied =3D copy_folio_from_iter_atomic(folio, offset, bytes, i); + copied =3D copy_folio_from_iter_atomic( + folio, offset, bytes, is_atomic ? &atomic_iter : i); written =3D iomap_write_end(iter, bytes, copied, folio) ? copied : 0; + if (is_atomic) { + if (written !=3D bytes) { + /* + * short copy so revert the iter accordingly. + * This should never happen ideally + */ + WARN_RATELIMIT( + 1, + "Short atomic write: bytes_pinned:%d bytes:%lld written:%lld\n", + pinned_pgs << PAGE_SHIFT, bytes, + written); + iov_iter_revert(i, + iov_iter_count(&atomic_iter)); + } + iomap_atomic_write_cleanup(&pages, &pinned_pgs, + &atomic_bvecs); + } =20 /* * Update the in-memory inode size after copying the data into @@ -1130,6 +1253,9 @@ static int iomap_write_iter(struct iomap_iter *iter, = struct iov_iter *i, } } while (iov_iter_count(i) && iomap_length(iter)); =20 + if (is_atomic) + iomap_atomic_write_cleanup(&pages, &pinned_pgs, &atomic_bvecs); + return total_written ? 0 : status; } =20 diff --git a/fs/read_write.c b/fs/read_write.c index 37546aa40f0d..7e064561cc4b 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1833,17 +1833,6 @@ int generic_atomic_write_valid(struct kiocb *iocb, s= truct iov_iter *iter) */ if (sb->s_blocksize !=3D PAGE_SIZE) return -EOPNOTSUPP; - - /* - * If the user buffer of atomic write crosses page boundary, - * there's a possibility of short write, example if 1 user page - * could not be faulted or got reclaimed before the copy - * operation. For now don't allow such a scenario by ensuring - * user buffer is page aligned. - */ - if (!PAGE_ALIGNED(iov_iter_alignment(iter))) - return -EOPNOTSUPP; - } =20 return 0; --=20 2.51.0 From nobody Sat Feb 7 23:24:23 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4511E3126D3; Wed, 12 Nov 2025 11:07:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945661; cv=none; b=LBN/klFD9EZKFmrA8KDqjpKpOUXtQekZzSDjJ0Erd/7C/4De0txYR/1Ag1fUwXBMk5HYR7lb/fxdTChT4fBc2JZGDpHMq3TqOto1ILq1zv2OdgKgb6EyysRKgmLhnDYEuE86TVuBDG2kQ+5Q7TYb5O7ihd9rOl4cRHsHOdcRlq8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945661; c=relaxed/simple; bh=A3LcSddC/0wAS2yz9RBWQH3DTyvWD3gr4JQonCz0n3g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Gyv/slp27a7Ikr4WIKqdCH6CfHpJowQnixJjUmLZQQLyyRXnS4Sc4jTRjtYRvHJQglR5sh5o+nHrozxLFKYAuYj7VNLH4oGoyFxtmWXNV1jNwaDehJ/7P9uqlB4H7j4Xg8/rIl/Qcoaeus5P9fUxDF1nTuCnsNB3IZcNhgmOLN4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=LoRQ6ovR; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="LoRQ6ovR" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC7dnSC002270; Wed, 12 Nov 2025 11:07:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=PVFH6XG0+MNfFYKp/ Q4P2dsd/VFw0IA47SArwJQ6BTI=; b=LoRQ6ovRbPWCckMFiKZKVyRS1GcJD8GlU Nijsz+8v2/CvGFpXY2e50dj72Taq28bbPFI/Z9NjW406BJuma8KKRLZqhxCieWiR nIO5GUpTTUPtimEYiFcU08b/UtPrXWQuXd9D3jOocB5c3nw0MsV2gliXxCKJrsAX ZctEU/WAoM/hUUpEKQRjFO0zXB3Fu1lcX7ZsCxngSgWmoerIPfxHGB2v8VAfzY3z jiYN6nhIMZk/zvuBCeCIW5XfxhRaktMRtDqZQILcZL+5Xb6anX8CmROsyNm72dpK xZR6Z7tl7POBNVhJyWn4FoAkoqLgkzAOp45JTp3lXm9RyH5r3v4yQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4a9wgx0q52-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:18 +0000 (GMT) Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 5ACB0Xta019638; Wed, 12 Nov 2025 11:07:18 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4a9wgx0q4y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:18 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC9l1ZH014762; Wed, 12 Nov 2025 11:07:17 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4aahpk7qd8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:17 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5ACB7Frd41877778 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Nov 2025 11:07:15 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 46B0A2004B; Wed, 12 Nov 2025 11:07:15 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 917A220040; Wed, 12 Nov 2025 11:07:10 +0000 (GMT) Received: from li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com (unknown [9.124.210.190]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 12 Nov 2025 11:07:10 +0000 (GMT) From: Ojaswin Mujoo To: Christian Brauner , djwong@kernel.org, ritesh.list@gmail.com, john.g.garry@oracle.com, tytso@mit.edu, willy@infradead.org, dchinner@redhat.com, hch@lst.de Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz, nilay@linux.ibm.com, martin.petersen@oracle.com, rostedt@goodmis.org, axboe@kernel.dk, linux-block@vger.kernel.org, linux-trace-kernel@vger.kernel.org Subject: [RFC PATCH 6/8] xfs: Report atomic write min and max for buf io as well Date: Wed, 12 Nov 2025 16:36:09 +0530 Message-ID: <9d0ec1039dd3fb40419cef56470ca508f36b8f51.1762945505.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: pufiJPBZmK00VnB_Sp4q2PFNYPTOwycU X-Proofpoint-ORIG-GUID: gVtj59SvZZWfpitmJWZCYcz7I5_8OHLl X-Authority-Analysis: v=2.4 cv=VMPQXtPX c=1 sm=1 tr=0 ts=69146a66 cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=Xt1C_Ey-aFlpf-Mi6TUA:9 a=cPQSjfK2_nFv0Q5t_7PE:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTA4MDAyMiBTYWx0ZWRfX4SKeXmIOv3HO Ye6oGhayyMmFR3oMpLkif2Y/RFw3/uRw/AW5sl9D2nRirIKN6+6BjrcdjcTDdvrYQKnJTeOaaFE qqmgizV5PYhwWexLrotrGlG8yEBCuGbn+2u82vD3qSN5FExI0aXfQDYzSrXhx0l1HqRa6gkSSBe 2rZjR7Y5IXuEPMjuQ9l5lajhwp9Qyp/1kFVewE/7w6rL5ycO6+EH4MwoKw9khVZimRp9KxMPlGQ LuNfAUpMyyycMl0TJMYJRXUl8z7UWioa1ncziuwUKlftUDdGciEas5JzyZyuTFmKyrIDmIjmzHZ 3BgegXAy8/yRHj+7kd6SHGJN7Xc+zn3/3wl6qsXrZGPjWjnLH7q+5yAoxXowA2gMCLf4+mLY6GW b14TkVAhHlBkCy5u1CERwqnHcl/SJg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-12_03,2025-11-11_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 bulkscore=0 clxscore=1015 phishscore=0 spamscore=0 malwarescore=0 adultscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511080022 Content-Type: text/plain; charset="utf-8" Now that we can reliably perform a HW based single block buffered atomic write for page size =3D=3D blocksize, start advertising it in XFS. Signed-off-by: Ojaswin Mujoo --- fs/xfs/xfs_iops.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index f036c46b19c5..67d370947d95 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -604,9 +604,10 @@ xfs_get_atomic_write_min( struct xfs_inode *ip, bool is_dio) { - if (is_dio) { - struct xfs_mount *mp =3D ip->i_mount; + struct xfs_mount *mp =3D ip->i_mount; + uint32_t bs =3D mp->m_sb.sb_blocksize; =20 + if (is_dio) { /* * If we can complete an atomic write via atomic out of place writes, * then advertise a minimum size of one fsblock. Without this @@ -618,10 +619,15 @@ xfs_get_atomic_write_min( */ if (xfs_inode_can_hw_atomic_write(ip) || xfs_inode_can_sw_atomic_write(ip)) - return mp->m_sb.sb_blocksize; + return bs; } + /* + * Buffered IO only supports hw single block atomic writes and bs =3D=3D = ps + * configurations. + */ + if (xfs_inode_can_hw_atomic_write(ip) && bs =3D=3D PAGE_SIZE) + return bs; =20 - /* buffered IO not supported yet so return 0 right away */ return 0; } =20 @@ -630,7 +636,8 @@ xfs_get_atomic_write_max( struct xfs_inode *ip, bool is_dio) { - struct xfs_mount *mp =3D ip->i_mount; + struct xfs_mount *mp =3D ip->i_mount; + uint32_t bs =3D mp->m_sb.sb_blocksize; =20 if (is_dio) { /* @@ -640,7 +647,7 @@ xfs_get_atomic_write_max( */ if (!xfs_inode_can_sw_atomic_write(ip)) { if (xfs_inode_can_hw_atomic_write(ip)) - return mp->m_sb.sb_blocksize; + return bs; return 0; } =20 @@ -653,8 +660,13 @@ xfs_get_atomic_write_max( return XFS_FSB_TO_B(mp, mp->m_groups[XG_TYPE_RTG].awu_max); return XFS_FSB_TO_B(mp, mp->m_groups[XG_TYPE_AG].awu_max); } + /* + * Buffered IO only supports hw single block atomic writes and bs =3D=3D = ps + * configurations. + */ + if (xfs_inode_can_hw_atomic_write(ip) && bs =3D=3D PAGE_SIZE) + return bs; =20 - /* buffered IO not supported yet so return 0 right away */ return 0; } =20 @@ -679,7 +691,7 @@ xfs_get_atomic_write_max_opt( return min(awu_max, xfs_inode_buftarg(ip)->bt_awu_max); } =20 - /* buffered IO not supported yet so return 0 right away */ + /* buffered IO for now only supports 1 filesyste block so max_opt is 0 */ return 0; } =20 --=20 2.51.0 From nobody Sat Feb 7 23:24:23 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F011C223339; Wed, 12 Nov 2025 11:09:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945795; cv=none; b=ecmiTv5MjObgYN7Jc5or6XnaeJUbUA2nTLujNiCO0u5bNUGZR0ddsu3mXWgulvLZPSGJoP7sIvUIzxqRIJB7TXfDFfdD8yP7iub92bzbSIf6IOqFL+Sx+qgByNvpg2WO3AGlfWgmX/Q90H6BQDaIDd9Re84my8OhC98J/TvLbqk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945795; c=relaxed/simple; bh=ze9JsHUSaVuZE/v3m1E2642h1kj/aSqb2oZegmCG9Kg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dpYQsApnBex/36zm50oaA/xVZ3KOLW+4VQ7vvA/Tdt8HK39Cz8wlx1wtIoiHv92Ks4Wk0k+Dlm7QrBbP6Fk+khwzjTpne7x/Znqkm1PEFPtPhBxs6+VohRBHFS50bx+eW2MNL/wLU4TQZCFm6ilsdY6HsMjbeGbbxrjA2LProsQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=ft1H6IdC; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="ft1H6IdC" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC6r5H3016772; Wed, 12 Nov 2025 11:07:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=bsAaJzZ3vua7fqMSK QG3rRqEw9wiAuwDBo4+b/ze2yw=; b=ft1H6IdCKTjrSgAv2uyWMcYt1Qr1ODSVU KzOXKraGHsVey+i+h0nZYMMDbM7lS2xOsJf7i9TjknnbRHMU/dz0Shc29DMAi6xz N0zZfKGoPYKZgfO596fxbSagBP8lZkB2UgktpPOD+jPoF8463aQ4jhHY+vjkBt+8 Jnf+RpXgf8ZixrlSadBaWqAA8Gb1VlXrR7uBN9ZlO/wYM7+6VQeozo9dXEjIR8b3 4YQyRQ2F/NtZNa+7L2gjhvyPHAPHdGdxW9byewwAyJ+DZgBdjWTWoAlNODCRGvgV Eaut8nDMMUMn+9uOhngse9HC/m/Z62yNu/REz+KRZ4STuGyjG6Ong== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aa5cj8u95-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:24 +0000 (GMT) Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 5ACB7NE0030193; Wed, 12 Nov 2025 11:07:23 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aa5cj8u92-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:23 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC9lOPh028939; Wed, 12 Nov 2025 11:07:22 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4aag6sfwuu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:22 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5ACB7KKU38666618 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Nov 2025 11:07:20 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9635B20040; Wed, 12 Nov 2025 11:07:20 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1A7D62004B; Wed, 12 Nov 2025 11:07:16 +0000 (GMT) Received: from li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com (unknown [9.124.210.190]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 12 Nov 2025 11:07:15 +0000 (GMT) From: Ojaswin Mujoo To: Christian Brauner , djwong@kernel.org, ritesh.list@gmail.com, john.g.garry@oracle.com, tytso@mit.edu, willy@infradead.org, dchinner@redhat.com, hch@lst.de Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz, nilay@linux.ibm.com, martin.petersen@oracle.com, rostedt@goodmis.org, axboe@kernel.dk, linux-block@vger.kernel.org, linux-trace-kernel@vger.kernel.org Subject: [RFC PATCH 7/8] iomap: Add bs X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=Ss+dKfO0 c=1 sm=1 tr=0 ts=69146a6c cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=pGLkceISAAAA:8 a=VnNF1IyMAAAA:8 a=25xlhPYPnclPhp86LCAA:9 a=cPQSjfK2_nFv0Q5t_7PE:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTA4MDA5NSBTYWx0ZWRfX3rFQ8pJuLs29 jLf5ByegZBHUkX+w8XbkxOOl7++SvznGlgHyGtstC8BVmF9m2mHNvDf/LTwhdocDp40ozfRTqU7 UtbTGv7a1j12j03JVnBdT75uyxCv99tpBbUc2/G/lbnDEb2dCSo/IQvjtQuBxTs2hPtA4Vi05T0 4G2XVylSJZtaWwy1IqpQInZJfOMJkoYxwcn3aGlF5nVsLkcBSd9+p7x9GeqajnvNfo6GnXYXD55 S6cCmlp09R2CZL/PuIilDQYcuIa4rVglz47BjB+SimQ9weKFvdJARUVuUUo4QonnoCDNp3vY1oy fgrgoHvjFAqp8yDfS1KgbjQele+Mlrx7g+tqrjrOSNg59szt4k2OWjx5tZWSeGXJQ0wNl/bbP/v KXGCppRHTj0k50hmULqLd9FJHF9Xhg== X-Proofpoint-GUID: QmcSEpiTEmEqJrMbbU2YOwO9U_ioD-Bk X-Proofpoint-ORIG-GUID: PLXcJglFIgT8Vty1Y9Nq9jokJ9oNFSGO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-12_03,2025-11-11_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 suspectscore=0 impostorscore=0 bulkscore=0 phishscore=0 lowpriorityscore=0 adultscore=0 priorityscore=1501 spamscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511080095 Content-Type: text/plain; charset="utf-8" Lift bs =3D=3D ps restriction for RWF_ATOMIC buffered writes by adding sub-page support. This is done by adding 2 more bitmaps in folio -> ifs. One bitmap tracks which blocks marks a start of atomic write and the other tracks which marks the end. For single block atomic write, both start and end would be marked on the same block, but the design is kept as such so we can easily extend to multi block later. With the help of the 2 bitmaps, we are able to determine which blocks needs to go atomically together during writeback. This prevents the issue where write amplification could cause an RWF_ATOMIC write which is bigger than supported by the underlying device. As with bs =3D=3D ps support, if there is a non atomic write that overlaps an atomic marked block, we will clear the atomic state of that block in ifs. Similarly, if the folio is mmapd and written to, we will clear atomic bit from all blocks in the folio. To illustrate some examples: A =3D Dirty, atomic block D =3D Dirty, non-atomic block Let pagesize =3D 4k, blocksize =3D 1k 1) - Initial state of blocks in folio: A A D D - Non atomic write from block 0 to 3 - New state: D D D D 2) - Initial state of blocks in folio: A A A A - Non atomic write from block 1 to 2 - New state: A D D A 3) - Initial state of blocks in folio: A A _ _ - mmap write to anyblock in folio - New state: D D D D Suggested-by: Ritesh Harjani (IBM) Signed-off-by: Ojaswin Mujoo --- fs/iomap/buffered-io.c | 207 ++++++++++++++++++++++++++++++++++--- fs/iomap/ioend.c | 9 +- fs/iomap/trace.h | 12 ++- fs/read_write.c | 22 ---- include/linux/iomap.h | 1 + include/linux/page-flags.h | 2 +- 6 files changed, 207 insertions(+), 46 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index e7dbe9bcb439..d86859728e3b 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -26,6 +26,10 @@ struct iomap_folio_state { * Each block has two bits in this bitmap: * Bits [0..blocks_per_folio) has the uptodate status. * Bits [b_p_f...(2*b_p_f)) has the dirty status. + * Bits [2*b_p_f..3*b_p_f) has whether block marks the + * start of an RWF_ATOMIC write + * Bits [3*b_p_f..4*b_p_f) has whether block marks the + * end of an RWF_ATOMIC write */ unsigned long state[]; }; @@ -76,6 +80,25 @@ static void iomap_set_range_uptodate(struct folio *folio= , size_t off, folio_mark_uptodate(folio); } =20 +static inline bool ifs_block_is_atomic_start(struct folio *folio, + struct iomap_folio_state *ifs, int block) +{ + struct inode *inode =3D folio->mapping->host; + unsigned int blks_per_folio =3D i_blocks_per_folio(inode, folio); + + return test_bit(block + (blks_per_folio * 2), ifs->state); +} + +static inline bool ifs_block_is_atomic_end(struct folio *folio, + struct iomap_folio_state *ifs, + int block) +{ + struct inode *inode =3D folio->mapping->host; + unsigned int blks_per_folio =3D i_blocks_per_folio(inode, folio); + + return test_bit(block + (blks_per_folio * 3), ifs->state); +} + static inline bool ifs_block_is_dirty(struct folio *folio, struct iomap_folio_state *ifs, int block) { @@ -85,17 +108,42 @@ static inline bool ifs_block_is_dirty(struct folio *fo= lio, return test_bit(block + blks_per_folio, ifs->state); } =20 +/* + * Returns false if the folio has atleast 1 atomic block, else true + */ +static inline bool ifs_is_fully_non_atomic(struct folio *folio, + struct iomap_folio_state *ifs) +{ + struct inode *inode =3D folio->mapping->host; + unsigned int blks_per_folio =3D i_blocks_per_folio(inode, folio); + + for (int i =3D 0; i < blks_per_folio; i++) { + if (ifs_block_is_atomic_start(folio, ifs, i)) + return false; + } + + return true; +} + static unsigned ifs_find_dirty_range(struct folio *folio, - struct iomap_folio_state *ifs, u64 *range_start, u64 range_end) + struct iomap_folio_state *ifs, + u64 *range_start, u64 range_end, + bool *is_atomic_range) { struct inode *inode =3D folio->mapping->host; + unsigned folio_nblks =3D i_blocks_per_folio(inode, folio); unsigned start_blk =3D offset_in_folio(folio, *range_start) >> inode->i_blkbits; unsigned end_blk =3D min_not_zero( offset_in_folio(folio, range_end) >> inode->i_blkbits, - i_blocks_per_folio(inode, folio)); + folio_nblks); unsigned nblks =3D 1; + bool is_atomic_folio =3D folio_test_atomic(folio); =20 + /* + * We need to be careful in not clubbing together atomic write ranges + * with other dirty blocks + */ while (!ifs_block_is_dirty(folio, ifs, start_blk)) if (++start_blk =3D=3D end_blk) return 0; @@ -106,12 +154,62 @@ static unsigned ifs_find_dirty_range(struct folio *fo= lio, nblks++; } =20 + *is_atomic_range =3D false; + + if (is_atomic_folio) { + unsigned int first_atomic; + unsigned int last =3D start_blk + nblks; + /* + * We now have the dirty range, however if the range has any + * RWF_ATOMIC blocks, we need to make sure to not club them with + * other dirty blocks. + */ + first_atomic =3D start_blk; + while (!ifs_block_is_atomic_start(folio, ifs, first_atomic)) { + if (++first_atomic =3D=3D start_blk + nblks) + break; + } + + if (first_atomic !=3D start_blk + nblks) { + /* RWF_ATOMIC blocks found in dirty range */ + if (first_atomic =3D=3D start_blk) { + /* + * range start is RWF_ATOMIC. Return only the + * atomic range. + */ + nblks =3D 0; + while (first_atomic + nblks < last) { + if (ifs_block_is_atomic_end( + folio, ifs, + first_atomic + nblks++)) + break; + } + + if (first_atomic + nblks > last) + /* + * RWF_ATOMIC range should + * always be contained in the + * dirty range + */ + WARN_ON(true); + + *is_atomic_range =3D true; + } else { + /* + * RWF_ATOMIC range is in middle of dirty range. Return only + * the starting non-RWF_ATOMIC range + */ + nblks =3D first_atomic - start_blk; + } + } + } + *range_start =3D folio_pos(folio) + (start_blk << inode->i_blkbits); return nblks << inode->i_blkbits; } =20 static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_sta= rt, - u64 range_end) + u64 range_end, bool *is_atomic_range) { struct iomap_folio_state *ifs =3D folio->private; =20 @@ -119,10 +217,33 @@ static unsigned iomap_find_dirty_range(struct folio *= folio, u64 *range_start, return 0; =20 if (ifs) - return ifs_find_dirty_range(folio, ifs, range_start, range_end); + return ifs_find_dirty_range(folio, ifs, range_start, range_end, + is_atomic_range); + + if (folio_test_atomic(folio)) + *is_atomic_range =3D true; + return range_end - *range_start; } =20 +static bool ifs_clear_range_atomic(struct folio *folio, + struct iomap_folio_state *ifs, size_t off, size_t len) +{ + struct inode *inode =3D folio->mapping->host; + unsigned int blks_per_folio =3D i_blocks_per_folio(inode, folio); + unsigned int first_blk =3D (off >> inode->i_blkbits); + unsigned int last_blk =3D (off + len - 1) >> inode->i_blkbits; + unsigned int nr_blks =3D last_blk - first_blk + 1; + unsigned long flags; + + spin_lock_irqsave(&ifs->state_lock, flags); + bitmap_clear(ifs->state, first_blk + (blks_per_folio * 2), nr_blks); + bitmap_clear(ifs->state, last_blk + (blks_per_folio * 3), nr_blks); + spin_unlock_irqrestore(&ifs->state_lock, flags); + + return ifs_is_fully_non_atomic(folio, ifs); +} + static void ifs_clear_range_dirty(struct folio *folio, struct iomap_folio_state *ifs, size_t off, size_t len) { @@ -138,6 +259,18 @@ static void ifs_clear_range_dirty(struct folio *folio, spin_unlock_irqrestore(&ifs->state_lock, flags); } =20 +static void iomap_clear_range_atomic(struct folio *folio, size_t off, size= _t len) +{ + struct iomap_folio_state *ifs =3D folio->private; + bool fully_non_atomic =3D true; + + if (ifs) + fully_non_atomic =3D ifs_clear_range_atomic(folio, ifs, off, len); + + if (fully_non_atomic) + folio_clear_atomic(folio); +} + static void iomap_clear_range_dirty(struct folio *folio, size_t off, size_= t len) { struct iomap_folio_state *ifs =3D folio->private; @@ -146,8 +279,34 @@ static void iomap_clear_range_dirty(struct folio *foli= o, size_t off, size_t len) ifs_clear_range_dirty(folio, ifs, off, len); } =20 -static void ifs_set_range_dirty(struct folio *folio, +static void ifs_set_range_atomic(struct folio *folio, struct iomap_folio_state *ifs, size_t off, size_t len) +{ + struct inode *inode =3D folio->mapping->host; + unsigned int blks_per_folio =3D i_blocks_per_folio(inode, folio); + unsigned int first_blk =3D (off >> inode->i_blkbits); + unsigned int last_blk =3D (off + len - 1) >> inode->i_blkbits; + unsigned long flags; + + spin_lock_irqsave(&ifs->state_lock, flags); + bitmap_set(ifs->state, first_blk + (blks_per_folio * 2), 1); + bitmap_set(ifs->state, last_blk + (blks_per_folio * 3), 1); + spin_unlock_irqrestore(&ifs->state_lock, flags); +} + +static void iomap_set_range_atomic(struct folio *folio, size_t off, size_t= len) +{ + struct iomap_folio_state *ifs =3D folio->private; + + if (ifs) + ifs_set_range_atomic(folio, ifs, off, len); + + folio_set_atomic(folio); +} + +static void ifs_set_range_dirty(struct folio *folio, + struct iomap_folio_state *ifs, size_t off, + size_t len) { struct inode *inode =3D folio->mapping->host; unsigned int blks_per_folio =3D i_blocks_per_folio(inode, folio); @@ -190,8 +349,12 @@ static struct iomap_folio_state *ifs_alloc(struct inod= e *inode, * The first state tracks per-block uptodate and the * second tracks per-block dirty state. */ + + /* + * TODO: How can we only selectively allocate atomic bitmaps for ifs? + */ ifs =3D kzalloc(struct_size(ifs, state, - BITS_TO_LONGS(2 * nr_blocks)), gfp); + BITS_TO_LONGS(4 * nr_blocks)), gfp); if (!ifs) return ifs; =20 @@ -941,6 +1104,8 @@ static int iomap_write_begin(struct iomap_iter *iter, static bool __iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t = len, size_t copied, struct folio *folio) { + struct inode *inode =3D iter->inode; + flush_dcache_folio(folio); =20 /* @@ -975,9 +1140,12 @@ static bool __iomap_write_end(struct iomap_iter *iter= , loff_t pos, size_t len, =20 return false; } - folio_set_atomic(folio); - } else - folio_clear_atomic(folio); + iomap_set_range_atomic(folio, offset_in_folio(folio, pos), len); + } else { + if (folio_test_atomic(folio)) + iomap_clear_range_atomic( + folio, offset_in_folio(folio, pos), len); + } =20 return true; } @@ -1208,7 +1376,11 @@ static int iomap_write_iter(struct iomap_iter *iter,= struct iov_iter *i, written); iov_iter_revert(i, iov_iter_count(&atomic_iter)); - } + } else + iomap_set_range_atomic( + folio, offset_in_folio(folio, pos), + written); + iomap_atomic_write_cleanup(&pages, &pinned_pgs, &atomic_bvecs); } @@ -1743,7 +1915,7 @@ static int iomap_folio_mkwrite_iter(struct iomap_iter= *iter, } else { WARN_ON_ONCE(!folio_test_uptodate(folio)); folio_mark_dirty(folio); - folio_clear_atomic(folio); + iomap_clear_range_atomic(folio, 0, folio_size(folio)); } =20 return iomap_iter_advance(iter, length); @@ -1799,7 +1971,7 @@ void iomap_finish_folio_write(struct inode *inode, st= ruct folio *folio, WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <=3D 0); =20 if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending)) { - folio_clear_atomic(folio); + iomap_clear_range_atomic(folio, 0, folio_size(folio)); folio_end_writeback(folio); } } @@ -1914,6 +2086,8 @@ int iomap_writeback_folio(struct iomap_writepage_ctx = *wpc, struct folio *folio) if (!ifs) { ifs =3D ifs_alloc(inode, folio, 0); iomap_set_range_dirty(folio, 0, end_pos - pos); + if (folio_test_atomic(folio)) + iomap_set_range_atomic(folio, 0, end_pos - pos); } =20 /* @@ -1936,7 +2110,8 @@ int iomap_writeback_folio(struct iomap_writepage_ctx = *wpc, struct folio *folio) * Walk through the folio to find dirty areas to write back. */ end_aligned =3D round_up(end_pos, i_blocksize(inode)); - while ((rlen =3D iomap_find_dirty_range(folio, &pos, end_aligned))) { + while ((rlen =3D iomap_find_dirty_range(folio, &pos, end_aligned, + &wpc->is_atomic_range))) { error =3D iomap_writeback_range(wpc, folio, pos, rlen, end_pos, &wb_pending); if (error) @@ -1962,11 +2137,13 @@ int iomap_writeback_folio(struct iomap_writepage_ct= x *wpc, struct folio *folio) * bit ourselves right after unlocking the page. */ if (ifs) { - if (atomic_dec_and_test(&ifs->write_bytes_pending)) + if (atomic_dec_and_test(&ifs->write_bytes_pending)) { + iomap_clear_range_atomic(folio, 0, folio_size(folio)); folio_end_writeback(folio); + } } else { if (!wb_pending) { - folio_clear_atomic(folio); + iomap_clear_range_atomic(folio, 0, folio_size(folio)); folio_end_writeback(folio); } } diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c index c129a695ceca..678c052c6443 100644 --- a/fs/iomap/ioend.c +++ b/fs/iomap/ioend.c @@ -163,10 +163,10 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx= *wpc, struct folio *folio, unsigned int ioend_flags =3D 0; unsigned int map_len =3D min_t(u64, dirty_len, wpc->iomap.offset + wpc->iomap.length - pos); - bool is_atomic =3D folio_test_atomic(folio); int error; =20 - trace_iomap_add_to_ioend(wpc->inode, pos, dirty_len, &wpc->iomap); + trace_iomap_add_to_ioend(wpc->inode, pos, dirty_len, &wpc->iomap, + wpc->is_atomic_range); =20 WARN_ON_ONCE(!folio->private && map_len < dirty_len); =20 @@ -188,7 +188,7 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *= wpc, struct folio *folio, ioend_flags |=3D IOMAP_IOEND_DONTCACHE; if (pos =3D=3D wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY)) ioend_flags |=3D IOMAP_IOEND_BOUNDARY; - if (is_atomic) + if (wpc->is_atomic_range) ioend_flags |=3D IOMAP_IOEND_ATOMIC; =20 if (!ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) { @@ -198,7 +198,8 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *= wpc, struct folio *folio, if (error) return error; } - wpc->wb_ctx =3D ioend =3D iomap_alloc_ioend(wpc, pos, ioend_flags, is_at= omic); + wpc->wb_ctx =3D ioend =3D iomap_alloc_ioend(wpc, pos, ioend_flags, + wpc->is_atomic_range); } =20 if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff)) diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h index a61c1dae4742..14ad280c03fe 100644 --- a/fs/iomap/trace.h +++ b/fs/iomap/trace.h @@ -172,8 +172,8 @@ DEFINE_IOMAP_EVENT(iomap_iter_srcmap); =20 TRACE_EVENT(iomap_add_to_ioend, TP_PROTO(struct inode *inode, u64 pos, unsigned int dirty_len, - struct iomap *iomap), - TP_ARGS(inode, pos, dirty_len, iomap), + struct iomap *iomap, bool is_atomic), + TP_ARGS(inode, pos, dirty_len, iomap, is_atomic), TP_STRUCT__entry( __field(dev_t, dev) __field(u64, ino) @@ -185,6 +185,7 @@ TRACE_EVENT(iomap_add_to_ioend, __field(u16, type) __field(u16, flags) __field(dev_t, bdev) + __field(bool, is_atomic) ), TP_fast_assign( __entry->dev =3D inode->i_sb->s_dev; @@ -197,9 +198,11 @@ TRACE_EVENT(iomap_add_to_ioend, __entry->type =3D iomap->type; __entry->flags =3D iomap->flags; __entry->bdev =3D iomap->bdev ? iomap->bdev->bd_dev : 0; + __entry->is_atomic =3D is_atomic; ), TP_printk("dev %d:%d ino 0x%llx bdev %d:%d pos 0x%llx dirty len 0x%llx " - "addr 0x%llx offset 0x%llx length 0x%llx type %s (0x%x) flags %s (0x%x= )", + "addr 0x%llx offset 0x%llx length 0x%llx type %s (0x%x) flags %s (0x%x= ) " + "is_atomic=3D%d", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino, MAJOR(__entry->bdev), MINOR(__entry->bdev), @@ -211,7 +214,8 @@ TRACE_EVENT(iomap_add_to_ioend, __print_symbolic(__entry->type, IOMAP_TYPE_STRINGS), __entry->type, __print_flags(__entry->flags, "|", IOMAP_F_FLAGS_STRINGS), - __entry->flags) + __entry->flags, + __entry->is_atomic) ); =20 TRACE_EVENT(iomap_iter, diff --git a/fs/read_write.c b/fs/read_write.c index 7e064561cc4b..ab5d8e17d86d 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1802,8 +1802,6 @@ int generic_file_rw_checks(struct file *file_in, stru= ct file *file_out) =20 int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter) { - struct super_block *sb =3D iocb->ki_filp->f_mapping->host->i_sb; - size_t len =3D iov_iter_count(iter); =20 if (!iter_is_ubuf(iter)) @@ -1815,26 +1813,6 @@ int generic_atomic_write_valid(struct kiocb *iocb, s= truct iov_iter *iter) if (!IS_ALIGNED(iocb->ki_pos, len)) return -EINVAL; =20 - if (!(iocb->ki_flags & IOCB_DIRECT)) { - /* Some restrictions to buferred IO */ - - /* - * We only support block size =3D=3D page size - * right now. This is to avoid the following: - * 1. 4kb block atomic write marks the complete 64kb folio as - * atomic. - * 2. Other writes, dirty the whole 64kb folio. - * 3. Writeback sees the whole folio dirty and atomic and tries - * to send a 64kb atomic write, which might exceed the - * allowed size and fail. - * - * Once we support sub-page atomic write tracking, we can remove - * this restriction. - */ - if (sb->s_blocksize !=3D PAGE_SIZE) - return -EOPNOTSUPP; - } - return 0; } EXPORT_SYMBOL_GPL(generic_atomic_write_valid); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 693f3e5ad03c..033e0ba49f85 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -460,6 +460,7 @@ struct iomap_writepage_ctx { const struct iomap_writeback_ops *ops; u32 nr_folios; /* folios added to the ioend */ void *wb_ctx; /* pending writeback context */ + bool is_atomic_range; }; =20 struct iomap_ioend *iomap_init_ioend(struct inode *inode, struct bio *bio, diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index bdce0f58a77a..542e7db6b21b 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -111,7 +111,7 @@ enum pageflags { PG_swapbacked, /* Page is backed by RAM/swap */ PG_unevictable, /* Page is "unevictable" */ PG_dropbehind, /* drop pages on IO completion */ - PG_atomic, /* Page is marked atomic for buffered atomic writes */ + PG_atomic, /* Atlease 1 block in page is marked atomic for buffered atom= ic writes */ #ifdef CONFIG_MMU PG_mlocked, /* Page is vma mlocked */ #endif --=20 2.51.0 From nobody Sat Feb 7 23:24:23 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B72072FDC3C; Wed, 12 Nov 2025 11:08:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945732; cv=none; b=c+wGcg0pFbZ5z1j5pqbfLjk4eq7e7BqYh+hoHg0FEUUsBkYtmSVZ1RKXBuQs021WE/OPypdca5Ka4xGE2WrC/Uw5fc916Two1nGbb313XcsCooj+Bb4HPlTMogvQCgb2XaPDXxEL5lJLmr7Hejjfl0qmfDdjYDtnufGrtnOcm4s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762945732; c=relaxed/simple; bh=7HYb7P746kTlQ448/enNTGByoCQvc5vWYJnuJdoRvhg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=relhJAN++1YU4mwqYfwvcON0EuEZuFmrxMR4MN1Z8I004J0JO5olA+x8SGhLulRUgXjC5E7Ubs7asNOi1RK7OHJ7q3l3F7Tzud+jWsSfaUvwtZC6QJmWMdbamwvkOmOi5oMgqXJrSedzvkR+DLWXgNwdBwAdlmqQqPgMnUpco6E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=pOB7QhmM; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="pOB7QhmM" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC6nTkG028520; Wed, 12 Nov 2025 11:07:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=lS0PV8URMxmmce+9y YxA6ZwV8ssL8Pfv2N3KgoOn0Oc=; b=pOB7QhmMKumE+2Ewk0J81JPbGPU/jfzXZ lQgYDaIKG+gR9FpX3LyQ1Df17NPun1GTooeAH52oDLyC3C+QHRc9BFPG+AdHTtRK wd3snm2oNdwwbMjZWANEa8pjXp4UVeYb3I35p1dPa9FkbsNLzBVusbf8HneBkgCu WWdXYPdOb0shtMsvVEPkhTRbJLESYAZuFF9BVGl2dVaYR7atgSYipSdQ0/6K5ITD 8jjuzmu7DRI6F2rDTpTwvxzmkJ38HY6rb5/MRfSy6cIMNMotYCAvL+u/xFVOKmiv 7lCR91WMcv+TU7Cf9jENRnTa+9o3RoEiquKFyM7RakWr9H/okbCow== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aa5tjymnq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:29 +0000 (GMT) Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 5ACB4SYB022695; Wed, 12 Nov 2025 11:07:28 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4aa5tjymnn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:28 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 5AC9QpsE014859; Wed, 12 Nov 2025 11:07:28 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4aahpk7qe4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Nov 2025 11:07:27 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 5ACB7Pg551118504 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Nov 2025 11:07:26 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D83932004F; Wed, 12 Nov 2025 11:07:25 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F267120043; Wed, 12 Nov 2025 11:07:20 +0000 (GMT) Received: from li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com (unknown [9.124.210.190]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 12 Nov 2025 11:07:20 +0000 (GMT) From: Ojaswin Mujoo To: Christian Brauner , djwong@kernel.org, ritesh.list@gmail.com, john.g.garry@oracle.com, tytso@mit.edu, willy@infradead.org, dchinner@redhat.com, hch@lst.de Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jack@suse.cz, nilay@linux.ibm.com, martin.petersen@oracle.com, rostedt@goodmis.org, axboe@kernel.dk, linux-block@vger.kernel.org, linux-trace-kernel@vger.kernel.org Subject: [RFC PATCH 8/8] xfs: Lift the bs == ps restriction for HW buffered atomic writes Date: Wed, 12 Nov 2025 16:36:11 +0530 Message-ID: <0f1f53d6fad8c25118b0348b0cb91dc2e4ecf456.1762945505.git.ojaswin@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ADQK4HD0lQecC7xUBvMAVAknMXJGVpEF X-Proofpoint-ORIG-GUID: iQAnc7Mvlf5AFos0tKRg3f3C_4ix3pmO X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTA4MDA5OSBTYWx0ZWRfX52SSPqtKtk4i cOa+6iOjuAAo2h3dK97pN8YCSCEGsYjvxno1zsFcUvQEs8u8QFoS8EVe4pT0zpMsHpPdVwbWA1T 0zo5RABNZRExM0hTMZPRyOam0cl9R1euNDPgYUagq4a1jUJQXcD4MdbGR+DPYm19JsbetcQDbLp IQpdcmM02Hdswz10zt3coYGH2sQxqol1aaRs3Nno9T2vFNe2dBPp59Ks7j3Pe37MrPuoZmS3vVq +ks1ZnAEyrV5uifEEGDaPOsLzfMeDcTgtT/VBdftWRhkKkfyZsXRrAFwJW7wHSXcMsDjNIQg+am ejhzNtQRfiKZ7AuNzA5NnblMkDmXntu5WTp1V7TrOlqMmUnX3jgGAQAU45KJ5jLteLGR1AUaDQy OsFaBLrhNGF97igEbAhLryJJS6zuFA== X-Authority-Analysis: v=2.4 cv=V6xwEOni c=1 sm=1 tr=0 ts=69146a71 cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=6UeiqGixMTsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=PLfzAVb2A4JWLM4niBMA:9 a=cPQSjfK2_nFv0Q5t_7PE:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-12_03,2025-11-11_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 adultscore=0 malwarescore=0 impostorscore=0 suspectscore=0 priorityscore=1501 phishscore=0 bulkscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511080099 Content-Type: text/plain; charset="utf-8" Now that we support bs < ps for HW atomic writes, lift this restirction fro= m XFS statx reporting Signed-off-by: Ojaswin Mujoo --- fs/xfs/xfs_iops.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 67d370947d95..5bd31aacf514 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -622,10 +622,9 @@ xfs_get_atomic_write_min( return bs; } /* - * Buffered IO only supports hw single block atomic writes and bs =3D=3D = ps - * configurations. + * Buffered IO only supports hw single block atomic writes */ - if (xfs_inode_can_hw_atomic_write(ip) && bs =3D=3D PAGE_SIZE) + if (xfs_inode_can_hw_atomic_write(ip)) return bs; =20 return 0; @@ -661,10 +660,9 @@ xfs_get_atomic_write_max( return XFS_FSB_TO_B(mp, mp->m_groups[XG_TYPE_AG].awu_max); } /* - * Buffered IO only supports hw single block atomic writes and bs =3D=3D = ps - * configurations. + * Buffered IO only supports hw single block atomic writes */ - if (xfs_inode_can_hw_atomic_write(ip) && bs =3D=3D PAGE_SIZE) + if (xfs_inode_can_hw_atomic_write(ip)) return bs; =20 return 0; --=20 2.51.0