[RFC PATCH 2/8] mm: Add PG_atomic

Ojaswin Mujoo posted 8 patches 2 months, 3 weeks ago
[RFC PATCH 2/8] mm: Add PG_atomic
Posted by Ojaswin Mujoo 2 months, 3 weeks ago
From: John Garry <john.g.garry@oracle.com>

Add page flag PG_atomic, meaning that a folio needs to be written back
atomically. This will be used by for handling RWF_ATOMIC buffered IO
in upcoming patches.

Co-developed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 include/linux/page-flags.h     | 5 +++++
 include/trace/events/mmflags.h | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0091ad1986bf..bdce0f58a77a 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -111,6 +111,7 @@ enum pageflags {
 	PG_swapbacked,		/* Page is backed by RAM/swap */
 	PG_unevictable,		/* Page is "unevictable"  */
 	PG_dropbehind,		/* drop pages on IO completion */
+	PG_atomic,		/* Page is marked atomic for buffered atomic writes */
 #ifdef CONFIG_MMU
 	PG_mlocked,		/* Page is vma mlocked */
 #endif
@@ -644,6 +645,10 @@ FOLIO_FLAG(unevictable, FOLIO_HEAD_PAGE)
 	__FOLIO_CLEAR_FLAG(unevictable, FOLIO_HEAD_PAGE)
 	FOLIO_TEST_CLEAR_FLAG(unevictable, FOLIO_HEAD_PAGE)
 
+FOLIO_FLAG(atomic, FOLIO_HEAD_PAGE)
+	__FOLIO_CLEAR_FLAG(atomic, FOLIO_HEAD_PAGE)
+	FOLIO_TEST_CLEAR_FLAG(atomic, FOLIO_HEAD_PAGE)
+
 #ifdef CONFIG_MMU
 FOLIO_FLAG(mlocked, FOLIO_HEAD_PAGE)
 	__FOLIO_CLEAR_FLAG(mlocked, FOLIO_HEAD_PAGE)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index aa441f593e9a..a8294f6146a5 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -159,7 +159,8 @@ TRACE_DEFINE_ENUM(___GFP_LAST_BIT);
 	DEF_PAGEFLAG_NAME(reclaim),					\
 	DEF_PAGEFLAG_NAME(swapbacked),					\
 	DEF_PAGEFLAG_NAME(unevictable),					\
-	DEF_PAGEFLAG_NAME(dropbehind)					\
+	DEF_PAGEFLAG_NAME(dropbehind),					\
+	DEF_PAGEFLAG_NAME(atomic)					\
 IF_HAVE_PG_MLOCK(mlocked)						\
 IF_HAVE_PG_HWPOISON(hwpoison)						\
 IF_HAVE_PG_IDLE(idle)							\
-- 
2.51.0
Re: [RFC PATCH 2/8] mm: Add PG_atomic
Posted by Matthew Wilcox 2 months, 3 weeks ago
On Wed, Nov 12, 2025 at 04:36:05PM +0530, Ojaswin Mujoo wrote:
> From: John Garry <john.g.garry@oracle.com>
> 
> Add page flag PG_atomic, meaning that a folio needs to be written back
> atomically. This will be used by for handling RWF_ATOMIC buffered IO
> in upcoming patches.

Page flags are a precious resource.  I'm not thrilled about allocating one
to this rather niche usecase.  Wouldn't this be more aptly a flag on the
address_space rather than the folio?  ie if we're doing this kind of write
to a file, aren't most/all of the writes to the file going to be atomic?
Re: [RFC PATCH 2/8] mm: Add PG_atomic
Posted by Ritesh Harjani (IBM) 2 months, 3 weeks ago
Matthew Wilcox <willy@infradead.org> writes:

> On Wed, Nov 12, 2025 at 04:36:05PM +0530, Ojaswin Mujoo wrote:
>> From: John Garry <john.g.garry@oracle.com>
>> 
>> Add page flag PG_atomic, meaning that a folio needs to be written back
>> atomically. This will be used by for handling RWF_ATOMIC buffered IO
>> in upcoming patches.
>
> Page flags are a precious resource.  I'm not thrilled about allocating one
> to this rather niche usecase.  Wouldn't this be more aptly a flag on the
> address_space rather than the folio?  ie if we're doing this kind of write
> to a file, aren't most/all of the writes to the file going to be atomic?

As of today the atomic writes functionality works on the per-write
basis (given it's a per-write characteristic). 

So, we can have two types of dirty folios sitting in the page cache of
an inode. Ones which were done using atomic buffered I/O flag
(RWF_ATOMIC) and the other ones which were non-atomic writes. Hence a
need of a folio flag to distinguish between the two writes.

-ritesh
Re: [RFC PATCH 2/8] mm: Add PG_atomic
Posted by Matthew Wilcox 2 months, 3 weeks ago
On Fri, Nov 14, 2025 at 10:30:09AM +0530, Ritesh Harjani wrote:
> Matthew Wilcox <willy@infradead.org> writes:
> 
> > On Wed, Nov 12, 2025 at 04:36:05PM +0530, Ojaswin Mujoo wrote:
> >> From: John Garry <john.g.garry@oracle.com>
> >> 
> >> Add page flag PG_atomic, meaning that a folio needs to be written back
> >> atomically. This will be used by for handling RWF_ATOMIC buffered IO
> >> in upcoming patches.
> >
> > Page flags are a precious resource.  I'm not thrilled about allocating one
> > to this rather niche usecase.  Wouldn't this be more aptly a flag on the
> > address_space rather than the folio?  ie if we're doing this kind of write
> > to a file, aren't most/all of the writes to the file going to be atomic?
> 
> As of today the atomic writes functionality works on the per-write
> basis (given it's a per-write characteristic). 
> 
> So, we can have two types of dirty folios sitting in the page cache of
> an inode. Ones which were done using atomic buffered I/O flag
> (RWF_ATOMIC) and the other ones which were non-atomic writes. Hence a
> need of a folio flag to distinguish between the two writes.

I know, but is this useful?  AFAIK, the files where Postgres wants to
use this functionality are the log files, and all writes to the log
files will want to use the atomic functionality.  What's the usecase
for "I want to mix atomic and non-atomic buffered writes to this file"?
Re: [RFC PATCH 2/8] mm: Add PG_atomic
Posted by Ritesh Harjani (IBM) 2 months, 2 weeks ago
Matthew Wilcox <willy@infradead.org> writes:

> On Fri, Nov 14, 2025 at 10:30:09AM +0530, Ritesh Harjani wrote:
>> Matthew Wilcox <willy@infradead.org> writes:
>> 
>> > On Wed, Nov 12, 2025 at 04:36:05PM +0530, Ojaswin Mujoo wrote:
>> >> From: John Garry <john.g.garry@oracle.com>
>> >> 
>> >> Add page flag PG_atomic, meaning that a folio needs to be written back
>> >> atomically. This will be used by for handling RWF_ATOMIC buffered IO
>> >> in upcoming patches.
>> >
>> > Page flags are a precious resource.  I'm not thrilled about allocating one
>> > to this rather niche usecase.  Wouldn't this be more aptly a flag on the
>> > address_space rather than the folio?  ie if we're doing this kind of write
>> > to a file, aren't most/all of the writes to the file going to be atomic?
>> 
>> As of today the atomic writes functionality works on the per-write
>> basis (given it's a per-write characteristic). 
>> 
>> So, we can have two types of dirty folios sitting in the page cache of
>> an inode. Ones which were done using atomic buffered I/O flag
>> (RWF_ATOMIC) and the other ones which were non-atomic writes. Hence a
>> need of a folio flag to distinguish between the two writes.
>
> I know, but is this useful?  AFAIK, the files where Postgres wants to
> use this functionality are the log files, and all writes to the log
> files will want to use the atomic functionality.  What's the usecase
> for "I want to mix atomic and non-atomic buffered writes to this file"?

Actually this goes back to the design of how we added support of atomic
writes during DIO. So during the initial design phase we decided that
this need not be a per-inode attribute or an open flag, but this is a
per write I/O characteristic.

So as per the current design, we don't have any open flag or a
persistent inode attribute which says kernel should permit _only_ atomic
writes I/O to this file. Instead what we support today is DIO atomic
writes using RWF_ATOMIC flag in pwritev2 syscall.

Having said that there can be several policy decision that could still be
discussed e.g. make sure any previous dirty data is flushed to disk when a
buffered atomic write request is made to an inode. 
Maybe that would allow us to just keep a flag at the address space level
because we would never have a mix of atomic and non-atomic page cache
pages.

IMO, I agree that folio flag is a scarce resource, but I guess the
initial goal of this patch series is mainly to discuss the initial
design of the core feature i.e. how buffered atomic writes should look
in Linux kernel. I agree and point taken that we should be careful with
using folio flags, but let's see how the design shapes up maybe? - that
will help us understand whether a folio flag is really required or maybe
an address space flag would do. 

-ritesh
Re: [RFC PATCH 2/8] mm: Add PG_atomic
Posted by Dave Chinner 2 months, 2 weeks ago
On Tue, Nov 18, 2025 at 09:47:42PM +0530, Ritesh Harjani wrote:
> Matthew Wilcox <willy@infradead.org> writes:
> 
> > On Fri, Nov 14, 2025 at 10:30:09AM +0530, Ritesh Harjani wrote:
> >> Matthew Wilcox <willy@infradead.org> writes:
> >> 
> >> > On Wed, Nov 12, 2025 at 04:36:05PM +0530, Ojaswin Mujoo wrote:
> >> >> From: John Garry <john.g.garry@oracle.com>
> >> >> 
> >> >> Add page flag PG_atomic, meaning that a folio needs to be written back
> >> >> atomically. This will be used by for handling RWF_ATOMIC buffered IO
> >> >> in upcoming patches.
> >> >
> >> > Page flags are a precious resource.  I'm not thrilled about allocating one
> >> > to this rather niche usecase.  Wouldn't this be more aptly a flag on the
> >> > address_space rather than the folio?  ie if we're doing this kind of write
> >> > to a file, aren't most/all of the writes to the file going to be atomic?
> >> 
> >> As of today the atomic writes functionality works on the per-write
> >> basis (given it's a per-write characteristic). 
> >> 
> >> So, we can have two types of dirty folios sitting in the page cache of
> >> an inode. Ones which were done using atomic buffered I/O flag
> >> (RWF_ATOMIC) and the other ones which were non-atomic writes. Hence a
> >> need of a folio flag to distinguish between the two writes.
> >
> > I know, but is this useful?  AFAIK, the files where Postgres wants to
> > use this functionality are the log files, and all writes to the log
> > files will want to use the atomic functionality.  What's the usecase
> > for "I want to mix atomic and non-atomic buffered writes to this file"?
> 
> Actually this goes back to the design of how we added support of atomic
> writes during DIO. So during the initial design phase we decided that
> this need not be a per-inode attribute or an open flag, but this is a
> per write I/O characteristic.
> 
> So as per the current design, we don't have any open flag or a
> persistent inode attribute which says kernel should permit _only_ atomic
> writes I/O to this file. Instead what we support today is DIO atomic
> writes using RWF_ATOMIC flag in pwritev2 syscall.

Which, if we can't do with REQ_ATOMIC IO, we fall back to the
filesystem COW IO path to provide RWF_ATOMIC semantics without
needing to involve the page cache.

IOWs, DIO REQ_ATOMIC writes are simply a fast path for the atomic
COW IO path inherent in COW-capable filesystems.

This is no different for buffered RWF_ATOMIC writes. We need to
ingest the data into the page cache as a COW operation, then at
writeback time we optimise away the COW operations if REQ_ATOMIC IO
can be performed instead.

Using COW for buffered RWF_ATOMIC writes means don't need to involve
the page caceh at all - this can all be implemented at the
filesystem extent mapping and iomap layers....

> Having said that there can be several policy decision that could still be
> discussed e.g. make sure any previous dirty data is flushed to disk when a
> buffered atomic write request is made to an inode. 

We don't need to care about mixed dirty non-atomic/atomic data on the
same file if REQ_ATOMIC is used as an optimisation for COW-based
atomic IO.  Filesystems like XFS naturally separate COW and non-COW
extents. If we combine non-atomic and atomic data into a single
atomic update at writeback(be it COW or REQ_ATOMIC IO), then we
have still honoured the requested atomic semantics required to
persist the data. It just doesn't matter.

IMO, trying to hack atomic physical IO semantics through the page
cache creates all sorts of issues that simply don't exist when we
use the atomic overwrite paths present in modern COW capable
filesystems....

-Dave.
-- 
Dave Chinner
david@fromorbit.com
Re: [RFC PATCH 2/8] mm: Add PG_atomic
Posted by David Hildenbrand (Red Hat) 2 months, 3 weeks ago
On 12.11.25 16:56, Matthew Wilcox wrote:
> On Wed, Nov 12, 2025 at 04:36:05PM +0530, Ojaswin Mujoo wrote:
>> From: John Garry <john.g.garry@oracle.com>
>>
>> Add page flag PG_atomic, meaning that a folio needs to be written back
>> atomically. This will be used by for handling RWF_ATOMIC buffered IO
>> in upcoming patches.
> 
> Page flags are a precious resource.  I'm not thrilled about allocating one
> to this rather niche usecase.

Fully agreed.

-- 
Cheers

David