[RFC PATCH 0/8] xfs: single block atomic writes for buffered IO

Ojaswin Mujoo posted 8 patches 2 months, 3 weeks ago
.../filesystems/ext4/atomic_writes.rst        |   4 +-
block/bdev.c                                  |   7 +-
fs/ext4/inode.c                               |   9 +-
fs/iomap/buffered-io.c                        | 395 ++++++++++++++++--
fs/iomap/ioend.c                              |  21 +-
fs/iomap/trace.h                              |  12 +-
fs/read_write.c                               |   3 -
fs/stat.c                                     |  33 +-
fs/xfs/xfs_file.c                             |   9 +-
fs/xfs/xfs_iops.c                             | 127 +++---
fs/xfs/xfs_iops.h                             |   6 +-
include/linux/fs.h                            |   3 +-
include/linux/iomap.h                         |   3 +
include/linux/page-flags.h                    |   5 +
include/trace/events/mmflags.h                |   3 +-
include/trace/misc/fs.h                       |   3 +-
include/uapi/linux/stat.h                     |  10 +-
tools/include/uapi/linux/stat.h               |  10 +-
.../trace/beauty/include/uapi/linux/stat.h    |  10 +-
19 files changed, 551 insertions(+), 122 deletions(-)
[RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Ojaswin Mujoo 2 months, 3 weeks ago
This patch adds support to perform single block RWF_ATOMIC writes for
iomap xfs buffered IO. This builds upon the inital RFC shared by John
Garry last year [1]. Most of the details are present in the respective 
commit messages but I'd mention some of the design points below:

1. The first 4 patches introduce the statx and iomap plubming and page
flags to add basic atomic writes support to buffered IO. However, there
are still 2 key restrictions that apply:

FIRST: If the user buffer of atomic write crosses page boundary, there's a
possibility of short write, example if 1 user page could not be faulted or got
reclaimed before the copy operation. For now don't allow such a scenario by
ensuring user buffer is page aligned. This way either the full write goes
through or nothing does. This is also discussed in Mathew Wilcox's comment here
[2]

This is lifted in patch 5. The approach we took was to:
 1. pin the user pages
 2. Create a BVEC out of the struct page to pass to
    copy_folio_from_iter_atomic() rather than the USER backed iter. We
    don't use the user iter directly because the pinned user page could
    still get unmapped from the process, leading to short writes.

This approach allows us to only proceed if we are sure we will not have a short
copy.

SECOND: We only support block size == page size buf-io atomic writes.
This is to avoid the following scenario:
 1. 4kb block atomic write marks the complete 64kb folio as
    atomic.
 2. Other writes, dirty the whole 64kb folio.
 3. Writeback sees the whole folio dirty and atomic and tries
    to send a 64kb atomic write, which might exceed the
    allowed atomic write size and fail.

Patch 7 adds support for sub-page atomic write tracking to remove this
restriction.  We do this by adding 2 more bitmaps to ifs to track atomic
write start and end.

Lastly, a non atomic write over an atomic write will remove the atomic
guarantee. Userspace is expected to make sure to sync the data to disk
after an atomic write before performing any overwrites.

This series has survived -g quick xfstests and I'll be continuing to
test it.  Just wanted to put out the RFC to get some reviews on the
design and suggestions on any better approaches.

[1] https://lore.kernel.org/all/20240422143923.3927601-1-john.g.garry@oracle.com/
[2] https://lore.kernel.org/all/ZiZ8XGZz46D3PRKr@casper.infradead.org/

Thanks,
Ojaswin

John Garry (2):
  fs: Rename STATX{_ATTR}_WRITE_ATOMIC -> STATX{_ATTR}_WRITE_ATOMIC_DIO
  mm: Add PG_atomic

Ojaswin Mujoo (6):
  fs: Add initial buffered atomic write support info to statx
  iomap: buffered atomic write support
  iomap: pin pages for RWF_ATOMIC buffered write
  xfs: Report atomic write min and max for buf io as well
  iomap: Add bs<ps buffered atomic writes support
  xfs: Lift the bs == ps restriction for HW buffered atomic writes

 .../filesystems/ext4/atomic_writes.rst        |   4 +-
 block/bdev.c                                  |   7 +-
 fs/ext4/inode.c                               |   9 +-
 fs/iomap/buffered-io.c                        | 395 ++++++++++++++++--
 fs/iomap/ioend.c                              |  21 +-
 fs/iomap/trace.h                              |  12 +-
 fs/read_write.c                               |   3 -
 fs/stat.c                                     |  33 +-
 fs/xfs/xfs_file.c                             |   9 +-
 fs/xfs/xfs_iops.c                             | 127 +++---
 fs/xfs/xfs_iops.h                             |   6 +-
 include/linux/fs.h                            |   3 +-
 include/linux/iomap.h                         |   3 +
 include/linux/page-flags.h                    |   5 +
 include/trace/events/mmflags.h                |   3 +-
 include/trace/misc/fs.h                       |   3 +-
 include/uapi/linux/stat.h                     |  10 +-
 tools/include/uapi/linux/stat.h               |  10 +-
 .../trace/beauty/include/uapi/linux/stat.h    |  10 +-
 19 files changed, 551 insertions(+), 122 deletions(-)

-- 
2.51.0
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Dave Chinner 2 months, 3 weeks ago
On Wed, Nov 12, 2025 at 04:36:03PM +0530, Ojaswin Mujoo wrote:
> This patch adds support to perform single block RWF_ATOMIC writes for
> iomap xfs buffered IO. This builds upon the inital RFC shared by John
> Garry last year [1]. Most of the details are present in the respective 
> commit messages but I'd mention some of the design points below:

What is the use case for this functionality? i.e. what is the
reason for adding all this complexity?

-Dave.
-- 
Dave Chinner
david@fromorbit.com
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Christoph Hellwig 2 months, 3 weeks ago
On Thu, Nov 13, 2025 at 08:56:56AM +1100, Dave Chinner wrote:
> On Wed, Nov 12, 2025 at 04:36:03PM +0530, Ojaswin Mujoo wrote:
> > This patch adds support to perform single block RWF_ATOMIC writes for
> > iomap xfs buffered IO. This builds upon the inital RFC shared by John
> > Garry last year [1]. Most of the details are present in the respective 
> > commit messages but I'd mention some of the design points below:
> 
> What is the use case for this functionality? i.e. what is the
> reason for adding all this complexity?

Seconded.  The atomic code has a lot of complexity, and further mixing
it with buffered I/O makes this even worse.  We'd need a really important
use case to even consider it.
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Ritesh Harjani (IBM) 2 months, 3 weeks ago
Christoph Hellwig <hch@lst.de> writes:

> On Thu, Nov 13, 2025 at 08:56:56AM +1100, Dave Chinner wrote:
>> On Wed, Nov 12, 2025 at 04:36:03PM +0530, Ojaswin Mujoo wrote:
>> > This patch adds support to perform single block RWF_ATOMIC writes for
>> > iomap xfs buffered IO. This builds upon the inital RFC shared by John
>> > Garry last year [1]. Most of the details are present in the respective 
>> > commit messages but I'd mention some of the design points below:
>> 
>> What is the use case for this functionality? i.e. what is the
>> reason for adding all this complexity?
>
> Seconded.  The atomic code has a lot of complexity, and further mixing
> it with buffered I/O makes this even worse.  We'd need a really important
> use case to even consider it.

I agree this should have been in the cover letter itself. 

I believe the reason for adding this functionality was also discussed at
LSFMM too...  

For e.g. https://lwn.net/Articles/974578/ goes in depth and talks about
Postgres folks looking for this, since PostgreSQL databases uses
buffered I/O for their database writes.

-ritesh
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Dave Chinner 2 months, 3 weeks ago
On Thu, Nov 13, 2025 at 11:12:49AM +0530, Ritesh Harjani wrote:
> Christoph Hellwig <hch@lst.de> writes:
> 
> > On Thu, Nov 13, 2025 at 08:56:56AM +1100, Dave Chinner wrote:
> >> On Wed, Nov 12, 2025 at 04:36:03PM +0530, Ojaswin Mujoo wrote:
> >> > This patch adds support to perform single block RWF_ATOMIC writes for
> >> > iomap xfs buffered IO. This builds upon the inital RFC shared by John
> >> > Garry last year [1]. Most of the details are present in the respective 
> >> > commit messages but I'd mention some of the design points below:
> >> 
> >> What is the use case for this functionality? i.e. what is the
> >> reason for adding all this complexity?
> >
> > Seconded.  The atomic code has a lot of complexity, and further mixing
> > it with buffered I/O makes this even worse.  We'd need a really important
> > use case to even consider it.
> 
> I agree this should have been in the cover letter itself. 
> 
> I believe the reason for adding this functionality was also discussed at
> LSFMM too...  
> 
> For e.g. https://lwn.net/Articles/974578/ goes in depth and talks about
> Postgres folks looking for this, since PostgreSQL databases uses
> buffered I/O for their database writes.

Pointing at a discussion about how "this application has some ideas
on how it can maybe use it someday in the future" isn't a
particularly good justification. This still sounds more like a
research project than something a production system needs right now.

Why didn't you use the existing COW buffered write IO path to
implement atomic semantics for buffered writes? The XFS
functionality is already all there, and it doesn't require any
changes to the page cache or iomap to support...

-Dave.
-- 
Dave Chinner
david@fromorbit.com
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Ojaswin Mujoo 2 months, 3 weeks ago
On Thu, Nov 13, 2025 at 09:32:11PM +1100, Dave Chinner wrote:
> On Thu, Nov 13, 2025 at 11:12:49AM +0530, Ritesh Harjani wrote:
> > Christoph Hellwig <hch@lst.de> writes:
> > 
> > > On Thu, Nov 13, 2025 at 08:56:56AM +1100, Dave Chinner wrote:
> > >> On Wed, Nov 12, 2025 at 04:36:03PM +0530, Ojaswin Mujoo wrote:
> > >> > This patch adds support to perform single block RWF_ATOMIC writes for
> > >> > iomap xfs buffered IO. This builds upon the inital RFC shared by John
> > >> > Garry last year [1]. Most of the details are present in the respective 
> > >> > commit messages but I'd mention some of the design points below:
> > >> 
> > >> What is the use case for this functionality? i.e. what is the
> > >> reason for adding all this complexity?
> > >
> > > Seconded.  The atomic code has a lot of complexity, and further mixing
> > > it with buffered I/O makes this even worse.  We'd need a really important
> > > use case to even consider it.
> > 
> > I agree this should have been in the cover letter itself. 
> > 
> > I believe the reason for adding this functionality was also discussed at
> > LSFMM too...  
> > 
> > For e.g. https://lwn.net/Articles/974578/ goes in depth and talks about
> > Postgres folks looking for this, since PostgreSQL databases uses
> > buffered I/O for their database writes.
> 
> Pointing at a discussion about how "this application has some ideas
> on how it can maybe use it someday in the future" isn't a
> particularly good justification. This still sounds more like a
> research project than something a production system needs right now.

Hi Dave, Christoph,

There were some discussions around use cases for buffered atomic writes
in the previous LSFMM covered by LWN here [1]. AFAIK, there are 
databases that recommend/prefer buffered IO over direct IO. As mentioned
in the article, MongoDB being one that supports both but recommends
buffered IO. Further, many DBs support both direct IO and buffered IO
well and it may not be fair to force them to stick to direct IO to get
the benefits of atomic writes.

[1] https://lwn.net/Articles/1016015/
> 
> Why didn't you use the existing COW buffered write IO path to
> implement atomic semantics for buffered writes? The XFS
> functionality is already all there, and it doesn't require any
> changes to the page cache or iomap to support...

This patch set focuses on HW accelerated single block atomic writes with
buffered IO, to get some early reviews on the core design.

Just like we did for direct IO atomic writes, the software fallback with
COW and multi block support can be added eventually.

Regards,
ojaswin

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Dave Chinner 2 months, 3 weeks ago
On Fri, Nov 14, 2025 at 02:50:25PM +0530, Ojaswin Mujoo wrote:
> On Thu, Nov 13, 2025 at 09:32:11PM +1100, Dave Chinner wrote:
> > On Thu, Nov 13, 2025 at 11:12:49AM +0530, Ritesh Harjani wrote:
> > > Christoph Hellwig <hch@lst.de> writes:
> > > 
> > > > On Thu, Nov 13, 2025 at 08:56:56AM +1100, Dave Chinner wrote:
> > > >> On Wed, Nov 12, 2025 at 04:36:03PM +0530, Ojaswin Mujoo wrote:
> > > >> > This patch adds support to perform single block RWF_ATOMIC writes for
> > > >> > iomap xfs buffered IO. This builds upon the inital RFC shared by John
> > > >> > Garry last year [1]. Most of the details are present in the respective 
> > > >> > commit messages but I'd mention some of the design points below:
> > > >> 
> > > >> What is the use case for this functionality? i.e. what is the
> > > >> reason for adding all this complexity?
> > > >
> > > > Seconded.  The atomic code has a lot of complexity, and further mixing
> > > > it with buffered I/O makes this even worse.  We'd need a really important
> > > > use case to even consider it.
> > > 
> > > I agree this should have been in the cover letter itself. 
> > > 
> > > I believe the reason for adding this functionality was also discussed at
> > > LSFMM too...  
> > > 
> > > For e.g. https://lwn.net/Articles/974578/ goes in depth and talks about
> > > Postgres folks looking for this, since PostgreSQL databases uses
> > > buffered I/O for their database writes.
> > 
> > Pointing at a discussion about how "this application has some ideas
> > on how it can maybe use it someday in the future" isn't a
> > particularly good justification. This still sounds more like a
> > research project than something a production system needs right now.
> 
> Hi Dave, Christoph,
> 
> There were some discussions around use cases for buffered atomic writes
> in the previous LSFMM covered by LWN here [1]. AFAIK, there are 
> databases that recommend/prefer buffered IO over direct IO. As mentioned
> in the article, MongoDB being one that supports both but recommends
> buffered IO. Further, many DBs support both direct IO and buffered IO
> well and it may not be fair to force them to stick to direct IO to get
> the benefits of atomic writes.
> 
> [1] https://lwn.net/Articles/1016015/

You are quoting a discussion about atomic writes that was
held without any XFS developers present. Given how XFS has driven
atomic write functionality so far, XFS developers might have some
..... opinions about how buffered atomic writes in XFS...

Indeed, go back to the 2024 buffered atomic IO LSFMM discussion,
where there were XFS developers present. That's the discussion that
Ritesh referenced, so you should be aware of it.

https://lwn.net/Articles/974578/

Back then I talked about how atomic writes made no sense as
-writeback IO- given the massive window for anything else to modify
the data in the page cache. There is no guarantee that what the
application wrote in the syscall is what gets written to disk with
writeback IO. i.e. anything that can access the page cache can
"tear" application data that is staged as "atomic data" for later
writeback.

IOWs, the concept of atomic writes for writeback IO makes almost no
sense at all - dirty data at rest in the page cache is not protected
against 3rd party access or modification. The "atomic data IO"
semantics can only exist in the submitting IO context where
exclusive access to the user data can be guaranteed.

IMO, the only way semantics that makes sense for buffered atomic
writes through the page cache is write-through IO. The "atomic"
context is related directly to user data provided at IO submission,
and so IO submitted must guarantee exactly that data is being
written to disk in that IO.

IOWs, we have to guarantee exclusive access between the data copy-in
and the pages being marked for writeback. The mapping needs to be
marked as using stable pages to prevent anyone else changing the
cached data whilst it has an atomic IO pending on it.

That means folios covering atomic IO ranges do not sit in the page
cache in a dirty state - they *must* immediately transition to the
writeback state before the folio is unlocked so that *nothing else
can modify them* before the physical REQ_ATOMIC IO is submitted and
completed.

If we've got the folios marked as writeback, we can pack them
immediately into a bio and submit the IO (e.g. via the iomap DIO
code). There is no need to involve the buffered IO writeback path
here; we've already got the folios at hand and in the right state
for IO. Once the IO is done, we end writeback on them and they
remain clean in the page caceh for anyone else to access and
modify...

This gives us the same physical IO semantics for buffered and direct
atomic IO, and it allows the same software fallbacks for larger IO
to be used as well.

> > Why didn't you use the existing COW buffered write IO path to
> > implement atomic semantics for buffered writes? The XFS
> > functionality is already all there, and it doesn't require any
> > changes to the page cache or iomap to support...
> 
> This patch set focuses on HW accelerated single block atomic writes with
> buffered IO, to get some early reviews on the core design.

What hardware acceleration? Hardware atomic writes are do not make
IO faster; they only change IO failure semantics in certain corner
cases. Making buffered writeback IO use REQ_ATOMIC does not change
the failure semantics of buffered writeback from the point of view
of an application; the applicaiton still has no idea just how much
data or what files lost data whent eh system crashes.

Further, writeback does not retain application write ordering, so
the application also has no control over the order that structured
data is updated on physical media.  Hence if the application needs
specific IO ordering for crash recovery (e.g. to avoid using a WAL)
it cannot use background buffered writeback for atomic writes
because that does not guarantee ordering.

What happens when you do two atomic buffered writes to the same file
range? The second on hits the page cache, so now the crash recovery
semantic is no longer "old or new", it's "some random older version
or new". If the application rewrites a range frequently enough,
on-disk updates could skip dozens of versions between "old" and
"new", whilst other ranges of the file move one version at a time.
The application has -zero control- of this behaviour because it is
background writeback that determines when something gets written to
disk, not the application.

IOWs, the only way to guarantee single version "old or new" atomic
buffered overwrites for any given write would be to force flushing
of the data post-write() completion.  That means either O_DSYNC,
fdatasync() or sync_file_range(). And this turns the atomic writes
into -write-through- IO, not write back IO...

> Just like we did for direct IO atomic writes, the software fallback with
> COW and multi block support can be added eventually.

If the reason for this functionality is "maybe someone
can use it in future", then you're not implementing this
functionality to optimise an existing workload. It's a research
project looking for a user.

Work with the database engineers to build a buffered atomic write
based engine that implements atomic writes with RWF_DSYNC.
Make it work, and optimise it to be competitive with existing
database engines, than then show how much faster it is using
RWF_ATOMIC buffered writes.

Alternatively - write an algorithm that assumes the filesystem is
using COW for overwrites, and optimise the data integrity algorithm
based on this knowledge. e.g. use always-cow mode on XFS, or just
optimise for normal bcachefs or btrfs buffered writes. Use O_DSYNC
when completion to submission ordering is required. Now you have
an application algorithm that is optimised for old-or-new behaviour,
and that can then be acclerated on overwrite-in-place capable
filesystems by using a direct-to-hw REQ_ATOMIC overwrite to provide
old-or-new semantics instead of using COW.

Yes, there are corner cases - partial writeback, fragmented files,
etc - where data will a mix of old and new when using COW without
RWF_DSYNC.  Those are the the cases that RWF_ATOMIC needs to
mitigate, but we don't need whacky page cache and writeback stuff to
implement RWF_ATOMIC semantics in COW capable filesystems.

i.e. enhance the applicaitons to take advantage of native COW
old-or-new data semantics for buffered writes, then we can look at
direct-to-hw fast paths to optimise those algorithms.

Trying to go direct-to-hw first without having any clue of how
applications are going to use such functionality is backwards.
Design the applicaiton level code that needs highly performant
old-or-new buffered write guarantees, then we can optimise the data
paths for it...

-Dave.
-- 
Dave Chinner
david@fromorbit.com
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Ojaswin Mujoo 2 months, 2 weeks ago
On Sun, Nov 16, 2025 at 07:11:50PM +1100, Dave Chinner wrote:
> On Fri, Nov 14, 2025 at 02:50:25PM +0530, Ojaswin Mujoo wrote:
> > On Thu, Nov 13, 2025 at 09:32:11PM +1100, Dave Chinner wrote:
> > > On Thu, Nov 13, 2025 at 11:12:49AM +0530, Ritesh Harjani wrote:
> > > > Christoph Hellwig <hch@lst.de> writes:
> > > > 
> > > > > On Thu, Nov 13, 2025 at 08:56:56AM +1100, Dave Chinner wrote:
> > > > >> On Wed, Nov 12, 2025 at 04:36:03PM +0530, Ojaswin Mujoo wrote:
> > > > >> > This patch adds support to perform single block RWF_ATOMIC writes for
> > > > >> > iomap xfs buffered IO. This builds upon the inital RFC shared by John
> > > > >> > Garry last year [1]. Most of the details are present in the respective 
> > > > >> > commit messages but I'd mention some of the design points below:
> > > > >> 
> > > > >> What is the use case for this functionality? i.e. what is the
> > > > >> reason for adding all this complexity?
> > > > >
> > > > > Seconded.  The atomic code has a lot of complexity, and further mixing
> > > > > it with buffered I/O makes this even worse.  We'd need a really important
> > > > > use case to even consider it.
> > > > 
> > > > I agree this should have been in the cover letter itself. 
> > > > 
> > > > I believe the reason for adding this functionality was also discussed at
> > > > LSFMM too...  
> > > > 
> > > > For e.g. https://lwn.net/Articles/974578/ goes in depth and talks about
> > > > Postgres folks looking for this, since PostgreSQL databases uses
> > > > buffered I/O for their database writes.
> > > 
> > > Pointing at a discussion about how "this application has some ideas
> > > on how it can maybe use it someday in the future" isn't a
> > > particularly good justification. This still sounds more like a
> > > research project than something a production system needs right now.
> > 
> > Hi Dave, Christoph,
> > 
> > There were some discussions around use cases for buffered atomic writes
> > in the previous LSFMM covered by LWN here [1]. AFAIK, there are 
> > databases that recommend/prefer buffered IO over direct IO. As mentioned
> > in the article, MongoDB being one that supports both but recommends
> > buffered IO. Further, many DBs support both direct IO and buffered IO
> > well and it may not be fair to force them to stick to direct IO to get
> > the benefits of atomic writes.
> > 
> > [1] https://lwn.net/Articles/1016015/
> 
> You are quoting a discussion about atomic writes that was
> held without any XFS developers present. Given how XFS has driven
> atomic write functionality so far, XFS developers might have some
> ..... opinions about how buffered atomic writes in XFS...
> 
> Indeed, go back to the 2024 buffered atomic IO LSFMM discussion,
> where there were XFS developers present. That's the discussion that
> Ritesh referenced, so you should be aware of it.
> 
> https://lwn.net/Articles/974578/
> 
> Back then I talked about how atomic writes made no sense as
> -writeback IO- given the massive window for anything else to modify
> the data in the page cache. There is no guarantee that what the
> application wrote in the syscall is what gets written to disk with
> writeback IO. i.e. anything that can access the page cache can
> "tear" application data that is staged as "atomic data" for later
> writeback.
> 
> IOWs, the concept of atomic writes for writeback IO makes almost no
> sense at all - dirty data at rest in the page cache is not protected
> against 3rd party access or modification. The "atomic data IO"
> semantics can only exist in the submitting IO context where
> exclusive access to the user data can be guaranteed.
> 
> IMO, the only way semantics that makes sense for buffered atomic
> writes through the page cache is write-through IO. The "atomic"
> context is related directly to user data provided at IO submission,
> and so IO submitted must guarantee exactly that data is being
> written to disk in that IO.
> 
> IOWs, we have to guarantee exclusive access between the data copy-in
> and the pages being marked for writeback. The mapping needs to be
> marked as using stable pages to prevent anyone else changing the
> cached data whilst it has an atomic IO pending on it.
> 
> That means folios covering atomic IO ranges do not sit in the page
> cache in a dirty state - they *must* immediately transition to the
> writeback state before the folio is unlocked so that *nothing else
> can modify them* before the physical REQ_ATOMIC IO is submitted and
> completed.
> 
> If we've got the folios marked as writeback, we can pack them
> immediately into a bio and submit the IO (e.g. via the iomap DIO
> code). There is no need to involve the buffered IO writeback path
> here; we've already got the folios at hand and in the right state
> for IO. Once the IO is done, we end writeback on them and they
> remain clean in the page caceh for anyone else to access and
> modify...

Hi Dave,

I believe the essenece of your comment is that the data in the page
cache can be modified between the write and the writeback time and hence
it makes sense to have a write-through only semantic for RWF_ATOMIC
buffered IO.

However, as per various discussions around this on the mailing list, it
is my understanding that protecting tearing against an application
changing a data range that was previously written atomically is
something that falls out of scope of RWF_ATOMIC.

As John pointed out in [1], even with dio, RWF_ATOMIC writes can be torn
if the application does parallel overlaps. The only thing we guarantee
is the data doesn't tear when the actualy IO happens, and from there its
the userspace's responsibility to not change the data till IO [2]. I
believe userspace changing data between write and writeback time falls
in the same category.


[1] https://lore.kernel.org/fstests/0af205d9-6093-4931-abe9-f236acae8d44@oracle.com/
[2] https://lore.kernel.org/fstests/20250729144526.GB2672049@frogsfrogsfrogs/

> 
> This gives us the same physical IO semantics for buffered and direct
> atomic IO, and it allows the same software fallbacks for larger IO
> to be used as well.
> 
> > > Why didn't you use the existing COW buffered write IO path to
> > > implement atomic semantics for buffered writes? The XFS
> > > functionality is already all there, and it doesn't require any
> > > changes to the page cache or iomap to support...
> > 
> > This patch set focuses on HW accelerated single block atomic writes with
> > buffered IO, to get some early reviews on the core design.
> 
> What hardware acceleration? Hardware atomic writes are do not make
> IO faster; they only change IO failure semantics in certain corner
> cases. Making buffered writeback IO use REQ_ATOMIC does not change
> the failure semantics of buffered writeback from the point of view
> of an application; the applicaiton still has no idea just how much
> data or what files lost data whent eh system crashes.
> 
> Further, writeback does not retain application write ordering, so
> the application also has no control over the order that structured
> data is updated on physical media.  Hence if the application needs
> specific IO ordering for crash recovery (e.g. to avoid using a WAL)
> it cannot use background buffered writeback for atomic writes
> because that does not guarantee ordering.
> 
> What happens when you do two atomic buffered writes to the same file
> range? The second on hits the page cache, so now the crash recovery
> semantic is no longer "old or new", it's "some random older version
> or new". If the application rewrites a range frequently enough,
> on-disk updates could skip dozens of versions between "old" and
> "new", whilst other ranges of the file move one version at a time.
> The application has -zero control- of this behaviour because it is
> background writeback that determines when something gets written to
> disk, not the application.
> 
> IOWs, the only way to guarantee single version "old or new" atomic
> buffered overwrites for any given write would be to force flushing
> of the data post-write() completion.  That means either O_DSYNC,
> fdatasync() or sync_file_range(). And this turns the atomic writes
> into -write-through- IO, not write back IO...

I agree that there is no ordeirng guarantee without calls to sync and
friends, but as with all other IO paths, it has always been the
applicatoin that needs to enforce the ordering. Applications like DBs
are well aware of this however there are still areas where they can
benefit with unordered atomic IO, eg bg write of a bunch of dirty
buffers, which only need to be sync'd once during checkpoint.

> 
> > Just like we did for direct IO atomic writes, the software fallback with
> > COW and multi block support can be added eventually.
> 
> If the reason for this functionality is "maybe someone
> can use it in future", then you're not implementing this
> functionality to optimise an existing workload. It's a research
> project looking for a user.
> 
> Work with the database engineers to build a buffered atomic write
> based engine that implements atomic writes with RWF_DSYNC.
> Make it work, and optimise it to be competitive with existing
> database engines, than then show how much faster it is using
> RWF_ATOMIC buffered writes.
> 
> Alternatively - write an algorithm that assumes the filesystem is
> using COW for overwrites, and optimise the data integrity algorithm
> based on this knowledge. e.g. use always-cow mode on XFS, or just
> optimise for normal bcachefs or btrfs buffered writes. Use O_DSYNC
> when completion to submission ordering is required. Now you have
> an application algorithm that is optimised for old-or-new behaviour,
> and that can then be acclerated on overwrite-in-place capable
> filesystems by using a direct-to-hw REQ_ATOMIC overwrite to provide
> old-or-new semantics instead of using COW.
> 
> Yes, there are corner cases - partial writeback, fragmented files,
> etc - where data will a mix of old and new when using COW without
> RWF_DSYNC.  Those are the the cases that RWF_ATOMIC needs to
> mitigate, but we don't need whacky page cache and writeback stuff to
> implement RWF_ATOMIC semantics in COW capable filesystems.
> 
> i.e. enhance the applicaitons to take advantage of native COW
> old-or-new data semantics for buffered writes, then we can look at
> direct-to-hw fast paths to optimise those algorithms.
> 
> Trying to go direct-to-hw first without having any clue of how
> applications are going to use such functionality is backwards.
> Design the applicaiton level code that needs highly performant
> old-or-new buffered write guarantees, then we can optimise the data
> paths for it...

Got it, thanks for the pointers Dave, we will look into this.

Regards,
ojaswin

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by John Garry 2 months, 3 weeks ago
On 16/11/2025 08:11, Dave Chinner wrote:
>> This patch set focuses on HW accelerated single block atomic writes with
>> buffered IO, to get some early reviews on the core design.
> What hardware acceleration? Hardware atomic writes are do not make
> IO faster; they only change IO failure semantics in certain corner
> cases.

I think that he references using REQ_ATOMIC-based bio vs xfs 
software-based atomic writes (which reuse the CoW infrastructure). And 
the former is considerably faster from my testing (for DIO, obvs). But 
the latter has not been optimized.
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Dave Chinner 2 months, 3 weeks ago
On Mon, Nov 17, 2025 at 10:59:55AM +0000, John Garry wrote:
> On 16/11/2025 08:11, Dave Chinner wrote:
> > > This patch set focuses on HW accelerated single block atomic writes with
> > > buffered IO, to get some early reviews on the core design.
> > What hardware acceleration? Hardware atomic writes are do not make
> > IO faster; they only change IO failure semantics in certain corner
> > cases.
> 
> I think that he references using REQ_ATOMIC-based bio vs xfs software-based
> atomic writes (which reuse the CoW infrastructure). And the former is
> considerably faster from my testing (for DIO, obvs). But the latter has not
> been optimized.

For DIO, REQ_ATOMIC IO will generally be faster than the software
fallback because no page cache interactions or data copy is required
by the DIO REQ_ATOMIC fast path.

But we are considering buffered writes, which *must* do a data copy,
and so the behaviour and performance differential of doing a COW vs
trying to force writeback to do REQ_ATOMIC IO is going to be much
different.

Consider that the way atomic buffered writes have been implemented
in writeback - turning off all folio and IO merging.  This means
writeback efficiency of atomic writes is going to be horrendous
compared to COW writes that don't use REQ_ATOMIC.

Further, REQ_ATOMIC buffered writes need to turn off delayed
allocation because if you can't allocate aligned extents then the
atomic write can *never* be performed. Hence we have to allocate up
front where we can return errors to userspace immediately, rather
than just reserve space and punt allocation to writeback. i.e. we
have to avoid the situation where we have dirty "atomic" data in the
page cache that cannot be written because physical allocation fails.

The likely outcome of turning off delalloc is that it further
degrades buffered atomic write writeback efficiency because it
removes the ability for the filesystem to optimise physical locality
of writeback IO. e.g. adjacent allocation across multiple small
files or packing of random writes in a single file to allow them to
merge at the block layer into one big IO...

REQ_ATOMIC is a natural fit for DIO because DIO is largely a "one
write syscall, one physical IO" style interface. Buffered writes,
OTOH, completely decouples application IO from physical IO, and so
there is no real "atomic" connection between the data being written
into the page caceh and the physical IO that is performed at some
time later.

This decoupling of physical IO is what brings all the problems and
inefficiencies. The filesystem being able to mark the RWF_ATOMIC
write range as a COW range at submission time creates a natural
"atomic IO" behaviour without requiring the page cache or writeback
to even care that the data needs to be written atomically.

From there, we optimise the COW IO path to record that
the new COW extent was created for the purpose of an atomic write.
Then when we go to write back data over that extent, the filesystem
can chose to do a REQ_ATOMIC write to do an atomic overwrite instead
of allocating a new extent and swapping the BMBT extent pointers at
IO completion time.

We really don't care if 4x16kB adjacent RWF_ATOMIC writes are
submitted as 1x64kB REQ_ATOMIC IO or 4 individual 16kB REQ_ATOMIC
IOs. The former is much more efficient from an IO perspective, and
the COW path can actually optimise for this because it can track the
atomic write ranges in cache exactly. If the range is larger (or
unaligned) than what REQ_ATOMIC can handle, we use COW writeback to
optimise for maximum writeback bandwidth, otherwise we use
REQ_ATOMIC to optimise for minimum writeback submission and
completion overhead...

IOWs, I think that for XFS (and other COW-capable filesystems) we
should be looking at optimising the COW IO path to use REQ_ATOMIC
where appropriate to create a direct overwrite fast path for
RWF_ATOMIC buffered writes. This seems a more natural and a lot less
intrusive than trying to blast through the page caceh abstractions
to directly couple userspace IO boundaries to physical writeback IO
boundaries...

-Dave.
-- 
Dave Chinner
david@fromorbit.com
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Ojaswin Mujoo 2 months, 2 weeks ago
On Tue, Nov 18, 2025 at 07:51:27AM +1100, Dave Chinner wrote:
> On Mon, Nov 17, 2025 at 10:59:55AM +0000, John Garry wrote:
> > On 16/11/2025 08:11, Dave Chinner wrote:
> > > > This patch set focuses on HW accelerated single block atomic writes with
> > > > buffered IO, to get some early reviews on the core design.
> > > What hardware acceleration? Hardware atomic writes are do not make
> > > IO faster; they only change IO failure semantics in certain corner
> > > cases.
> > 
> > I think that he references using REQ_ATOMIC-based bio vs xfs software-based
> > atomic writes (which reuse the CoW infrastructure). And the former is
> > considerably faster from my testing (for DIO, obvs). But the latter has not
> > been optimized.
> 

Hi Dave,
Thanks for the review and insights.

Going through the discussions in previous emails and this email, I
understand that there are 2 main points/approaches that you've
mentioned:

1. Using COW extents to track atomic ranges
  - Discussed inline below.

2. Using write-through for RWF_ATOMIC buffered-IO (Suggested in [1])
	- [1] https://lore.kernel.org/linux-ext4/aRmHRk7FGD4nCT0s@dread.disaster.area/
  - I will respond inline in the above thread.

> For DIO, REQ_ATOMIC IO will generally be faster than the software
> fallback because no page cache interactions or data copy is required
> by the DIO REQ_ATOMIC fast path.
> 
> But we are considering buffered writes, which *must* do a data copy,
> and so the behaviour and performance differential of doing a COW vs
> trying to force writeback to do REQ_ATOMIC IO is going to be much
> different.
> 
> Consider that the way atomic buffered writes have been implemented
> in writeback - turning off all folio and IO merging.  This means
> writeback efficiency of atomic writes is going to be horrendous
> compared to COW writes that don't use REQ_ATOMIC.

Yes, I agree that it is a bit of an overkill.

> 
> Further, REQ_ATOMIC buffered writes need to turn off delayed
> allocation because if you can't allocate aligned extents then the
> atomic write can *never* be performed. Hence we have to allocate up
> front where we can return errors to userspace immediately, rather
> than just reserve space and punt allocation to writeback. i.e. we
> have to avoid the situation where we have dirty "atomic" data in the
> page cache that cannot be written because physical allocation fails.
> 
> The likely outcome of turning off delalloc is that it further
> degrades buffered atomic write writeback efficiency because it
> removes the ability for the filesystem to optimise physical locality
> of writeback IO. e.g. adjacent allocation across multiple small
> files or packing of random writes in a single file to allow them to
> merge at the block layer into one big IO...
> 
> REQ_ATOMIC is a natural fit for DIO because DIO is largely a "one
> write syscall, one physical IO" style interface. Buffered writes,
> OTOH, completely decouples application IO from physical IO, and so
> there is no real "atomic" connection between the data being written
> into the page caceh and the physical IO that is performed at some
> time later.
> 
> This decoupling of physical IO is what brings all the problems and
> inefficiencies. The filesystem being able to mark the RWF_ATOMIC
> write range as a COW range at submission time creates a natural
> "atomic IO" behaviour without requiring the page cache or writeback
> to even care that the data needs to be written atomically.
> 
> From there, we optimise the COW IO path to record that
> the new COW extent was created for the purpose of an atomic write.
> Then when we go to write back data over that extent, the filesystem
> can chose to do a REQ_ATOMIC write to do an atomic overwrite instead
> of allocating a new extent and swapping the BMBT extent pointers at
> IO completion time.
> 
> We really don't care if 4x16kB adjacent RWF_ATOMIC writes are
> submitted as 1x64kB REQ_ATOMIC IO or 4 individual 16kB REQ_ATOMIC
> IOs. The former is much more efficient from an IO perspective, and
> the COW path can actually optimise for this because it can track the
> atomic write ranges in cache exactly. If the range is larger (or
> unaligned) than what REQ_ATOMIC can handle, we use COW writeback to
> optimise for maximum writeback bandwidth, otherwise we use
> REQ_ATOMIC to optimise for minimum writeback submission and
> completion overhead...

Okay IIUC, you are suggesting that, instead of tracking the atomic
ranges in page cache and ifs, lets move that to the filesystem, for
example in XFS we can:

1. In write iomap_begin path, for RWF_ATOMIC, create a COW extent and
mark it as atomic. 

2. Carry on with the memcpy to folio and finish the write path.

3. During writeback, at XFS can detect that there is a COW atomic
extent. It can then:
  3.1 See that it is an overlap that can be done with REQ_ATOMIC
	directly 
	3.2 Else, finish the atomic IO in software emulated way just like we
	do for direct IO currently.

I believe the above example with XFS can also be extended to a FS like
ext4 without needing COW range, as long as we can ensure that we always
meet the conditions for REQ_ATOMIC during writeback (example by using
bigalloc for aligned extents and being careful not to cross the atomic
write limits)

> 
> IOWs, I think that for XFS (and other COW-capable filesystems) we
> should be looking at optimising the COW IO path to use REQ_ATOMIC
> where appropriate to create a direct overwrite fast path for
> RWF_ATOMIC buffered writes. This seems a more natural and a lot less
> intrusive than trying to blast through the page caceh abstractions
> to directly couple userspace IO boundaries to physical writeback IO
> boundaries...

I agree that this approach avoids bloating the page cache and ifs layers
with RWF_ATOMIC implementation details. That being said, the task of
managing the atomic ranges is now pushed down to the FS and is no longer
generic which might introduce friction in onboarding of new FSes in the
future. Regardless, from the discussion, I believe at this point we are
okay to make that trade-off.

Let me take some time to look into the XFS COW paths and try to implement
this approach. Thanks for the suggestion!

Regards,
ojaswin

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Matthew Wilcox 2 months, 3 weeks ago
On Fri, Nov 14, 2025 at 02:50:25PM +0530, Ojaswin Mujoo wrote:
> buffered IO. Further, many DBs support both direct IO and buffered IO
> well and it may not be fair to force them to stick to direct IO to get
> the benefits of atomic writes.

It may not be fair to force kernel developers to support a feature that
has no users.
Re: [RFC PATCH 0/8] xfs: single block atomic writes for buffered IO
Posted by Christoph Hellwig 2 months, 3 weeks ago
On Thu, Nov 13, 2025 at 11:12:49AM +0530, Ritesh Harjani wrote:
> For e.g. https://lwn.net/Articles/974578/ goes in depth and talks about
> Postgres folks looking for this, since PostgreSQL databases uses
> buffered I/O for their database writes.

Honestly, a database stubbornly using the wrong I/O path should not be
a reaѕon for adding this complexity.

[syzbot ci] Re: xfs: single block atomic writes for buffered IO
Posted by syzbot ci 2 months, 3 weeks ago
syzbot ci has tested the following series

[v1] xfs: single block atomic writes for buffered IO
https://lore.kernel.org/all/cover.1762945505.git.ojaswin@linux.ibm.com
* [RFC PATCH 1/8] fs: Rename STATX{_ATTR}_WRITE_ATOMIC -> STATX{_ATTR}_WRITE_ATOMIC_DIO
* [RFC PATCH 2/8] mm: Add PG_atomic
* [RFC PATCH 3/8] fs: Add initial buffered atomic write support info to statx
* [RFC PATCH 4/8] iomap: buffered atomic write support
* [RFC PATCH 5/8] iomap: pin pages for RWF_ATOMIC buffered write
* [RFC PATCH 6/8] xfs: Report atomic write min and max for buf io as well
* [RFC PATCH 7/8] iomap: Add bs<ps buffered atomic writes support
* [RFC PATCH 8/8] xfs: Lift the bs == ps restriction for HW buffered atomic writes

and found the following issue:
KASAN: slab-out-of-bounds Read in __bitmap_clear

Full report is available here:
https://ci.syzbot.org/series/430a088a-50e2-46d3-87ff-a1f0fa67b66c

***

KASAN: slab-out-of-bounds Read in __bitmap_clear

tree:      linux-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base:      ab40c92c74c6b0c611c89516794502b3a3173966
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/02d3e137-5d7e-4c95-8f32-43b8663d95df/config
C repro:   https://ci.syzbot.org/findings/92a3582f-40a6-4936-8fcd-dc55c447a432/c_repro
syz repro: https://ci.syzbot.org/findings/92a3582f-40a6-4936-8fcd-dc55c447a432/syz_repro

==================================================================
BUG: KASAN: slab-out-of-bounds in __bitmap_clear+0x155/0x180 lib/bitmap.c:395
Read of size 8 at addr ffff88816ced7cd0 by task kworker/0:1/10

CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Workqueue: xfs-conv/loop0 xfs_end_io
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xca/0x240 mm/kasan/report.c:482
 kasan_report+0x118/0x150 mm/kasan/report.c:595
 __bitmap_clear+0x155/0x180 lib/bitmap.c:395
 bitmap_clear include/linux/bitmap.h:496 [inline]
 ifs_clear_range_atomic fs/iomap/buffered-io.c:241 [inline]
 iomap_clear_range_atomic+0x25c/0x630 fs/iomap/buffered-io.c:268
 iomap_finish_folio_write+0x2f0/0x410 fs/iomap/buffered-io.c:1971
 iomap_finish_ioend_buffered+0x223/0x5e0 fs/iomap/ioend.c:58
 iomap_finish_ioends+0x116/0x2b0 fs/iomap/ioend.c:295
 xfs_end_ioend+0x50b/0x690 fs/xfs/xfs_aops.c:168
 xfs_end_io+0x253/0x2d0 fs/xfs/xfs_aops.c:205
 process_one_work+0x94a/0x15d0 kernel/workqueue.c:3267
 process_scheduled_works kernel/workqueue.c:3350 [inline]
 worker_thread+0x9b0/0xee0 kernel/workqueue.c:3431
 kthread+0x711/0x8a0 kernel/kthread.c:463
 ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

Allocated by task 5952:
 kasan_save_stack mm/kasan/common.c:56 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
 poison_kmalloc_redzone mm/kasan/common.c:397 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:414
 kasan_kmalloc include/linux/kasan.h:262 [inline]
 __do_kmalloc_node mm/slub.c:5672 [inline]
 __kmalloc_noprof+0x41d/0x800 mm/slub.c:5684
 kmalloc_noprof include/linux/slab.h:961 [inline]
 kzalloc_noprof include/linux/slab.h:1094 [inline]
 ifs_alloc+0x1e4/0x530 fs/iomap/buffered-io.c:356
 iomap_writeback_folio+0x81c/0x26a0 fs/iomap/buffered-io.c:2084
 iomap_writepages+0x162/0x2d0 fs/iomap/buffered-io.c:2168
 xfs_vm_writepages+0x28a/0x300 fs/xfs/xfs_aops.c:701
 do_writepages+0x32e/0x550 mm/page-writeback.c:2598
 filemap_writeback mm/filemap.c:387 [inline]
 filemap_fdatawrite_range mm/filemap.c:412 [inline]
 file_write_and_wait_range+0x23e/0x340 mm/filemap.c:786
 xfs_file_fsync+0x195/0x800 fs/xfs/xfs_file.c:137
 generic_write_sync include/linux/fs.h:2639 [inline]
 xfs_file_buffered_write+0x723/0x8a0 fs/xfs/xfs_file.c:1015
 do_iter_readv_writev+0x623/0x8c0 fs/read_write.c:-1
 vfs_writev+0x31a/0x960 fs/read_write.c:1057
 do_pwritev fs/read_write.c:1153 [inline]
 __do_sys_pwritev2 fs/read_write.c:1211 [inline]
 __se_sys_pwritev2+0x179/0x290 fs/read_write.c:1202
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff88816ced7c80
 which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes to the right of
 allocated 80-byte region [ffff88816ced7c80, ffff88816ced7cd0)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16ced7
flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000000 ffff888100041280 dead000000000100 dead000000000122
raw: 0000000000000000 0000000080200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x252800(GFP_NOWAIT|__GFP_NORETRY|__GFP_COMP|__GFP_THISNODE), pid 1, tgid 1 (swapper/0), ts 12041529441, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1851
 prep_new_page mm/page_alloc.c:1859 [inline]
 get_page_from_freelist+0x2365/0x2440 mm/page_alloc.c:3920
 __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5209
 alloc_slab_page mm/slub.c:3086 [inline]
 allocate_slab+0x71/0x350 mm/slub.c:3257
 new_slab mm/slub.c:3311 [inline]
 ___slab_alloc+0xf56/0x1990 mm/slub.c:4671
 __slab_alloc+0x65/0x100 mm/slub.c:4794
 __slab_alloc_node mm/slub.c:4870 [inline]
 slab_alloc_node mm/slub.c:5266 [inline]
 __kmalloc_cache_node_noprof+0x4b7/0x6f0 mm/slub.c:5799
 kmalloc_node_noprof include/linux/slab.h:983 [inline]
 alloc_node_nr_active kernel/workqueue.c:4908 [inline]
 __alloc_workqueue+0x6a9/0x1b80 kernel/workqueue.c:5762
 alloc_workqueue_noprof+0xd4/0x210 kernel/workqueue.c:5822
 nbd_dev_add+0x4f1/0xae0 drivers/block/nbd.c:1961
 nbd_init+0x168/0x1f0 drivers/block/nbd.c:2691
 do_one_initcall+0x25a/0x860 init/main.c:1378
 do_initcall_level+0x104/0x190 init/main.c:1440
 do_initcalls+0x59/0xa0 init/main.c:1456
 kernel_init_freeable+0x334/0x4b0 init/main.c:1688
 kernel_init+0x1d/0x1d0 init/main.c:1578
page_owner free stack trace missing

Memory state around the buggy address:
 ffff88816ced7b80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
 ffff88816ced7c00: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
>ffff88816ced7c80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
                                                 ^
 ffff88816ced7d00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff88816ced7d80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
==================================================================


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.