[PATCH 0/2] fsdax,xfs: fix warning messages

Shiyang Ruan posted 2 patches 1 week, 3 days ago
There is a newer version of this series
fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
fs/xfs/xfs_iomap.c   |   6 +-
fs/xfs/xfs_reflink.c |   8 ++-
include/linux/dax.h  |   2 +
4 files changed, 129 insertions(+), 53 deletions(-)
[PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Shiyang Ruan 1 week, 3 days ago
Many testcases failed in dax+reflink mode with warning message in dmesg.
This also effects dax+noreflink mode if we run the test after a
dax+reflink test.  So, the most urgent thing is solving the warning
messages.

Patch 1 fixes some mistakes and adds handling of CoW cases not
previously considered (srcmap is HOLE or UNWRITTEN).
Patch 2 adds the implementation of unshare for fsdax.

With these fixes, most warning messages in dax_associate_entry() are
gone.  But honestly, generic/388 will randomly failed with the warning.
The case shutdown the xfs when fsstress is running, and do it for many
times.  I think the reason is that dax pages in use are not able to be
invalidated in time when fs is shutdown.  The next time dax page to be
associated, it still remains the mapping value set last time.  I'll keep
on solving it.

The warning message in dax_writeback_one() can also be fixed because of
the dax unshare.


Shiyang Ruan (2):
  fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
  fsdax,xfs: port unshare to fsdax

 fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_iomap.c   |   6 +-
 fs/xfs/xfs_reflink.c |   8 ++-
 include/linux/dax.h  |   2 +
 4 files changed, 129 insertions(+), 53 deletions(-)

-- 
2.38.1
RE: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Dan Williams 4 days, 13 hours ago
[ add Andrew ]

Shiyang Ruan wrote:
> Many testcases failed in dax+reflink mode with warning message in dmesg.
> This also effects dax+noreflink mode if we run the test after a
> dax+reflink test.  So, the most urgent thing is solving the warning
> messages.
> 
> Patch 1 fixes some mistakes and adds handling of CoW cases not
> previously considered (srcmap is HOLE or UNWRITTEN).
> Patch 2 adds the implementation of unshare for fsdax.
> 
> With these fixes, most warning messages in dax_associate_entry() are
> gone.  But honestly, generic/388 will randomly failed with the warning.
> The case shutdown the xfs when fsstress is running, and do it for many
> times.  I think the reason is that dax pages in use are not able to be
> invalidated in time when fs is shutdown.  The next time dax page to be
> associated, it still remains the mapping value set last time.  I'll keep
> on solving it.
> 
> The warning message in dax_writeback_one() can also be fixed because of
> the dax unshare.

Thank you for digging in on this, I had been pinned down on CXL tasks
and worried that we would need to mark FS_DAX broken for a cycle, so
this is timely.

My only concern is that these patches look to have significant collisions with
the fsdax page reference counting reworks pending in linux-next. Although,
those are still sitting in mm-unstable:

http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org

My preference would be to move ahead with both in which case I can help
rebase these fixes on top. In that scenario everything would go through
Andrew.

However, if we are getting too late in the cycle for that path I think
these dax-fixes take precedence, and one more cycle to let the page
reference count reworks sit is ok.

> Shiyang Ruan (2):
>   fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
>   fsdax,xfs: port unshare to fsdax
> 
>  fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
>  fs/xfs/xfs_iomap.c   |   6 +-
>  fs/xfs/xfs_reflink.c |   8 ++-
>  include/linux/dax.h  |   2 +
>  4 files changed, 129 insertions(+), 53 deletions(-)
> 
> -- 
> 2.38.1
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Andrew Morton 3 days, 20 hours ago
On Tue, 29 Nov 2022 19:59:14 -0800 Dan Williams <dan.j.williams@intel.com> wrote:

> [ add Andrew ]
> 
> Shiyang Ruan wrote:
> > Many testcases failed in dax+reflink mode with warning message in dmesg.
> > This also effects dax+noreflink mode if we run the test after a
> > dax+reflink test.  So, the most urgent thing is solving the warning
> > messages.
> > 
> > Patch 1 fixes some mistakes and adds handling of CoW cases not
> > previously considered (srcmap is HOLE or UNWRITTEN).
> > Patch 2 adds the implementation of unshare for fsdax.
> > 
> > With these fixes, most warning messages in dax_associate_entry() are
> > gone.  But honestly, generic/388 will randomly failed with the warning.
> > The case shutdown the xfs when fsstress is running, and do it for many
> > times.  I think the reason is that dax pages in use are not able to be
> > invalidated in time when fs is shutdown.  The next time dax page to be
> > associated, it still remains the mapping value set last time.  I'll keep
> > on solving it.
> > 
> > The warning message in dax_writeback_one() can also be fixed because of
> > the dax unshare.
> 
> Thank you for digging in on this, I had been pinned down on CXL tasks
> and worried that we would need to mark FS_DAX broken for a cycle, so
> this is timely.
> 
> My only concern is that these patches look to have significant collisions with
> the fsdax page reference counting reworks pending in linux-next. Although,
> those are still sitting in mm-unstable:
> 
> http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org

As far as I know, Dan's "Fix the DAX-gup mistake" series is somewhat
stuck.  Jan pointed out:

https://lore.kernel.org/all/20221109113849.p7pwob533ijgrytu@quack3/T/#u

or have Jason's issues since been addressed?

> My preference would be to move ahead with both in which case I can help
> rebase these fixes on top. In that scenario everything would go through
> Andrew.
> 
> However, if we are getting too late in the cycle for that path I think
> these dax-fixes take precedence, and one more cycle to let the page
> reference count reworks sit is ok.

That sounds a decent approach.  So we go with this series ("fsdax,xfs:
fix warning messages") and aim at 6.3-rc1 with "Fix the DAX-gup
mistake"?
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Dan Williams 3 days, 19 hours ago
Andrew Morton wrote:
> On Tue, 29 Nov 2022 19:59:14 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > [ add Andrew ]
> > 
> > Shiyang Ruan wrote:
> > > Many testcases failed in dax+reflink mode with warning message in dmesg.
> > > This also effects dax+noreflink mode if we run the test after a
> > > dax+reflink test.  So, the most urgent thing is solving the warning
> > > messages.
> > > 
> > > Patch 1 fixes some mistakes and adds handling of CoW cases not
> > > previously considered (srcmap is HOLE or UNWRITTEN).
> > > Patch 2 adds the implementation of unshare for fsdax.
> > > 
> > > With these fixes, most warning messages in dax_associate_entry() are
> > > gone.  But honestly, generic/388 will randomly failed with the warning.
> > > The case shutdown the xfs when fsstress is running, and do it for many
> > > times.  I think the reason is that dax pages in use are not able to be
> > > invalidated in time when fs is shutdown.  The next time dax page to be
> > > associated, it still remains the mapping value set last time.  I'll keep
> > > on solving it.
> > > 
> > > The warning message in dax_writeback_one() can also be fixed because of
> > > the dax unshare.
> > 
> > Thank you for digging in on this, I had been pinned down on CXL tasks
> > and worried that we would need to mark FS_DAX broken for a cycle, so
> > this is timely.
> > 
> > My only concern is that these patches look to have significant collisions with
> > the fsdax page reference counting reworks pending in linux-next. Although,
> > those are still sitting in mm-unstable:
> > 
> > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org
> 
> As far as I know, Dan's "Fix the DAX-gup mistake" series is somewhat
> stuck.  Jan pointed out:
> 
> https://lore.kernel.org/all/20221109113849.p7pwob533ijgrytu@quack3/T/#u
> 
> or have Jason's issues since been addressed?

No, they have not. I do think the current series is a step forward, but
given the urgency remains low for the time being (CXL hotplug use case
further out, no known collisions with ongoing folio work, and no
MEMORY_DEVICE_PRIVATE users looking to build any conversions on top for
6.2) I am ok to circle back for 6.3 for that follow on work to be
integrated.

> > My preference would be to move ahead with both in which case I can help
> > rebase these fixes on top. In that scenario everything would go through
> > Andrew.
> > 
> > However, if we are getting too late in the cycle for that path I think
> > these dax-fixes take precedence, and one more cycle to let the page
> > reference count reworks sit is ok.
> 
> That sounds a decent approach.  So we go with this series ("fsdax,xfs:
> fix warning messages") and aim at 6.3-rc1 with "Fix the DAX-gup
> mistake"?
> 

Yeah, that's the path of least hassle.
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Darrick J. Wong 3 days, 18 hours ago
On Wed, Nov 30, 2022 at 01:48:59PM -0800, Dan Williams wrote:
> Andrew Morton wrote:
> > On Tue, 29 Nov 2022 19:59:14 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
> > 
> > > [ add Andrew ]
> > > 
> > > Shiyang Ruan wrote:
> > > > Many testcases failed in dax+reflink mode with warning message in dmesg.
> > > > This also effects dax+noreflink mode if we run the test after a
> > > > dax+reflink test.  So, the most urgent thing is solving the warning
> > > > messages.
> > > > 
> > > > Patch 1 fixes some mistakes and adds handling of CoW cases not
> > > > previously considered (srcmap is HOLE or UNWRITTEN).
> > > > Patch 2 adds the implementation of unshare for fsdax.
> > > > 
> > > > With these fixes, most warning messages in dax_associate_entry() are
> > > > gone.  But honestly, generic/388 will randomly failed with the warning.
> > > > The case shutdown the xfs when fsstress is running, and do it for many
> > > > times.  I think the reason is that dax pages in use are not able to be
> > > > invalidated in time when fs is shutdown.  The next time dax page to be
> > > > associated, it still remains the mapping value set last time.  I'll keep
> > > > on solving it.
> > > > 
> > > > The warning message in dax_writeback_one() can also be fixed because of
> > > > the dax unshare.
> > > 
> > > Thank you for digging in on this, I had been pinned down on CXL tasks
> > > and worried that we would need to mark FS_DAX broken for a cycle, so
> > > this is timely.
> > > 
> > > My only concern is that these patches look to have significant collisions with
> > > the fsdax page reference counting reworks pending in linux-next. Although,
> > > those are still sitting in mm-unstable:
> > > 
> > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org
> > 
> > As far as I know, Dan's "Fix the DAX-gup mistake" series is somewhat
> > stuck.  Jan pointed out:
> > 
> > https://lore.kernel.org/all/20221109113849.p7pwob533ijgrytu@quack3/T/#u
> > 
> > or have Jason's issues since been addressed?
> 
> No, they have not. I do think the current series is a step forward, but
> given the urgency remains low for the time being (CXL hotplug use case
> further out, no known collisions with ongoing folio work, and no
> MEMORY_DEVICE_PRIVATE users looking to build any conversions on top for
> 6.2) I am ok to circle back for 6.3 for that follow on work to be
> integrated.
> 
> > > My preference would be to move ahead with both in which case I can help
> > > rebase these fixes on top. In that scenario everything would go through
> > > Andrew.
> > > 
> > > However, if we are getting too late in the cycle for that path I think
> > > these dax-fixes take precedence, and one more cycle to let the page
> > > reference count reworks sit is ok.
> > 
> > That sounds a decent approach.  So we go with this series ("fsdax,xfs:
> > fix warning messages") and aim at 6.3-rc1 with "Fix the DAX-gup
> > mistake"?
> > 
> 
> Yeah, that's the path of least hassle.

Sounds good.  I still want to see patch 1 of this series broken up into
smaller pieces though.  Once the series goes through review, do you want
me to push the fixes to Linus, seeing as xfs is the only user of this
functionality?

--D
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Dan Williams 3 days, 17 hours ago
Darrick J. Wong wrote:
> On Wed, Nov 30, 2022 at 01:48:59PM -0800, Dan Williams wrote:
> > Andrew Morton wrote:
> > > On Tue, 29 Nov 2022 19:59:14 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
> > > 
> > > > [ add Andrew ]
> > > > 
> > > > Shiyang Ruan wrote:
> > > > > Many testcases failed in dax+reflink mode with warning message in dmesg.
> > > > > This also effects dax+noreflink mode if we run the test after a
> > > > > dax+reflink test.  So, the most urgent thing is solving the warning
> > > > > messages.
> > > > > 
> > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not
> > > > > previously considered (srcmap is HOLE or UNWRITTEN).
> > > > > Patch 2 adds the implementation of unshare for fsdax.
> > > > > 
> > > > > With these fixes, most warning messages in dax_associate_entry() are
> > > > > gone.  But honestly, generic/388 will randomly failed with the warning.
> > > > > The case shutdown the xfs when fsstress is running, and do it for many
> > > > > times.  I think the reason is that dax pages in use are not able to be
> > > > > invalidated in time when fs is shutdown.  The next time dax page to be
> > > > > associated, it still remains the mapping value set last time.  I'll keep
> > > > > on solving it.
> > > > > 
> > > > > The warning message in dax_writeback_one() can also be fixed because of
> > > > > the dax unshare.
> > > > 
> > > > Thank you for digging in on this, I had been pinned down on CXL tasks
> > > > and worried that we would need to mark FS_DAX broken for a cycle, so
> > > > this is timely.
> > > > 
> > > > My only concern is that these patches look to have significant collisions with
> > > > the fsdax page reference counting reworks pending in linux-next. Although,
> > > > those are still sitting in mm-unstable:
> > > > 
> > > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org
> > > 
> > > As far as I know, Dan's "Fix the DAX-gup mistake" series is somewhat
> > > stuck.  Jan pointed out:
> > > 
> > > https://lore.kernel.org/all/20221109113849.p7pwob533ijgrytu@quack3/T/#u
> > > 
> > > or have Jason's issues since been addressed?
> > 
> > No, they have not. I do think the current series is a step forward, but
> > given the urgency remains low for the time being (CXL hotplug use case
> > further out, no known collisions with ongoing folio work, and no
> > MEMORY_DEVICE_PRIVATE users looking to build any conversions on top for
> > 6.2) I am ok to circle back for 6.3 for that follow on work to be
> > integrated.
> > 
> > > > My preference would be to move ahead with both in which case I can help
> > > > rebase these fixes on top. In that scenario everything would go through
> > > > Andrew.
> > > > 
> > > > However, if we are getting too late in the cycle for that path I think
> > > > these dax-fixes take precedence, and one more cycle to let the page
> > > > reference count reworks sit is ok.
> > > 
> > > That sounds a decent approach.  So we go with this series ("fsdax,xfs:
> > > fix warning messages") and aim at 6.3-rc1 with "Fix the DAX-gup
> > > mistake"?
> > > 
> > 
> > Yeah, that's the path of least hassle.
> 
> Sounds good.  I still want to see patch 1 of this series broken up into
> smaller pieces though.  Once the series goes through review, do you want
> me to push the fixes to Linus, seeing as xfs is the only user of this
> functionality?

Yes, that was my primary feedback as well, and merging through xfs makes
sense to me.
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Darrick J. Wong 4 days, 13 hours ago
On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote:
> [ add Andrew ]
> 
> Shiyang Ruan wrote:
> > Many testcases failed in dax+reflink mode with warning message in dmesg.
> > This also effects dax+noreflink mode if we run the test after a
> > dax+reflink test.  So, the most urgent thing is solving the warning
> > messages.
> > 
> > Patch 1 fixes some mistakes and adds handling of CoW cases not
> > previously considered (srcmap is HOLE or UNWRITTEN).
> > Patch 2 adds the implementation of unshare for fsdax.
> > 
> > With these fixes, most warning messages in dax_associate_entry() are
> > gone.  But honestly, generic/388 will randomly failed with the warning.
> > The case shutdown the xfs when fsstress is running, and do it for many
> > times.  I think the reason is that dax pages in use are not able to be
> > invalidated in time when fs is shutdown.  The next time dax page to be
> > associated, it still remains the mapping value set last time.  I'll keep
> > on solving it.
> > 
> > The warning message in dax_writeback_one() can also be fixed because of
> > the dax unshare.
> 
> Thank you for digging in on this, I had been pinned down on CXL tasks
> and worried that we would need to mark FS_DAX broken for a cycle, so
> this is timely.
> 
> My only concern is that these patches look to have significant collisions with
> the fsdax page reference counting reworks pending in linux-next. Although,
> those are still sitting in mm-unstable:
> 
> http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org
> 
> My preference would be to move ahead with both in which case I can help
> rebase these fixes on top. In that scenario everything would go through
> Andrew.
> 
> However, if we are getting too late in the cycle for that path I think
> these dax-fixes take precedence, and one more cycle to let the page
> reference count reworks sit is ok.

Well now that raises some interesting questions -- dax and reflink are
totally broken on 6.1.  I was thinking about cramming them into 6.2 as a
data corruption fix on the grounds that is not an acceptable state of
affairs.

OTOH we're past -rc7, which is **really late** to be changing core code.
Then again, there aren't so many fsdax users and nobody's complained
about 6.0/6.1 being busted, so perhaps the risk of regression isn't so
bad?  Then again, that could be a sign that this could wait, if you and
Andrew are really eager to merge the reworks.

Just looking at the stuff that's still broken with dax+reflink -- I
noticed that xfs/550-552 (aka the dax poison tests) are still regressing
on reflink filesystems.

So, uh, what would this patchset need to change if the "fsdax page
reference counting reworks" were applied?  Would it be changing the page
refcount instead of stashing that in page->index?

--D

> > Shiyang Ruan (2):
> >   fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
> >   fsdax,xfs: port unshare to fsdax
> > 
> >  fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
> >  fs/xfs/xfs_iomap.c   |   6 +-
> >  fs/xfs/xfs_reflink.c |   8 ++-
> >  include/linux/dax.h  |   2 +
> >  4 files changed, 129 insertions(+), 53 deletions(-)
> > 
> > -- 
> > 2.38.1
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Dan Williams 4 days, 10 hours ago
Darrick J. Wong wrote:
> On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote:
> > [ add Andrew ]
> > 
> > Shiyang Ruan wrote:
> > > Many testcases failed in dax+reflink mode with warning message in dmesg.
> > > This also effects dax+noreflink mode if we run the test after a
> > > dax+reflink test.  So, the most urgent thing is solving the warning
> > > messages.
> > > 
> > > Patch 1 fixes some mistakes and adds handling of CoW cases not
> > > previously considered (srcmap is HOLE or UNWRITTEN).
> > > Patch 2 adds the implementation of unshare for fsdax.
> > > 
> > > With these fixes, most warning messages in dax_associate_entry() are
> > > gone.  But honestly, generic/388 will randomly failed with the warning.
> > > The case shutdown the xfs when fsstress is running, and do it for many
> > > times.  I think the reason is that dax pages in use are not able to be
> > > invalidated in time when fs is shutdown.  The next time dax page to be
> > > associated, it still remains the mapping value set last time.  I'll keep
> > > on solving it.
> > > 
> > > The warning message in dax_writeback_one() can also be fixed because of
> > > the dax unshare.
> > 
> > Thank you for digging in on this, I had been pinned down on CXL tasks
> > and worried that we would need to mark FS_DAX broken for a cycle, so
> > this is timely.
> > 
> > My only concern is that these patches look to have significant collisions with
> > the fsdax page reference counting reworks pending in linux-next. Although,
> > those are still sitting in mm-unstable:
> > 
> > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org
> > 
> > My preference would be to move ahead with both in which case I can help
> > rebase these fixes on top. In that scenario everything would go through
> > Andrew.
> > 
> > However, if we are getting too late in the cycle for that path I think
> > these dax-fixes take precedence, and one more cycle to let the page
> > reference count reworks sit is ok.
> 
> Well now that raises some interesting questions -- dax and reflink are
> totally broken on 6.1.  I was thinking about cramming them into 6.2 as a
> data corruption fix on the grounds that is not an acceptable state of
> affairs.

I agree it's not an acceptable state of affairs, but for 6.1 the answer
may be to just revert to dax+reflink being forbidden again. The fact
that no end user has noticed is probably a good sign that we can disable
that without any one screaming. That may be the easy answer for 6.2 as
well given how late this all is.

> OTOH we're past -rc7, which is **really late** to be changing core code.
> Then again, there aren't so many fsdax users and nobody's complained
> about 6.0/6.1 being busted, so perhaps the risk of regression isn't so
> bad?  Then again, that could be a sign that this could wait, if you and
> Andrew are really eager to merge the reworks.

The page reference counting has also been languishing for a long time. A
6.2 merge would be nice, it relieves maintenance burden, but they do not
start to have real end user implications until CXL memory hotplug
platforms arrive and the warts in the reference counting start to show
real problems in production.

> Just looking at the stuff that's still broken with dax+reflink -- I
> noticed that xfs/550-552 (aka the dax poison tests) are still regressing
> on reflink filesystems.

That's worrying because the whole point of reworking dax, xfs, and
mm/memory-failure all at once was to handle the collision of poison and
reflink'd dax files.

> So, uh, what would this patchset need to change if the "fsdax page
> reference counting reworks" were applied?  Would it be changing the page
> refcount instead of stashing that in page->index?

Nah, it's things like switching from pages to folios and shifting how
dax goes from pfns to pages.

https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=cca48ba3196

Ideally fsdax would never deal in pfns at all and do everything in terms
of offsets relative to a 'struct dax_device'.

My gut is saying these patches, the refcount reworks, and the
dax+reflink fixes, are important but not end user critical. One more
status quo release does not hurt, and we can circle back to get this all
straightened early in v6.3.

I.e. just revert:

35fcd75af3ed xfs: fail dax mount if reflink is enabled on a partition

...for v6.1-rc8 and get back to this early in the New Year.

> 
> --D
> 
> > > Shiyang Ruan (2):
> > >   fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
> > >   fsdax,xfs: port unshare to fsdax
> > > 
> > >  fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
> > >  fs/xfs/xfs_iomap.c   |   6 +-
> > >  fs/xfs/xfs_reflink.c |   8 ++-
> > >  include/linux/dax.h  |   2 +
> > >  4 files changed, 129 insertions(+), 53 deletions(-)
> > > 
> > > -- 
> > > 2.38.1
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Darrick J. Wong 3 days, 20 hours ago
On Tue, Nov 29, 2022 at 11:05:30PM -0800, Dan Williams wrote:
> Darrick J. Wong wrote:
> > On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote:
> > > [ add Andrew ]
> > > 
> > > Shiyang Ruan wrote:
> > > > Many testcases failed in dax+reflink mode with warning message in dmesg.
> > > > This also effects dax+noreflink mode if we run the test after a
> > > > dax+reflink test.  So, the most urgent thing is solving the warning
> > > > messages.
> > > > 
> > > > Patch 1 fixes some mistakes and adds handling of CoW cases not
> > > > previously considered (srcmap is HOLE or UNWRITTEN).
> > > > Patch 2 adds the implementation of unshare for fsdax.
> > > > 
> > > > With these fixes, most warning messages in dax_associate_entry() are
> > > > gone.  But honestly, generic/388 will randomly failed with the warning.
> > > > The case shutdown the xfs when fsstress is running, and do it for many
> > > > times.  I think the reason is that dax pages in use are not able to be
> > > > invalidated in time when fs is shutdown.  The next time dax page to be
> > > > associated, it still remains the mapping value set last time.  I'll keep
> > > > on solving it.
> > > > 
> > > > The warning message in dax_writeback_one() can also be fixed because of
> > > > the dax unshare.
> > > 
> > > Thank you for digging in on this, I had been pinned down on CXL tasks
> > > and worried that we would need to mark FS_DAX broken for a cycle, so
> > > this is timely.
> > > 
> > > My only concern is that these patches look to have significant collisions with
> > > the fsdax page reference counting reworks pending in linux-next. Although,
> > > those are still sitting in mm-unstable:
> > > 
> > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org
> > > 
> > > My preference would be to move ahead with both in which case I can help
> > > rebase these fixes on top. In that scenario everything would go through
> > > Andrew.
> > > 
> > > However, if we are getting too late in the cycle for that path I think
> > > these dax-fixes take precedence, and one more cycle to let the page
> > > reference count reworks sit is ok.
> > 
> > Well now that raises some interesting questions -- dax and reflink are
> > totally broken on 6.1.  I was thinking about cramming them into 6.2 as a
> > data corruption fix on the grounds that is not an acceptable state of
> > affairs.
> 
> I agree it's not an acceptable state of affairs, but for 6.1 the answer
> may be to just revert to dax+reflink being forbidden again. The fact
> that no end user has noticed is probably a good sign that we can disable
> that without any one screaming. That may be the easy answer for 6.2 as
> well given how late this all is.
> 
> > OTOH we're past -rc7, which is **really late** to be changing core code.
> > Then again, there aren't so many fsdax users and nobody's complained
> > about 6.0/6.1 being busted, so perhaps the risk of regression isn't so
> > bad?  Then again, that could be a sign that this could wait, if you and
> > Andrew are really eager to merge the reworks.
> 
> The page reference counting has also been languishing for a long time. A
> 6.2 merge would be nice, it relieves maintenance burden, but they do not
> start to have real end user implications until CXL memory hotplug
> platforms arrive and the warts in the reference counting start to show
> real problems in production.

Hm.  How bad *would* it be to rebase that patchset atop this one?

After overnight testing on -rc7 it looks like Ruan's patchset fixes all
the problems AFAICT.  Most of the remaining regressions are to mask off
fragmentation testing because fsdax cow (like the directio write paths)
doesn't make much use of extent size hints.

> > Just looking at the stuff that's still broken with dax+reflink -- I
> > noticed that xfs/550-552 (aka the dax poison tests) are still regressing
> > on reflink filesystems.
> 
> That's worrying because the whole point of reworking dax, xfs, and
> mm/memory-failure all at once was to handle the collision of poison and
> reflink'd dax files.

I just tried out -rc7 and all three pass, so disregard this please.

> > So, uh, what would this patchset need to change if the "fsdax page
> > reference counting reworks" were applied?  Would it be changing the page
> > refcount instead of stashing that in page->index?
> 
> Nah, it's things like switching from pages to folios and shifting how
> dax goes from pfns to pages.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=cca48ba3196
> 
> Ideally fsdax would never deal in pfns at all and do everything in terms
> of offsets relative to a 'struct dax_device'.
> 
> My gut is saying these patches, the refcount reworks, and the
> dax+reflink fixes, are important but not end user critical. One more
> status quo release does not hurt, and we can circle back to get this all
> straightened early in v6.3.

Being a data corruption fix, I don't see why we shouldn't revisit this
during the 6.2 cycle, even if it comes after merging the refcounting
stuff.

Question for Ruan: Would it be terribly difficult to push out a v2 with
the review comments applied so that we have something we can backport to
6.1; and then rebase the series atop 6.2-rc1 so we can apply it to
upstream (and then apply the 6.1 version to the LTS)?  Or is this too
convoluted...?

> I.e. just revert:
> 
> 35fcd75af3ed xfs: fail dax mount if reflink is enabled on a partition
> 
> ...for v6.1-rc8 and get back to this early in the New Year.

Hm.  Tempting.

--D

> > 
> > --D
> > 
> > > > Shiyang Ruan (2):
> > > >   fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
> > > >   fsdax,xfs: port unshare to fsdax
> > > > 
> > > >  fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
> > > >  fs/xfs/xfs_iomap.c   |   6 +-
> > > >  fs/xfs/xfs_reflink.c |   8 ++-
> > > >  include/linux/dax.h  |   2 +
> > > >  4 files changed, 129 insertions(+), 53 deletions(-)
> > > > 
> > > > -- 
> > > > 2.38.1
> 
>
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Shiyang Ruan 3 days, 2 hours ago

在 2022/12/1 5:08, Darrick J. Wong 写道:
> On Tue, Nov 29, 2022 at 11:05:30PM -0800, Dan Williams wrote:
>> Darrick J. Wong wrote:
>>> On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote:
>>>> [ add Andrew ]
>>>>
>>>> Shiyang Ruan wrote:
>>>>> Many testcases failed in dax+reflink mode with warning message in dmesg.
>>>>> This also effects dax+noreflink mode if we run the test after a
>>>>> dax+reflink test.  So, the most urgent thing is solving the warning
>>>>> messages.
>>>>>
>>>>> Patch 1 fixes some mistakes and adds handling of CoW cases not
>>>>> previously considered (srcmap is HOLE or UNWRITTEN).
>>>>> Patch 2 adds the implementation of unshare for fsdax.
>>>>>
>>>>> With these fixes, most warning messages in dax_associate_entry() are
>>>>> gone.  But honestly, generic/388 will randomly failed with the warning.
>>>>> The case shutdown the xfs when fsstress is running, and do it for many
>>>>> times.  I think the reason is that dax pages in use are not able to be
>>>>> invalidated in time when fs is shutdown.  The next time dax page to be
>>>>> associated, it still remains the mapping value set last time.  I'll keep
>>>>> on solving it.
>>>>>
>>>>> The warning message in dax_writeback_one() can also be fixed because of
>>>>> the dax unshare.
>>>>
>>>> Thank you for digging in on this, I had been pinned down on CXL tasks
>>>> and worried that we would need to mark FS_DAX broken for a cycle, so
>>>> this is timely.
>>>>
>>>> My only concern is that these patches look to have significant collisions with
>>>> the fsdax page reference counting reworks pending in linux-next. Although,
>>>> those are still sitting in mm-unstable:
>>>>
>>>> http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org
>>>>
>>>> My preference would be to move ahead with both in which case I can help
>>>> rebase these fixes on top. In that scenario everything would go through
>>>> Andrew.
>>>>
>>>> However, if we are getting too late in the cycle for that path I think
>>>> these dax-fixes take precedence, and one more cycle to let the page
>>>> reference count reworks sit is ok.
>>>
>>> Well now that raises some interesting questions -- dax and reflink are
>>> totally broken on 6.1.  I was thinking about cramming them into 6.2 as a
>>> data corruption fix on the grounds that is not an acceptable state of
>>> affairs.
>>
>> I agree it's not an acceptable state of affairs, but for 6.1 the answer
>> may be to just revert to dax+reflink being forbidden again. The fact
>> that no end user has noticed is probably a good sign that we can disable
>> that without any one screaming. That may be the easy answer for 6.2 as
>> well given how late this all is.
>>
>>> OTOH we're past -rc7, which is **really late** to be changing core code.
>>> Then again, there aren't so many fsdax users and nobody's complained
>>> about 6.0/6.1 being busted, so perhaps the risk of regression isn't so
>>> bad?  Then again, that could be a sign that this could wait, if you and
>>> Andrew are really eager to merge the reworks.
>>
>> The page reference counting has also been languishing for a long time. A
>> 6.2 merge would be nice, it relieves maintenance burden, but they do not
>> start to have real end user implications until CXL memory hotplug
>> platforms arrive and the warts in the reference counting start to show
>> real problems in production.
> 
> Hm.  How bad *would* it be to rebase that patchset atop this one?
> 
> After overnight testing on -rc7 it looks like Ruan's patchset fixes all
> the problems AFAICT.  Most of the remaining regressions are to mask off
> fragmentation testing because fsdax cow (like the directio write paths)
> doesn't make much use of extent size hints.
> 
>>> Just looking at the stuff that's still broken with dax+reflink -- I
>>> noticed that xfs/550-552 (aka the dax poison tests) are still regressing
>>> on reflink filesystems.
>>
>> That's worrying because the whole point of reworking dax, xfs, and
>> mm/memory-failure all at once was to handle the collision of poison and
>> reflink'd dax files.
> 
> I just tried out -rc7 and all three pass, so disregard this please.
> 
>>> So, uh, what would this patchset need to change if the "fsdax page
>>> reference counting reworks" were applied?  Would it be changing the page
>>> refcount instead of stashing that in page->index?
>>
>> Nah, it's things like switching from pages to folios and shifting how
>> dax goes from pfns to pages.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=cca48ba3196
>>
>> Ideally fsdax would never deal in pfns at all and do everything in terms
>> of offsets relative to a 'struct dax_device'.
>>
>> My gut is saying these patches, the refcount reworks, and the
>> dax+reflink fixes, are important but not end user critical. One more
>> status quo release does not hurt, and we can circle back to get this all
>> straightened early in v6.3.
> 
> Being a data corruption fix, I don't see why we shouldn't revisit this
> during the 6.2 cycle, even if it comes after merging the refcounting
> stuff.
> 
> Question for Ruan: Would it be terribly difficult to push out a v2 with
> the review comments applied so that we have something we can backport to
> 6.1; and then rebase the series atop 6.2-rc1 so we can apply it to
> upstream (and then apply the 6.1 version to the LTS)?  Or is this too
> convoluted...?

It's fine to me.  V2 has been posted just now.  The big patch has been 
separated.


--
Thanks,
Ruan.

> 
>> I.e. just revert:
>>
>> 35fcd75af3ed xfs: fail dax mount if reflink is enabled on a partition
>>
>> ...for v6.1-rc8 and get back to this early in the New Year.
> 
> Hm.  Tempting.
> 
> --D
> 
>>>
>>> --D
>>>
>>>>> Shiyang Ruan (2):
>>>>>    fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
>>>>>    fsdax,xfs: port unshare to fsdax
>>>>>
>>>>>   fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
>>>>>   fs/xfs/xfs_iomap.c   |   6 +-
>>>>>   fs/xfs/xfs_reflink.c |   8 ++-
>>>>>   include/linux/dax.h  |   2 +
>>>>>   4 files changed, 129 insertions(+), 53 deletions(-)
>>>>>
>>>>> -- 
>>>>> 2.38.1
>>
>>
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Darrick J. Wong 2 days, 23 hours ago
On Thu, Dec 01, 2022 at 11:39:12PM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2022/12/1 5:08, Darrick J. Wong 写道:
> > On Tue, Nov 29, 2022 at 11:05:30PM -0800, Dan Williams wrote:
> > > Darrick J. Wong wrote:
> > > > On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote:
> > > > > [ add Andrew ]
> > > > > 
> > > > > Shiyang Ruan wrote:
> > > > > > Many testcases failed in dax+reflink mode with warning message in dmesg.
> > > > > > This also effects dax+noreflink mode if we run the test after a
> > > > > > dax+reflink test.  So, the most urgent thing is solving the warning
> > > > > > messages.
> > > > > > 
> > > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not
> > > > > > previously considered (srcmap is HOLE or UNWRITTEN).
> > > > > > Patch 2 adds the implementation of unshare for fsdax.
> > > > > > 
> > > > > > With these fixes, most warning messages in dax_associate_entry() are
> > > > > > gone.  But honestly, generic/388 will randomly failed with the warning.
> > > > > > The case shutdown the xfs when fsstress is running, and do it for many
> > > > > > times.  I think the reason is that dax pages in use are not able to be
> > > > > > invalidated in time when fs is shutdown.  The next time dax page to be
> > > > > > associated, it still remains the mapping value set last time.  I'll keep
> > > > > > on solving it.
> > > > > > 
> > > > > > The warning message in dax_writeback_one() can also be fixed because of
> > > > > > the dax unshare.
> > > > > 
> > > > > Thank you for digging in on this, I had been pinned down on CXL tasks
> > > > > and worried that we would need to mark FS_DAX broken for a cycle, so
> > > > > this is timely.
> > > > > 
> > > > > My only concern is that these patches look to have significant collisions with
> > > > > the fsdax page reference counting reworks pending in linux-next. Although,
> > > > > those are still sitting in mm-unstable:
> > > > > 
> > > > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org
> > > > > 
> > > > > My preference would be to move ahead with both in which case I can help
> > > > > rebase these fixes on top. In that scenario everything would go through
> > > > > Andrew.
> > > > > 
> > > > > However, if we are getting too late in the cycle for that path I think
> > > > > these dax-fixes take precedence, and one more cycle to let the page
> > > > > reference count reworks sit is ok.
> > > > 
> > > > Well now that raises some interesting questions -- dax and reflink are
> > > > totally broken on 6.1.  I was thinking about cramming them into 6.2 as a
> > > > data corruption fix on the grounds that is not an acceptable state of
> > > > affairs.
> > > 
> > > I agree it's not an acceptable state of affairs, but for 6.1 the answer
> > > may be to just revert to dax+reflink being forbidden again. The fact
> > > that no end user has noticed is probably a good sign that we can disable
> > > that without any one screaming. That may be the easy answer for 6.2 as
> > > well given how late this all is.
> > > 
> > > > OTOH we're past -rc7, which is **really late** to be changing core code.
> > > > Then again, there aren't so many fsdax users and nobody's complained
> > > > about 6.0/6.1 being busted, so perhaps the risk of regression isn't so
> > > > bad?  Then again, that could be a sign that this could wait, if you and
> > > > Andrew are really eager to merge the reworks.
> > > 
> > > The page reference counting has also been languishing for a long time. A
> > > 6.2 merge would be nice, it relieves maintenance burden, but they do not
> > > start to have real end user implications until CXL memory hotplug
> > > platforms arrive and the warts in the reference counting start to show
> > > real problems in production.
> > 
> > Hm.  How bad *would* it be to rebase that patchset atop this one?
> > 
> > After overnight testing on -rc7 it looks like Ruan's patchset fixes all
> > the problems AFAICT.  Most of the remaining regressions are to mask off
> > fragmentation testing because fsdax cow (like the directio write paths)
> > doesn't make much use of extent size hints.
> > 
> > > > Just looking at the stuff that's still broken with dax+reflink -- I
> > > > noticed that xfs/550-552 (aka the dax poison tests) are still regressing
> > > > on reflink filesystems.
> > > 
> > > That's worrying because the whole point of reworking dax, xfs, and
> > > mm/memory-failure all at once was to handle the collision of poison and
> > > reflink'd dax files.
> > 
> > I just tried out -rc7 and all three pass, so disregard this please.
> > 
> > > > So, uh, what would this patchset need to change if the "fsdax page
> > > > reference counting reworks" were applied?  Would it be changing the page
> > > > refcount instead of stashing that in page->index?
> > > 
> > > Nah, it's things like switching from pages to folios and shifting how
> > > dax goes from pfns to pages.
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=cca48ba3196
> > > 
> > > Ideally fsdax would never deal in pfns at all and do everything in terms
> > > of offsets relative to a 'struct dax_device'.
> > > 
> > > My gut is saying these patches, the refcount reworks, and the
> > > dax+reflink fixes, are important but not end user critical. One more
> > > status quo release does not hurt, and we can circle back to get this all
> > > straightened early in v6.3.
> > 
> > Being a data corruption fix, I don't see why we shouldn't revisit this
> > during the 6.2 cycle, even if it comes after merging the refcounting
> > stuff.
> > 
> > Question for Ruan: Would it be terribly difficult to push out a v2 with
> > the review comments applied so that we have something we can backport to
> > 6.1; and then rebase the series atop 6.2-rc1 so we can apply it to
> > upstream (and then apply the 6.1 version to the LTS)?  Or is this too
> > convoluted...?
> 
> It's fine to me.  V2 has been posted just now.  The big patch has been
> separated.

Ok, thank you.

Since akpm/Dan aren't moving forward with the page refcounting changes
for 6.2, I think I'll try to merge these fixes for 6.2-rc1 without so
much rebasing. :)

--D

> 
> --
> Thanks,
> Ruan.
> 
> > 
> > > I.e. just revert:
> > > 
> > > 35fcd75af3ed xfs: fail dax mount if reflink is enabled on a partition
> > > 
> > > ...for v6.1-rc8 and get back to this early in the New Year.
> > 
> > Hm.  Tempting.
> > 
> > --D
> > 
> > > > 
> > > > --D
> > > > 
> > > > > > Shiyang Ruan (2):
> > > > > >    fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
> > > > > >    fsdax,xfs: port unshare to fsdax
> > > > > > 
> > > > > >   fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
> > > > > >   fs/xfs/xfs_iomap.c   |   6 +-
> > > > > >   fs/xfs/xfs_reflink.c |   8 ++-
> > > > > >   include/linux/dax.h  |   2 +
> > > > > >   4 files changed, 129 insertions(+), 53 deletions(-)
> > > > > > 
> > > > > > -- 
> > > > > > 2.38.1
> > > 
> > > 
Re: [PATCH 0/2] fsdax,xfs: fix warning messages #forregzbot
Posted by Thorsten Leemhuis 4 days, 7 hours ago
[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

On 24.11.22 15:54, Shiyang Ruan wrote:
> Many testcases failed in dax+reflink mode with warning message in dmesg.
> This also effects dax+noreflink mode if we run the test after a
> dax+reflink test.  So, the most urgent thing is solving the warning
> messages.

Darrick in https://lore.kernel.org/all/Y4bZGvP8Ozp+4De%2F@magnolia/
wrote "dax and reflink are totally broken on 6.1". Hence, add this to
the tracking to be sure it's not forgotten.

#regzbot ^introduced 35fcd75af3ed
#regzbot title xfs/dax/reflink are totally broken on 6.1
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Darrick J. Wong 6 days, 23 hours ago
On Thu, Nov 24, 2022 at 02:54:52PM +0000, Shiyang Ruan wrote:
> Many testcases failed in dax+reflink mode with warning message in dmesg.
> This also effects dax+noreflink mode if we run the test after a
> dax+reflink test.  So, the most urgent thing is solving the warning
> messages.
> 
> Patch 1 fixes some mistakes and adds handling of CoW cases not
> previously considered (srcmap is HOLE or UNWRITTEN).
> Patch 2 adds the implementation of unshare for fsdax.
> 
> With these fixes, most warning messages in dax_associate_entry() are
> gone.  But honestly, generic/388 will randomly failed with the warning.
> The case shutdown the xfs when fsstress is running, and do it for many
> times.  I think the reason is that dax pages in use are not able to be
> invalidated in time when fs is shutdown.  The next time dax page to be
> associated, it still remains the mapping value set last time.  I'll keep
> on solving it.
> 
> The warning message in dax_writeback_one() can also be fixed because of
> the dax unshare.

This cuts down the amount of test failures quite a bit, but I think
you're still missing a piece or two -- namely the part that refuses to
enable S_DAX mode on a reflinked file when the inode is being loaded
from disk.  However, thank you for fixing dax.c, because that was the
part I couldn't figure out at all. :)

--D

> 
> Shiyang Ruan (2):
>   fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
>   fsdax,xfs: port unshare to fsdax
> 
>  fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
>  fs/xfs/xfs_iomap.c   |   6 +-
>  fs/xfs/xfs_reflink.c |   8 ++-
>  include/linux/dax.h  |   2 +
>  4 files changed, 129 insertions(+), 53 deletions(-)
> 
> -- 
> 2.38.1
>
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Shiyang Ruan 6 days, 15 hours ago

在 2022/11/28 2:38, Darrick J. Wong 写道:
> On Thu, Nov 24, 2022 at 02:54:52PM +0000, Shiyang Ruan wrote:
>> Many testcases failed in dax+reflink mode with warning message in dmesg.
>> This also effects dax+noreflink mode if we run the test after a
>> dax+reflink test.  So, the most urgent thing is solving the warning
>> messages.
>>
>> Patch 1 fixes some mistakes and adds handling of CoW cases not
>> previously considered (srcmap is HOLE or UNWRITTEN).
>> Patch 2 adds the implementation of unshare for fsdax.
>>
>> With these fixes, most warning messages in dax_associate_entry() are
>> gone.  But honestly, generic/388 will randomly failed with the warning.
>> The case shutdown the xfs when fsstress is running, and do it for many
>> times.  I think the reason is that dax pages in use are not able to be
>> invalidated in time when fs is shutdown.  The next time dax page to be
>> associated, it still remains the mapping value set last time.  I'll keep
>> on solving it.
>>
>> The warning message in dax_writeback_one() can also be fixed because of
>> the dax unshare.
> 
> This cuts down the amount of test failures quite a bit, but I think
> you're still missing a piece or two -- namely the part that refuses to
> enable S_DAX mode on a reflinked file when the inode is being loaded
> from disk.  However, thank you for fixing dax.c, because that was the
> part I couldn't figure out at all. :)

I didn't include it[1] in this patchset...

[1] 
https://lore.kernel.org/linux-xfs/1663234002-17-1-git-send-email-ruansy.fnst@fujitsu.com/


--
Thanks,
Ruan.

> 
> --D
> 
>>
>> Shiyang Ruan (2):
>>    fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
>>    fsdax,xfs: port unshare to fsdax
>>
>>   fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
>>   fs/xfs/xfs_iomap.c   |   6 +-
>>   fs/xfs/xfs_reflink.c |   8 ++-
>>   include/linux/dax.h  |   2 +
>>   4 files changed, 129 insertions(+), 53 deletions(-)
>>
>> -- 
>> 2.38.1
>>
Re: [PATCH 0/2] fsdax,xfs: fix warning messages
Posted by Darrick J. Wong 5 days, 18 hours ago
On Mon, Nov 28, 2022 at 10:16:23AM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2022/11/28 2:38, Darrick J. Wong 写道:
> > On Thu, Nov 24, 2022 at 02:54:52PM +0000, Shiyang Ruan wrote:
> > > Many testcases failed in dax+reflink mode with warning message in dmesg.
> > > This also effects dax+noreflink mode if we run the test after a
> > > dax+reflink test.  So, the most urgent thing is solving the warning
> > > messages.
> > > 
> > > Patch 1 fixes some mistakes and adds handling of CoW cases not
> > > previously considered (srcmap is HOLE or UNWRITTEN).
> > > Patch 2 adds the implementation of unshare for fsdax.
> > > 
> > > With these fixes, most warning messages in dax_associate_entry() are
> > > gone.  But honestly, generic/388 will randomly failed with the warning.
> > > The case shutdown the xfs when fsstress is running, and do it for many
> > > times.  I think the reason is that dax pages in use are not able to be
> > > invalidated in time when fs is shutdown.  The next time dax page to be
> > > associated, it still remains the mapping value set last time.  I'll keep
> > > on solving it.
> > > 
> > > The warning message in dax_writeback_one() can also be fixed because of
> > > the dax unshare.
> > 
> > This cuts down the amount of test failures quite a bit, but I think
> > you're still missing a piece or two -- namely the part that refuses to
> > enable S_DAX mode on a reflinked file when the inode is being loaded
> > from disk.  However, thank you for fixing dax.c, because that was the
> > part I couldn't figure out at all. :)
> 
> I didn't include it[1] in this patchset...
> 
> [1] https://lore.kernel.org/linux-xfs/1663234002-17-1-git-send-email-ruansy.fnst@fujitsu.com/

Oh, ok.  I'll pull that one in.  All the remaining test failures seem to
be related to inode flag states or tests that trip over the lack of
delalloc on dax+reflink files.

--D

> 
> --
> Thanks,
> Ruan.
> 
> > 
> > --D
> > 
> > > 
> > > Shiyang Ruan (2):
> > >    fsdax,xfs: fix warning messages at dax_[dis]associate_entry()
> > >    fsdax,xfs: port unshare to fsdax
> > > 
> > >   fs/dax.c             | 166 ++++++++++++++++++++++++++++++-------------
> > >   fs/xfs/xfs_iomap.c   |   6 +-
> > >   fs/xfs/xfs_reflink.c |   8 ++-
> > >   include/linux/dax.h  |   2 +
> > >   4 files changed, 129 insertions(+), 53 deletions(-)
> > > 
> > > -- 
> > > 2.38.1
> > >