[PATCH] block/psi: make PSI annotations of submit_bio only work for file pages

cgel.zte@gmail.com posted 1 patch 4 years, 3 months ago
block/bio.c               | 9 +++++----
block/blk-core.c          | 2 +-
fs/direct-io.c            | 2 +-
include/linux/blk_types.h | 2 +-
4 files changed, 8 insertions(+), 7 deletions(-)
[PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
Posted by cgel.zte@gmail.com 4 years, 3 months ago
From: Yang Yang <yang.yang29@zte.com.cn>

psi tracks the time spent on submitting the IO of refaulting file pages
and anonymous pages[1]. But after we tracks refaulting anonymous pages
in swap_readpage[2][3], there is no need to track refaulting anonymous
pages in submit_bio.

So this patch can reduce redundant calling of psi_memstall_enter. And
make it easier to track refaulting file pages and anonymous pages
separately.

[1] commit b8e24a9300b0 ("block: annotate refault stalls from IO submission")
[2] commit 937790699be9 ("mm/page_io.c: annotate refault stalls from swap_readpage")
[3] commit 2b413a1a728f ("mm: page_io: fix psi memory pressure error on cold swapins")

Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
 block/bio.c               | 9 +++++----
 block/blk-core.c          | 2 +-
 fs/direct-io.c            | 2 +-
 include/linux/blk_types.h | 2 +-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3c57b3ba727d..54b60be4f3b0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1035,8 +1035,9 @@ void __bio_add_page(struct bio *bio, struct page *page,
 	bio->bi_iter.bi_size += len;
 	bio->bi_vcnt++;
 
-	if (!bio_flagged(bio, BIO_WORKINGSET) && unlikely(PageWorkingset(page)))
-		bio_set_flag(bio, BIO_WORKINGSET);
+	if (!bio_flagged(bio, BIO_WORKINGSET_FILE) &&
+	    unlikely(PageWorkingset(page)) && !PageSwapBacked(page))
+		bio_set_flag(bio, BIO_WORKINGSET_FILE);
 }
 EXPORT_SYMBOL_GPL(__bio_add_page);
 
@@ -1254,7 +1255,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
  * is returned only if 0 pages could be pinned.
  *
  * It's intended for direct IO, so doesn't do PSI tracking, the caller is
- * responsible for setting BIO_WORKINGSET if necessary.
+ * responsible for setting BIO_WORKINGSET_FILE if necessary.
  */
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
@@ -1274,7 +1275,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
 
 	/* don't account direct I/O as memory stall */
-	bio_clear_flag(bio, BIO_WORKINGSET);
+	bio_clear_flag(bio, BIO_WORKINGSET_FILE);
 	return bio->bi_vcnt ? 0 : ret;
 }
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
diff --git a/block/blk-core.c b/block/blk-core.c
index ddac62aebc55..9a955323734b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -918,7 +918,7 @@ void submit_bio(struct bio *bio)
 	 * part of overall IO time.
 	 */
 	if (unlikely(bio_op(bio) == REQ_OP_READ &&
-	    bio_flagged(bio, BIO_WORKINGSET))) {
+	    bio_flagged(bio, BIO_WORKINGSET_FILE))) {
 		unsigned long pflags;
 
 		psi_memstall_enter(&pflags);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index aef06e607b40..7cdec50fb27b 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -420,7 +420,7 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
 
 	bio->bi_private = dio;
 	/* don't account direct I/O as memory stall */
-	bio_clear_flag(bio, BIO_WORKINGSET);
+	bio_clear_flag(bio, BIO_WORKINGSET_FILE);
 
 	spin_lock_irqsave(&dio->bio_lock, flags);
 	dio->refcount++;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 0c7a9a1f06c8..a1aaba4767e9 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -314,7 +314,7 @@ enum {
 	BIO_NO_PAGE_REF,	/* don't put release vec pages */
 	BIO_CLONED,		/* doesn't own data */
 	BIO_BOUNCED,		/* bio is a bounce bio */
-	BIO_WORKINGSET,		/* contains userspace workingset pages */
+	BIO_WORKINGSET_FILE,	/* contains userspace workingset file pages */
 	BIO_QUIET,		/* Make BIO Quiet */
 	BIO_CHAIN,		/* chained bio, ->bi_remaining in effect */
 	BIO_REFFED,		/* bio has elevated ->bi_cnt */
-- 
2.25.1
Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
Posted by Christoph Hellwig 4 years, 3 months ago
> @@ -1035,8 +1035,9 @@ void __bio_add_page(struct bio *bio, struct page *page,
>  	bio->bi_iter.bi_size += len;
>  	bio->bi_vcnt++;
>  
> -	if (!bio_flagged(bio, BIO_WORKINGSET) && unlikely(PageWorkingset(page)))
> -		bio_set_flag(bio, BIO_WORKINGSET);
> +	if (!bio_flagged(bio, BIO_WORKINGSET_FILE) &&
> +	    unlikely(PageWorkingset(page)) && !PageSwapBacked(page))
> +		bio_set_flag(bio, BIO_WORKINGSET_FILE);

This needs to go out of the block I/O fast path, not grow even more
checks.
Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
Posted by CGEL 4 years, 3 months ago
On Wed, Mar 16, 2022 at 01:48:05AM -0700, Christoph Hellwig wrote:
> > @@ -1035,8 +1035,9 @@ void __bio_add_page(struct bio *bio, struct page *page,
> >  	bio->bi_iter.bi_size += len;
> >  	bio->bi_vcnt++;
> >  
> > -	if (!bio_flagged(bio, BIO_WORKINGSET) && unlikely(PageWorkingset(page)))
> > -		bio_set_flag(bio, BIO_WORKINGSET);
> > +	if (!bio_flagged(bio, BIO_WORKINGSET_FILE) &&
> > +	    unlikely(PageWorkingset(page)) && !PageSwapBacked(page))
> > +		bio_set_flag(bio, BIO_WORKINGSET_FILE);
> 
> This needs to go out of the block I/O fast path, not grow even more
> checks.

Thanks for your replying.

First, Johannes Weiner had made his state, see:
https://lore.kernel.org/all/Yio17pXawRuuVJFO@cmpxchg.org/

Second, I understand your concern, but actually there seems no better
way to do file pages workingset delay accounting, because this is no
unique function to track file pages submitting, kernel do this in
multiple sub-system.

Thirdly, this patch doesn't make it worse, indeed it reduce unnecessary
psi accounting in submit_bio.
Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
Posted by Johannes Weiner 4 years, 3 months ago
On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> From: Yang Yang <yang.yang29@zte.com.cn>
> 
> psi tracks the time spent on submitting the IO of refaulting file pages
> and anonymous pages[1]. But after we tracks refaulting anonymous pages
> in swap_readpage[2][3], there is no need to track refaulting anonymous
> pages in submit_bio.
> 
> So this patch can reduce redundant calling of psi_memstall_enter. And
> make it easier to track refaulting file pages and anonymous pages
> separately.

I don't think this is an improvement.

psi_memstall_enter() will check current->in_memstall once, detect the
nested call, and bail. Your patch checks PageSwapBacked for every page
being added. It's more branches for less robust code.
Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
Posted by CGEL 4 years, 3 months ago
On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > From: Yang Yang <yang.yang29@zte.com.cn>
> > 
> > psi tracks the time spent on submitting the IO of refaulting file pages
> > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > pages in submit_bio.
> > 
> > So this patch can reduce redundant calling of psi_memstall_enter. And
> > make it easier to track refaulting file pages and anonymous pages
> > separately.
> 
> I don't think this is an improvement.
> 
> psi_memstall_enter() will check current->in_memstall once, detect the
> nested call, and bail. Your patch checks PageSwapBacked for every page
> being added. It's more branches for less robust code.

We are also working for a new patch to classify different reasons cause
psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help
user to tuning sysctl, for example, if user see high compact delay, he
may try do adjust THP sysctl to reduce the compact delay.

To support that, we should distinguish what's the reason cause psi in
submit_io(), this patch does the job.
Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
Posted by Johannes Weiner 4 years, 3 months ago
On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote:
> On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > > From: Yang Yang <yang.yang29@zte.com.cn>
> > > 
> > > psi tracks the time spent on submitting the IO of refaulting file pages
> > > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > > pages in submit_bio.
> > > 
> > > So this patch can reduce redundant calling of psi_memstall_enter. And
> > > make it easier to track refaulting file pages and anonymous pages
> > > separately.
> > 
> > I don't think this is an improvement.
> > 
> > psi_memstall_enter() will check current->in_memstall once, detect the
> > nested call, and bail. Your patch checks PageSwapBacked for every page
> > being added. It's more branches for less robust code.
> 
> We are also working for a new patch to classify different reasons cause
> psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help
> user to tuning sysctl, for example, if user see high compact delay, he
> may try do adjust THP sysctl to reduce the compact delay.
> 
> To support that, we should distinguish what's the reason cause psi in
> submit_io(), this patch does the job.

Please submit these patches together then. On its own, this patch
isn't desirable.
Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
Posted by CGEL 4 years, 3 months ago
On Tue, Mar 22, 2022 at 09:27:58AM -0400, Johannes Weiner wrote:
> On Tue, Mar 22, 2022 at 02:47:42AM +0000, CGEL wrote:
> > On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> > > On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > > > From: Yang Yang <yang.yang29@zte.com.cn>
> > > > 
> > > > psi tracks the time spent on submitting the IO of refaulting file pages
> > > > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > > > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > > > pages in submit_bio.
> > > > 
> > > > So this patch can reduce redundant calling of psi_memstall_enter. And
> > > > make it easier to track refaulting file pages and anonymous pages
> > > > separately.
> > > 
> > > I don't think this is an improvement.
> > > 
> > > psi_memstall_enter() will check current->in_memstall once, detect the
> > > nested call, and bail. Your patch checks PageSwapBacked for every page
> > > being added. It's more branches for less robust code.
> > 
> > We are also working for a new patch to classify different reasons cause
> > psi_memstall_enter(): reclaim, thrashing, compact, etc. This will help
> > user to tuning sysctl, for example, if user see high compact delay, he
> > may try do adjust THP sysctl to reduce the compact delay.
> > 
> > To support that, we should distinguish what's the reason cause psi in
> > submit_io(), this patch does the job.
> 
> Please submit these patches together then. On its own, this patch
> isn't desirable.
I think this patch has it's independent value, I try to make a better
explain.

1) This patch doesn't work it worse or even better
After this patch, swap workingset handle is simpler, file workingset
handle just has one more check, as below.
Before this patch handling swap workingset:
	a) in swap_readpage() call psi_memstall_enter() ->
	b) in __bio_add_page() test if should call bio_set_flag(), true ->
	c) in __bio_add_page() call bio_set_flag()
	d) in submit_bio() test if should call psi_memstall_enter(), true ->
	e) call psi_memstall_enter, detect the nested call, and bail.
	f) call bio_clear_flag if needed.
Before this patch handling file page workingset:
	a) in __bio_add_page() test if should call bio_set_flag(), true ->
	...
	b) call bio_clear_flag if needed.
After this patch handling swap workingset:
	a) in swap_readpage() call psi_memstall_enter() ->
	b) in __bio_add_page() test if should call bio_set_flag(), one more check, false and return.
	c) in submit_bio() test if should call psi_memstall_enter(), false and return.
After this patch handling file pages workingset:
	a) in __bio_add_page() test if should call bio_set_flag(), one more check, true ->
	...
	b) call bio_clear_flag if needed.

2) This patch help tools like kprobe to trace different workingset
After this patch we know workingset in submit_io() is only cause by file pages.

3) This patch will help code evolution
Such as psi classify, getdelays supports counting file pages workingset submit.
Re: [PATCH] block/psi: make PSI annotations of submit_bio only work for file pages
Posted by CGEL 4 years, 3 months ago
On Mon, Mar 21, 2022 at 10:33:20AM -0400, Johannes Weiner wrote:
> On Wed, Mar 16, 2022 at 06:39:28AM +0000, cgel.zte@gmail.com wrote:
> > From: Yang Yang <yang.yang29@zte.com.cn>
> > 
> > psi tracks the time spent on submitting the IO of refaulting file pages
> > and anonymous pages[1]. But after we tracks refaulting anonymous pages
> > in swap_readpage[2][3], there is no need to track refaulting anonymous
> > pages in submit_bio.
> > 
> > So this patch can reduce redundant calling of psi_memstall_enter. And
> > make it easier to track refaulting file pages and anonymous pages
> > separately.
> 
> I don't think this is an improvement.
> 
> psi_memstall_enter() will check current->in_memstall once, detect the
> nested call, and bail. Your patch checks PageSwapBacked for every page
> being added. It's more branches for less robust code.

And PageSwapBacked checking is after unlikely(PageWorkingset(page), so I think
the impact is little.