[PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark

skseofh@gmail.com posted 1 patch 4 years, 5 months ago
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark
Posted by skseofh@gmail.com 4 years, 5 months ago
From: Daero Lee <skseofh@gmail.com>

In kswapd_try_to_sleep function, to check whether kswapd can sleep,
the prepare_kswapd_sleep function is called twice.

If free pages are below high-watermark in the first call,
the @remaining variable is not updated at 0 and the
prepare_kswapd_sleep function is called for the second time.

I think it is necessary to set the initial value of the
@remaining to a non-zero value to prevent consecutive calls
to the same function.

Signed-off-by: Daero Lee <skseofh@gmail.com>
---
 mm/vmscan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 700434db5735..1217ecec5bbb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
 	/*
 	 * Return the order kswapd stopped reclaiming at as
 	 * prepare_kswapd_sleep() takes it into account. If another caller
-	 * entered the allocator slow path while kswapd was awake, order will
+	 * entered the allqocator slow path while kswapd was awake, order will
 	 * remain at the higher level.
 	 */
 	return sc.order;
@@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
 static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
 				unsigned int highest_zoneidx)
 {
-	long remaining = 0;
+	long remaining = ~0;
 	DEFINE_WAIT(wait);
 
 	if (freezing(current) || kthread_should_stop())
-- 
2.25.1

Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark
Posted by Andrew Morton 4 years, 5 months ago
On Sun,  2 Jan 2022 12:31:29 +0900 skseofh@gmail.com wrote:

> From: Daero Lee <skseofh@gmail.com>
> 
> In kswapd_try_to_sleep function, to check whether kswapd can sleep,
> the prepare_kswapd_sleep function is called twice.
> 
> If free pages are below high-watermark in the first call,
> the @remaining variable is not updated at 0 and the
> prepare_kswapd_sleep function is called for the second time.
> 
> I think it is necessary to set the initial value of the
> @remaining to a non-zero value to prevent consecutive calls
> to the same function.

Seems right.  Mel & Joonsoo, could you please check?

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
>  	/*
>  	 * Return the order kswapd stopped reclaiming at as
>  	 * prepare_kswapd_sleep() takes it into account. If another caller
> -	 * entered the allocator slow path while kswapd was awake, order will
> +	 * entered the allqocator slow path while kswapd was awake, order will

whoops.

>  	 * remain at the higher level.
>  	 */
>  	return sc.order;
> @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
>  static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
>  				unsigned int highest_zoneidx)
>  {
> -	long remaining = 0;
> +	long remaining = ~0;
>  	DEFINE_WAIT(wait);
>  
>  	if (freezing(current) || kthread_should_stop())

Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark
Posted by Mel Gorman 4 years, 5 months ago
On Sun, Jan 02, 2022 at 12:31:29PM +0900, skseofh@gmail.com wrote:
> From: Daero Lee <skseofh@gmail.com>
> 
> In kswapd_try_to_sleep function, to check whether kswapd can sleep,
> the prepare_kswapd_sleep function is called twice.
> 
> If free pages are below high-watermark in the first call,
> the @remaining variable is not updated at 0 and the
> prepare_kswapd_sleep function is called for the second time.
> 
> I think it is necessary to set the initial value of the
> @remaining to a non-zero value to prevent consecutive calls
> to the same function.
> 
> Signed-off-by: Daero Lee <skseofh@gmail.com>
> ---
>  mm/vmscan.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 700434db5735..1217ecec5bbb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
>  	/*
>  	 * Return the order kswapd stopped reclaiming at as
>  	 * prepare_kswapd_sleep() takes it into account. If another caller
> -	 * entered the allocator slow path while kswapd was awake, order will
> +	 * entered the allqocator slow path while kswapd was awake, order will
>  	 * remain at the higher level.
>  	 */
>  	return sc.order;

This hunk just adds a typo, drop it.

> @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
>  static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
>  				unsigned int highest_zoneidx)
>  {
> -	long remaining = 0;
> +	long remaining = ~0;
>  	DEFINE_WAIT(wait);
>  
>  	if (freezing(current) || kthread_should_stop())

While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
is balanced on the first try, it then does not restore the vmstat
thresholds and doesn't call schedul() for kswapd to go to sleep.

I think you did spot a problem but I suspect you want something like
the following untested patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 700434db5735..40784693c840 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4355,7 +4355,8 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
 static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
 				unsigned int highest_zoneidx)
 {
-	long remaining = 0;
+	long remaining;
+	bool balanced;
 	DEFINE_WAIT(wait);
 
 	if (freezing(current) || kthread_should_stop())
@@ -4370,7 +4371,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
 	 * eligible zone balanced that it's also unlikely that compaction will
 	 * succeed.
 	 */
-	if (prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
+	balanced = prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx);
+	if (balanced) {
 		/*
 		 * Compaction records what page blocks it recently failed to
 		 * isolate pages from and skips them in the future scanning.
@@ -4387,6 +4389,10 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
 
 		remaining = schedule_timeout(HZ/10);
 
+		/* Is pgdat balanced after a short sleep? */
+		balanced = prepare_kswapd_sleep(pgdat, reclaim_order,
+							highest_zoneidx);
+
 		/*
 		 * If woken prematurely then reset kswapd_highest_zoneidx and
 		 * order. The values will either be from a wakeup request or
@@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
 	}
 
 	/*
-	 * After a short sleep, check if it was a premature sleep. If not, then
-	 * go fully to sleep until explicitly woken up.
+	 * If balanced to the high watermark, restore vmstat thresholds and
+	 * kswapd goes to sleep. If kswapd remains awake, account whether
+	 * the low or high watermark was hit quickly.
 	 */
-	if (!remaining &&
-	    prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
+	if (balanced) {
 		trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
 
 		/*
Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark
Posted by DaeRo Lee 4 years, 5 months ago
> > @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
> >  static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> >                               unsigned int highest_zoneidx)
> >  {
> > -     long remaining = 0;
> > +     long remaining = ~0;
> >       DEFINE_WAIT(wait);
> >
> >       if (freezing(current) || kthread_should_stop())
>
> While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
> is balanced on the first try, it then does not restore the vmstat
> thresholds and doesn't call schedul() for kswapd to go to sleep.

I intended not to call prepare_kswapd_sleep() twice when the pgdat is NOT
balanced on the first try:)

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 700434db5735..40784693c840 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4355,7 +4355,8 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
>  static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
>                                 unsigned int highest_zoneidx)
>  {
> -       long remaining = 0;
> +       long remaining;
> +       bool balanced;
>         DEFINE_WAIT(wait);
>
>         if (freezing(current) || kthread_should_stop())
> @@ -4370,7 +4371,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
>          * eligible zone balanced that it's also unlikely that compaction will
>          * succeed.
>          */
> -       if (prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
> +       balanced = prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx);
> +       if (balanced) {
>                 /*
>                  * Compaction records what page blocks it recently failed to
>                  * isolate pages from and skips them in the future scanning.
> @@ -4387,6 +4389,10 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
>
>                 remaining = schedule_timeout(HZ/10);
>
> +               /* Is pgdat balanced after a short sleep? */
> +               balanced = prepare_kswapd_sleep(pgdat, reclaim_order,
> +                                                       highest_zoneidx);
> +
>                 /*
>                  * If woken prematurely then reset kswapd_highest_zoneidx and
>                  * order. The values will either be from a wakeup request or
> @@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
>         }
>
>         /*
> -        * After a short sleep, check if it was a premature sleep. If not, then
> -        * go fully to sleep until explicitly woken up.
> +        * If balanced to the high watermark, restore vmstat thresholds and
> +        * kswapd goes to sleep. If kswapd remains awake, account whether
> +        * the low or high watermark was hit quickly.
>          */
> -       if (!remaining &&
> -           prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
> +       if (balanced) {
>                 trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
>
>                 /*

But, I think what you did is more readable and nice.
Thanks!

> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 700434db5735..1217ecec5bbb 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
> >       /*
> >        * Return the order kswapd stopped reclaiming at as
> >        * prepare_kswapd_sleep() takes it into account. If another caller
> > -      * entered the allocator slow path while kswapd was awake, order will
> > +      * entered the allqocator slow path while kswapd was awake, order will
> >        * remain at the higher level.
> >        */
> >       return sc.order;
>
> This hunk just adds a typo, drop it.

Sorry about that;;
Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark
Posted by Mel Gorman 4 years, 5 months ago
On Thu, Jan 06, 2022 at 09:03:34PM +0900, DaeRo Lee wrote:
> > > @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
> > >  static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> > >                               unsigned int highest_zoneidx)
> > >  {
> > > -     long remaining = 0;
> > > +     long remaining = ~0;
> > >       DEFINE_WAIT(wait);
> > >
> > >       if (freezing(current) || kthread_should_stop())
> >
> > While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
> > is balanced on the first try, it then does not restore the vmstat
> > thresholds and doesn't call schedul() for kswapd to go to sleep.
> 
> I intended not to call prepare_kswapd_sleep() twice when the pgdat is NOT
> balanced on the first try:)
> 

Stupid typo on my part.

> > @@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
> >         }
> >
> >         /*
> > -        * After a short sleep, check if it was a premature sleep. If not, then
> > -        * go fully to sleep until explicitly woken up.
> > +        * If balanced to the high watermark, restore vmstat thresholds and
> > +        * kswapd goes to sleep. If kswapd remains awake, account whether
> > +        * the low or high watermark was hit quickly.
> >          */
> > -       if (!remaining &&
> > -           prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
> > +       if (balanced) {
> >                 trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
> >
> >                 /*
> 
> But, I think what you did is more readable and nice.
> Thanks!
> 

Feel free to pick it up, rerun your tests to ensure it's behaving as
expected and resend! Include something in the changelog about user-visible
effects if any (or a note saying that it reduces unnecssary overhead)
and resend with me added to the cc.

> > > @@ -4331,7 +4331,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
> > >       /*
> > >        * Return the order kswapd stopped reclaiming at as
> > >        * prepare_kswapd_sleep() takes it into account. If another caller
> > > -      * entered the allocator slow path while kswapd was awake, order will
> > > +      * entered the allqocator slow path while kswapd was awake, order will
> > >        * remain at the higher level.
> > >        */
> > >       return sc.order;
> >
> > This hunk just adds a typo, drop it.
> 
> Sorry about that;;

No need to be sorry, it happens :)

Thanks!

-- 
Mel Gorman
SUSE Labs
Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark
Posted by Wei Yang 4 years, 3 months ago
On Thu, Jan 06, 2022 at 12:57:58PM +0000, Mel Gorman wrote:
>On Thu, Jan 06, 2022 at 09:03:34PM +0900, DaeRo Lee wrote:
>> > > @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
>> > >  static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
>> > >                               unsigned int highest_zoneidx)
>> > >  {
>> > > -     long remaining = 0;
>> > > +     long remaining = ~0;
>> > >       DEFINE_WAIT(wait);
>> > >
>> > >       if (freezing(current) || kthread_should_stop())
>> >
>> > While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
>> > is balanced on the first try, it then does not restore the vmstat
>> > thresholds and doesn't call schedul() for kswapd to go to sleep.
>> 
>> I intended not to call prepare_kswapd_sleep() twice when the pgdat is NOT
>> balanced on the first try:)
>> 
>
>Stupid typo on my part.
>
>> > @@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
>> >         }
>> >
>> >         /*
>> > -        * After a short sleep, check if it was a premature sleep. If not, then
>> > -        * go fully to sleep until explicitly woken up.
>> > +        * If balanced to the high watermark, restore vmstat thresholds and
>> > +        * kswapd goes to sleep. If kswapd remains awake, account whether
>> > +        * the low or high watermark was hit quickly.
>> >          */
>> > -       if (!remaining &&
>> > -           prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
>> > +       if (balanced) {
>> >                 trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
>> >
>> >                 /*
>> 
>> But, I think what you did is more readable and nice.
>> Thanks!
>> 
>
>Feel free to pick it up, rerun your tests to ensure it's behaving as
>expected and resend! Include something in the changelog about user-visible
>effects if any (or a note saying that it reduces unnecssary overhead)
>and resend with me added to the cc.
>

Hi, All

Seems this thread stops here. I don't see following patch and current upstream
doesn't include this change.

May I continue this? Of course, with author-ship from DaeRo Lee <skseofh@gmail.com>.

Mel,

Would you mind suggesting some cases that I could do to see the effects from
this change? Such as the overhead or throughput? Or what cases you expect?


-- 
Wei Yang
Help you, Help me
Re: [PATCH] mm/vmscan.c: no need to double-check if free pages are under high-watermark
Posted by Mel Gorman 4 years, 2 months ago
On Sat, Mar 26, 2022 at 03:50:22PM +0000, Wei Yang wrote:
> On Thu, Jan 06, 2022 at 12:57:58PM +0000, Mel Gorman wrote:
> >On Thu, Jan 06, 2022 at 09:03:34PM +0900, DaeRo Lee wrote:
> >> > > @@ -4355,7 +4355,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat,
> >> > >  static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> >> > >                               unsigned int highest_zoneidx)
> >> > >  {
> >> > > -     long remaining = 0;
> >> > > +     long remaining = ~0;
> >> > >       DEFINE_WAIT(wait);
> >> > >
> >> > >       if (freezing(current) || kthread_should_stop())
> >> >
> >> > While this does avoid calling prepare_kswapd_sleep() twice if the pgdat
> >> > is balanced on the first try, it then does not restore the vmstat
> >> > thresholds and doesn't call schedul() for kswapd to go to sleep.
> >> 
> >> I intended not to call prepare_kswapd_sleep() twice when the pgdat is NOT
> >> balanced on the first try:)
> >> 
> >
> >Stupid typo on my part.
> >
> >> > @@ -4406,11 +4412,11 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
> >> >         }
> >> >
> >> >         /*
> >> > -        * After a short sleep, check if it was a premature sleep. If not, then
> >> > -        * go fully to sleep until explicitly woken up.
> >> > +        * If balanced to the high watermark, restore vmstat thresholds and
> >> > +        * kswapd goes to sleep. If kswapd remains awake, account whether
> >> > +        * the low or high watermark was hit quickly.
> >> >          */
> >> > -       if (!remaining &&
> >> > -           prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
> >> > +       if (balanced) {
> >> >                 trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
> >> >
> >> >                 /*
> >> 
> >> But, I think what you did is more readable and nice.
> >> Thanks!
> >> 
> >
> >Feel free to pick it up, rerun your tests to ensure it's behaving as
> >expected and resend! Include something in the changelog about user-visible
> >effects if any (or a note saying that it reduces unnecssary overhead)
> >and resend with me added to the cc.
> >
> 
> Hi, All
> 
> Seems this thread stops here. I don't see following patch and current upstream
> doesn't include this change.
> 
> May I continue this? Of course, with author-ship from DaeRo Lee <skseofh@gmail.com>.
> 

I've no objections. When I said "Feel free to pick it up", I meant that
I was ok with you taking the patch and putting your team on it.

> Mel,
> 
> Would you mind suggesting some cases that I could do to see the effects from
> this change? Such as the overhead or throughput? Or what cases you expect?
> 

I don't have any suggestions on artificially triggering it. I had assumed
you had encountered the bug in practice and had a test case but it would
be ok to note that the patch is a theoretical fix based on code review.

-- 
Mel Gorman
SUSE Labs