[PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers

Johannes Weiner posted 5 patches 9 months, 1 week ago
[PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers
Posted by Johannes Weiner 9 months, 1 week ago
compaction_suitable() hardcodes the min watermark, with a boost to the
low watermark for costly orders. However, compaction_ready() requires
order-0 at the high watermark. It currently checks the marks twice.

Make the watermark a parameter to compaction_suitable() and have the
callers pass in what they require:

- compaction_zonelist_suitable() is used by the direct reclaim path,
  so use the min watermark.

- compact_suit_allocation_order() has a watermark in context derived
  from cc->alloc_flags.

  The only quirk is that kcompactd doesn't initialize cc->alloc_flags
  explicitly. There is a direct check in kcompactd_do_work() that
  passes ALLOC_WMARK_MIN, but there is another check downstack in
  compact_zone() that ends up passing the unset alloc_flags. Since
  they default to 0, and that coincides with ALLOC_WMARK_MIN, it is
  correct. But it's subtle. Set cc->alloc_flags explicitly.

- should_continue_reclaim() is direct reclaim, use the min watermark.

- Finally, consolidate the two checks in compaction_ready() to a
  single compaction_suitable() call passing the high watermark.

  There is a tiny change in behavior: before, compaction_suitable()
  would check order-0 against min or low, depending on costly
  order. Then there'd be another high watermark check.

  Now, the high watermark is passed to compaction_suitable(), and the
  costly order-boost (low - min) is added on top. This means
  compaction_ready() sets a marginally higher target for free pages.

  In a kernelbuild + THP pressure test, though, this didn't show any
  measurable negative effects on memory pressure or reclaim rates. As
  the comment above the check says, reclaim is usually stopped short
  on should_continue_reclaim(), and this just defines the worst-case
  reclaim cutoff in case compaction is not making any headway.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/compaction.h |  5 ++--
 mm/compaction.c            | 52 ++++++++++++++++++++------------------
 mm/vmscan.c                | 26 ++++++++++---------
 3 files changed, 45 insertions(+), 38 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 7bf0c521db63..173d9c07a895 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -95,7 +95,7 @@ extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 		struct page **page);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern bool compaction_suitable(struct zone *zone, int order,
-					       int highest_zoneidx);
+				unsigned long watermark, int highest_zoneidx);
 
 extern void compaction_defer_reset(struct zone *zone, int order,
 				bool alloc_success);
@@ -113,7 +113,8 @@ static inline void reset_isolation_suitable(pg_data_t *pgdat)
 }
 
 static inline bool compaction_suitable(struct zone *zone, int order,
-						      int highest_zoneidx)
+				       unsigned long watermark,
+				       int highest_zoneidx)
 {
 	return false;
 }
diff --git a/mm/compaction.c b/mm/compaction.c
index 550ce5021807..036353ef1878 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2382,40 +2382,42 @@ static enum compact_result compact_finished(struct compact_control *cc)
 }
 
 static bool __compaction_suitable(struct zone *zone, int order,
-				  int highest_zoneidx,
-				  unsigned long wmark_target)
+				  unsigned long watermark, int highest_zoneidx,
+				  unsigned long free_pages)
 {
-	unsigned long watermark;
 	/*
 	 * Watermarks for order-0 must be met for compaction to be able to
 	 * isolate free pages for migration targets. This means that the
-	 * watermark and alloc_flags have to match, or be more pessimistic than
-	 * the check in __isolate_free_page(). We don't use the direct
-	 * compactor's alloc_flags, as they are not relevant for freepage
-	 * isolation. We however do use the direct compactor's highest_zoneidx
-	 * to skip over zones where lowmem reserves would prevent allocation
-	 * even if compaction succeeds.
-	 * For costly orders, we require low watermark instead of min for
-	 * compaction to proceed to increase its chances.
+	 * watermark have to match, or be more pessimistic than the check in
+	 * __isolate_free_page().
+	 *
+	 * For costly orders, we require a higher watermark for compaction to
+	 * proceed to increase its chances.
+	 *
+	 * We use the direct compactor's highest_zoneidx to skip over zones
+	 * where lowmem reserves would prevent allocation even if compaction
+	 * succeeds.
+	 *
 	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
-	 * suitable migration targets
+	 * suitable migration targets.
 	 */
-	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
-				low_wmark_pages(zone) : min_wmark_pages(zone);
 	watermark += compact_gap(order);
+	if (order > PAGE_ALLOC_COSTLY_ORDER)
+		watermark += low_wmark_pages(zone) - min_wmark_pages(zone);
 	return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
-				   ALLOC_CMA, wmark_target);
+				   ALLOC_CMA, free_pages);
 }
 
 /*
  * compaction_suitable: Is this suitable to run compaction on this zone now?
  */
-bool compaction_suitable(struct zone *zone, int order, int highest_zoneidx)
+bool compaction_suitable(struct zone *zone, int order, unsigned long watermark,
+			 int highest_zoneidx)
 {
 	enum compact_result compact_result;
 	bool suitable;
 
-	suitable = __compaction_suitable(zone, order, highest_zoneidx,
+	suitable = __compaction_suitable(zone, order, highest_zoneidx, watermark,
 					 zone_page_state(zone, NR_FREE_PAGES));
 	/*
 	 * fragmentation index determines if allocation failures are due to
@@ -2453,6 +2455,7 @@ bool compaction_suitable(struct zone *zone, int order, int highest_zoneidx)
 	return suitable;
 }
 
+/* Used by direct reclaimers */
 bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 		int alloc_flags)
 {
@@ -2475,8 +2478,8 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 		 */
 		available = zone_reclaimable_pages(zone) / order;
 		available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
-		if (__compaction_suitable(zone, order, ac->highest_zoneidx,
-					  available))
+		if (__compaction_suitable(zone, order, min_wmark_pages(zone),
+					  ac->highest_zoneidx, available))
 			return true;
 	}
 
@@ -2513,13 +2516,13 @@ compaction_suit_allocation_order(struct zone *zone, unsigned int order,
 	 */
 	if (order > PAGE_ALLOC_COSTLY_ORDER && async &&
 	    !(alloc_flags & ALLOC_CMA)) {
-		watermark = low_wmark_pages(zone) + compact_gap(order);
-		if (!__zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
-					   0, zone_page_state(zone, NR_FREE_PAGES)))
+		if (!__zone_watermark_ok(zone, 0, watermark + compact_gap(order),
+					 highest_zoneidx, 0,
+					 zone_page_state(zone, NR_FREE_PAGES)))
 			return COMPACT_SKIPPED;
 	}
 
-	if (!compaction_suitable(zone, order, highest_zoneidx))
+	if (!compaction_suitable(zone, order, watermark, highest_zoneidx))
 		return COMPACT_SKIPPED;
 
 	return COMPACT_CONTINUE;
@@ -3082,6 +3085,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 		.mode = MIGRATE_SYNC_LIGHT,
 		.ignore_skip_hint = false,
 		.gfp_mask = GFP_KERNEL,
+		.alloc_flags = ALLOC_WMARK_MIN,
 	};
 	enum compact_result ret;
 
@@ -3100,7 +3104,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 			continue;
 
 		ret = compaction_suit_allocation_order(zone,
-				cc.order, zoneid, ALLOC_WMARK_MIN,
+				cc.order, zoneid, cc.alloc_flags,
 				false);
 		if (ret != COMPACT_CONTINUE)
 			continue;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2bc740637a6c..3370bdca6868 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5890,12 +5890,15 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 
 	/* If compaction would go ahead or the allocation would succeed, stop */
 	for_each_managed_zone_pgdat(zone, pgdat, z, sc->reclaim_idx) {
+		unsigned long watermark = min_wmark_pages(zone);
+
 		/* Allocation can already succeed, nothing to do */
-		if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone),
+		if (zone_watermark_ok(zone, sc->order, watermark,
 				      sc->reclaim_idx, 0))
 			return false;
 
-		if (compaction_suitable(zone, sc->order, sc->reclaim_idx))
+		if (compaction_suitable(zone, sc->order, watermark,
+					sc->reclaim_idx))
 			return false;
 	}
 
@@ -6122,22 +6125,21 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
 			      sc->reclaim_idx, 0))
 		return true;
 
-	/* Compaction cannot yet proceed. Do reclaim. */
-	if (!compaction_suitable(zone, sc->order, sc->reclaim_idx))
-		return false;
-
 	/*
-	 * Compaction is already possible, but it takes time to run and there
-	 * are potentially other callers using the pages just freed. So proceed
-	 * with reclaim to make a buffer of free pages available to give
-	 * compaction a reasonable chance of completing and allocating the page.
+	 * Direct reclaim usually targets the min watermark, but compaction
+	 * takes time to run and there are potentially other callers using the
+	 * pages just freed. So target a higher buffer to give compaction a
+	 * reasonable chance of completing and allocating the pages.
+	 *
 	 * Note that we won't actually reclaim the whole buffer in one attempt
 	 * as the target watermark in should_continue_reclaim() is lower. But if
 	 * we are already above the high+gap watermark, don't reclaim at all.
 	 */
-	watermark = high_wmark_pages(zone) + compact_gap(sc->order);
+	watermark = high_wmark_pages(zone);
+	if (compaction_suitable(zone, sc->order, watermark, sc->reclaim_idx))
+		return true;
 
-	return zone_watermark_ok_safe(zone, 0, watermark, sc->reclaim_idx);
+	return false;
 }
 
 static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
-- 
2.48.1
Re: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers
Posted by Vlastimil Babka 8 months, 1 week ago
On 3/13/25 22:05, Johannes Weiner wrote:
> compaction_suitable() hardcodes the min watermark, with a boost to the
> low watermark for costly orders. However, compaction_ready() requires
> order-0 at the high watermark. It currently checks the marks twice.
> 
> Make the watermark a parameter to compaction_suitable() and have the
> callers pass in what they require:
> 
> - compaction_zonelist_suitable() is used by the direct reclaim path,
>   so use the min watermark.
> 
> - compact_suit_allocation_order() has a watermark in context derived
>   from cc->alloc_flags.
> 
>   The only quirk is that kcompactd doesn't initialize cc->alloc_flags
>   explicitly. There is a direct check in kcompactd_do_work() that
>   passes ALLOC_WMARK_MIN, but there is another check downstack in
>   compact_zone() that ends up passing the unset alloc_flags. Since
>   they default to 0, and that coincides with ALLOC_WMARK_MIN, it is
>   correct. But it's subtle. Set cc->alloc_flags explicitly.
> 
> - should_continue_reclaim() is direct reclaim, use the min watermark.
> 
> - Finally, consolidate the two checks in compaction_ready() to a
>   single compaction_suitable() call passing the high watermark.
> 
>   There is a tiny change in behavior: before, compaction_suitable()
>   would check order-0 against min or low, depending on costly
>   order. Then there'd be another high watermark check.
> 
>   Now, the high watermark is passed to compaction_suitable(), and the
>   costly order-boost (low - min) is added on top. This means
>   compaction_ready() sets a marginally higher target for free pages.
> 
>   In a kernelbuild + THP pressure test, though, this didn't show any
>   measurable negative effects on memory pressure or reclaim rates. As
>   the comment above the check says, reclaim is usually stopped short
>   on should_continue_reclaim(), and this just defines the worst-case
>   reclaim cutoff in case compaction is not making any headway.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

<snip>

> @@ -2513,13 +2516,13 @@ compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>  	 */
>  	if (order > PAGE_ALLOC_COSTLY_ORDER && async &&
>  	    !(alloc_flags & ALLOC_CMA)) {
> -		watermark = low_wmark_pages(zone) + compact_gap(order);
> -		if (!__zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
> -					   0, zone_page_state(zone, NR_FREE_PAGES)))
> +		if (!__zone_watermark_ok(zone, 0, watermark + compact_gap(order),
> +					 highest_zoneidx, 0,
> +					 zone_page_state(zone, NR_FREE_PAGES)))
>  			return COMPACT_SKIPPED;

The watermark here is no longer recalculated as low_wmark_pages() but the
value from above based on alloc_flags is reused.
It's probably ok, maybe it's even more correct, just wasn't mentioned in the
changelog as another tiny change of behavior so I wanted to point it out.
Re: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers
Posted by Johannes Weiner 8 months, 1 week ago
On Thu, Apr 10, 2025 at 05:19:06PM +0200, Vlastimil Babka wrote:
> On 3/13/25 22:05, Johannes Weiner wrote:
> > compaction_suitable() hardcodes the min watermark, with a boost to the
> > low watermark for costly orders. However, compaction_ready() requires
> > order-0 at the high watermark. It currently checks the marks twice.
> > 
> > Make the watermark a parameter to compaction_suitable() and have the
> > callers pass in what they require:
> > 
> > - compaction_zonelist_suitable() is used by the direct reclaim path,
> >   so use the min watermark.
> > 
> > - compact_suit_allocation_order() has a watermark in context derived
> >   from cc->alloc_flags.
> > 
> >   The only quirk is that kcompactd doesn't initialize cc->alloc_flags
> >   explicitly. There is a direct check in kcompactd_do_work() that
> >   passes ALLOC_WMARK_MIN, but there is another check downstack in
> >   compact_zone() that ends up passing the unset alloc_flags. Since
> >   they default to 0, and that coincides with ALLOC_WMARK_MIN, it is
> >   correct. But it's subtle. Set cc->alloc_flags explicitly.
> > 
> > - should_continue_reclaim() is direct reclaim, use the min watermark.
> > 
> > - Finally, consolidate the two checks in compaction_ready() to a
> >   single compaction_suitable() call passing the high watermark.
> > 
> >   There is a tiny change in behavior: before, compaction_suitable()
> >   would check order-0 against min or low, depending on costly
> >   order. Then there'd be another high watermark check.
> > 
> >   Now, the high watermark is passed to compaction_suitable(), and the
> >   costly order-boost (low - min) is added on top. This means
> >   compaction_ready() sets a marginally higher target for free pages.
> > 
> >   In a kernelbuild + THP pressure test, though, this didn't show any
> >   measurable negative effects on memory pressure or reclaim rates. As
> >   the comment above the check says, reclaim is usually stopped short
> >   on should_continue_reclaim(), and this just defines the worst-case
> >   reclaim cutoff in case compaction is not making any headway.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> <snip>
> 
> > @@ -2513,13 +2516,13 @@ compaction_suit_allocation_order(struct zone *zone, unsigned int order,
> >  	 */
> >  	if (order > PAGE_ALLOC_COSTLY_ORDER && async &&
> >  	    !(alloc_flags & ALLOC_CMA)) {
> > -		watermark = low_wmark_pages(zone) + compact_gap(order);
> > -		if (!__zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
> > -					   0, zone_page_state(zone, NR_FREE_PAGES)))
> > +		if (!__zone_watermark_ok(zone, 0, watermark + compact_gap(order),
> > +					 highest_zoneidx, 0,
> > +					 zone_page_state(zone, NR_FREE_PAGES)))
> >  			return COMPACT_SKIPPED;
> 
> The watermark here is no longer recalculated as low_wmark_pages() but the
> value from above based on alloc_flags is reused.
> It's probably ok, maybe it's even more correct, just wasn't mentioned in the
> changelog as another tiny change of behavior so I wanted to point it out.

Ah yes, it would have made sense to point out.

I was wondering about this check. It was introduced to bail on
compaction if there are not enough free non-CMA pages. But if there
are, we still fall through and check the superset of regular + CMA
pages against the watermarks as well. We know this will succeed, so
this seems moot.

It's also a little odd that compaction_suitable() hardcodes ALLOC_CMA
with the explanation that "CMA are migration targets", but then this
check says "actually, it doesn't help us if blocks are formed in CMA".

Does it make more sense to plumb alloc_flags to compaction_suitable()?

There is more head-scratching, though. The check is meant to test
whether compaction has a chance of forming non-CMA blocks. But free
pages are targets. You could have plenty of non-contiguous, free
non-CMA memory - compaction will then form blocks in CMA by moving CMA
pages into those non-CMA targets.

The longer I look at this, the more I feel like this just hard-coded
the very specific scenario the patch author had a problem with: CMA is
massive. The page allocator fills up regular memory first. Once
regular memory is full, non-CMA requests stall on compaction making
CMA blocks. So just bail on compaction then.

It's a valid problem, but I don't see how this code makes any general
sense outside of this exact sequence of events. Especially once
compaction has moved stuff around between regular and CMA memory, the
issue will be back, and the check does nothing to prevent it.
Re: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers
Posted by Vlastimil Babka 8 months, 1 week ago
On 4/10/25 22:17, Johannes Weiner wrote:
> On Thu, Apr 10, 2025 at 05:19:06PM +0200, Vlastimil Babka wrote:
>> On 3/13/25 22:05, Johannes Weiner wrote:
> 
> Ah yes, it would have made sense to point out.
> 
> I was wondering about this check. It was introduced to bail on
> compaction if there are not enough free non-CMA pages. But if there
> are, we still fall through and check the superset of regular + CMA
> pages against the watermarks as well. We know this will succeed, so
> this seems moot.

I guess we didn't want to avoid the fragindex part of compaction_suitable(),
which in theory may not succeed?

> It's also a little odd that compaction_suitable() hardcodes ALLOC_CMA
> with the explanation that "CMA are migration targets", but then this
> check says "actually, it doesn't help us if blocks are formed in CMA".

Hm yes.

> Does it make more sense to plumb alloc_flags to compaction_suitable()?

Possibly.

> There is more head-scratching, though. The check is meant to test
> whether compaction has a chance of forming non-CMA blocks. But free
> pages are targets. You could have plenty of non-contiguous, free
> non-CMA memory - compaction will then form blocks in CMA by moving CMA
> pages into those non-CMA targets.
> 
> The longer I look at this, the more I feel like this just hard-coded
> the very specific scenario the patch author had a problem with: CMA is
> massive. The page allocator fills up regular memory first. Once
> regular memory is full, non-CMA requests stall on compaction making
> CMA blocks. So just bail on compaction then.

Right.

> It's a valid problem, but I don't see how this code makes any general
> sense outside of this exact sequence of events. Especially once
> compaction has moved stuff around between regular and CMA memory, the
> issue will be back, and the check does nothing to prevent it.

Yeah, it seemed to fix a real problem and we both acked it :) but it's not
ideal.

Maybe the true solution (or a step towards it) would be for compaction for
!ALLOC_CMA only use non-CMA pageblocks as migration sources.
IMHO it's just another symptom of the general problem that CMA pageblocks
exist as part of a zone that's not otherwise ZONE_MOVABLE and suddenly the
watermarks have to depend on the allocation type. And of course for the
high-order allocations it doesn't just matter the amount of memory in cma vs
non-cma parts of the zone, but also its contiguity.
Re: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers
Posted by kernel test robot 9 months ago

Hello,

kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:

commit: 6304be90cf5460f33b031e1e19cbe7ffdcbc9f66 ("[PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers")
url: https://github.com/intel-lab-lkp/linux/commits/Johannes-Weiner/mm-compaction-push-watermark-into-compaction_suitable-callers/20250314-050839
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20250313210647.1314586-2-hannes@cmpxchg.org/
patch subject: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers

in testcase: trinity
version: trinity-x86_64-ba2360ed-1_20241228
with following parameters:

	runtime: 300s
	group: group-03
	nr_groups: 5



config: x86_64-kexec
compiler: clang-20
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+---------------------------------------------+------------+------------+
|                                             | 0174ed04ed | 6304be90cf |
+---------------------------------------------+------------+------------+
| BUG:unable_to_handle_page_fault_for_address | 0          | 5          |
| Oops                                        | 0          | 5          |
| RIP:__zone_watermark_ok                     | 0          | 5          |
+---------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202503201604.a3aa6a95-lkp@intel.com


[   24.321289][   T36] BUG: unable to handle page fault for address: ffff88844000c5f8
[   24.322631][   T36] #PF: supervisor read access in kernel mode
[   24.323577][   T36] #PF: error_code(0x0000) - not-present page
[   24.324482][   T36] PGD 3a01067 P4D 3a01067 PUD 0
[   24.325301][   T36] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[   24.326157][   T36] CPU: 1 UID: 0 PID: 36 Comm: kcompactd0 Not tainted 6.14.0-rc6-00559-g6304be90cf54 #1
[   24.327631][   T36] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 24.329194][ T36] RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) 
[ 24.330125][ T36] Code: 84 c0 78 14 4c 8b 97 48 06 00 00 45 31 db 4d 85 d2 4d 0f 4f da 4c 01 de 49 29 f1 41 f7 c0 38 02 00 00 0f 85 92 00 00 00 48 98 <48> 03 54 c7 38 49 39 d1 7e 7e b0 01 85 c9 74 7a 83 f9 0a 7f 73 48
All code
========
   0:	84 c0                	test   %al,%al
   2:	78 14                	js     0x18
   4:	4c 8b 97 48 06 00 00 	mov    0x648(%rdi),%r10
   b:	45 31 db             	xor    %r11d,%r11d
   e:	4d 85 d2             	test   %r10,%r10
  11:	4d 0f 4f da          	cmovg  %r10,%r11
  15:	4c 01 de             	add    %r11,%rsi
  18:	49 29 f1             	sub    %rsi,%r9
  1b:	41 f7 c0 38 02 00 00 	test   $0x238,%r8d
  22:	0f 85 92 00 00 00    	jne    0xba
  28:	48 98                	cltq
  2a:*	48 03 54 c7 38       	add    0x38(%rdi,%rax,8),%rdx		<-- trapping instruction
  2f:	49 39 d1             	cmp    %rdx,%r9
  32:	7e 7e                	jle    0xb2
  34:	b0 01                	mov    $0x1,%al
  36:	85 c9                	test   %ecx,%ecx
  38:	74 7a                	je     0xb4
  3a:	83 f9 0a             	cmp    $0xa,%ecx
  3d:	7f 73                	jg     0xb2
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 03 54 c7 38       	add    0x38(%rdi,%rax,8),%rdx
   5:	49 39 d1             	cmp    %rdx,%r9
   8:	7e 7e                	jle    0x88
   a:	b0 01                	mov    $0x1,%al
   c:	85 c9                	test   %ecx,%ecx
   e:	74 7a                	je     0x8a
  10:	83 f9 0a             	cmp    $0xa,%ecx
  13:	7f 73                	jg     0x88
  15:	48                   	rex.W
[   24.333001][   T36] RSP: 0018:ffffc90000137cd0 EFLAGS: 00010246
[   24.334003][   T36] RAX: 00000000000036a8 RBX: 0000000000000001 RCX: 0000000000000000
[   24.335270][   T36] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88843fff1080
[   24.336551][   T36] RBP: 0000000000000001 R08: 0000000000000080 R09: 0000000000003b14
[   24.337799][   T36] R10: 00000000000018b0 R11: 00000000000018b0 R12: 0000000000000001
[   24.339130][   T36] R13: 0000000000000000 R14: ffff88843fff1080 R15: 00000000000036a8
[   24.340412][   T36] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[   24.341739][   T36] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.342448][   T36] CR2: ffff88844000c5f8 CR3: 00000001bceba000 CR4: 00000000000406f0
[   24.343331][   T36] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   24.347498][   T36] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   24.348260][   T36] Call Trace:
[   24.348621][   T36]  <TASK>
[ 24.348958][ T36] ? __die_body (arch/x86/kernel/dumpstack.c:421) 
[ 24.349447][ T36] ? page_fault_oops (arch/x86/mm/fault.c:710) 
[ 24.350008][ T36] ? do_kern_addr_fault (arch/x86/mm/fault.c:1198) 
[ 24.350582][ T36] ? exc_page_fault (arch/x86/mm/fault.c:1479) 
[ 24.351065][ T36] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623) 
[ 24.351550][ T36] ? __zone_watermark_ok (mm/page_alloc.c:3256) 
[ 24.352049][ T36] compaction_suitable (mm/compaction.c:2407) 
[ 24.352532][ T36] compaction_suit_allocation_order (mm/compaction.c:?) 
[ 24.353127][ T36] kcompactd (mm/compaction.c:3109) 
[ 24.353618][ T36] kthread (kernel/kthread.c:466) 
[ 24.354105][ T36] ? __pfx_kcompactd (mm/compaction.c:3184) 
[ 24.354658][ T36] ? __pfx_kthread (kernel/kthread.c:413) 
[ 24.355121][ T36] ret_from_fork (arch/x86/kernel/process.c:154) 
[ 24.355567][ T36] ? __pfx_kthread (kernel/kthread.c:413) 
[ 24.356032][ T36] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) 
[   24.356505][   T36]  </TASK>
[   24.356837][   T36] Modules linked in: can_bcm can_raw can cn scsi_transport_iscsi sr_mod ipmi_msghandler cdrom sg ata_generic dm_mod fuse
[   24.358098][   T36] CR2: ffff88844000c5f8
[   24.358662][   T36] ---[ end trace 0000000000000000 ]---
[ 24.359178][ T36] RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) 
[ 24.359726][ T36] Code: 84 c0 78 14 4c 8b 97 48 06 00 00 45 31 db 4d 85 d2 4d 0f 4f da 4c 01 de 49 29 f1 41 f7 c0 38 02 00 00 0f 85 92 00 00 00 48 98 <48> 03 54 c7 38 49 39 d1 7e 7e b0 01 85 c9 74 7a 83 f9 0a 7f 73 48
All code
========
   0:	84 c0                	test   %al,%al
   2:	78 14                	js     0x18
   4:	4c 8b 97 48 06 00 00 	mov    0x648(%rdi),%r10
   b:	45 31 db             	xor    %r11d,%r11d
   e:	4d 85 d2             	test   %r10,%r10
  11:	4d 0f 4f da          	cmovg  %r10,%r11
  15:	4c 01 de             	add    %r11,%rsi
  18:	49 29 f1             	sub    %rsi,%r9
  1b:	41 f7 c0 38 02 00 00 	test   $0x238,%r8d
  22:	0f 85 92 00 00 00    	jne    0xba
  28:	48 98                	cltq
  2a:*	48 03 54 c7 38       	add    0x38(%rdi,%rax,8),%rdx		<-- trapping instruction
  2f:	49 39 d1             	cmp    %rdx,%r9
  32:	7e 7e                	jle    0xb2
  34:	b0 01                	mov    $0x1,%al
  36:	85 c9                	test   %ecx,%ecx
  38:	74 7a                	je     0xb4
  3a:	83 f9 0a             	cmp    $0xa,%ecx
  3d:	7f 73                	jg     0xb2
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 03 54 c7 38       	add    0x38(%rdi,%rax,8),%rdx
   5:	49 39 d1             	cmp    %rdx,%r9
   8:	7e 7e                	jle    0x88
   a:	b0 01                	mov    $0x1,%al
   c:	85 c9                	test   %ecx,%ecx
   e:	74 7a                	je     0x8a
  10:	83 f9 0a             	cmp    $0xa,%ecx
  13:	7f 73                	jg     0x88
  15:	48                   	rex.W


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250320/202503201604.a3aa6a95-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers
Posted by Johannes Weiner 9 months ago
On Fri, Mar 21, 2025 at 02:21:20PM +0800, kernel test robot wrote:
> commit: 6304be90cf5460f33b031e1e19cbe7ffdcbc9f66 ("[PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers")
> url: https://github.com/intel-lab-lkp/linux/commits/Johannes-Weiner/mm-compaction-push-watermark-into-compaction_suitable-callers/20250314-050839
> base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/all/20250313210647.1314586-2-hannes@cmpxchg.org/
> patch subject: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers

> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

> [   24.321289][   T36] BUG: unable to handle page fault for address: ffff88844000c5f8
> [   24.322631][   T36] #PF: supervisor read access in kernel mode
> [   24.323577][   T36] #PF: error_code(0x0000) - not-present page
> [   24.324482][   T36] PGD 3a01067 P4D 3a01067 PUD 0
> [   24.325301][   T36] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> [   24.326157][   T36] CPU: 1 UID: 0 PID: 36 Comm: kcompactd0 Not tainted 6.14.0-rc6-00559-g6304be90cf54 #1
> [   24.327631][   T36] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 24.329194][ T36] RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) 
> [ 24.330125][ T36] Code: 84 c0 78 14 4c 8b 97 48 06 00 00 45 31 db 4d 85 d2 4d 0f 4f da 4c 01 de 49 29 f1 41 f7 c0 38 02 00 00 0f 85 92 00 00 00 48 98 <48> 03 54 c7 38 49 39 d1 7e 7e b0 01 85 c9 74 7a 83 f9 0a 7f 73 48
> All code
> ========
>    0:	84 c0                	test   %al,%al
>    2:	78 14                	js     0x18
>    4:	4c 8b 97 48 06 00 00 	mov    0x648(%rdi),%r10
>    b:	45 31 db             	xor    %r11d,%r11d
>    e:	4d 85 d2             	test   %r10,%r10
>   11:	4d 0f 4f da          	cmovg  %r10,%r11
>   15:	4c 01 de             	add    %r11,%rsi
>   18:	49 29 f1             	sub    %rsi,%r9
>   1b:	41 f7 c0 38 02 00 00 	test   $0x238,%r8d
>   22:	0f 85 92 00 00 00    	jne    0xba
>   28:	48 98                	cltq
>   2a:*	48 03 54 c7 38       	add    0x38(%rdi,%rax,8),%rdx		<-- trapping instruction

That would be the zone->lowmem_reserve[highest_zoneidx] deref:

        long int                   lowmem_reserve[4];    /*  0x38  0x20 */

>   2f:	49 39 d1             	cmp    %rdx,%r9
>   32:	7e 7e                	jle    0xb2
>   34:	b0 01                	mov    $0x1,%al
>   36:	85 c9                	test   %ecx,%ecx
>   38:	74 7a                	je     0xb4
>   3a:	83 f9 0a             	cmp    $0xa,%ecx
>   3d:	7f 73                	jg     0xb2
>   3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	48 03 54 c7 38       	add    0x38(%rdi,%rax,8),%rdx
>    5:	49 39 d1             	cmp    %rdx,%r9
>    8:	7e 7e                	jle    0x88
>    a:	b0 01                	mov    $0x1,%al
>    c:	85 c9                	test   %ecx,%ecx
>    e:	74 7a                	je     0x8a
>   10:	83 f9 0a             	cmp    $0xa,%ecx
>   13:	7f 73                	jg     0x88
>   15:	48                   	rex.W
> [   24.333001][   T36] RSP: 0018:ffffc90000137cd0 EFLAGS: 00010246
> [   24.334003][   T36] RAX: 00000000000036a8 RBX: 0000000000000001 RCX: 0000000000000000
> [   24.335270][   T36] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88843fff1080

and %rax and %rdx look like the swapped watermark and zoneidx (36a8 is
14k pages, or 54M, which matches a min watermark on a 16G system).

So this is the bug that Hugh fixed here:

https://lore.kernel.org/all/005ace8b-07fa-01d4-b54b-394a3e029c07@google.com/

It's resolved in the latest version of the patch in -mm.
Re: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers
Posted by Hugh Dickins 9 months ago
[PATCH] mm: compaction: push watermark into compaction_suitable() callers fix

Stop oops on out-of-range highest_zoneidx: compaction_suitable() pass
args to __compaction_suitable() in the order which it is expecting.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/compaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 4a2ccb82d0b2..b5c9e8fd39f1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2433,7 +2433,7 @@ bool compaction_suitable(struct zone *zone, int order, unsigned long watermark,
 	enum compact_result compact_result;
 	bool suitable;
 
-	suitable = __compaction_suitable(zone, order, highest_zoneidx, watermark,
+	suitable = __compaction_suitable(zone, order, watermark, highest_zoneidx,
 					 zone_page_state(zone, NR_FREE_PAGES));
 	/*
 	 * fragmentation index determines if allocation failures are due to
-- 
2.43.0
Re: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers
Posted by Johannes Weiner 9 months ago
On Sat, Mar 15, 2025 at 09:28:16PM -0700, Hugh Dickins wrote:
> [PATCH] mm: compaction: push watermark into compaction_suitable() callers fix
> 
> Stop oops on out-of-range highest_zoneidx: compaction_suitable() pass
> args to __compaction_suitable() in the order which it is expecting.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  mm/compaction.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4a2ccb82d0b2..b5c9e8fd39f1 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2433,7 +2433,7 @@ bool compaction_suitable(struct zone *zone, int order, unsigned long watermark,
>  	enum compact_result compact_result;
>  	bool suitable;
>  
> -	suitable = __compaction_suitable(zone, order, highest_zoneidx, watermark,
> +	suitable = __compaction_suitable(zone, order, watermark, highest_zoneidx,
>  					 zone_page_state(zone, NR_FREE_PAGES));

Ouch, thanks for the fix Hugh.

This obviously didn't crash for me, but I re-ran the benchmarks with
your fix in my test environment.

This affects the direct compaction path, and I indeed see a minor
uptick in direct compaction, with a larger reduction in daemon work.

Compact daemon scanned migrate        2455570.93 (    +0.00%)   1770142.33 (   -27.91%)
Compact daemon scanned free           2429309.20 (    +0.00%)   1604744.00 (   -33.94%)
Compact direct scanned migrate          40136.60 (    +0.00%)     58326.67 (   +45.32%)
Compact direct scanned free             22127.13 (    +0.00%)     52216.93 (  +135.98%)
Compact total migrate scanned         2495707.53 (    +0.00%)   1828469.00 (   -26.74%)
Compact total free scanned            2451436.33 (    +0.00%)   1656960.93 (   -32.41%)

It doesn't change the overall A/B picture between baseline and the
series, so I'm comfortable keeping the current changelog results.
Re: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers
Posted by Zi Yan 9 months, 1 week ago
On 13 Mar 2025, at 17:05, Johannes Weiner wrote:

> compaction_suitable() hardcodes the min watermark, with a boost to the
> low watermark for costly orders. However, compaction_ready() requires
> order-0 at the high watermark. It currently checks the marks twice.
>
> Make the watermark a parameter to compaction_suitable() and have the
> callers pass in what they require:
>
> - compaction_zonelist_suitable() is used by the direct reclaim path,
>   so use the min watermark.
>
> - compact_suit_allocation_order() has a watermark in context derived
>   from cc->alloc_flags.
>
>   The only quirk is that kcompactd doesn't initialize cc->alloc_flags
>   explicitly. There is a direct check in kcompactd_do_work() that
>   passes ALLOC_WMARK_MIN, but there is another check downstack in
>   compact_zone() that ends up passing the unset alloc_flags. Since
>   they default to 0, and that coincides with ALLOC_WMARK_MIN, it is
>   correct. But it's subtle. Set cc->alloc_flags explicitly.
>
> - should_continue_reclaim() is direct reclaim, use the min watermark.
>
> - Finally, consolidate the two checks in compaction_ready() to a
>   single compaction_suitable() call passing the high watermark.
>
>   There is a tiny change in behavior: before, compaction_suitable()
>   would check order-0 against min or low, depending on costly
>   order. Then there'd be another high watermark check.
>
>   Now, the high watermark is passed to compaction_suitable(), and the
>   costly order-boost (low - min) is added on top. This means
>   compaction_ready() sets a marginally higher target for free pages.
>
>   In a kernelbuild + THP pressure test, though, this didn't show any
>   measurable negative effects on memory pressure or reclaim rates. As
>   the comment above the check says, reclaim is usually stopped short
>   on should_continue_reclaim(), and this just defines the worst-case
>   reclaim cutoff in case compaction is not making any headway.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/compaction.h |  5 ++--
>  mm/compaction.c            | 52 ++++++++++++++++++++------------------
>  mm/vmscan.c                | 26 ++++++++++---------
>  3 files changed, 45 insertions(+), 38 deletions(-)
>

The changes look good to me. Acked-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi