From nobody Sat Jun 13 19:02:40 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 702B626AC3 for ; Wed, 6 May 2026 03:33:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778038391; cv=none; b=YpUF7UbjQsVr+VTVsU0Ep06DbN8eU2sdK+sQSO1MGGnc/si7ikQPPf2EsN5SLh+4XErCRth3Np6+4WvRqprI6sIpx0BfzehKxUMzyeK5sFfG3LjZr0drPSy0JwZC66XRPeSJK3cTeqjljvHu+CDAOg0+40iBbm9Is5PnjqpJkgc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778038391; c=relaxed/simple; bh=81vapQm0G2+s7Q9rjkTUxp8N8diOM+t3xvAi5oamm8g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=QdkIXpgs0VU+Xe1GkAUeW5N8JYnjC/1VyyI1jIJgRRLuqfJwmhx9AYW4v9IS3cw9/degaFHOF2AQG06xqW4zfod35VM4qqhcO6Wvz5IQ4gwU7MgkEerMQlocxAMeCAZGJBN/Xw3imELB/MGtO6aq/S4IsvTk4ph4t7Lt48rJvLI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XCyn4vWL; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XCyn4vWL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778038389; x=1809574389; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=81vapQm0G2+s7Q9rjkTUxp8N8diOM+t3xvAi5oamm8g=; b=XCyn4vWLXae1sALQMOQtlVuVtQjoOTnnXp7pAFUDYLytddPtaeV2EPLv bm+e8Jm60GSCLkdz7u44VXXvBFmmGUCHsUSY/f+kdQXileJ4p1zv+rqo3 vVPOs8KX0iZJsVnRYGDBDCB5S/EvFtAPY1dE4pWo7BBqfWR1IpTz7Od55 lsz/0qTDIfanAsMfceetB6Fi+t3VkYBmQciW7g0jIqdjz3uMgwtQZc0Ak vUVWDvb31cyOfdhH/GqfqwPjD0KMS/1RQbB3lEuPj72svHZxqx6qypaxD EofcjuPMOvdOt5i7M+sr1tynqxyUENMjlztRB2d+i7MXvmsKl/GtbWWnt w==; X-CSE-ConnectionGUID: EH2nrLaNQHiPzBXYiOOzuA== X-CSE-MsgGUID: vEX/f4pHR0KyjcooGqiZpg== X-IronPort-AV: E=McAfee;i="6800,10657,11777"; a="78829027" X-IronPort-AV: E=Sophos;i="6.23,218,1770624000"; d="scan'208";a="78829027" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2026 20:33:07 -0700 X-CSE-ConnectionGUID: nIPvi29BS/e4PKxK3VvyMQ== X-CSE-MsgGUID: 7ksHOWx2SLuT43UQAcKJtw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,218,1770624000"; d="scan'208";a="266342146" Received: from gsse-cloud1.jf.intel.com ([10.54.39.91]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2026 20:33:06 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org Cc: Andrew Morton , Dave Chinner , Qi Zheng , Roman Gushchin , Muchun Song , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Johannes Weiner , Shakeel Butt , Kairui Song , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org, =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= Subject: [PATCH v5 1/5] mm: Wire up order in shrink_control Date: Tue, 5 May 2026 20:32:56 -0700 Message-Id: <20260506033300.3534883-2-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260506033300.3534883-1-matthew.brost@intel.com> References: <20260506033300.3534883-1-matthew.brost@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Pass the allocation order through shrink_control so shrinkers have visibility into the order that triggered reclaim. This allows shrinkers to implement better heuristics, such as detecting high-order allocation pressure or fragmentation and avoiding eviction of working sets when reclaim is invoked from kswapd. Cc: Andrew Morton Cc: Dave Chinner Cc: Qi Zheng Cc: Roman Gushchin Cc: Muchun Song Cc: David Hildenbrand Cc: Lorenzo Stoakes Cc: "Liam R. Howlett" Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Michal Hocko Cc: Johannes Weiner Cc: Shakeel Butt Cc: Kairui Song Cc: Barry Song Cc: Axel Rasmussen Cc: Yuanchu Xie Cc: Wei Xu Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Suggested-by: Thomas Hellstr=C3=B6m Signed-off-by: Matthew Brost --- include/linux/shrinker.h | 3 +++ mm/internal.h | 4 ++-- mm/shrinker.c | 13 ++++++++----- mm/vmscan.c | 7 ++++--- 4 files changed, 17 insertions(+), 10 deletions(-) diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 1a00be90d93a..7072f693b9be 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -37,6 +37,9 @@ struct shrink_control { /* current node being shrunk (for NUMA aware shrinkers) */ int nid; =20 + /* Allocation order we are currently trying to fulfil. */ + s8 order; + /* * How many objects scan_objects should scan and try to reclaim. * This is reset before every call, so it is safe for callees diff --git a/mm/internal.h b/mm/internal.h index 5a2ddcf68e0b..ff8671dccf7b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1759,8 +1759,8 @@ void __meminit __init_single_page(struct page *page, = unsigned long pfn, void __meminit __init_page_from_nid(unsigned long pfn, int nid); =20 /* shrinker related functions */ -unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memc= g, - int priority); +unsigned long shrink_slab(gfp_t gfp_mask, int nid, s8 order, + struct mem_cgroup *memcg, int priority); =20 int shmem_add_to_page_cache(struct folio *folio, struct address_space *mapping, diff --git a/mm/shrinker.c b/mm/shrinker.c index 76b3f750cf65..c83f3b3daa08 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -466,7 +466,7 @@ static unsigned long do_shrink_slab(struct shrink_contr= ol *shrinkctl, } =20 #ifdef CONFIG_MEMCG -static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, +static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, s8 order, struct mem_cgroup *memcg, int priority) { struct shrinker_info *info; @@ -528,6 +528,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, = int nid, struct shrink_control sc =3D { .gfp_mask =3D gfp_mask, .nid =3D nid, + .order =3D order, .memcg =3D memcg, }; struct shrinker *shrinker; @@ -587,7 +588,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, = int nid, return freed; } #else /* !CONFIG_MEMCG */ -static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, +static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, s8 order, struct mem_cgroup *memcg, int priority) { return 0; @@ -598,6 +599,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, = int nid, * shrink_slab - shrink slab caches * @gfp_mask: allocation context * @nid: node whose slab caches to target + * @order: order of allocation * @memcg: memory cgroup whose slab caches to target * @priority: the reclaim priority * @@ -614,8 +616,8 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, = int nid, * * Returns the number of reclaimed slab objects. */ -unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memc= g, - int priority) +unsigned long shrink_slab(gfp_t gfp_mask, int nid, s8 order, + struct mem_cgroup *memcg, int priority) { unsigned long ret, freed =3D 0; struct shrinker *shrinker; @@ -628,7 +630,7 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, stru= ct mem_cgroup *memcg, * oom. */ if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) - return shrink_slab_memcg(gfp_mask, nid, memcg, priority); + return shrink_slab_memcg(gfp_mask, nid, order, memcg, priority); =20 /* * lockless algorithm of global shrink. @@ -656,6 +658,7 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, stru= ct mem_cgroup *memcg, struct shrink_control sc =3D { .gfp_mask =3D gfp_mask, .nid =3D nid, + .order =3D order, .memcg =3D memcg, }; =20 diff --git a/mm/vmscan.c b/mm/vmscan.c index bd1b1aa12581..a54d14ecad25 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -412,7 +412,7 @@ static unsigned long drop_slab_node(int nid) =20 memcg =3D mem_cgroup_iter(NULL, NULL, NULL); do { - freed +=3D shrink_slab(GFP_KERNEL, nid, memcg, 0); + freed +=3D shrink_slab(GFP_KERNEL, nid, 0, memcg, 0); } while ((memcg =3D mem_cgroup_iter(NULL, memcg, NULL)) !=3D NULL); =20 return freed; @@ -5068,7 +5068,8 @@ static int shrink_one(struct lruvec *lruvec, struct s= can_control *sc) =20 success =3D try_to_shrink_lruvec(lruvec, sc); =20 - shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority); + shrink_slab(sc->gfp_mask, pgdat->node_id, sc->order, memcg, + sc->priority); =20 if (!sc->proactive) vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned, @@ -6170,7 +6171,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, stru= ct scan_control *sc) =20 shrink_lruvec(lruvec, sc); =20 - shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, + shrink_slab(sc->gfp_mask, pgdat->node_id, sc->order, memcg, sc->priority); =20 /* Record the group's reclaim efficiency */ --=20 2.34.1 From nobody Sat Jun 13 19:02:40 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F319C28B4FA for ; Wed, 6 May 2026 03:33:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778038391; cv=none; b=DRfuM6iXi4iv/C/LXoyv1BH115tLNVwCYCAicuVNyNGIpdlYE2VpxCceBFsNAP2YAXrJHhzU/2kQ3LBIodTtQ4rPq/8ip5UmQ1+/ChY8+lvtSIuJjh7froFusniCZ7aYWsVihHaOBmhaDATVa25wm1LxnoSWkE5Oia2ybKUJEzQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778038391; c=relaxed/simple; bh=aHVNymQuQK1hQF9n5owlycgqWuDIDx54Dow9YV/hQrc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Mq/TZ16zswTnJZBG2v3ca2kni5nBin/FKKWvMnI2v/EATbewYqqtJIIjdqpBUFANxZJhd8rzOQv9Zw99lEh176bu3sHEcqt2zwS9e+txOceYcGXhpsnmtuIRqUg/uucze+ZOmmcCzmBSLBbCm2oZfTrII/Sc55JRelzRf4pJa94= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=j5npjMKK; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="j5npjMKK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778038390; x=1809574390; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aHVNymQuQK1hQF9n5owlycgqWuDIDx54Dow9YV/hQrc=; b=j5npjMKKNqr6n9SVpYzUtRlqXamieL7S9jKJ32gyu6EgR1p9tgM4/lt6 ++BhpDouZ1OODY1k89w47un6HsrWArlVqmChQv3o/16L2dYk3vSB1tGW1 +1sfeSNj6iZzCQyTzhwipXCGUs841hRiGZs49yacYaHARDWtSak24aBiV n+QPJD4ihT1gNssRfya5MvTnbIfFSkrZJNiiy+Lz8qWmIUkCjC8mGtcAU TC0zkZ1ieWM57FwuUpLxWKyrFtOtyb3QdC5IFoMlJ9x36FaPswkZOw/nd LFUAToKHuHJsbSQ2+BaU0M1htsd8n/MtFfhyIToOHmCqWdXoyliP77QtT Q==; X-CSE-ConnectionGUID: f1XppeBeR6yny2e6EgjLoQ== X-CSE-MsgGUID: k8pWeelwTmmmXJzBWsqwZw== X-IronPort-AV: E=McAfee;i="6800,10657,11777"; a="78829041" X-IronPort-AV: E=Sophos;i="6.23,218,1770624000"; d="scan'208";a="78829041" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2026 20:33:07 -0700 X-CSE-ConnectionGUID: qPgFxjMzSgKQBpZL9lINlg== X-CSE-MsgGUID: VGh6WgVBRTmNHKrHDo5xFw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,218,1770624000"; d="scan'208";a="266342147" Received: from gsse-cloud1.jf.intel.com ([10.54.39.91]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2026 20:33:06 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org Cc: Andrew Morton , Dave Chinner , Qi Zheng , Roman Gushchin , Muchun Song , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Johannes Weiner , Shakeel Butt , Kairui Song , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 2/5] mm: Introduce opportunistic_compaction concept to vmscan and shrinkers Date: Tue, 5 May 2026 20:32:57 -0700 Message-Id: <20260506033300.3534883-3-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260506033300.3534883-1-matthew.brost@intel.com> References: <20260506033300.3534883-1-matthew.brost@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" High-order allocations using __GFP_NORETRY or __GFP_RETRY_MAYFAIL are often opportunistic attempts to satisfy fragmentation-sensitive allocations rather than indications of severe memory pressure. In these cases, kswapd reclaim may invoke shrinkers that aggressively destroy working sets even though reclaim is unlikely to materially improve the allocation outcome. Some shrinkers manage expensive backing or migration operations where reclaim can result in substantial working set disruption despite the system having sufficient free memory overall. This is particularly visible in fragmentation-heavy workloads where reclaim repeatedly tears down active state while kswapd attempts to satisfy higher-order allocations. Introduce an opportunistic_compaction hint in shrink_control that allows kswapd to communicate when reclaim originates from a high-order allocation context that may be fragmentation driven rather than true memory pressure. Shrinkers may use this hint to avoid destructive working set reclaim while still participating normally during order-0 or stronger reclaim conditions. The hint is propagated through shrink_slab() and derived from high-order kswapd wakeups associated with non-failing allocation contexts. No functional changes are introduced for existing shrinkers. Cc: Andrew Morton Cc: Dave Chinner Cc: Qi Zheng Cc: Roman Gushchin Cc: Muchun Song Cc: David Hildenbrand Cc: Lorenzo Stoakes Cc: "Liam R. Howlett" Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Michal Hocko Cc: Johannes Weiner Cc: Shakeel Butt Cc: Kairui Song Cc: Barry Song Cc: Axel Rasmussen Cc: Yuanchu Xie Cc: Wei Xu Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Assisted-by: Claude:claude-opus-4.6 Signed-off-by: Matthew Brost --- include/linux/mmzone.h | 40 +++++++++++++++++++++++ include/linux/shrinker.h | 20 ++++++++++++ mm/internal.h | 3 +- mm/shrinker.c | 14 +++++--- mm/vmscan.c | 70 +++++++++++++++++++++++++++++++++++++--- 5 files changed, 137 insertions(+), 10 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9adb2ad21da5..1554e8058e4b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1461,6 +1461,39 @@ struct memory_failure_stats { }; #endif =20 +/* + * Per-pgdat state machine for the kswapd "opportunistic compaction" hint. + * + * wakeup_kswapd() collapses the gfp flags of all wakers that arrive betwe= en + * two kswapd runs into a single tri-state, which kswapd then forwards to = the + * shrinkers via shrink_control::opportunistic_compaction: + * + * KSWAPD_UNSET_OPPORTUNISTIC_COMPACTION + * Initial state after kswapd consumes the previous value. No waker has + * been observed yet for the upcoming run. + * + * KSWAPD_NO_OPPORTUNISTIC_COMPACTION + * At least one waker is an order-0 allocation, or a high-order + * allocation that cannot tolerate failure (i.e., not eligible for + * opportunistic behaviour). Shrinkers must do their normal best-effort + * work; the hint is cleared. + * + * KSWAPD_OPPORTUNISTIC_COMPACTION + * All wakers seen so far are high-order allocations that may fail + * (__GFP_NORETRY or __GFP_RETRY_MAYFAIL, without __GFP_NOFAIL). Shrinkers + * may skip work that is unlikely to produce a contiguous high-order + * block (e.g., evicting working-set pages). + * + * The state is sticky in the "NO" direction within a single kswapd run: o= nce + * any non-eligible waker is observed, subsequent eligible wakers cannot + * upgrade it back to KSWAPD_OPPORTUNISTIC_COMPACTION. + */ +enum kswapd_opportunistic_compaction_type { + KSWAPD_UNSET_OPPORTUNISTIC_COMPACTION =3D 0, + KSWAPD_NO_OPPORTUNISTIC_COMPACTION, + KSWAPD_OPPORTUNISTIC_COMPACTION, +}; + /* * On NUMA machines, each NUMA node would have a pg_data_t to describe * it's memory layout. On UMA machines there is a single pglist_data which @@ -1525,6 +1558,13 @@ typedef struct pglist_data { #endif struct task_struct *kswapd; /* Protected by kswapd_lock */ int kswapd_order; + /* + * Aggregated opportunistic-compaction hint for the next kswapd run. + * Updated by wakeup_kswapd() based on the gfp flags / order of each + * waker, and consumed (and reset) by kswapd before balance_pgdat(). + * See enum kswapd_opportunistic_compaction_type for the state machine. + */ + enum kswapd_opportunistic_compaction_type kswapd_opportunistic_compaction; enum zone_type kswapd_highest_zoneidx; =20 atomic_t kswapd_failures; /* Number of 'reclaimed =3D=3D 0' runs */ diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 7072f693b9be..c1a69536bcdc 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -40,6 +40,26 @@ struct shrink_control { /* Allocation order we are currently trying to fulfil. */ s8 order; =20 + /* + * Opportunistic compaction hint. + * + * Set by the reclaim path to tell shrinkers that this pass is + * driven by an order > 0 allocation that the caller is willing to + * have fail (e.g., __GFP_NORETRY / __GFP_RETRY_MAYFAIL without + * __GFP_NOFAIL). Such allocations only really benefit from + * shrinking when doing so frees up a contiguous, high-order block; + * thrashing working sets in the hope of producing one is typically + * counter-productive. + * + * Shrinkers that can produce naturally-aligned high-order folios + * (see shrink_control::order) should treat this as a hint to skip + * costly work that is unlikely to help compaction (for example, + * evicting hot/working-set pages just to free single pages). + * + * Only meaningful when @order > 0; ignored otherwise. + */ + bool opportunistic_compaction; + /* * How many objects scan_objects should scan and try to reclaim. * This is reset before every call, so it is safe for callees diff --git a/mm/internal.h b/mm/internal.h index ff8671dccf7b..a822ddfc7e5d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1760,7 +1760,8 @@ void __meminit __init_page_from_nid(unsigned long pfn= , int nid); =20 /* shrinker related functions */ unsigned long shrink_slab(gfp_t gfp_mask, int nid, s8 order, - struct mem_cgroup *memcg, int priority); + struct mem_cgroup *memcg, int priority, + bool opportunistic_compaction); =20 int shmem_add_to_page_cache(struct folio *folio, struct address_space *mapping, diff --git a/mm/shrinker.c b/mm/shrinker.c index c83f3b3daa08..bdc331e8a344 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -467,7 +467,7 @@ static unsigned long do_shrink_slab(struct shrink_contr= ol *shrinkctl, =20 #ifdef CONFIG_MEMCG static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, s8 order, - struct mem_cgroup *memcg, int priority) + struct mem_cgroup *memcg, int priority, bool opportunistic_compaction) { struct shrinker_info *info; unsigned long ret, freed =3D 0; @@ -530,6 +530,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, = int nid, s8 order, .nid =3D nid, .order =3D order, .memcg =3D memcg, + .opportunistic_compaction =3D opportunistic_compaction, }; struct shrinker *shrinker; int shrinker_id =3D calc_shrinker_id(index, offset); @@ -589,7 +590,8 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, = int nid, s8 order, } #else /* !CONFIG_MEMCG */ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, s8 order, - struct mem_cgroup *memcg, int priority) + struct mem_cgroup *memcg, int priority, + bool opportunistic_compaction) { return 0; } @@ -602,6 +604,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, = int nid, s8 order, * @order: order of allocation * @memcg: memory cgroup whose slab caches to target * @priority: the reclaim priority + * @opportunistic_compaction: do compaction opportunistically (e.g., do no= t swap working sets) * * Call the shrink functions to age shrinkable caches. * @@ -617,7 +620,8 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, = int nid, s8 order, * Returns the number of reclaimed slab objects. */ unsigned long shrink_slab(gfp_t gfp_mask, int nid, s8 order, - struct mem_cgroup *memcg, int priority) + struct mem_cgroup *memcg, int priority, + bool opportunistic_compaction) { unsigned long ret, freed =3D 0; struct shrinker *shrinker; @@ -630,7 +634,8 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, s8 o= rder, * oom. */ if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) - return shrink_slab_memcg(gfp_mask, nid, order, memcg, priority); + return shrink_slab_memcg(gfp_mask, nid, order, memcg, priority, + opportunistic_compaction); =20 /* * lockless algorithm of global shrink. @@ -660,6 +665,7 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, s8 o= rder, .nid =3D nid, .order =3D order, .memcg =3D memcg, + .opportunistic_compaction =3D opportunistic_compaction, }; =20 if (!shrinker_try_get(shrinker)) diff --git a/mm/vmscan.c b/mm/vmscan.c index a54d14ecad25..57b8e1af6300 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -96,6 +96,14 @@ struct scan_control { /* Swappiness value for proactive reclaim. Always use sc_swappiness()! */ int *proactive_swappiness; =20 + /* + * Opportunistic compaction hint snapshotted from the pgdat at the + * start of this reclaim pass. Forwarded to shrinkers through + * shrink_control::opportunistic_compaction so they can skip + * non-productive work for failable high-order allocations. + */ + enum kswapd_opportunistic_compaction_type kswapd_opportunistic_compaction; + /* Can active folios be deactivated as part of reclaim? */ #define DEACTIVATE_ANON 1 #define DEACTIVATE_FILE 2 @@ -412,7 +420,7 @@ static unsigned long drop_slab_node(int nid) =20 memcg =3D mem_cgroup_iter(NULL, NULL, NULL); do { - freed +=3D shrink_slab(GFP_KERNEL, nid, 0, memcg, 0); + freed +=3D shrink_slab(GFP_KERNEL, nid, 0, memcg, 0, false); } while ((memcg =3D mem_cgroup_iter(NULL, memcg, NULL)) !=3D NULL); =20 return freed; @@ -5069,7 +5077,8 @@ static int shrink_one(struct lruvec *lruvec, struct s= can_control *sc) success =3D try_to_shrink_lruvec(lruvec, sc); =20 shrink_slab(sc->gfp_mask, pgdat->node_id, sc->order, memcg, - sc->priority); + sc->priority, sc->kswapd_opportunistic_compaction =3D=3D + KSWAPD_OPPORTUNISTIC_COMPACTION); =20 if (!sc->proactive) vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned, @@ -6172,7 +6181,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, stru= ct scan_control *sc) shrink_lruvec(lruvec, sc); =20 shrink_slab(sc->gfp_mask, pgdat->node_id, sc->order, memcg, - sc->priority); + sc->priority, sc->kswapd_opportunistic_compaction =3D=3D + KSWAPD_OPPORTUNISTIC_COMPACTION); =20 /* Record the group's reclaim efficiency */ if (!sc->proactive) @@ -7105,8 +7115,14 @@ clear_reclaim_active(pg_data_t *pgdat, int highest_z= oneidx) * found to have free_pages <=3D high_wmark_pages(zone), any page in that = zone * or lower is eligible for reclaim until at least one usable zone is * balanced. + * + * @kswapd_opportunistic_compaction is the aggregated hint produced by + * wakeup_kswapd() for this run; it is propagated into scan_control so that + * shrinkers can skip costly work that is unlikely to help compaction when + * all wakers are failable high-order allocations. */ -static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) +static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx, + enum kswapd_opportunistic_compaction_type kswapd_opportunistic_compact= ion) { int i; unsigned long nr_soft_reclaimed; @@ -7120,6 +7136,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order,= int highest_zoneidx) .gfp_mask =3D GFP_KERNEL, .order =3D order, .may_unmap =3D 1, + .kswapd_opportunistic_compaction =3D kswapd_opportunistic_compaction, }; =20 set_task_reclaim_state(current, &sc.reclaim_state); @@ -7442,6 +7459,7 @@ static int kswapd(void *p) unsigned int highest_zoneidx =3D MAX_NR_ZONES - 1; pg_data_t *pgdat =3D (pg_data_t *)p; struct task_struct *tsk =3D current; + enum kswapd_opportunistic_compaction_type kswapd_opportunistic_compaction; =20 /* * Tell the memory management that we're a "memory allocator", @@ -7459,6 +7477,7 @@ static int kswapd(void *p) set_freezable(); =20 WRITE_ONCE(pgdat->kswapd_order, 0); + WRITE_ONCE(pgdat->kswapd_opportunistic_compaction, KSWAPD_UNSET_OPPORTUNI= STIC_COMPACTION); WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES); atomic_set(&pgdat->nr_writeback_throttled, 0); for ( ; ; ) { @@ -7474,10 +7493,13 @@ static int kswapd(void *p) =20 /* Read the new order and highest_zoneidx */ alloc_order =3D READ_ONCE(pgdat->kswapd_order); + kswapd_opportunistic_compaction =3D READ_ONCE(pgdat->kswapd_opportunisti= c_compaction); highest_zoneidx =3D kswapd_highest_zoneidx(pgdat, highest_zoneidx); WRITE_ONCE(pgdat->kswapd_order, 0); WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES); + WRITE_ONCE(pgdat->kswapd_opportunistic_compaction, + KSWAPD_UNSET_OPPORTUNISTIC_COMPACTION); =20 if (kthread_freezable_should_stop(&was_frozen)) break; @@ -7500,7 +7522,8 @@ static int kswapd(void *p) trace_mm_vmscan_kswapd_wake(pgdat->node_id, highest_zoneidx, alloc_order); reclaim_order =3D balance_pgdat(pgdat, alloc_order, - highest_zoneidx); + highest_zoneidx, + kswapd_opportunistic_compaction); if (reclaim_order < alloc_order) goto kswapd_try_sleep; } @@ -7510,6 +7533,22 @@ static int kswapd(void *p) return 0; } =20 +/* + * Is @gfp_flags a high-order allocation that is eligible for the + * "opportunistic compaction" treatment in kswapd / shrinkers? + * + * The caller must be willing to tolerate failure (__GFP_NORETRY or + * __GFP_RETRY_MAYFAIL) and must not have set __GFP_NOFAIL. For such + * allocations there is little value in burning working-set pages just to + * scrape together a single high-order block: if compaction can't easily + * succeed, the caller would rather see the allocation fail. + */ +static bool gfp_kswapd_opportunistic_compaction(gfp_t gfp_flags) +{ + return (gfp_flags & (__GFP_NORETRY | __GFP_RETRY_MAYFAIL)) && + !(gfp_flags & __GFP_NOFAIL); +} + /* * A zone is low on free memory or too fragmented for high-order memory. = If * kswapd should reclaim (direct reclaim is deferred), wake it up for the = zone's @@ -7538,6 +7577,27 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flag= s, int order, if (READ_ONCE(pgdat->kswapd_order) < order) WRITE_ONCE(pgdat->kswapd_order, order); =20 + /* + * Fold this waker into the per-pgdat opportunistic-compaction hint + * that kswapd will pick up at the start of its next run. + * + * The state is sticky in the "NO" direction: once any waker in this + * batch is order-0 or a non-failable high-order allocation, the hint + * stays cleared until kswapd consumes it. Only when every waker so + * far is a failable high-order allocation do we set + * KSWAPD_OPPORTUNISTIC_COMPACTION, asking shrinkers to skip work + * that won't realistically help compaction. + */ + if (READ_ONCE(pgdat->kswapd_opportunistic_compaction) !=3D + KSWAPD_NO_OPPORTUNISTIC_COMPACTION) { + if (!order || !gfp_kswapd_opportunistic_compaction(gfp_flags)) + WRITE_ONCE(pgdat->kswapd_opportunistic_compaction, + KSWAPD_NO_OPPORTUNISTIC_COMPACTION); + else if (order && gfp_kswapd_opportunistic_compaction(gfp_flags)) + WRITE_ONCE(pgdat->kswapd_opportunistic_compaction, + KSWAPD_OPPORTUNISTIC_COMPACTION); + } + if (!waitqueue_active(&pgdat->kswapd_wait)) return; =20 --=20 2.34.1 From nobody Sat Jun 13 19:02:40 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38DB12C11DF for ; Wed, 6 May 2026 03:33:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778038391; cv=none; b=aSOrtFMjccoprLA+N/7KmteKF1RwP3G9WVHIC+/DAIC4ZjrVTBAqPuzXRypaNsLtTEOmR5oNqT6umHW+3Bf8EG8DzDO4PBKdIiIzWBNy6zEdkfInVHHSSioPWjSWuf14A0E4JlK4NyKS6HPw7Gm+SnvOVVoqyKXAHpTY9+pmJlU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778038391; c=relaxed/simple; bh=fhvCwpOOerNDSxVfaQALugyDzdK6DreUrRK4L1ox3PU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=D+kkPqwKCzRt5v7pFKd7qNz83jYb55Y2hAatnOARqvGp3S3yXoSamXMPX0fUkJ8PTy2JdEfksCZ+nk/yLzyttKfroIi7SAKdFvD8AzRgJ0BE4n/fvCflOVrvfn5F1AuA0suveKyRbzMq3R0GmbtjbySyWcjyE0WRU8XZ6cjCJRQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=N67e8UzN; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="N67e8UzN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778038391; x=1809574391; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fhvCwpOOerNDSxVfaQALugyDzdK6DreUrRK4L1ox3PU=; b=N67e8UzNlhi6aoASuAT8BU3nYchfwCYXpDaIe59qLr4V/vYXFgj8DWLO bDThoaoAEUKcJnxE9r732sbmX6Z+71eAEt/NlQRKtPCrbqFi///7yGsMu nMq6rOtYsE0j6tibEimXHxDrX83L6ETzPn+AAcjLjzcBU61+oWcNFe6ZZ AX7XBGUq+v60RTXYpOWW/f53Oor3hcg3BT7JEnmJx5bEkFsO99w2icPKo Vlx2gN/pQuhhGXx1e+YbUHyL3AZx6jkwo0CGNaSWPf0lCfkc8vrd4Asi+ To6+2PjUewd7ateB1KYcW8BvEdUOMSdkHfs6ivC/oN9D5zdTnHycwNapu w==; X-CSE-ConnectionGUID: eFynZYYTRkWhGz8PvSNSuQ== X-CSE-MsgGUID: kWo18VTLTzSa0+MGgSvi1w== X-IronPort-AV: E=McAfee;i="6800,10657,11777"; a="78829060" X-IronPort-AV: E=Sophos;i="6.23,218,1770624000"; d="scan'208";a="78829060" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2026 20:33:07 -0700 X-CSE-ConnectionGUID: lHxy+04KTjG2WxqjseVy6w== X-CSE-MsgGUID: ukmuUaJjRm62am4oFugUjA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,218,1770624000"; d="scan'208";a="266342150" Received: from gsse-cloud1.jf.intel.com ([10.54.39.91]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2026 20:33:07 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org Cc: Andrew Morton , Dave Chinner , Qi Zheng , Roman Gushchin , Muchun Song , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Johannes Weiner , Shakeel Butt , Kairui Song , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Tvrtko Ursulin , =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Carlos Santa , Christian Koenig , Huang Rui , Matthew Auld , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Simona Vetter , Daniel Colascione , Andi Shyti Subject: [PATCH v5 3/5] drm/ttm: Issue direct reclaim at beneficial_order Date: Tue, 5 May 2026 20:32:58 -0700 Message-Id: <20260506033300.3534883-4-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260506033300.3534883-1-matthew.brost@intel.com> References: <20260506033300.3534883-1-matthew.brost@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Triggering kswap at an order higher than beneficial_order makes little sense, as the driver has already indicated the optimal order at which reclaim is effective. Similarly, issuing direct reclaim or triggering kswap at a lower order than beneficial_order is ineffective, since the driver does not benefit from reclaiming lower-order pages. As a result, direct reclaim should only be issued with __GFP_NORETRY at exactly beneficial_order, or as a fallback, direct reclaim without __GFP_NORETRY at order 0 when failure is not an option. Cc: Andrew Morton Cc: Dave Chinner Cc: Qi Zheng Cc: Roman Gushchin Cc: Muchun Song Cc: David Hildenbrand Cc: Lorenzo Stoakes Cc: "Liam R. Howlett" Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Michal Hocko Cc: Johannes Weiner Cc: Shakeel Butt Cc: Kairui Song Cc: Barry Song Cc: Axel Rasmussen Cc: Yuanchu Xie Cc: Wei Xu Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Tvrtko Ursulin Cc: Thomas Hellstr=C3=B6m Cc: Carlos Santa Cc: Christian Koenig Cc: Huang Rui Cc: Matthew Auld Cc: Matthew Brost Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: David Airlie Cc: Simona Vetter CC: dri-devel@lists.freedesktop.org Cc: Daniel Colascione Signed-off-by: Matthew Brost Reviewed-by: Christian Koenig Reviewed-by: Andi Shyti --- drivers/gpu/drm/ttm/ttm_pool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index 278bbe7a11ad..e76c3a5c67bd 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -165,8 +165,8 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool= *pool, gfp_t gfp_flags, * Do not add latency to the allocation path for allocations orders * device tolds us do not bring them additional performance gains. */ - if (beneficial_order && order > beneficial_order) - gfp_flags &=3D ~__GFP_DIRECT_RECLAIM; + if (order && beneficial_order && order !=3D beneficial_order) + gfp_flags &=3D ~__GFP_RECLAIM; =20 if (!ttm_pool_uses_dma_alloc(pool)) { p =3D alloc_pages_node(pool->nid, gfp_flags, order); --=20 2.34.1 From nobody Sat Jun 13 19:02:40 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9164E2E2850 for ; Wed, 6 May 2026 03:33:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778038393; cv=none; b=eBppOfTaGK4KANxFD3ClxsX56v6dD0EzGbItSDkpYPiXGoXleKajFj9KdNV6pezsNb5xnw3WdXKrNlbfOQ+tKU5D0hyTg1F8QPZu07ChMdZS0RmIuQ/KPs5M6pw0K7dXhHMKt5Kg728GGtAdDwVohe7UpYW1VV3VhNayxsWVoBs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778038393; c=relaxed/simple; bh=D1dzh1JcLACqd1t8TArobnoPQTlNfPM6sRXopWPpBeA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=RMb0zX1DLM6HbdWNW8vFuW9EKQIIKJnTfA/hNdAb/Yi0G4/K68vTFYSNSCs7v6RfWAiodcMlY4WVlPaIYzf993TgP5MPEXxyb7jaDNo6OvlPtmEoA2fL6zat3194qIawB9kvVlqrtdwmjvslQR75oHX435LqJ3nJsRPE0XdMr4M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aTs3ax5b; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aTs3ax5b" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778038392; x=1809574392; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=D1dzh1JcLACqd1t8TArobnoPQTlNfPM6sRXopWPpBeA=; b=aTs3ax5bSe7fdV15w+bMIaxYSKqSjltrIAbJ1Yf29Zk27P8XqyhDLQYp DwCqRGVWZVJdXS3ZxJ9DgXV6lO0xYvKzWtgZ7siD32iC5TU/CEDC0xZur w61VazJD4pRTP7vF0C7Zh1omkYJA6Vpy2uPYDNEDE4nOFkX9rVygQqifJ 5Ylg4FUN8DJpY4bgRJwUMcabBhu7ArKV0aH1+o/yyBh0IKwgCddrRQmCu Md60TxE28MrVukG/lg+2EK/EOSsw6cLpNc5ub2fd/AKVn050rr4TMWKAN YxmC0le2f5i8lP5rkaQfcaz4Y4+Qmt96jEeZlyMHHQ0PeOgTeoh+Yhz0g g==; X-CSE-ConnectionGUID: 4y/JsEWGQ027LDVEBeLi1g== X-CSE-MsgGUID: Y7C430vfRQ2YZp9hVnSc5g== X-IronPort-AV: E=McAfee;i="6800,10657,11777"; a="78829069" X-IronPort-AV: E=Sophos;i="6.23,218,1770624000"; d="scan'208";a="78829069" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2026 20:33:07 -0700 X-CSE-ConnectionGUID: bOHXU/EHQYKUuqtDmh0h9A== X-CSE-MsgGUID: VN7DFizKTnCFe1wgehyRvw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,218,1770624000"; d="scan'208";a="266342153" Received: from gsse-cloud1.jf.intel.com ([10.54.39.91]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2026 20:33:07 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org Cc: Andrew Morton , Dave Chinner , Qi Zheng , Roman Gushchin , Muchun Song , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Johannes Weiner , Shakeel Butt , Kairui Song , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org, =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Carlos Santa , Matthew Auld , Andi Shyti Subject: [PATCH v5 4/5] drm/xe: Set TTM device beneficial_order to 9 (2M) Date: Tue, 5 May 2026 20:32:59 -0700 Message-Id: <20260506033300.3534883-5-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260506033300.3534883-1-matthew.brost@intel.com> References: <20260506033300.3534883-1-matthew.brost@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Set the TTM device beneficial_order to 9 (2M), which is the sweet spot for Xe when attempting reclaim on system memory BOs, as it matches the large GPU page size. This ensures reclaim is attempted at the most effective order for the driver. Cc: Andrew Morton Cc: Dave Chinner Cc: Qi Zheng Cc: Roman Gushchin Cc: Muchun Song Cc: David Hildenbrand Cc: Lorenzo Stoakes Cc: "Liam R. Howlett" Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Michal Hocko Cc: Johannes Weiner Cc: Shakeel Butt Cc: Kairui Song Cc: Barry Song Cc: Axel Rasmussen Cc: Yuanchu Xie Cc: Wei Xu Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Thomas Hellstr=C3=B6m Cc: Carlos Santa Cc: Matthew Auld Signed-off-by: Matthew Brost Reviewed-by: Andi Shyti Reviewed-by: Thomas Hellstr=C3=B6m --- drivers/gpu/drm/xe/xe_device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 4b45b617a039..3f719ab08d1c 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -500,7 +500,8 @@ struct xe_device *xe_device_create(struct pci_dev *pdev, =20 err =3D ttm_device_init(&xe->ttm, &xe_ttm_funcs, xe->drm.dev, xe->drm.anon_inode->i_mapping, - xe->drm.vma_offset_manager, 0); + xe->drm.vma_offset_manager, + TTM_ALLOCATION_POOL_BENEFICIAL_ORDER(get_order(SZ_2M))); if (WARN_ON(err)) return ERR_PTR(err); =20 --=20 2.34.1 From nobody Sat Jun 13 19:02:40 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D37602E762C for ; Wed, 6 May 2026 03:33:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778038393; cv=none; b=LNnO/bonxU+6bTXiULd9aF6gqAbMvC1jIWoLJoQhxtOujvMk3uFJdVddY4YiKT32p2cqBWUbtJMyl5x2Lx5Xmd7Ec/Zl1Hhz1cWOi2gIgRuO9YFM0y6dFpI6+HXihJ4RTOCxq4dnc9IrKsLcWG31WyihrILNQ4cxfJNkyONBNAE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778038393; c=relaxed/simple; bh=hKs2r/SZ8fBLPzoHgcugkqcKmbZtBIMZ80SyreZA8i4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=vD6/JL12AHrGMZZMCS4rxSUo3bzI/G/yx9gaPcvuiYH+a6vsz+0WIbmLF2FdekDHnSUT/757QclAWim47H/E0tE254L7/ExyUZSxF6GaKrHYhZ39YZGD45RrsIZB6961Q5Y2fDtNsBXsAlGVbTgMVqtoipEwSQTaTBTrS9+Pyg8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iiexS1Px; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iiexS1Px" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778038392; x=1809574392; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hKs2r/SZ8fBLPzoHgcugkqcKmbZtBIMZ80SyreZA8i4=; b=iiexS1PxxvVn70NoFHGYA/t6B05GZwe9swz7LkuHqzjXmSonMQctCz3W 4/qtNzompBMVFnzt0KhjYJGGdUpDD1hx/o9t8H6diSrF7ZopKetCusDWw mCivbHivzkp7gl/BddKzpVUFlIvLFMPPMkPlhWYMNFQg/iUl0jLJaYlAW Eg4SZtSkbLABx2DKJEUsBiRsPF7WwLyBmAYDvgwv9H+f7kIW51o0EIWj5 RLIRW+ytow+s5bv8e/Xi84vF5YcWzsTY+/KGtvI7L/VIBv0NmK1pJqrQT BSO/6LPoOkk3X6LBkSnFHjGDiI9oQJ6K9bXZXkleWfo5invSRl+bxrCc/ Q==; X-CSE-ConnectionGUID: erd8KU3sRbKVE56oXddZUQ== X-CSE-MsgGUID: HTOssimGRqGXNyWlLNJ98g== X-IronPort-AV: E=McAfee;i="6800,10657,11777"; a="78829082" X-IronPort-AV: E=Sophos;i="6.23,218,1770624000"; d="scan'208";a="78829082" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2026 20:33:07 -0700 X-CSE-ConnectionGUID: 7i4b2G9rRAKaRcC7Cv6/qw== X-CSE-MsgGUID: El8a7KdFSaqIsFbWl9ZtuQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,218,1770624000"; d="scan'208";a="266342154" Received: from gsse-cloud1.jf.intel.com ([10.54.39.91]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2026 20:33:07 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org Cc: Andrew Morton , Dave Chinner , Qi Zheng , Roman Gushchin , Muchun Song , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Johannes Weiner , Shakeel Butt , Kairui Song , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 5/5] drm/xe: Make use of shrink_control::opportunistic_compaction hint Date: Tue, 5 May 2026 20:33:00 -0700 Message-Id: <20260506033300.3534883-6-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260506033300.3534883-1-matthew.brost@intel.com> References: <20260506033300.3534883-1-matthew.brost@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Xe/TTM backup reclaim can be extremely expensive under fragmentation pressure as reclaim may migrate or destroy actively used GPU working sets despite the system still having substantial free memory available. Under high-order opportunistic reclaim, repeatedly backing up GPU memory can lead to reclaim/rebind ping-pong behavior where active GPU working sets are continuously torn down and reconstructed without materially improving allocation success. Use the new shrink_control::opportunistic_compaction hint to avoid Xe backup reclaim during fragmentation-driven high-order reclaim attempts. In this mode the shrinker skips advertising backup-backed reclaimable memory and avoids initiating backup operations entirely. Order-0 and non-opportunistic reclaim behavior remain unchanged, so Xe backup reclaim still participates normally during genuine memory pressure. Cc: Andrew Morton Cc: Dave Chinner Cc: Qi Zheng Cc: Roman Gushchin Cc: Muchun Song Cc: David Hildenbrand Cc: Lorenzo Stoakes Cc: "Liam R. Howlett" Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Michal Hocko Cc: Johannes Weiner Cc: Shakeel Butt Cc: Kairui Song Cc: Barry Song Cc: Axel Rasmussen Cc: Yuanchu Xie Cc: Wei Xu Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Assisted-by: Claude:claude-opus-4.6 Signed-off-by: Matthew Brost Reviewed-by: Thomas Hellstr=C3=B6m --- drivers/gpu/drm/xe/xe_shrinker.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_shrinker.c b/drivers/gpu/drm/xe/xe_shrin= ker.c index 83374cd57660..4646b0f5b82b 100644 --- a/drivers/gpu/drm/xe/xe_shrinker.c +++ b/drivers/gpu/drm/xe/xe_shrinker.c @@ -139,10 +139,17 @@ static unsigned long xe_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) { struct xe_shrinker *shrinker =3D to_xe_shrinker(shrink); - unsigned long num_pages; + unsigned long num_pages =3D 0; bool can_backup =3D !!(sc->gfp_mask & __GFP_FS); =20 - num_pages =3D ttm_backup_bytes_avail() >> PAGE_SHIFT; + /* + * Skip accounting backup-able pages when this is an opportunistic + * high-order pass: TTM backup work shrinks at native page granularity + * and is unlikely to produce the contiguous block the caller wants, + * so don't advertise it as reclaimable for this hint. + */ + if (!sc->order || !sc->opportunistic_compaction) + num_pages =3D ttm_backup_bytes_avail() >> PAGE_SHIFT; read_lock(&shrinker->lock); =20 if (can_backup) @@ -233,7 +240,14 @@ static unsigned long xe_shrinker_scan(struct shrinker = *shrink, struct shrink_con } =20 sc->nr_scanned =3D nr_scanned; - if (nr_scanned >=3D nr_to_scan || !can_backup) + /* + * Stop after the purge pass for opportunistic high-order reclaim: + * the subsequent backup/writeback pass works at native page order + * and is unlikely to free a contiguous high-order block, so doing + * it here would just churn working sets for no compaction benefit. + */ + if (nr_scanned >=3D nr_to_scan || !can_backup || + (sc->order && sc->opportunistic_compaction)) goto out; =20 /* If we didn't wake before, try to do it now if needed. */ --=20 2.34.1