From nobody Fri Apr 10 06:22:53 2026 Received: from mail-qv1-f43.google.com (mail-qv1-f43.google.com [209.85.219.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EF3917F7 for ; Tue, 25 Feb 2025 00:10:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740442237; cv=none; b=OwV3tJ9xr6sNMcex8aXDj578lcPylonwYhF/O+fYbZmVtKCsdIpV72Jo3QcJxZhW4FoS1VDN2LVMAjAKN616mnLC1eUzjbRBaGj05J5bNfbpl6w2vg+OLLCuIxHG6/575cLx0AV8r+W51hSz6p6jlX1U6/5MElH9h9nZcDTL/xQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740442237; c=relaxed/simple; bh=vW6/f9HmG6tIbyx0iIEK4VXIjaqoMfGskZQtTPAvwdk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FTuJMf6DZwEB4g+t4kIkQKw2OX123t70oEG6pSeufDbVQhUf6jFYY1/JkupxzHEcIh3r3mWs9A2aHYENGad+DawhnwIdPmiji5U+Y6BDlqQrABJ5OsTpBoY/YspW/x7YIellwyS+HY+I7LSbwBxF3mfk1oSvHVnUjIhUL8bCuWU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20230601.gappssmtp.com header.i=@cmpxchg-org.20230601.gappssmtp.com header.b=bBMqLMA6; arc=none smtp.client-ip=209.85.219.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20230601.gappssmtp.com header.i=@cmpxchg-org.20230601.gappssmtp.com header.b="bBMqLMA6" Received: by mail-qv1-f43.google.com with SMTP id 6a1803df08f44-6ddcff5a823so40602256d6.0 for ; Mon, 24 Feb 2025 16:10:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1740442233; x=1741047033; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Jfbz8GKSHpu/xyC6cCph+07KnkaKzui+briDAaa0nnU=; b=bBMqLMA6k26Yh7fvf+n5C0NkpxW8G1qvzE2lnvVw4+PgN8lRKIckzKI615W/7Bc2ZP bJYRblCiB/217+R0oOCoYtI2TugSJzFC+nLsPWoUUIplGrSe6erUrqQJg6N/AjpVDY/7 lsKMcappEloKmfPhl8oySbbhDnNB+NlqVF4zLwNm1ozLYSQk8ziwKcAa3N2X2mA2ZXs3 Dye1YR0/N1F5gCLcV56r4Jg1mqO8bEZNPHVJkdUKy7PscTP210UqzPP81WUAgUSf8ZAA VKRcAd7aHAqskqLqvPRsP9/MzgAwrVGWxQl5GYY4i70RCqKADSmjIqU677G1E5IRcFhe UyGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740442233; x=1741047033; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Jfbz8GKSHpu/xyC6cCph+07KnkaKzui+briDAaa0nnU=; b=BZVrYR8DtjFUg6UQ4fDG2XhuZ2FGpfUy1YEPA7eMcVbsBinzKFZk69FSc3xZJ7fome QZS7hDW0MSqENSP97hAOuAOUUhEQXQi7s46FITrvn2UIwF2sgdFRaTekk9IsBcLa9D2w WpPmtcYtSLsVl7eoIbrGARXqR3sYUb3FtIkDG+QPdrvtUk0a4NHVHHBEtEYVxCCEbgfJ Dz8Z6e1HBmSvl+zW4/JMsEagJeXhPcCvvjhH3IXK7iFq4BGAoYKcW7762K5OituKiBlX aTH6vDgoL1oJ8eEilV3mCT7rWB9Uco06/5G7QX5S/9GnPt7LR0kY3Zvm/i+oiai+HOxv YpPg== X-Forwarded-Encrypted: i=1; AJvYcCWm8apDT3HedCnlHKkaVLyM5s7fSJqb6T3kRESU/P82dnPZmIV0BGl77syfdiT/AboquEEu+rPejQyOXgs=@vger.kernel.org X-Gm-Message-State: AOJu0YwxfuREnI6vRbDL/2z6rp/ZIrwTVH+IeUvDXk7quPzvVITCTN5H hmzEvZWBJSgIwNxvpcFsSmS9nEmo70yG8FMheQitLUPVNJhSz9dQL8f9LSJ7PDc= X-Gm-Gg: ASbGncu9gq9lQUJifa8YtVeIdFcNcQlr3UBZv0zlw9M/Pl9yod0zHstpBlsuVDrw8bR 25fVtfiZwPlKUY8wzjXJ5WhRYRBBuCnJjqlOGlQNM9RLT+zvrYe5EzYMo3iq5Klco0uyd6IEohc vcyO/QjXl0zZZD9b43eWQ0k4cNOJvVgI2FmkK5RPpoJyUJ6pTUHP7ssA7i8O05ZVvbiwWDFJ0zp 1+rPaOzFcRuefjWhNWhnmyRKg97W/NXz7izNoYxRqely12B5f24IDEHAnJRlWE0XYbAa0zL9Ggs gb4qShhWiziJTg1i5snlKBQP X-Google-Smtp-Source: AGHT+IGM3VVUjbDeCXvrKFFPK7jbq7t2UH3GiR9glB2p9o0jhpvV1lYU60G0oFU/Ke/PG5Pti/cUBg== X-Received: by 2002:a05:6214:1bcb:b0:6e6:5a8a:aba with SMTP id 6a1803df08f44-6e6ae829f34mr221625886d6.21.1740442232965; Mon, 24 Feb 2025 16:10:32 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id 6a1803df08f44-6e87b1564easm3076106d6.72.2025.02.24.16.10.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Feb 2025 16:10:32 -0800 (PST) From: Johannes Weiner To: Andrew Morton Cc: Vlastimil Babka , Brendan Jackman , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/3] mm: page_alloc: don't steal single pages from biggest buddy Date: Mon, 24 Feb 2025 19:08:24 -0500 Message-ID: <20250225001023.1494422-2-hannes@cmpxchg.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250225001023.1494422-1-hannes@cmpxchg.org> References: <20250225001023.1494422-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The fallback code searches for the biggest buddy first in an attempt to steal the whole block and encourage type grouping down the line. The approach used to be this: - Non-movable requests will split the largest buddy and steal the remainder. This splits up contiguity, but it allows subsequent requests of this type to fall back into adjacent space. - Movable requests go and look for the smallest buddy instead. The thinking is that movable requests can be compacted, so grouping is less important than retaining contiguity. c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion") enforces freelist type hygiene, which restricts stealing to either claiming the whole block or just taking the requested chunk; no additional pages or buddy remainders can be stolen any more. The patch mishandled when to switch to finding the smallest buddy in that new reality. As a result, it may steal the exact request size, but from the biggest buddy. This causes fracturing for no good reason. Fix this by committing to the new behavior: either steal the whole block, or fall back to the smallest buddy. Remove single-page stealing from steal_suitable_fallback(). Rename it to try_to_steal_block() to make the intentions clear. If this fails, always fall back to the smallest buddy. The following is from 4 runs of mmtest's thpchallenge. "Pollute" is single page fallback, "steal" is conversion of a partially used block. The numbers for free block conversions (omitted) are comparable. vanilla patched @pollute[unmovable from reclaimable]: 27 106 @pollute[unmovable from movable]: 82 46 @pollute[reclaimable from unmovable]: 256 83 @pollute[reclaimable from movable]: 46 8 @pollute[movable from unmovable]: 4841 868 @pollute[movable from reclaimable]: 5278 12568 @steal[unmovable from reclaimable]: 11 12 @steal[unmovable from movable]: 113 49 @steal[reclaimable from unmovable]: 19 34 @steal[reclaimable from movable]: 47 21 @steal[movable from unmovable]: 250 183 @steal[movable from reclaimable]: 81 93 The allocator appears to do a better job at keeping stealing and polluting to the first fallback preference. As a result, the numbers for "from movable" - the least preferred fallback option, and most detrimental to compactability - are down across the board. Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block co= nversion") Suggested-by: Vlastimil Babka Signed-off-by: Johannes Weiner Reviewed-by: Brendan Jackman Reviewed-by: Vlastimil Babka --- mm/page_alloc.c | 80 +++++++++++++++++++++---------------------------- 1 file changed, 34 insertions(+), 46 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 16dfcf7ade74..9ea14ec52449 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1986,13 +1986,12 @@ static inline bool boost_watermark(struct zone *zon= e) * can claim the whole pageblock for the requested migratetype. If not, we= check * the pageblock for constituent pages; if at least half of the pages are = free * or compatible, we can still claim the whole block, so pages freed in the - * future will be put on the correct free list. Otherwise, we isolate exac= tly - * the order we need from the fallback block and leave its migratetype alo= ne. + * future will be put on the correct free list. */ static struct page * -steal_suitable_fallback(struct zone *zone, struct page *page, - int current_order, int order, int start_type, - unsigned int alloc_flags, bool whole_block) +try_to_steal_block(struct zone *zone, struct page *page, + int current_order, int order, int start_type, + unsigned int alloc_flags) { int free_pages, movable_pages, alike_pages; unsigned long start_pfn; @@ -2005,7 +2004,7 @@ steal_suitable_fallback(struct zone *zone, struct pag= e *page, * highatomic accounting. */ if (is_migrate_highatomic(block_type)) - goto single_page; + return NULL; =20 /* Take ownership for orders >=3D pageblock_order */ if (current_order >=3D pageblock_order) { @@ -2026,14 +2025,10 @@ steal_suitable_fallback(struct zone *zone, struct p= age *page, if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD)) set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); =20 - /* We are not allowed to try stealing from the whole block */ - if (!whole_block) - goto single_page; - /* moving whole block can fail due to zone boundary conditions */ if (!prep_move_freepages_block(zone, page, &start_pfn, &free_pages, &movable_pages)) - goto single_page; + return NULL; =20 /* * Determine how many pages are compatible with our allocation. @@ -2066,9 +2061,7 @@ steal_suitable_fallback(struct zone *zone, struct pag= e *page, return __rmqueue_smallest(zone, order, start_type); } =20 -single_page: - page_del_and_expand(zone, page, order, current_order, block_type); - return page; + return NULL; } =20 /* @@ -2250,14 +2243,19 @@ static bool unreserve_highatomic_pageblock(const st= ruct alloc_context *ac, } =20 /* - * Try finding a free buddy page on the fallback list and put it on the fr= ee - * list of requested migratetype, possibly along with other pages from the= same - * block, depending on fragmentation avoidance heuristics. Returns true if - * fallback was found so that __rmqueue_smallest() can grab it. + * Try finding a free buddy page on the fallback list. + * + * This will attempt to steal a whole pageblock for the requested type + * to ensure grouping of such requests in the future. + * + * If a whole block cannot be stolen, regress to __rmqueue_smallest() + * logic to at least break up as little contiguity as possible. * * The use of signed ints for order and current_order is a deliberate * deviation from the rest of this file, to make the for loop * condition simpler. + * + * Return the stolen page, or NULL if none can be found. */ static __always_inline struct page * __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, @@ -2291,45 +2289,35 @@ __rmqueue_fallback(struct zone *zone, int order, in= t start_migratetype, if (fallback_mt =3D=3D -1) continue; =20 - /* - * We cannot steal all free pages from the pageblock and the - * requested migratetype is movable. In that case it's better to - * steal and split the smallest available page instead of the - * largest available page, because even if the next movable - * allocation falls back into a different pageblock than this - * one, it won't cause permanent fragmentation. - */ - if (!can_steal && start_migratetype =3D=3D MIGRATE_MOVABLE - && current_order > order) - goto find_smallest; + if (!can_steal) + break; =20 - goto do_steal; + page =3D get_page_from_free_area(area, fallback_mt); + page =3D try_to_steal_block(zone, page, current_order, order, + start_migratetype, alloc_flags); + if (page) + goto got_one; } =20 - return NULL; + if (alloc_flags & ALLOC_NOFRAGMENT) + return NULL; =20 -find_smallest: + /* No luck stealing blocks. Find the smallest fallback page */ for (current_order =3D order; current_order < NR_PAGE_ORDERS; current_ord= er++) { area =3D &(zone->free_area[current_order]); fallback_mt =3D find_suitable_fallback(area, current_order, start_migratetype, false, &can_steal); - if (fallback_mt !=3D -1) - break; - } - - /* - * This should not happen - we already found a suitable fallback - * when looking for the largest page. - */ - VM_BUG_ON(current_order > MAX_PAGE_ORDER); + if (fallback_mt =3D=3D -1) + continue; =20 -do_steal: - page =3D get_page_from_free_area(area, fallback_mt); + page =3D get_page_from_free_area(area, fallback_mt); + page_del_and_expand(zone, page, order, current_order, fallback_mt); + goto got_one; + } =20 - /* take off list, maybe claim block, expand remainder */ - page =3D steal_suitable_fallback(zone, page, current_order, order, - start_migratetype, alloc_flags, can_steal); + return NULL; =20 +got_one: trace_mm_page_alloc_extfrag(page, order, current_order, start_migratetype, fallback_mt); =20 --=20 2.48.1