From nobody Wed Dec 17 21:39:23 2025 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A0C7189915 for ; Tue, 31 Dec 2024 04:36:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735619765; cv=none; b=ajls8xbZBswdPJEXfe4p9LU1GZMDh1z9fo7khE2LshdExTwiCspXBbVJRPOl+JKW5ga71nPq8opnDj+m4hyXvvptAuAWnv+//n8lPZRAxzxvQPtYKrlqEZOVL8xEZnZY6PPK7439R9FSdFFQ6d1cFAwMsd8GKfyS6h+TvyV3b3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735619765; c=relaxed/simple; bh=pozeMboL+q+r1YKjg311oeCCqbNAn6Alpdz2xWCa2hY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=eWnXn5VfXBiA/AKIsPhu2tQRiJzSRslHTGBctVIfCe9A7gUWE2ZFJrBhnR+agv51lbS0kwNNC/KaGw/C6aA0wLewANxjGNCUG9U9yKV2Q8UstWpJRmr1wq/wmCCRlEMNuOZRW0WyGQeR2EZD1dcnnSbJhjCoDYOuv4/dXAUx/PM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=034O1oXy; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="034O1oXy" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2162259a5dcso235599515ad.3 for ; Mon, 30 Dec 2024 20:36:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1735619764; x=1736224564; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uSnhijC2SU9G2JpiGLLPy0X8NVAvldR4ppcAmuCmJd8=; b=034O1oXySHX/oM0yS6jiNXtNWXwGw/hmyUdzsYxeI3g5bzG7ejhGR6bxHgt+fPpsu5 Ep+hea1m8wuRaNjG+zlNo8UThg+vSFqqivFSSewvpldknFbFtvJhNeE2sTtgOgjQ6Kx6 d7vzXZYgSgE2rj7alrSKrq7WXdO8SBsHFFjv9hA1I04nRnJZ/JWHUu4oopU24iS1Xstb /qGErRlOuXMgGAO1g26CDV0p61asaumdTvlPD7ipIoEPbeKzbZHeX6jfJpPSywXmNRSv DpqE5UmJ3vxUiGhFHrdM1GhQlx3GWDmiwYKXAmk/y4iae5ORNCtSeIxerNJdTJ0VkuOC cl4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735619764; x=1736224564; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uSnhijC2SU9G2JpiGLLPy0X8NVAvldR4ppcAmuCmJd8=; b=vz8Xr7WkwS0eOtmT6oUL8K80ED3yaxLT7odwVRBsQG+kJPmk1CApKHRUEa07fqcxBH NjJsIO1Z/TVHbJQ0L45peEKe0OY3JEIfYektCchMGCpcPS16YC6qUMiWjZkiguamTqBL /Dv+AiVKHR0ucH5c3MWxJpNQwREbIvrJA5ohhy41BqE7l+WkXQ8w98HMiOdagH75hAWO SIPvvaqkRdyhZhqCTDKwniwJNWAcXIlWjbZlPh0kfuIBTkfIGkaK94vT6SHVGxE6ybhS O4lM7pT7hxw3yVwN6aKV1ioWY0m977rkzhmPdSNk6+l3O2o2x78jIzU2O9h/4YhSoRoA 85Vw== X-Forwarded-Encrypted: i=1; AJvYcCWD3lLX6WAvQiU2CjYWJC7F4e/pED8E7awX5HCuIi4s02Gzc0NnrrSfYWJ5we6UmU/GnxZloCrvuIasTKU=@vger.kernel.org X-Gm-Message-State: AOJu0YxEoK9y1rFRdKYL07KH39bqdwNDZGehhyq/Zlw4WUk/OcmPmRfG RaXr5LC9ZosP8OcFbvFibxZ+XhaRLGYRnw7IJoGGyq9PGyyqfsS3UvaN6UoPD+5TxziYXzUxMm3 oLA== X-Google-Smtp-Source: AGHT+IG+LWeqA7gxlbhJ4Aa6fTniLxLak1adrDfoZ2HZ8A15k6jPqKUZa911/S4AQd/cYmc6w8UN3NHudRE= X-Received: from pgbda9.prod.google.com ([2002:a05:6a02:2389:b0:7fd:577a:6d1e]) (user=yuzhao job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:ef4e:b0:216:7761:cc36 with SMTP id d9443c01a7336-219e6f14bbamr619775545ad.43.1735619763742; Mon, 30 Dec 2024 20:36:03 -0800 (PST) Date: Mon, 30 Dec 2024 21:35:38 -0700 In-Reply-To: <20241231043538.4075764-1-yuzhao@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241231043538.4075764-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241231043538.4075764-8-yuzhao@google.com> Subject: [PATCH mm-unstable v4 7/7] mm/mglru: fix PTE-mapped large folios From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , Barry Song , Kalesh Singh Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Count the accessed bits from PTEs mapping the same large folio as one access rather than multiple accesses. The last patch changed how folios accessed through page tables are promoted: rather than getting promoted after the accessed bit is cleared for the first time, a folio only gets promoted thereafter. Counting the accessed bits from the same large folio as multiple accesses can cause that folio to be promoted prematurely, which in turn can cause overprotection of single-use large folios. This patch reduced the sys time of the kernel compilation by 95% CI [2, 5]% on Altra M128-30 with 3GB DRAM, 12GB zram, 16KB THPs and -j32. Reported-by: Barry Song Signed-off-by: Yu Zhao Tested-by: Kalesh Singh --- mm/vmscan.c | 110 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 72 insertions(+), 38 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 74bc85fc7cdf..a099876fa029 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3431,29 +3431,55 @@ static bool suitable_to_scan(int total, int young) return young * n >=3D total; } =20 +static void walk_update_folio(struct lru_gen_mm_walk *walk, struct folio *= folio, + int new_gen, bool dirty) +{ + int old_gen; + + if (!folio) + return; + + if (dirty && !folio_test_dirty(folio) && + !(folio_test_anon(folio) && folio_test_swapbacked(folio) && + !folio_test_swapcache(folio))) + folio_mark_dirty(folio); + + if (walk) { + old_gen =3D folio_update_gen(folio, new_gen); + if (old_gen >=3D 0 && old_gen !=3D new_gen) + update_batch_size(walk, folio, old_gen, new_gen); + } else if (lru_gen_set_refs(folio)) { + old_gen =3D folio_lru_gen(folio); + if (old_gen >=3D 0 && old_gen !=3D new_gen) + folio_activate(folio); + } +} + static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long = end, struct mm_walk *args) { int i; + bool dirty; pte_t *pte; spinlock_t *ptl; unsigned long addr; int total =3D 0; int young =3D 0; + struct folio *last =3D NULL; struct lru_gen_mm_walk *walk =3D args->private; struct mem_cgroup *memcg =3D lruvec_memcg(walk->lruvec); struct pglist_data *pgdat =3D lruvec_pgdat(walk->lruvec); DEFINE_MAX_SEQ(walk->lruvec); - int old_gen, new_gen =3D lru_gen_from_seq(max_seq); + int gen =3D lru_gen_from_seq(max_seq); pmd_t pmdval; =20 - pte =3D pte_offset_map_rw_nolock(args->mm, pmd, start & PMD_MASK, &pmdval, - &ptl); + pte =3D pte_offset_map_rw_nolock(args->mm, pmd, start & PMD_MASK, &pmdval= , &ptl); if (!pte) return false; + if (!spin_trylock(ptl)) { pte_unmap(pte); - return false; + return true; } =20 if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { @@ -3482,19 +3508,23 @@ static bool walk_pte_range(pmd_t *pmd, unsigned lon= g start, unsigned long end, if (!ptep_clear_young_notify(args->vma, addr, pte + i)) continue; =20 + if (last !=3D folio) { + walk_update_folio(walk, last, gen, dirty); + + last =3D folio; + dirty =3D false; + } + + if (pte_dirty(ptent)) + dirty =3D true; + young++; walk->mm_stats[MM_LEAF_YOUNG]++; - - if (pte_dirty(ptent) && !folio_test_dirty(folio) && - !(folio_test_anon(folio) && folio_test_swapbacked(folio) && - !folio_test_swapcache(folio))) - folio_mark_dirty(folio); - - old_gen =3D folio_update_gen(folio, new_gen); - if (old_gen >=3D 0 && old_gen !=3D new_gen) - update_batch_size(walk, folio, old_gen, new_gen); } =20 + walk_update_folio(walk, last, gen, dirty); + last =3D NULL; + if (i < PTRS_PER_PTE && get_next_vma(PMD_MASK, PAGE_SIZE, args, &start, &= end)) goto restart; =20 @@ -3508,13 +3538,15 @@ static void walk_pmd_range_locked(pud_t *pud, unsig= ned long addr, struct vm_area struct mm_walk *args, unsigned long *bitmap, unsigned long *first) { int i; + bool dirty; pmd_t *pmd; spinlock_t *ptl; + struct folio *last =3D NULL; struct lru_gen_mm_walk *walk =3D args->private; struct mem_cgroup *memcg =3D lruvec_memcg(walk->lruvec); struct pglist_data *pgdat =3D lruvec_pgdat(walk->lruvec); DEFINE_MAX_SEQ(walk->lruvec); - int old_gen, new_gen =3D lru_gen_from_seq(max_seq); + int gen =3D lru_gen_from_seq(max_seq); =20 VM_WARN_ON_ONCE(pud_leaf(*pud)); =20 @@ -3567,20 +3599,23 @@ static void walk_pmd_range_locked(pud_t *pud, unsig= ned long addr, struct vm_area if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; =20 + if (last !=3D folio) { + walk_update_folio(walk, last, gen, dirty); + + last =3D folio; + dirty =3D false; + } + + if (pmd_dirty(pmd[i])) + dirty =3D true; + walk->mm_stats[MM_LEAF_YOUNG]++; - - if (pmd_dirty(pmd[i]) && !folio_test_dirty(folio) && - !(folio_test_anon(folio) && folio_test_swapbacked(folio) && - !folio_test_swapcache(folio))) - folio_mark_dirty(folio); - - old_gen =3D folio_update_gen(folio, new_gen); - if (old_gen >=3D 0 && old_gen !=3D new_gen) - update_batch_size(walk, folio, old_gen, new_gen); next: i =3D i > MIN_LRU_BATCH ? 0 : find_next_bit(bitmap, MIN_LRU_BATCH, i) + = 1; } while (i <=3D MIN_LRU_BATCH); =20 + walk_update_folio(walk, last, gen, dirty); + arch_leave_lazy_mmu_mode(); spin_unlock(ptl); done: @@ -4115,9 +4150,11 @@ static void lru_gen_age_node(struct pglist_data *pgd= at, struct scan_control *sc) bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; + bool dirty; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; + struct folio *last =3D NULL; int young =3D 1; pte_t *pte =3D pvmw->pte; unsigned long addr =3D pvmw->address; @@ -4128,7 +4165,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); struct lru_gen_mm_state *mm_state =3D get_mm_state(lruvec); DEFINE_MAX_SEQ(lruvec); - int old_gen, new_gen =3D lru_gen_from_seq(max_seq); + int gen =3D lru_gen_from_seq(max_seq); =20 lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); @@ -4182,24 +4219,21 @@ bool lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) if (!ptep_clear_young_notify(vma, addr, pte + i)) continue; =20 - young++; + if (last !=3D folio) { + walk_update_folio(walk, last, gen, dirty); =20 - if (pte_dirty(ptent) && !folio_test_dirty(folio) && - !(folio_test_anon(folio) && folio_test_swapbacked(folio) && - !folio_test_swapcache(folio))) - folio_mark_dirty(folio); - - if (walk) { - old_gen =3D folio_update_gen(folio, new_gen); - if (old_gen >=3D 0 && old_gen !=3D new_gen) - update_batch_size(walk, folio, old_gen, new_gen); - } else if (lru_gen_set_refs(folio)) { - old_gen =3D folio_lru_gen(folio); - if (old_gen >=3D 0 && old_gen !=3D new_gen) - folio_activate(folio); + last =3D folio; + dirty =3D false; } + + if (pte_dirty(ptent)) + dirty =3D true; + + young++; } =20 + walk_update_folio(walk, last, gen, dirty); + arch_leave_lazy_mmu_mode(); =20 /* feedback from rmap walkers to page table walkers */ --=20 2.47.1.613.gc27f4b7a9f-goog