From nobody Wed Dec 17 21:39:25 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56579154433 for ; Tue, 31 Dec 2024 04:35:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735619755; cv=none; b=X1dEp/6SUHE49FXDsAh1+kOzSAzl0QoERyz7lw1TU2XRPJ5dWW/5NJ10p5ZJ7FGZH4wEMthiQxAxpfmgBry2+9PEirchqGi1TAMCKXIwR1pffAJCkh6SdmdRIei4I8Wo3XOeehaxvR3qeQNMt1L3eriKhbxTbPidTuDC/HjijsU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735619755; c=relaxed/simple; bh=JmurZNY3bKU74FjFIUf7Yt0rZJUtuY0d03M13taVH4M=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nWhdI1qvE+6G4TR56Lyn4EX5jXyYra7FDqiEM3Muk8OmYsdclIA5gHzOWRyEOIDtUjB/n31a55btKZJC2d5n+bx/4QbWcNfsKAFflpru///B89BK9YfC2jbLzLDU3CH2C6NI8Y0PYVQi+KTU/9n+G2+I0VPlqSsy2Kr/5o8chuk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=c4C/YmFc; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="c4C/YmFc" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ee8ced572eso13429627a91.0 for ; Mon, 30 Dec 2024 20:35:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1735619752; x=1736224552; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=sDhUReo2deRGp+tnJMIYJBVB2YFMdmBRAUrx9f/kOiA=; b=c4C/YmFcXpBRzret4u7Q2V8GzZ8FgS87cxDjwdvlWOsxA+QVZlOzKuesRSQjsp0XvG YL507BK5V6X9fG22Udp4qy4TL5s4NPTsmRppaXmuDr5EOn59W3ibnNeRTiSdHnXa5yXd YXwgEBDhiLx3qe5bSUddPrLKpkCU66d+/siFxXT4NjZFUWSPHR4igBDSf3ion0vZOPiC ps2c2bo6cO5GGvA7DeHjfB8OpD7bAm4JaBNM47gxz9Bfqy/eF+Jb0s3QSCv1B4BqXSL/ Mhu+Z8gliFEqJY+3kQ6s4o41wU+iTIkqhjbl6BZDYQGIRqhjlFU4oTPEMJLR+BibwC1E dMXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735619752; x=1736224552; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sDhUReo2deRGp+tnJMIYJBVB2YFMdmBRAUrx9f/kOiA=; b=aTATpTAClEDCGNhj+r0+ypNi/cQPjigwhcv4K7ncJkm9ByCQyboN/9Y2Hx/sbjbe1r Wy1bvWi/rM/DS9YtkdWj+cr/hoekNZNw0/dJJy7Yjxr2b3hAV4QBVLJaWtVZeHj1o7zf ERrH8D7AGODxfCSzYLzOctN9FkpORPByyv8SJzfYlamNdD78FohmKJEH12Jb2QAq3fCv 2GKd0BItE/0odz8P7xuKnVEqwnb2pIyLxeI2dRC8sl4qoyeChkTCUlMrGz19dVNA5l5m HU568pWJWnVx4PTCz6r2BO/L7IwDcOiDfBGhDtL+ZgqOqsPstpMjNMShHYfy69e0fT3o nM3g== X-Forwarded-Encrypted: i=1; AJvYcCU9JzwLWYbrrcP3A5zzAJtNmpSh/oMZmHpqHFQh43cqltGQ/VTnd4WL0twCubvpEpYddLTXlOTHDuASp30=@vger.kernel.org X-Gm-Message-State: AOJu0Yw8b4p+Kd0vgiSageAREK8D/r146O1Ho0ZV868Azv4bIiZatEX4 NaBckEyw82jvmue2aFMR5GJMuFFgC6L9854pTrJ7zHjC9ZelDn2yW9LiTOxOgygRCUDZ7x8ZgEq 0lA== X-Google-Smtp-Source: AGHT+IG6nxstyXgWqSu+L2n9Zxz+hgGy42XY3ZsLF+ONO7heKN9bb4QBB98vkqXP6rvyE1kIrVxQ7Qx/GdU= X-Received: from pfxa12.prod.google.com ([2002:a05:6a00:1d0c:b0:725:d350:a304]) (user=yuzhao job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:8d8d:b0:725:e37d:cd36 with SMTP id d2e1a72fcca58-72abdd3c476mr58090892b3a.2.1735619752645; Mon, 30 Dec 2024 20:35:52 -0800 (PST) Date: Mon, 30 Dec 2024 21:35:34 -0700 In-Reply-To: <20241231043538.4075764-1-yuzhao@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241231043538.4075764-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241231043538.4075764-4-yuzhao@google.com> Subject: [PATCH mm-unstable v4 3/7] mm/mglru: rework aging feedback From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , David Stevens , Kalesh Singh Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The aging feedback is based on both the number of generations and the distribution of folios in each generation. The number of generations is currently the distance between max_seq and anon min_seq. This is because anon min_seq is not allowed to move past file min_seq. The rationale for that is that file is always evictable whereas anon is not. However, for use cases where anon is a lot cheaper than file: 1. Anon in the second oldest generation can be a better choice than file in the oldest generation. 2. A large amount of file in the oldest generation can skew the distribution, making should_run_aging() return false negative. Allow anon and file min_seq to move independently, and use solely the number of generations as the feedback for aging. Specifically, when both anon and file are evictable, anon min_seq can now be greater than file min_seq, and therefore the number of generations becomes the distance between max_seq and min(min_seq[0],min_seq[1]). And should_run_aging() returns true if and only if the number of generations is less than MAX_NR_GENS. As the first step to the final optimization, this change by itself should not have userspace-visiable effects beyond performance. The next twos patch will take advantage of this change; the last patch in this series will better distribute folios across MAX_NR_GENS. Reported-by: David Stevens Signed-off-by: Yu Zhao Tested-by: Kalesh Singh --- include/linux/mmzone.h | 17 ++-- mm/vmscan.c | 200 ++++++++++++++++++----------------------- 2 files changed, 96 insertions(+), 121 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b36124145a16..8245ecb0400b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -421,12 +421,11 @@ enum { /* * The youngest generation number is stored in max_seq for both anon and f= ile * types as they are aged on an equal footing. The oldest generation numbe= rs are - * stored in min_seq[] separately for anon and file types as clean file pa= ges - * can be evicted regardless of swap constraints. - * - * Normally anon and file min_seq are in sync. But if swapping is constrai= ned, - * e.g., out of swap space, file min_seq is allowed to advance and leave a= non - * min_seq behind. + * stored in min_seq[] separately for anon and file types so that they can= be + * incremented independently. Ideally min_seq[] are kept in sync when both= anon + * and file types are evictable. However, to adapt to situations like extr= eme + * swappiness, they are allowed to be out of sync by at most + * MAX_NR_GENS-MIN_NR_GENS-1. * * The number of pages in each generation is eventually consistent and the= refore * can be transiently negative when reset_batch_size() is pending. @@ -446,8 +445,8 @@ struct lru_gen_folio { unsigned long avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; /* the exponential moving average of evicted+protected */ unsigned long avg_total[ANON_AND_FILE][MAX_NR_TIERS]; - /* the first tier doesn't need protection, hence the minus one */ - unsigned long protected[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS - 1]; + /* can only be modified under the LRU lock */ + unsigned long protected[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; /* can be modified without holding the LRU lock */ atomic_long_t evicted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; @@ -498,7 +497,7 @@ struct lru_gen_mm_walk { int mm_stats[NR_MM_STATS]; /* total batched items */ int batched; - bool can_swap; + int swappiness; bool force_scan; }; =20 diff --git a/mm/vmscan.c b/mm/vmscan.c index f236db86de8a..f767e3d34e73 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2627,11 +2627,17 @@ static bool should_clear_pmd_young(void) READ_ONCE((lruvec)->lrugen.min_seq[LRU_GEN_FILE]), \ } =20 +#define evictable_min_seq(min_seq, swappiness) \ + min((min_seq)[!(swappiness)], (min_seq)[(swappiness) !=3D MAX_SWAPPINESS]) + #define for_each_gen_type_zone(gen, type, zone) \ for ((gen) =3D 0; (gen) < MAX_NR_GENS; (gen)++) \ for ((type) =3D 0; (type) < ANON_AND_FILE; (type)++) \ for ((zone) =3D 0; (zone) < MAX_NR_ZONES; (zone)++) =20 +#define for_each_evictable_type(type, swappiness) \ + for ((type) =3D !(swappiness); (type) <=3D ((swappiness) !=3D MAX_SWAPPIN= ESS); (type)++) + #define get_memcg_gen(seq) ((seq) % MEMCG_NR_GENS) #define get_memcg_bin(bin) ((bin) % MEMCG_NR_BINS) =20 @@ -2677,10 +2683,16 @@ static int get_nr_gens(struct lruvec *lruvec, int t= ype) =20 static bool __maybe_unused seq_is_valid(struct lruvec *lruvec) { - /* see the comment on lru_gen_folio */ - return get_nr_gens(lruvec, LRU_GEN_FILE) >=3D MIN_NR_GENS && - get_nr_gens(lruvec, LRU_GEN_FILE) <=3D get_nr_gens(lruvec, LRU_GEN= _ANON) && - get_nr_gens(lruvec, LRU_GEN_ANON) <=3D MAX_NR_GENS; + int type; + + for (type =3D 0; type < ANON_AND_FILE; type++) { + int n =3D get_nr_gens(lruvec, type); + + if (n < MIN_NR_GENS || n > MAX_NR_GENS) + return false; + } + + return true; } =20 /*************************************************************************= ***** @@ -3087,9 +3099,8 @@ static void read_ctrl_pos(struct lruvec *lruvec, int = type, int tier, int gain, pos->refaulted =3D lrugen->avg_refaulted[type][tier] + atomic_long_read(&lrugen->refaulted[hist][type][tier]); pos->total =3D lrugen->avg_total[type][tier] + + lrugen->protected[hist][type][tier] + atomic_long_read(&lrugen->evicted[hist][type][tier]); - if (tier) - pos->total +=3D lrugen->protected[hist][type][tier - 1]; pos->gain =3D gain; } =20 @@ -3116,17 +3127,15 @@ static void reset_ctrl_pos(struct lruvec *lruvec, i= nt type, bool carryover) WRITE_ONCE(lrugen->avg_refaulted[type][tier], sum / 2); =20 sum =3D lrugen->avg_total[type][tier] + + lrugen->protected[hist][type][tier] + atomic_long_read(&lrugen->evicted[hist][type][tier]); - if (tier) - sum +=3D lrugen->protected[hist][type][tier - 1]; WRITE_ONCE(lrugen->avg_total[type][tier], sum / 2); } =20 if (clear) { atomic_long_set(&lrugen->refaulted[hist][type][tier], 0); atomic_long_set(&lrugen->evicted[hist][type][tier], 0); - if (tier) - WRITE_ONCE(lrugen->protected[hist][type][tier - 1], 0); + WRITE_ONCE(lrugen->protected[hist][type][tier], 0); } } } @@ -3261,7 +3270,7 @@ static int should_skip_vma(unsigned long start, unsig= ned long end, struct mm_wal return true; =20 if (vma_is_anonymous(vma)) - return !walk->can_swap; + return !walk->swappiness; =20 if (WARN_ON_ONCE(!vma->vm_file || !vma->vm_file->f_mapping)) return true; @@ -3271,7 +3280,10 @@ static int should_skip_vma(unsigned long start, unsi= gned long end, struct mm_wal return true; =20 if (shmem_mapping(mapping)) - return !walk->can_swap; + return !walk->swappiness; + + if (walk->swappiness =3D=3D MAX_SWAPPINESS) + return true; =20 /* to exclude special mappings like dax, etc. */ return !mapping->a_ops->read_folio; @@ -3359,7 +3371,7 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm= _area_struct *vma, unsigned } =20 static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *m= emcg, - struct pglist_data *pgdat, bool can_swap) + struct pglist_data *pgdat) { struct folio *folio; =20 @@ -3370,10 +3382,6 @@ static struct folio *get_pfn_folio(unsigned long pfn= , struct mem_cgroup *memcg, if (folio_memcg(folio) !=3D memcg) return NULL; =20 - /* file VMAs can contain anon pages from COW */ - if (!folio_is_file_lru(folio) && !can_swap) - return NULL; - return folio; } =20 @@ -3429,7 +3437,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long = start, unsigned long end, if (pfn =3D=3D -1) continue; =20 - folio =3D get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); + folio =3D get_pfn_folio(pfn, memcg, pgdat); if (!folio) continue; =20 @@ -3514,7 +3522,7 @@ static void walk_pmd_range_locked(pud_t *pud, unsigne= d long addr, struct vm_area if (pfn =3D=3D -1) goto next; =20 - folio =3D get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); + folio =3D get_pfn_folio(pfn, memcg, pgdat); if (!folio) goto next; =20 @@ -3726,22 +3734,26 @@ static void clear_mm_walk(void) kfree(walk); } =20 -static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) +static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness) { int zone; int remaining =3D MAX_LRU_BATCH; struct lru_gen_folio *lrugen =3D &lruvec->lrugen; + int hist =3D lru_hist_from_seq(lrugen->min_seq[type]); int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[type]); =20 - if (type =3D=3D LRU_GEN_ANON && !can_swap) + if (type ? swappiness =3D=3D MAX_SWAPPINESS : !swappiness) goto done; =20 - /* prevent cold/hot inversion if force_scan is true */ + /* prevent cold/hot inversion if the type is evictable */ for (zone =3D 0; zone < MAX_NR_ZONES; zone++) { struct list_head *head =3D &lrugen->folios[old_gen][type][zone]; =20 while (!list_empty(head)) { struct folio *folio =3D lru_to_folio(head); + int refs =3D folio_lru_refs(folio); + int tier =3D lru_tier_from_refs(refs); + int delta =3D folio_nr_pages(folio); =20 VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio); VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio), folio); @@ -3751,6 +3763,9 @@ static bool inc_min_seq(struct lruvec *lruvec, int ty= pe, bool can_swap) new_gen =3D folio_inc_gen(lruvec, folio, false); list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]); =20 + WRITE_ONCE(lrugen->protected[hist][type][tier], + lrugen->protected[hist][type][tier] + delta); + if (!--remaining) return false; } @@ -3762,7 +3777,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int ty= pe, bool can_swap) return true; } =20 -static bool try_to_inc_min_seq(struct lruvec *lruvec, bool can_swap) +static bool try_to_inc_min_seq(struct lruvec *lruvec, int swappiness) { int gen, type, zone; bool success =3D false; @@ -3772,7 +3787,7 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec,= bool can_swap) VM_WARN_ON_ONCE(!seq_is_valid(lruvec)); =20 /* find the oldest populated generation */ - for (type =3D !can_swap; type < ANON_AND_FILE; type++) { + for_each_evictable_type(type, swappiness) { while (min_seq[type] + MIN_NR_GENS <=3D lrugen->max_seq) { gen =3D lru_gen_from_seq(min_seq[type]); =20 @@ -3788,13 +3803,17 @@ static bool try_to_inc_min_seq(struct lruvec *lruve= c, bool can_swap) } =20 /* see the comment on lru_gen_folio */ - if (can_swap) { - min_seq[LRU_GEN_ANON] =3D min(min_seq[LRU_GEN_ANON], min_seq[LRU_GEN_FIL= E]); - min_seq[LRU_GEN_FILE] =3D max(min_seq[LRU_GEN_ANON], lrugen->min_seq[LRU= _GEN_FILE]); + if (swappiness && swappiness !=3D MAX_SWAPPINESS) { + unsigned long seq =3D lrugen->max_seq - MIN_NR_GENS; + + if (min_seq[LRU_GEN_ANON] > seq && min_seq[LRU_GEN_FILE] < seq) + min_seq[LRU_GEN_ANON] =3D seq; + else if (min_seq[LRU_GEN_FILE] > seq && min_seq[LRU_GEN_ANON] < seq) + min_seq[LRU_GEN_FILE] =3D seq; } =20 - for (type =3D !can_swap; type < ANON_AND_FILE; type++) { - if (min_seq[type] =3D=3D lrugen->min_seq[type]) + for_each_evictable_type(type, swappiness) { + if (min_seq[type] <=3D lrugen->min_seq[type]) continue; =20 reset_ctrl_pos(lruvec, type, true); @@ -3805,8 +3824,7 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec,= bool can_swap) return success; } =20 -static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, - bool can_swap, bool force_scan) +static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, int swap= piness) { bool success; int prev, next; @@ -3824,13 +3842,11 @@ static bool inc_max_seq(struct lruvec *lruvec, unsi= gned long seq, if (!success) goto unlock; =20 - for (type =3D ANON_AND_FILE - 1; type >=3D 0; type--) { + for (type =3D 0; type < ANON_AND_FILE; type++) { if (get_nr_gens(lruvec, type) !=3D MAX_NR_GENS) continue; =20 - VM_WARN_ON_ONCE(!force_scan && (type =3D=3D LRU_GEN_FILE || can_swap)); - - if (inc_min_seq(lruvec, type, can_swap)) + if (inc_min_seq(lruvec, type, swappiness)) continue; =20 spin_unlock_irq(&lruvec->lru_lock); @@ -3874,7 +3890,7 @@ static bool inc_max_seq(struct lruvec *lruvec, unsign= ed long seq, } =20 static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long seq, - bool can_swap, bool force_scan) + int swappiness, bool force_scan) { bool success; struct lru_gen_mm_walk *walk; @@ -3885,7 +3901,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec,= unsigned long seq, VM_WARN_ON_ONCE(seq > READ_ONCE(lrugen->max_seq)); =20 if (!mm_state) - return inc_max_seq(lruvec, seq, can_swap, force_scan); + return inc_max_seq(lruvec, seq, swappiness); =20 /* see the comment in iterate_mm_list() */ if (seq <=3D READ_ONCE(mm_state->seq)) @@ -3910,7 +3926,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec,= unsigned long seq, =20 walk->lruvec =3D lruvec; walk->seq =3D seq; - walk->can_swap =3D can_swap; + walk->swappiness =3D swappiness; walk->force_scan =3D force_scan; =20 do { @@ -3920,7 +3936,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec,= unsigned long seq, } while (mm); done: if (success) { - success =3D inc_max_seq(lruvec, seq, can_swap, force_scan); + success =3D inc_max_seq(lruvec, seq, swappiness); WARN_ON_ONCE(!success); } =20 @@ -3961,13 +3977,13 @@ static bool lruvec_is_sizable(struct lruvec *lruvec= , struct scan_control *sc) { int gen, type, zone; unsigned long total =3D 0; - bool can_swap =3D get_swappiness(lruvec, sc); + int swappiness =3D get_swappiness(lruvec, sc); struct lru_gen_folio *lrugen =3D &lruvec->lrugen; struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); DEFINE_MAX_SEQ(lruvec); DEFINE_MIN_SEQ(lruvec); =20 - for (type =3D !can_swap; type < ANON_AND_FILE; type++) { + for_each_evictable_type(type, swappiness) { unsigned long seq; =20 for (seq =3D min_seq[type]; seq <=3D max_seq; seq++) { @@ -3987,6 +4003,7 @@ static bool lruvec_is_reclaimable(struct lruvec *lruv= ec, struct scan_control *sc { int gen; unsigned long birth; + int swappiness =3D get_swappiness(lruvec, sc); struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); DEFINE_MIN_SEQ(lruvec); =20 @@ -3996,8 +4013,7 @@ static bool lruvec_is_reclaimable(struct lruvec *lruv= ec, struct scan_control *sc if (!lruvec_is_sizable(lruvec, sc)) return false; =20 - /* see the comment on lru_gen_folio */ - gen =3D lru_gen_from_seq(min_seq[LRU_GEN_FILE]); + gen =3D lru_gen_from_seq(evictable_min_seq(min_seq, swappiness)); birth =3D READ_ONCE(lruvec->lrugen.timestamps[gen]); =20 return time_is_before_jiffies(birth + min_ttl); @@ -4064,7 +4080,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) unsigned long addr =3D pvmw->address; struct vm_area_struct *vma =3D pvmw->vma; struct folio *folio =3D pfn_folio(pvmw->pfn); - bool can_swap =3D !folio_is_file_lru(folio); struct mem_cgroup *memcg =3D folio_memcg(folio); struct pglist_data *pgdat =3D folio_pgdat(folio); struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); @@ -4117,7 +4132,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) if (pfn =3D=3D -1) continue; =20 - folio =3D get_pfn_folio(pfn, memcg, pgdat, can_swap); + folio =3D get_pfn_folio(pfn, memcg, pgdat); if (!folio) continue; =20 @@ -4333,8 +4348,8 @@ static bool sort_folio(struct lruvec *lruvec, struct = folio *folio, struct scan_c gen =3D folio_inc_gen(lruvec, folio, false); list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); =20 - WRITE_ONCE(lrugen->protected[hist][type][tier - 1], - lrugen->protected[hist][type][tier - 1] + delta); + WRITE_ONCE(lrugen->protected[hist][type][tier], + lrugen->protected[hist][type][tier] + delta); return true; } =20 @@ -4533,7 +4548,6 @@ static int isolate_folios(struct lruvec *lruvec, stru= ct scan_control *sc, int sw { int i; int type; - int scanned; int tier =3D -1; DEFINE_MIN_SEQ(lruvec); =20 @@ -4558,21 +4572,23 @@ static int isolate_folios(struct lruvec *lruvec, st= ruct scan_control *sc, int sw else type =3D get_type_to_scan(lruvec, swappiness, &tier); =20 - for (i =3D !swappiness; i < ANON_AND_FILE; i++) { + for_each_evictable_type(i, swappiness) { + int scanned; + if (tier < 0) tier =3D get_tier_idx(lruvec, type); =20 + *type_scanned =3D type; + scanned =3D scan_folios(lruvec, sc, type, tier, list); if (scanned) - break; + return scanned; =20 type =3D !type; tier =3D -1; } =20 - *type_scanned =3D type; - - return scanned; + return 0; } =20 static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, in= t swappiness) @@ -4588,6 +4604,7 @@ static int evict_folios(struct lruvec *lruvec, struct= scan_control *sc, int swap struct reclaim_stat stat; struct lru_gen_mm_walk *walk; bool skip_retry =3D false; + struct lru_gen_folio *lrugen =3D &lruvec->lrugen; struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); =20 @@ -4597,7 +4614,7 @@ static int evict_folios(struct lruvec *lruvec, struct= scan_control *sc, int swap =20 scanned +=3D try_to_inc_min_seq(lruvec, swappiness); =20 - if (get_nr_gens(lruvec, !swappiness) =3D=3D MIN_NR_GENS) + if (evictable_min_seq(lrugen->min_seq, swappiness) + MIN_NR_GENS > lrugen= ->max_seq) scanned =3D 0; =20 spin_unlock_irq(&lruvec->lru_lock); @@ -4669,63 +4686,32 @@ static int evict_folios(struct lruvec *lruvec, stru= ct scan_control *sc, int swap } =20 static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, - bool can_swap, unsigned long *nr_to_scan) + int swappiness, unsigned long *nr_to_scan) { int gen, type, zone; - unsigned long old =3D 0; - unsigned long young =3D 0; - unsigned long total =3D 0; + unsigned long size =3D 0; struct lru_gen_folio *lrugen =3D &lruvec->lrugen; DEFINE_MIN_SEQ(lruvec); =20 - /* whether this lruvec is completely out of cold folios */ - if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) { - *nr_to_scan =3D 0; + *nr_to_scan =3D 0; + /* have to run aging, since eviction is not possible anymore */ + if (evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS > max_seq) return true; - } =20 - for (type =3D !can_swap; type < ANON_AND_FILE; type++) { + for_each_evictable_type(type, swappiness) { unsigned long seq; =20 for (seq =3D min_seq[type]; seq <=3D max_seq; seq++) { - unsigned long size =3D 0; - gen =3D lru_gen_from_seq(seq); =20 for (zone =3D 0; zone < MAX_NR_ZONES; zone++) size +=3D max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L); - - total +=3D size; - if (seq =3D=3D max_seq) - young +=3D size; - else if (seq + MIN_NR_GENS =3D=3D max_seq) - old +=3D size; } } =20 - *nr_to_scan =3D total; - - /* - * The aging tries to be lazy to reduce the overhead, while the eviction - * stalls when the number of generations reaches MIN_NR_GENS. Hence, the - * ideal number of generations is MIN_NR_GENS+1. - */ - if (min_seq[!can_swap] + MIN_NR_GENS < max_seq) - return false; - - /* - * It's also ideal to spread pages out evenly, i.e., 1/(MIN_NR_GENS+1) - * of the total number of pages for each generation. A reasonable range - * for this average portion is [1/MIN_NR_GENS, 1/(MIN_NR_GENS+2)]. The - * aging cares about the upper bound of hot pages, while the eviction - * cares about the lower bound of cold pages. - */ - if (young * MIN_NR_GENS > total) - return true; - if (old * (MIN_NR_GENS + 2) < total) - return true; - - return false; + *nr_to_scan =3D size; + /* better to run aging even though eviction is still possible */ + return evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS =3D=3D max_se= q; } =20 /* @@ -4733,7 +4719,7 @@ static bool should_run_aging(struct lruvec *lruvec, u= nsigned long max_seq, * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg * reclaim. */ -static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,= bool can_swap) +static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,= int swappiness) { bool success; unsigned long nr_to_scan; @@ -4743,7 +4729,7 @@ static long get_nr_to_scan(struct lruvec *lruvec, str= uct scan_control *sc, bool if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) return -1; =20 - success =3D should_run_aging(lruvec, max_seq, can_swap, &nr_to_scan); + success =3D should_run_aging(lruvec, max_seq, swappiness, &nr_to_scan); =20 /* try to scrape all its memory if this memcg was deleted */ if (nr_to_scan && !mem_cgroup_online(memcg)) @@ -4754,7 +4740,7 @@ static long get_nr_to_scan(struct lruvec *lruvec, str= uct scan_control *sc, bool return nr_to_scan >> sc->priority; =20 /* stop scanning this lruvec as it's low on cold folios */ - return try_to_inc_max_seq(lruvec, max_seq, can_swap, false) ? -1 : 0; + return try_to_inc_max_seq(lruvec, max_seq, swappiness, false) ? -1 : 0; } =20 static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *= sc) @@ -5298,8 +5284,7 @@ static void lru_gen_seq_show_full(struct seq_file *m,= struct lruvec *lruvec, s =3D "rep"; n[0] =3D atomic_long_read(&lrugen->refaulted[hist][type][tier]); n[1] =3D atomic_long_read(&lrugen->evicted[hist][type][tier]); - if (tier) - n[2] =3D READ_ONCE(lrugen->protected[hist][type][tier - 1]); + n[2] =3D READ_ONCE(lrugen->protected[hist][type][tier]); } =20 for (i =3D 0; i < 3; i++) @@ -5354,7 +5339,7 @@ static int lru_gen_seq_show(struct seq_file *m, void = *v) seq_printf(m, " node %5d\n", nid); =20 if (!full) - seq =3D min_seq[LRU_GEN_ANON]; + seq =3D evictable_min_seq(min_seq, MAX_SWAPPINESS / 2); else if (max_seq >=3D MAX_NR_GENS) seq =3D max_seq - MAX_NR_GENS + 1; else @@ -5394,23 +5379,14 @@ static const struct seq_operations lru_gen_seq_ops = =3D { }; =20 static int run_aging(struct lruvec *lruvec, unsigned long seq, - bool can_swap, bool force_scan) + int swappiness, bool force_scan) { DEFINE_MAX_SEQ(lruvec); - DEFINE_MIN_SEQ(lruvec); - - if (seq < max_seq) - return 0; =20 if (seq > max_seq) return -EINVAL; =20 - if (!force_scan && min_seq[!can_swap] + MAX_NR_GENS - 1 <=3D max_seq) - return -ERANGE; - - try_to_inc_max_seq(lruvec, max_seq, can_swap, force_scan); - - return 0; + return try_to_inc_max_seq(lruvec, max_seq, swappiness, force_scan) ? 0 : = -EEXIST; } =20 static int run_eviction(struct lruvec *lruvec, unsigned long seq, struct s= can_control *sc, @@ -5426,7 +5402,7 @@ static int run_eviction(struct lruvec *lruvec, unsign= ed long seq, struct scan_co while (!signal_pending(current)) { DEFINE_MIN_SEQ(lruvec); =20 - if (seq < min_seq[!swappiness]) + if (seq < evictable_min_seq(min_seq, swappiness)) return 0; =20 if (sc->nr_reclaimed >=3D nr_to_reclaim) --=20 2.47.1.613.gc27f4b7a9f-goog