From nobody Sun Sep 14 22:16:54 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3E00C38142 for ; Wed, 18 Jan 2023 00:43:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229904AbjARAnD (ORCPT ); Tue, 17 Jan 2023 19:43:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229558AbjARAl2 (ORCPT ); Tue, 17 Jan 2023 19:41:28 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1D143BDAC for ; Tue, 17 Jan 2023 16:18:43 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-4d952e13250so175023447b3.5 for ; Tue, 17 Jan 2023 16:18:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jALjY0Bdqm1/YesVsEUhs8/Cxbw0KA6q73WmJmcfOK4=; b=pX6C/v257+Os4JnZfLWjhxBV0VioyPSdesGvFKMhRi/HOpAYPq96hFce0IJgNobb3f 2RH/XPSAGyC+3a1jFn0o95RyFf0afCup5DSBmmLZgA3+4x68ZEm3/AiWKPt2dO0SAhs2 qEKm8jYZbPrYzG4alAYJzfZyA0nxYmHbvumNSkV7f4z7yu3jhN0NtanMPAdCj1/USyGx 1L4h4tnaeAl/KcXQN6xJTx54BEZhW+49QSN0vQbi0KCGhohYKV9fc+dlLROGMatZSamc sO4MmeMwltJif26xvMx8YoX5Nz26pJPF/BhFL5LC2OwAz51Arhjn3iwQbHW7fR73j/xa eO5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jALjY0Bdqm1/YesVsEUhs8/Cxbw0KA6q73WmJmcfOK4=; b=2zfzQrIwHkfyDaemaMz4dcyYpoc267DCNRVe0PJsTC61wvjYXCBCBaBBxUVxWIEEZ4 oLFGMkXUn7Rqg7DbvCV3fWTvLHQ5Y70RCoiqzTjs9q2p9v/FaLaW5gxDvbXNymEFKXdg ic7kYtiwtPa1WhTSxmFke+rmBq9aR8wxI9aiCBnjK0Xm0d9ps9zIdrVQgz5u+lULyq3d NaOS0VNLaRwOBckhBlj9F8ui/R0WYsqttvQtbdtg133MlSwTpqbPYQL+Sil7X6M2jizu MTEWmAodTLtUcebWbiaLYmbguKm2WEJ090ztbpXvJRE5cYMn9Jf/dCXl6W0vpx/6rWtd VCgg== X-Gm-Message-State: AFqh2kroAdO1jFMfnzayaA6fbVwPNXt9ZJXIrhRu9XpXP92ZPyxuLchG dQ3HMTKqDEpvg8bJ9vKF/e+Q6igr5hbZNQ== X-Google-Smtp-Source: AMrXdXtzbC2enjCfumzv2Oq91nq399MVoLbi+AK4XxL/JTK6ADZ9ziefgILXVLLfQa1YAb/fYw2dRvsw1q4h+w== X-Received: from talumbau.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:90d]) (user=talumbau job=sendgmr) by 2002:a5b:eca:0:b0:7b9:dda1:dd24 with SMTP id a10-20020a5b0eca000000b007b9dda1dd24mr542766ybs.339.1674001122864; Tue, 17 Jan 2023 16:18:42 -0800 (PST) Date: Wed, 18 Jan 2023 00:18:21 +0000 In-Reply-To: <20230118001827.1040870-1-talumbau@google.com> Mime-Version: 1.0 References: <20230118001827.1040870-1-talumbau@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230118001827.1040870-2-talumbau@google.com> Subject: [PATCH mm-unstable v1 1/7] mm: multi-gen LRU: section for working set protection From: "T.J. Alumbaugh" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-mm@google.com, "T.J. Alumbaugh" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a section for working set protection in the code and the design doc. The admin doc already contains its usage. Signed-off-by: T.J. Alumbaugh --- Documentation/mm/multigen_lru.rst | 15 +++++++++++++++ mm/vmscan.c | 4 ++++ 2 files changed, 19 insertions(+) diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_= lru.rst index d8f721f98868..6e1483e70fdc 100644 --- a/Documentation/mm/multigen_lru.rst +++ b/Documentation/mm/multigen_lru.rst @@ -141,6 +141,21 @@ loop has detected outlying refaults from the tier this= page is in. To this end, the feedback loop uses the first tier as the baseline, for the reason stated earlier. =20 +Working set protection +---------------------- +Each generation is timestamped at birth. If ``lru_gen_min_ttl`` is +set, an ``lruvec`` is protected from the eviction when its oldest +generation was born within ``lru_gen_min_ttl`` milliseconds. In other +words, it prevents the working set of ``lru_gen_min_ttl`` milliseconds +from getting evicted. The OOM killer is triggered if this working set +cannot be kept in memory. + +This time-based approach has the following advantages: + +1. It is easier to configure because it is agnostic to applications + and memory sizes. +2. It is more reliable because it is directly wired to the OOM killer. + Summary ------- The multi-gen LRU can be disassembled into the following parts: diff --git a/mm/vmscan.c b/mm/vmscan.c index 394ff4962cbc..a741765896b6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4475,6 +4475,10 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec= , unsigned long max_seq, return true; } =20 +/*************************************************************************= ***** + * working set protection + *************************************************************************= *****/ + static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *= sc) { int gen, type, zone; --=20 2.39.0.314.g84b9a713c41-goog From nobody Sun Sep 14 22:16:54 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AFA6C00A5A for ; Wed, 18 Jan 2023 00:42:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229728AbjARAmz (ORCPT ); Tue, 17 Jan 2023 19:42:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbjARAla (ORCPT ); Tue, 17 Jan 2023 19:41:30 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAFA33B67A for ; Tue, 17 Jan 2023 16:18:44 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-4d4b54d0731so215769497b3.18 for ; Tue, 17 Jan 2023 16:18:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WTRJVqBDkGUHWO7WMmajNVElkfX34x7nOoLfAt9+bEQ=; b=EX4UN/0eFUWIexqbQBRbpsGvA8KNCkS9OA2NXANeTD6ptgjqai35FyeOaIlNA+FBeT Cnm1APiNeCIgthHh2iELTziFYadXwmKI+My466IFRjTf/8YHjX/qU076qk38a1evmlWx 9cOBrA43RLYgGR0khB9BTTTnuGuKySesuafSPYRMecz6/nmWTUyMMYSArHGqmlT0Lqvp 18oLpRY+JKlwBTFpzRAjjAaW5oIyD/R7ryPOJioSA9lpHAZcpz2CvjjALChgEZLHXOM7 SoqMJT034sFZS2SsfMRRiWjeQ3w338ooV2rbafg3XxSq36dYpf7Y7Qlv8xp/0Z4MLaY/ 8dKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WTRJVqBDkGUHWO7WMmajNVElkfX34x7nOoLfAt9+bEQ=; b=ECmhb/TiKXSZPq9FmAoV7nNggFFrlItUwRSplg1rLZ0cbX+QHY6nIYbRU0QK85i5va X3Z8gYpS+1x9pxJKZX73Q54JRY62lQyd3095vKPkLitxQPVhL27N+J3XQDYUUG3EXJfd s+3MSFWLb48iXVAnWEctGlNJqHKr4YSko5AOAlpx6iwbYqDarzj0K1NOtTLdeplpoZrX fv8TiXjS5X/ALnn3EvzigPcKUdP55YOR2wh2a/a4JzuWEXQHRGPx14OoXl7+YV2Fm5TG 3oUJvG1sJDmkrX2zcAT2yZWSllG1X/c7f5F6TOerRhAiURL+XoI76bS+0OfGJI60LVf9 bfpQ== X-Gm-Message-State: AFqh2koqTorviGJj1HiU5vqZ/ewqOdab86Q8VF/iKqvOcQx5UceCQ4dS pggOi7ex4befzLERHWOZfLm4HK9xR4dV/w== X-Google-Smtp-Source: AMrXdXtD7bqi98Yh7h2wUwVBLNMYi2DQaFvYeFfYECMq032GE0MLbvHBZdI0aQprknrCs8TdSBeXA1M9DG39IQ== X-Received: from talumbau.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:90d]) (user=talumbau job=sendgmr) by 2002:a25:6089:0:b0:7b4:6882:d442 with SMTP id u131-20020a256089000000b007b46882d442mr579253ybb.429.1674001123956; Tue, 17 Jan 2023 16:18:43 -0800 (PST) Date: Wed, 18 Jan 2023 00:18:22 +0000 In-Reply-To: <20230118001827.1040870-1-talumbau@google.com> Mime-Version: 1.0 References: <20230118001827.1040870-1-talumbau@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230118001827.1040870-3-talumbau@google.com> Subject: [PATCH mm-unstable v1 2/7] mm: multi-gen LRU: section for rmap/PT walk feedback From: "T.J. Alumbaugh" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-mm@google.com, "T.J. Alumbaugh" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a section for lru_gen_look_around() in the code and the design doc. Signed-off-by: T.J. Alumbaugh --- Documentation/mm/multigen_lru.rst | 14 ++++++++++++++ mm/vmscan.c | 4 ++++ 2 files changed, 18 insertions(+) diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_= lru.rst index 6e1483e70fdc..bd988a142bc2 100644 --- a/Documentation/mm/multigen_lru.rst +++ b/Documentation/mm/multigen_lru.rst @@ -156,6 +156,20 @@ This time-based approach has the following advantages: and memory sizes. 2. It is more reliable because it is directly wired to the OOM killer. =20 +Rmap/PT walk feedback +--------------------- +Searching the rmap for PTEs mapping each page on an LRU list (to test +and clear the accessed bit) can be expensive because pages from +different VMAs (PA space) are not cache friendly to the rmap (VA +space). For workloads mostly using mapped pages, searching the rmap +can incur the highest CPU cost in the reclaim path. + +``lru_gen_look_around()`` exploits spatial locality to reduce the +trips into the rmap. It scans the adjacent PTEs of a young PTE and +promotes hot pages. If the scan was done cacheline efficiently, it +adds the PMD entry pointing to the PTE table to the Bloom filter. This +forms a feedback loop between the eviction and the aging. + Summary ------- The multi-gen LRU can be disassembled into the following parts: diff --git a/mm/vmscan.c b/mm/vmscan.c index a741765896b6..eb9263bf6806 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4569,6 +4569,10 @@ static void lru_gen_age_node(struct pglist_data *pgd= at, struct scan_control *sc) } } =20 +/*************************************************************************= ***** + * rmap/PT walk feedback + *************************************************************************= *****/ + /* * This function exploits spatial locality when shrink_folio_list() walks = the * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages.= If --=20 2.39.0.314.g84b9a713c41-goog From nobody Sun Sep 14 22:16:54 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31804C00A5A for ; Wed, 18 Jan 2023 00:42:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229895AbjARAmp (ORCPT ); Tue, 17 Jan 2023 19:42:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229784AbjARAlb (ORCPT ); Tue, 17 Jan 2023 19:41:31 -0500 Received: from mail-ot1-x34a.google.com (mail-ot1-x34a.google.com [IPv6:2607:f8b0:4864:20::34a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD71E3BDB3 for ; Tue, 17 Jan 2023 16:18:45 -0800 (PST) Received: by mail-ot1-x34a.google.com with SMTP id 46-20020a9d0331000000b00684ce0aa309so7650904otv.9 for ; Tue, 17 Jan 2023 16:18:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fBsOlKlcuwYg1TO5fnMZqwWlWQzMMR0DkSccvt40dYk=; b=XayJErSb1aUIDJQCWLMV0aDRWh/VtUQOtmRqg/ZJxZnU40ttu0e9zM+Xhxtdl++5pZ lPE3P44X2Ej530Kl88sXanY2BnvXQKxEhrYGsF3QszLz1C4W4SMucbXVkGn/GRTAgsn2 fFc12zaIEm71Lua2HdCr1jxbKNOSRWe49p+eLsRiILK9qxazoqtwDNvvc2+AMNizUPm2 e+/XjaknM6u/pigZ5eKwgASGZNUD+/mH4qvaYvqulYP8AIKpFrvSNg+xpxnOuBFuUz7C HVMdu1disoW9P4vfqufI7AfwND1y3wI1M6ClDfOdPy2+sDpqlvuwiGDN617BzYG4SlL3 kYvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fBsOlKlcuwYg1TO5fnMZqwWlWQzMMR0DkSccvt40dYk=; b=nttOwYamA1P+NOvfhiHpRdqrvAZoU3q+nmFpOHcGGok8uVfQNWNN+DodfioCf4GUj8 4fi7zuFRYMqpDWXqwMdHHf11c9dul2ecL7v8SFXd260M7krf7jgIhTrsH0jcYxejZv4P ipBWBctfFZUutNSuJ1mjYWMdHXvnM5AOOOxLL7cjj3iEuN+AkWIL0R1FX5Yk+lFdLN19 GToQibJLh3yBhsYQiphuzJGfS4PSSb6ABhXBFBsnq2n2oYq75OHZNgjGZCBzYB80fqbF wXlNbaEDgNwOfyRVoseC9+iPfnLMaZaOlBOCEi75YiGaeZcSfaCo4+icBxSjkAZe9rZa /8AA== X-Gm-Message-State: AFqh2kqEVUbeTUf7F4AK1yZ46n0NjCTlGhdsjv5Zi5bD4C5URb/kYnMj mzQfVSkms3Dn4LflsoeczsRY4BVt9fDcfw== X-Google-Smtp-Source: AMrXdXvyo6DW2fTIellOLWSjvwbbsAvYsTAnHGBC++acHS1loujh0p6QFwQu+rlsujM/lM0FuAgtv4WyxwJreA== X-Received: from talumbau.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:90d]) (user=talumbau job=sendgmr) by 2002:a05:6808:6d8:b0:35b:58e9:8890 with SMTP id m24-20020a05680806d800b0035b58e98890mr272274oih.243.1674001125047; Tue, 17 Jan 2023 16:18:45 -0800 (PST) Date: Wed, 18 Jan 2023 00:18:23 +0000 In-Reply-To: <20230118001827.1040870-1-talumbau@google.com> Mime-Version: 1.0 References: <20230118001827.1040870-1-talumbau@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230118001827.1040870-4-talumbau@google.com> Subject: [PATCH mm-unstable v1 3/7] mm: multi-gen LRU: section for Bloom filters From: "T.J. Alumbaugh" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-mm@google.com, "T.J. Alumbaugh" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move Bloom filters code into a dedicated section. Improve the design doc to explain Bloom filter usage and connection between aging and eviction in their use. Signed-off-by: T.J. Alumbaugh --- Documentation/mm/multigen_lru.rst | 16 +++ mm/vmscan.c | 180 +++++++++++++++--------------- 2 files changed, 108 insertions(+), 88 deletions(-) diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_= lru.rst index bd988a142bc2..770b5d539856 100644 --- a/Documentation/mm/multigen_lru.rst +++ b/Documentation/mm/multigen_lru.rst @@ -170,6 +170,22 @@ promotes hot pages. If the scan was done cacheline eff= iciently, it adds the PMD entry pointing to the PTE table to the Bloom filter. This forms a feedback loop between the eviction and the aging. =20 +Bloom Filters +------------- +Bloom filters are a space and memory efficient data structure for set +membership test, i.e., test if an element is not in the set or may be +in the set. + +In the eviction path, specifically, in ``lru_gen_look_around()``, if a +PMD has a sufficient number of hot pages, its address is placed in the +filter. In the aging path, set membership means that the PTE range +will be scanned for young pages. + +Note that Bloom filters are probabilistic on set membership. If a test +is false positive, the cost is an additional scan of a range of PTEs, +which may yield hot pages anyway. Parameters of the filter itself can +control the false positive rate in the limit. + Summary ------- The multi-gen LRU can be disassembled into the following parts: diff --git a/mm/vmscan.c b/mm/vmscan.c index eb9263bf6806..1be9120349f8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3233,6 +3233,98 @@ static bool __maybe_unused seq_is_valid(struct lruve= c *lruvec) get_nr_gens(lruvec, LRU_GEN_ANON) <=3D MAX_NR_GENS; } =20 +/*************************************************************************= ***** + * Bloom filters + *************************************************************************= *****/ + +/* + * Bloom filters with m=3D1<<15, k=3D2 and the false positive rates of ~1/= 5 when + * n=3D10,000 and ~1/2 when n=3D20,000, where, conventionally, m is the nu= mber of + * bits in a bitmap, k is the number of hash functions and n is the number= of + * inserted items. + * + * Page table walkers use one of the two filters to reduce their search sp= ace. + * To get rid of non-leaf entries that no longer have enough leaf entries,= the + * aging uses the double-buffering technique to flip to the other filter e= ach + * time it produces a new generation. For non-leaf entries that have enough + * leaf entries, the aging carries them over to the next generation in + * walk_pmd_range(); the eviction also report them when walking the rmap + * in lru_gen_look_around(). + * + * For future optimizations: + * 1. It's not necessary to keep both filters all the time. The spare one = can be + * freed after the RCU grace period and reallocated if needed again. + * 2. And when reallocating, it's worth scaling its size according to the = number + * of inserted entries in the other filter, to reduce the memory overhe= ad on + * small systems and false positives on large systems. + * 3. Jenkins' hash function is an alternative to Knuth's. + */ +#define BLOOM_FILTER_SHIFT 15 + +static inline int filter_gen_from_seq(unsigned long seq) +{ + return seq % NR_BLOOM_FILTERS; +} + +static void get_item_key(void *item, int *key) +{ + u32 hash =3D hash_ptr(item, BLOOM_FILTER_SHIFT * 2); + + BUILD_BUG_ON(BLOOM_FILTER_SHIFT * 2 > BITS_PER_TYPE(u32)); + + key[0] =3D hash & (BIT(BLOOM_FILTER_SHIFT) - 1); + key[1] =3D hash >> BLOOM_FILTER_SHIFT; +} + +static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, vo= id *item) +{ + int key[2]; + unsigned long *filter; + int gen =3D filter_gen_from_seq(seq); + + filter =3D READ_ONCE(lruvec->mm_state.filters[gen]); + if (!filter) + return true; + + get_item_key(item, key); + + return test_bit(key[0], filter) && test_bit(key[1], filter); +} + +static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, = void *item) +{ + int key[2]; + unsigned long *filter; + int gen =3D filter_gen_from_seq(seq); + + filter =3D READ_ONCE(lruvec->mm_state.filters[gen]); + if (!filter) + return; + + get_item_key(item, key); + + if (!test_bit(key[0], filter)) + set_bit(key[0], filter); + if (!test_bit(key[1], filter)) + set_bit(key[1], filter); +} + +static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq) +{ + unsigned long *filter; + int gen =3D filter_gen_from_seq(seq); + + filter =3D lruvec->mm_state.filters[gen]; + if (filter) { + bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT)); + return; + } + + filter =3D bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT), + __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN); + WRITE_ONCE(lruvec->mm_state.filters[gen], filter); +} + /*************************************************************************= ***** * mm_struct list *************************************************************************= *****/ @@ -3352,94 +3444,6 @@ void lru_gen_migrate_mm(struct mm_struct *mm) } #endif =20 -/* - * Bloom filters with m=3D1<<15, k=3D2 and the false positive rates of ~1/= 5 when - * n=3D10,000 and ~1/2 when n=3D20,000, where, conventionally, m is the nu= mber of - * bits in a bitmap, k is the number of hash functions and n is the number= of - * inserted items. - * - * Page table walkers use one of the two filters to reduce their search sp= ace. - * To get rid of non-leaf entries that no longer have enough leaf entries,= the - * aging uses the double-buffering technique to flip to the other filter e= ach - * time it produces a new generation. For non-leaf entries that have enough - * leaf entries, the aging carries them over to the next generation in - * walk_pmd_range(); the eviction also report them when walking the rmap - * in lru_gen_look_around(). - * - * For future optimizations: - * 1. It's not necessary to keep both filters all the time. The spare one = can be - * freed after the RCU grace period and reallocated if needed again. - * 2. And when reallocating, it's worth scaling its size according to the = number - * of inserted entries in the other filter, to reduce the memory overhe= ad on - * small systems and false positives on large systems. - * 3. Jenkins' hash function is an alternative to Knuth's. - */ -#define BLOOM_FILTER_SHIFT 15 - -static inline int filter_gen_from_seq(unsigned long seq) -{ - return seq % NR_BLOOM_FILTERS; -} - -static void get_item_key(void *item, int *key) -{ - u32 hash =3D hash_ptr(item, BLOOM_FILTER_SHIFT * 2); - - BUILD_BUG_ON(BLOOM_FILTER_SHIFT * 2 > BITS_PER_TYPE(u32)); - - key[0] =3D hash & (BIT(BLOOM_FILTER_SHIFT) - 1); - key[1] =3D hash >> BLOOM_FILTER_SHIFT; -} - -static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq) -{ - unsigned long *filter; - int gen =3D filter_gen_from_seq(seq); - - filter =3D lruvec->mm_state.filters[gen]; - if (filter) { - bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT)); - return; - } - - filter =3D bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT), - __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN); - WRITE_ONCE(lruvec->mm_state.filters[gen], filter); -} - -static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, = void *item) -{ - int key[2]; - unsigned long *filter; - int gen =3D filter_gen_from_seq(seq); - - filter =3D READ_ONCE(lruvec->mm_state.filters[gen]); - if (!filter) - return; - - get_item_key(item, key); - - if (!test_bit(key[0], filter)) - set_bit(key[0], filter); - if (!test_bit(key[1], filter)) - set_bit(key[1], filter); -} - -static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, vo= id *item) -{ - int key[2]; - unsigned long *filter; - int gen =3D filter_gen_from_seq(seq); - - filter =3D READ_ONCE(lruvec->mm_state.filters[gen]); - if (!filter) - return true; - - get_item_key(item, key); - - return test_bit(key[0], filter) && test_bit(key[1], filter); -} - static void reset_mm_stats(struct lruvec *lruvec, struct lru_gen_mm_walk *= walk, bool last) { int i; --=20 2.39.0.314.g84b9a713c41-goog From nobody Sun Sep 14 22:16:54 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E31DCC38142 for ; Wed, 18 Jan 2023 00:42:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229862AbjARAmY (ORCPT ); Tue, 17 Jan 2023 19:42:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229787AbjARAlb (ORCPT ); Tue, 17 Jan 2023 19:41:31 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A74333C29A for ; Tue, 17 Jan 2023 16:18:46 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id h136-20020a25d08e000000b007e1b1a30d5dso7780060ybg.15 for ; Tue, 17 Jan 2023 16:18:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=k/ufnMQ/vF5URPKOGCCuWIRDYgI0KQr2ysXo3/nfRQk=; b=fSs2he8n2faro7dY9Elmop9HsLJqfMwYDfYr/miLYNI4PMEVnZCHxMV9LeX1o28DHo druvi7t5lwq7TVy6LbjSlaK36xSy2FdhpVvUKH7cr1YqJa2U5OLRYk0yZMxfTbU/O238 KywizYneXgXglvAeXtoZH1/zMNpwjLL3QhAllYpUsY8Ossg7zWGSScEG+sBr3EE7g7rS T5xuds/YhvvKdhYw87C4Db14nwI1VlUA8k2pDi5288e10Jbwq7s2xcbQwHHJTExytD1y J0K4AFQLft1n+HUS5iU8fA93FO1zmSwguj1FbRWnJA1v9lI9C2sB+6CB6qG/VR8nyeSv JlYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=k/ufnMQ/vF5URPKOGCCuWIRDYgI0KQr2ysXo3/nfRQk=; b=OmiXjC92EE6/G9JIxQLv9yKJppjscxd6/QBTRYohACYfuPPcbkSqxSqOCHRj5jQ7R2 sIzl/WTP3odiv4Fcs5gAnMI9PuAaeJXkSv1miMqJA5CbhemSDoxF95DCXUsetubF383A Js3Dy6AtWgveZYJ3yzUs1sAAGmeT+CSjSRocdSAtX/F1nv7O2/uWir/jKiHbcT3nsp0f KEG4LX2Pcbrj6hIq2lMXKT7mjLO6pG/KvKTavdQg37P889ToJvusAx9gRWLtmE2MDc1e nJxy3JnkGG/Ho3iGAM5WioJ5PVpLHMQ4iot91679tWT/DG6koJjR10VpT89tZ0GuVA3/ ZbUg== X-Gm-Message-State: AFqh2kopLa8MIvkulsveyaT+b1IBXb1b6Y9WtKQu5prYp3xgoae1HL0S DsO+udE5I7K4De39CRPLs2Hn7J0IpSdZMA== X-Google-Smtp-Source: AMrXdXuJtnhQvY/9kTYmOAcKA/VDrfSt/OUgE6mb5+T93wJOOFT8sNsVoTj7b8EQKR1Z6wuC/cnTOAJHyskjug== X-Received: from talumbau.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:90d]) (user=talumbau job=sendgmr) by 2002:a0d:eb82:0:b0:3b2:ce3b:eae1 with SMTP id u124-20020a0deb82000000b003b2ce3beae1mr650144ywe.4.1674001125945; Tue, 17 Jan 2023 16:18:45 -0800 (PST) Date: Wed, 18 Jan 2023 00:18:24 +0000 In-Reply-To: <20230118001827.1040870-1-talumbau@google.com> Mime-Version: 1.0 References: <20230118001827.1040870-1-talumbau@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230118001827.1040870-5-talumbau@google.com> Subject: [PATCH mm-unstable v1 4/7] mm: multi-gen LRU: section for memcg LRU From: "T.J. Alumbaugh" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-mm@google.com, "T.J. Alumbaugh" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move memcg LRU code into a dedicated section. Improve the design doc to outline its architecture. Signed-off-by: T.J. Alumbaugh --- Documentation/mm/multigen_lru.rst | 33 +++- include/linux/mm_inline.h | 17 -- include/linux/mmzone.h | 13 +- mm/memcontrol.c | 8 +- mm/vmscan.c | 250 +++++++++++++++++------------- 5 files changed, 178 insertions(+), 143 deletions(-) diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_= lru.rst index 770b5d539856..5f1f6ecbb79b 100644 --- a/Documentation/mm/multigen_lru.rst +++ b/Documentation/mm/multigen_lru.rst @@ -186,9 +186,40 @@ is false positive, the cost is an additional scan of a= range of PTEs, which may yield hot pages anyway. Parameters of the filter itself can control the false positive rate in the limit. =20 +Memcg LRU +--------- +An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs, +since each node and memcg combination has an LRU of folios (see +``mem_cgroup_lruvec()``). Its goal is to improve the scalability of +global reclaim, which is critical to system-wide memory overcommit in +data centers. Note that memcg LRU only applies to global reclaim. + +The basic structure of an memcg LRU can be understood by an analogy to +the active/inactive LRU (of folios): + +1. It has the young and the old (generations), i.e., the counterparts + to the active and the inactive; +2. The increment of ``max_seq`` triggers promotion, i.e., the + counterpart to activation; +3. Other events trigger similar operations, e.g., offlining an memcg + triggers demotion, i.e., the counterpart to deactivation. + +In terms of global reclaim, it has two distinct features: + +1. Sharding, which allows each thread to start at a random memcg (in + the old generation) and improves parallelism; +2. Eventual fairness, which allows direct reclaim to bail out at will + and reduces latency without affecting fairness over some time. + +In terms of traversing memcgs during global reclaim, it improves the +best-case complexity from O(n) to O(1) and does not affect the +worst-case complexity O(n). Therefore, on average, it has a sublinear +complexity. + Summary ------- -The multi-gen LRU can be disassembled into the following parts: +The multi-gen LRU (of folios) can be disassembled into the following +parts: =20 * Generations * Rmap walks diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 26dcbda07e92..de1e622dd366 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -122,18 +122,6 @@ static inline bool lru_gen_in_fault(void) return current->in_lru_fault; } =20 -#ifdef CONFIG_MEMCG -static inline int lru_gen_memcg_seg(struct lruvec *lruvec) -{ - return READ_ONCE(lruvec->lrugen.seg); -} -#else -static inline int lru_gen_memcg_seg(struct lruvec *lruvec) -{ - return 0; -} -#endif - static inline int lru_gen_from_seq(unsigned long seq) { return seq % MAX_NR_GENS; @@ -309,11 +297,6 @@ static inline bool lru_gen_in_fault(void) return false; } =20 -static inline int lru_gen_memcg_seg(struct lruvec *lruvec) -{ - return 0; -} - static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *= folio, bool reclaiming) { return false; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ae7d4e92c12d..c54964979ccf 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -368,15 +368,6 @@ struct page_vma_mapped_walk; #define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF) #define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF) =20 -/* see the comment on MEMCG_NR_GENS */ -enum { - MEMCG_LRU_NOP, - MEMCG_LRU_HEAD, - MEMCG_LRU_TAIL, - MEMCG_LRU_OLD, - MEMCG_LRU_YOUNG, -}; - #ifdef CONFIG_LRU_GEN =20 enum { @@ -557,7 +548,7 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg); void lru_gen_online_memcg(struct mem_cgroup *memcg); void lru_gen_offline_memcg(struct mem_cgroup *memcg); void lru_gen_release_memcg(struct mem_cgroup *memcg); -void lru_gen_rotate_memcg(struct lruvec *lruvec, int op); +void lru_gen_soft_reclaim(struct lruvec *lruvec); =20 #else /* !CONFIG_MEMCG */ =20 @@ -608,7 +599,7 @@ static inline void lru_gen_release_memcg(struct mem_cgr= oup *memcg) { } =20 -static inline void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) +static inline void lru_gen_soft_reclaim(struct lruvec *lruvec) { } =20 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 893427aded01..17335459d8dc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -476,12 +476,8 @@ static void mem_cgroup_update_tree(struct mem_cgroup *= memcg, int nid) struct mem_cgroup_tree_per_node *mctz; =20 if (lru_gen_enabled()) { - struct lruvec *lruvec =3D &memcg->nodeinfo[nid]->lruvec; - - /* see the comment on MEMCG_NR_GENS */ - if (soft_limit_excess(memcg) && lru_gen_memcg_seg(lruvec) !=3D MEMCG_LRU= _HEAD) - lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD); - + if (soft_limit_excess(memcg)) + lru_gen_soft_reclaim(&memcg->nodeinfo[nid]->lruvec); return; } =20 diff --git a/mm/vmscan.c b/mm/vmscan.c index 1be9120349f8..796d4ca65e97 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4705,6 +4705,148 @@ void lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) mem_cgroup_unlock_pages(); } =20 +/*************************************************************************= ***** + * memcg LRU + *************************************************************************= *****/ + +/* see the comment on MEMCG_NR_GENS */ +enum { + MEMCG_LRU_NOP, + MEMCG_LRU_HEAD, + MEMCG_LRU_TAIL, + MEMCG_LRU_OLD, + MEMCG_LRU_YOUNG, +}; + +#ifdef CONFIG_MEMCG + +static int lru_gen_memcg_seg(struct lruvec *lruvec) +{ + return READ_ONCE(lruvec->lrugen.seg); +} + +static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) +{ + int seg; + int old, new; + int bin =3D get_random_u32_below(MEMCG_NR_BINS); + struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); + + spin_lock(&pgdat->memcg_lru.lock); + + VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list)); + + seg =3D 0; + new =3D old =3D lruvec->lrugen.gen; + + /* see the comment on MEMCG_NR_GENS */ + if (op =3D=3D MEMCG_LRU_HEAD) + seg =3D MEMCG_LRU_HEAD; + else if (op =3D=3D MEMCG_LRU_TAIL) + seg =3D MEMCG_LRU_TAIL; + else if (op =3D=3D MEMCG_LRU_OLD) + new =3D get_memcg_gen(pgdat->memcg_lru.seq); + else if (op =3D=3D MEMCG_LRU_YOUNG) + new =3D get_memcg_gen(pgdat->memcg_lru.seq + 1); + else + VM_WARN_ON_ONCE(true); + + hlist_nulls_del_rcu(&lruvec->lrugen.list); + + if (op =3D=3D MEMCG_LRU_HEAD || op =3D=3D MEMCG_LRU_OLD) + hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne= w][bin]); + else + hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne= w][bin]); + + pgdat->memcg_lru.nr_memcgs[old]--; + pgdat->memcg_lru.nr_memcgs[new]++; + + lruvec->lrugen.gen =3D new; + WRITE_ONCE(lruvec->lrugen.seg, seg); + + if (!pgdat->memcg_lru.nr_memcgs[old] && old =3D=3D get_memcg_gen(pgdat->m= emcg_lru.seq)) + WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); + + spin_unlock(&pgdat->memcg_lru.lock); +} + +void lru_gen_online_memcg(struct mem_cgroup *memcg) +{ + int gen; + int nid; + int bin =3D get_random_u32_below(MEMCG_NR_BINS); + + for_each_node(nid) { + struct pglist_data *pgdat =3D NODE_DATA(nid); + struct lruvec *lruvec =3D get_lruvec(memcg, nid); + + spin_lock(&pgdat->memcg_lru.lock); + + VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list)); + + gen =3D get_memcg_gen(pgdat->memcg_lru.seq); + + hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ge= n][bin]); + pgdat->memcg_lru.nr_memcgs[gen]++; + + lruvec->lrugen.gen =3D gen; + + spin_unlock(&pgdat->memcg_lru.lock); + } +} + +void lru_gen_offline_memcg(struct mem_cgroup *memcg) +{ + int nid; + + for_each_node(nid) { + struct lruvec *lruvec =3D get_lruvec(memcg, nid); + + lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD); + } +} + +void lru_gen_release_memcg(struct mem_cgroup *memcg) +{ + int gen; + int nid; + + for_each_node(nid) { + struct pglist_data *pgdat =3D NODE_DATA(nid); + struct lruvec *lruvec =3D get_lruvec(memcg, nid); + + spin_lock(&pgdat->memcg_lru.lock); + + VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list)); + + gen =3D lruvec->lrugen.gen; + + hlist_nulls_del_rcu(&lruvec->lrugen.list); + pgdat->memcg_lru.nr_memcgs[gen]--; + + if (!pgdat->memcg_lru.nr_memcgs[gen] && gen =3D=3D get_memcg_gen(pgdat->= memcg_lru.seq)) + WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); + + spin_unlock(&pgdat->memcg_lru.lock); + } +} + +void lru_gen_soft_reclaim(struct lruvec *lruvec) +{ + /* see the comment on MEMCG_NR_GENS */ + if (lru_gen_memcg_seg(lruvec) !=3D MEMCG_LRU_HEAD) + lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD); +} + +#else /* !CONFIG_MEMCG */ + +static int lru_gen_memcg_seg(struct lruvec *lruvec) +{ + return 0; +} + +#endif + /*************************************************************************= ***** * the eviction *************************************************************************= *****/ @@ -5397,53 +5539,6 @@ static void lru_gen_shrink_node(struct pglist_data *= pgdat, struct scan_control * pgdat->kswapd_failures =3D 0; } =20 -#ifdef CONFIG_MEMCG -void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) -{ - int seg; - int old, new; - int bin =3D get_random_u32_below(MEMCG_NR_BINS); - struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); - - spin_lock(&pgdat->memcg_lru.lock); - - VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list)); - - seg =3D 0; - new =3D old =3D lruvec->lrugen.gen; - - /* see the comment on MEMCG_NR_GENS */ - if (op =3D=3D MEMCG_LRU_HEAD) - seg =3D MEMCG_LRU_HEAD; - else if (op =3D=3D MEMCG_LRU_TAIL) - seg =3D MEMCG_LRU_TAIL; - else if (op =3D=3D MEMCG_LRU_OLD) - new =3D get_memcg_gen(pgdat->memcg_lru.seq); - else if (op =3D=3D MEMCG_LRU_YOUNG) - new =3D get_memcg_gen(pgdat->memcg_lru.seq + 1); - else - VM_WARN_ON_ONCE(true); - - hlist_nulls_del_rcu(&lruvec->lrugen.list); - - if (op =3D=3D MEMCG_LRU_HEAD || op =3D=3D MEMCG_LRU_OLD) - hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne= w][bin]); - else - hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne= w][bin]); - - pgdat->memcg_lru.nr_memcgs[old]--; - pgdat->memcg_lru.nr_memcgs[new]++; - - lruvec->lrugen.gen =3D new; - WRITE_ONCE(lruvec->lrugen.seg, seg); - - if (!pgdat->memcg_lru.nr_memcgs[old] && old =3D=3D get_memcg_gen(pgdat->m= emcg_lru.seq)) - WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); - - spin_unlock(&pgdat->memcg_lru.lock); -} -#endif - /*************************************************************************= ***** * state change *************************************************************************= *****/ @@ -6086,67 +6181,6 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg) } } =20 -void lru_gen_online_memcg(struct mem_cgroup *memcg) -{ - int gen; - int nid; - int bin =3D get_random_u32_below(MEMCG_NR_BINS); - - for_each_node(nid) { - struct pglist_data *pgdat =3D NODE_DATA(nid); - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - spin_lock(&pgdat->memcg_lru.lock); - - VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list)); - - gen =3D get_memcg_gen(pgdat->memcg_lru.seq); - - hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ge= n][bin]); - pgdat->memcg_lru.nr_memcgs[gen]++; - - lruvec->lrugen.gen =3D gen; - - spin_unlock(&pgdat->memcg_lru.lock); - } -} - -void lru_gen_offline_memcg(struct mem_cgroup *memcg) -{ - int nid; - - for_each_node(nid) { - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD); - } -} - -void lru_gen_release_memcg(struct mem_cgroup *memcg) -{ - int gen; - int nid; - - for_each_node(nid) { - struct pglist_data *pgdat =3D NODE_DATA(nid); - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - spin_lock(&pgdat->memcg_lru.lock); - - VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list)); - - gen =3D lruvec->lrugen.gen; - - hlist_nulls_del_rcu(&lruvec->lrugen.list); - pgdat->memcg_lru.nr_memcgs[gen]--; - - if (!pgdat->memcg_lru.nr_memcgs[gen] && gen =3D=3D get_memcg_gen(pgdat->= memcg_lru.seq)) - WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); - - spin_unlock(&pgdat->memcg_lru.lock); - } -} - #endif /* CONFIG_MEMCG */ =20 static int __init init_lru_gen(void) --=20 2.39.0.314.g84b9a713c41-goog From nobody Sun Sep 14 22:16:54 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6064DC00A5A for ; Wed, 18 Jan 2023 00:42:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229851AbjARAmS (ORCPT ); Tue, 17 Jan 2023 19:42:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229549AbjARAlb (ORCPT ); Tue, 17 Jan 2023 19:41:31 -0500 Received: from mail-io1-xd4a.google.com (mail-io1-xd4a.google.com [IPv6:2607:f8b0:4864:20::d4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 969273C2A6 for ; Tue, 17 Jan 2023 16:18:47 -0800 (PST) Received: by mail-io1-xd4a.google.com with SMTP id b26-20020a056602331a00b00704cb50e151so2296979ioz.13 for ; Tue, 17 Jan 2023 16:18:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=oHUtkzMGrelI8/KMRDpB6ws41Nm9DHo0hzRSbvA1rX4=; b=P+dHdTcobETbrT/4Y9RoKNBSMMjBesxNPv628wUhtHtBS5jrH92q0UFgzkKx+jBUcE llP4mUbT+FnTKRZni0V7IXBtUy27FnrRbybdJiuJGP3RMYCF0bcFrjXKYSaYVIKZ9ogS JzaV1tqb30YMpASHxaQjO19PI+kv5ars13oppxhnN5j89apHzpCwj1YEkNza+2OxNiBd 7wrcI6tDL6HoGMsQhpsvQW1y7sIhZuHlVtRlKspdAzoE9g6JCgUaAY9kTFMCcr2R5rYn jHMYQPorKFLfa+/ENOJni+bMaYbS740iYvYX5M5Qs6/0kiF87294eU0cXrKq7j8h0545 H4eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oHUtkzMGrelI8/KMRDpB6ws41Nm9DHo0hzRSbvA1rX4=; b=pcXbyxDibhAFapuF765LVr+wDwHmzt2pWsiY3+F1hdGMMhfqf1AU/+f+f1+o6W6NW1 Fl+TA/LAmQDXJWq1O0i2/hEr8vWFuI4N3MvbusS5TDRIOJm9Ovf6conr0e2e6roueraN MUxmQnOc8HL2rZ6vzlvmfw9DcKpNinpNup+48QPqCx1wDdSNlKbbL/ZAmhx0bS0isuo4 AoVBWx/wz2vz46Y9o1ZVxTw1VSE/syqM2Vp6SuO49r/XxXPBxzLElvNTZZkDyj8L+wqX gTN0U6HTTOFspJrfDhudYtCv35hpHN/Ndfd5kCzsEnfibfsL/07dJ9MBhSKqPQiKW1VS 57yQ== X-Gm-Message-State: AFqh2kqOn77q0TIq/mt9AQf7EVINLL+TUH4wO+VcUG3tqEE0nWKOq5pu przTBIP2aInm0IDSs27H8EjnbXCHk5OpaA== X-Google-Smtp-Source: AMrXdXvO3k5QPMoTA/WF9Jnxii9iK6RJnQpbjnESwHDAurHbqkPa1FvBFeVLCq6wyGOChzX4emiW+HqkB7neJg== X-Received: from talumbau.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:90d]) (user=talumbau job=sendgmr) by 2002:a05:6638:3b06:b0:38a:adeb:d4d1 with SMTP id bb6-20020a0566383b0600b0038aadebd4d1mr562363jab.81.1674001127016; Tue, 17 Jan 2023 16:18:47 -0800 (PST) Date: Wed, 18 Jan 2023 00:18:25 +0000 In-Reply-To: <20230118001827.1040870-1-talumbau@google.com> Mime-Version: 1.0 References: <20230118001827.1040870-1-talumbau@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230118001827.1040870-6-talumbau@google.com> Subject: [PATCH mm-unstable v1 5/7] mm: multi-gen LRU: improve lru_gen_exit_memcg() From: "T.J. Alumbaugh" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-mm@google.com, "T.J. Alumbaugh" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add warnings and poison ->next. Signed-off-by: T.J. Alumbaugh --- mm/vmscan.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index 796d4ca65e97..c2e6ad53447b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6168,12 +6168,17 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg) int i; int nid; =20 + VM_WARN_ON_ONCE(!list_empty(&memcg->mm_list.fifo)); + for_each_node(nid) { struct lruvec *lruvec =3D get_lruvec(memcg, nid); =20 + VM_WARN_ON_ONCE(lruvec->mm_state.nr_walkers); VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0, sizeof(lruvec->lrugen.nr_pages))); =20 + lruvec->lrugen.list.next =3D LIST_POISON1; + for (i =3D 0; i < NR_BLOOM_FILTERS; i++) { bitmap_free(lruvec->mm_state.filters[i]); lruvec->mm_state.filters[i] =3D NULL; --=20 2.39.0.314.g84b9a713c41-goog From nobody Sun Sep 14 22:16:54 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44933C00A5A for ; Wed, 18 Jan 2023 00:42:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229872AbjARAm1 (ORCPT ); Tue, 17 Jan 2023 19:42:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229788AbjARAlb (ORCPT ); Tue, 17 Jan 2023 19:41:31 -0500 Received: from mail-io1-xd49.google.com (mail-io1-xd49.google.com [IPv6:2607:f8b0:4864:20::d49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB2C63CE18 for ; Tue, 17 Jan 2023 16:18:48 -0800 (PST) Received: by mail-io1-xd49.google.com with SMTP id b21-20020a5d8d95000000b006fa39fbb94eso20260742ioj.17 for ; Tue, 17 Jan 2023 16:18:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CTCu27j3NsgPeq6sMC4ftRhidA8hrlAj8BNmRhunsTg=; b=KmtgBZ0ns5KBrFINUJN1UyCkT0OkztNePcFoXQ/KcNMIWVI4/Md6wgIu7rD9n2Z9W7 OHP52GbTDEUvON3Ld0n87PN+YeYWmganlqHvab14UiI2dOJRoBrxCTfp/qt8Xhwz53OB mPEpoJUArFb1uvBq5NwobsbT+aICrf6M2jI1VnONWmew09b3yBrXI5tYV14dkq1e3YUD 0EHv/w9MXarhkO+fJBlUWX4vvx6dkSXDUwGOj0NKr7HEFADLxh0DrE/USqzuEbBO+WOd QBw+bLV+MykCDnCfOAw7H7t3evyBdMLMvH1I1gxmNRV/8Dny+rXsipWoINWSU52z73y8 GcWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CTCu27j3NsgPeq6sMC4ftRhidA8hrlAj8BNmRhunsTg=; b=VAFbNQ3q6T/SBRKMw17E3WZa23g8n6og0zqdzKKeczaYHBhiVYiKmpwp88hhy4IRtB QUPpc273ingzBDzp1974k3BZE2bfCCkYLMBOj2Lz6dlE6asy2ojHPuVRdket6INCvCoF UADQWiPRUtRXVCsNUOO8stFUZqnIaj5wk8mllUKPA+pRj6a3DTKH7u7pbcqf60pX9aad PRl/sJgisD/OW4H2euprCSYHkcm6GO1Pk6d4l9Jri8YU9t1mQJW953NeQlGvpkZHhyux rReTdZbg/tKSxHfWBBFgAWaLChMT7TBYDZgBoYUngHQQCFFE6P095pI1Y4igqKLmnjaB p2fg== X-Gm-Message-State: AFqh2kr56fqXMALbFPOsTUl5U5u1YiBiYba/UQLQI/epk/rkQtzebM+b b8Zuv4gRuPJIZoVCBnxIql3O+aKjD2lgZQ== X-Google-Smtp-Source: AMrXdXuxcjNNoKg2rw1D3GgeuKMfxw+4OUiAhpy9aCzXSpOxc3V+thh4Ty8+l3G41CLAAnqXADnDMl74AePdDg== X-Received: from talumbau.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:90d]) (user=talumbau job=sendgmr) by 2002:a02:b795:0:b0:39e:9c44:3fe6 with SMTP id f21-20020a02b795000000b0039e9c443fe6mr578418jam.297.1674001128137; Tue, 17 Jan 2023 16:18:48 -0800 (PST) Date: Wed, 18 Jan 2023 00:18:26 +0000 In-Reply-To: <20230118001827.1040870-1-talumbau@google.com> Mime-Version: 1.0 References: <20230118001827.1040870-1-talumbau@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230118001827.1040870-7-talumbau@google.com> Subject: [PATCH mm-unstable v1 6/7] mm: multi-gen LRU: improve walk_pmd_range() From: "T.J. Alumbaugh" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-mm@google.com, "T.J. Alumbaugh" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Improve readability of walk_pmd_range() and walk_pmd_range_locked(). Signed-off-by: T.J. Alumbaugh --- mm/vmscan.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index c2e6ad53447b..ff3b4aa3c31f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3999,8 +3999,8 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long = start, unsigned long end, } =20 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEA= F_PMD_YOUNG) -static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct v= m_area_struct *vma, - struct mm_walk *args, unsigned long *bitmap, unsigned long *start) +static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct v= m_area_struct *vma, + struct mm_walk *args, unsigned long *bitmap, unsigned long *first) { int i; pmd_t *pmd; @@ -4013,18 +4013,19 @@ static void walk_pmd_range_locked(pud_t *pud, unsig= ned long next, struct vm_area VM_WARN_ON_ONCE(pud_leaf(*pud)); =20 /* try to batch at most 1+MIN_LRU_BATCH+1 entries */ - if (*start =3D=3D -1) { - *start =3D next; + if (*first =3D=3D -1) { + *first =3D addr; + bitmap_zero(bitmap, MIN_LRU_BATCH); return; } =20 - i =3D next =3D=3D -1 ? 0 : pmd_index(next) - pmd_index(*start); + i =3D addr =3D=3D -1 ? 0 : pmd_index(addr) - pmd_index(*first); if (i && i <=3D MIN_LRU_BATCH) { __set_bit(i - 1, bitmap); return; } =20 - pmd =3D pmd_offset(pud, *start); + pmd =3D pmd_offset(pud, *first); =20 ptl =3D pmd_lockptr(args->mm, pmd); if (!spin_trylock(ptl)) @@ -4035,15 +4036,16 @@ static void walk_pmd_range_locked(pud_t *pud, unsig= ned long next, struct vm_area do { unsigned long pfn; struct folio *folio; - unsigned long addr =3D i ? (*start & PMD_MASK) + i * PMD_SIZE : *start; + + /* don't round down the first address */ + addr =3D i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; =20 pfn =3D get_pmd_pfn(pmd[i], vma, addr); if (pfn =3D=3D -1) goto next; =20 if (!pmd_trans_huge(pmd[i])) { - if (arch_has_hw_nonleaf_pmd_young() && - get_cap(LRU_GEN_NONLEAF_YOUNG)) + if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } @@ -4072,12 +4074,11 @@ static void walk_pmd_range_locked(pud_t *pud, unsig= ned long next, struct vm_area arch_leave_lazy_mmu_mode(); spin_unlock(ptl); done: - *start =3D -1; - bitmap_zero(bitmap, MIN_LRU_BATCH); + *first =3D -1; } #else -static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct v= m_area_struct *vma, - struct mm_walk *args, unsigned long *bitmap, unsigned long *start) +static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct v= m_area_struct *vma, + struct mm_walk *args, unsigned long *bitmap, unsigned long *first) { } #endif @@ -4090,9 +4091,9 @@ static void walk_pmd_range(pud_t *pud, unsigned long = start, unsigned long end, unsigned long next; unsigned long addr; struct vm_area_struct *vma; - unsigned long pos =3D -1; + unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)]; + unsigned long first =3D -1; struct lru_gen_mm_walk *walk =3D args->private; - unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)] =3D {}; =20 VM_WARN_ON_ONCE(pud_leaf(*pud)); =20 @@ -4131,18 +4132,17 @@ static void walk_pmd_range(pud_t *pud, unsigned lon= g start, unsigned long end, if (pfn < pgdat->node_start_pfn || pfn >=3D pgdat_end_pfn(pgdat)) continue; =20 - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &pos); + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } #endif walk->mm_stats[MM_NONLEAF_TOTAL]++; =20 - if (arch_has_hw_nonleaf_pmd_young() && - get_cap(LRU_GEN_NONLEAF_YOUNG)) { + if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG)) { if (!pmd_young(val)) continue; =20 - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &pos); + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); } =20 if (!walk->force_scan && !test_bloom_filter(walk->lruvec, walk->max_seq,= pmd + i)) @@ -4159,7 +4159,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long = start, unsigned long end, update_bloom_filter(walk->lruvec, walk->max_seq + 1, pmd + i); } =20 - walk_pmd_range_locked(pud, -1, vma, args, bitmap, &pos); + walk_pmd_range_locked(pud, -1, vma, args, bitmap, &first); =20 if (i < PTRS_PER_PMD && get_next_vma(PUD_MASK, PMD_SIZE, args, &start, &e= nd)) goto restart; --=20 2.39.0.314.g84b9a713c41-goog From nobody Sun Sep 14 22:16:54 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEFB4C38147 for ; Wed, 18 Jan 2023 00:42:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229685AbjARAmN (ORCPT ); Tue, 17 Jan 2023 19:42:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229783AbjARAlb (ORCPT ); Tue, 17 Jan 2023 19:41:31 -0500 Received: from mail-io1-xd4a.google.com (mail-io1-xd4a.google.com [IPv6:2607:f8b0:4864:20::d4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B168C3B65E for ; Tue, 17 Jan 2023 16:18:49 -0800 (PST) Received: by mail-io1-xd4a.google.com with SMTP id n8-20020a6bf608000000b007048850aa92so8061651ioh.10 for ; Tue, 17 Jan 2023 16:18:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6H89U5IeidH5uN+ug8sUPQpQtuhKWJCjw7/0EqS2ku0=; b=ehaqOT566l9lQNTopDpFadeCKqnbnQXDRjA4DPY/fYefKLjsNzYv9N4HuPSWfNkL/T 4Tg/kA81AyBkg4BImcq3oqRmF/8TJybpfdFgXPSLolkWJ5mw6QHUgFCZBkBVQDibpWXw f/lXMaLAbJt0/JF5UEDKskx+8SxTkQBmDQJgygYwaPhtsoZOQXj+1AcNg97s38dmmyVz p2SYvvXrygWzdMYec36TtEcLVug0YvASHX2223Ft7SAt6HO5CKdMQRQazFhMMGSgulaE rAipKhxVwfCNyD+CnFMmGB/6pVkrS4W1S5qqf6Glc6OZDlHbiS2ijfudvQJ3gTUyA8mo KJmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6H89U5IeidH5uN+ug8sUPQpQtuhKWJCjw7/0EqS2ku0=; b=z6bfZu1apCYCWdVDKNrwBHg1s5MjST7cyzOcChzHM4K87wVFELT5JgYCXEU09ERzAm of5/kZTVyMCUUKgRhAm7OriqnOZs5gYjI6DWKT9uHYV9omHfKEpm83ZxvFCTQNNGEVca D1gLzUwAN0YojHuiqzXNCb7ZZtQSfsu/X5yb1BSvh9eAL410X7bCJTQ76KNc8szJtAKh 0jPkuJenaJ/CRaq1LZJG8GqDZBI8jIAEKsygX6Wr6co24YOU5Jr9Z5K77Tf7fECfmUjU FFSRtfW7DUW4HO+WBlqbDoS9kFPz+aQagsKfWGQJwZLrE32u5C7QZj+a/N83FKxEE/aB hQSA== X-Gm-Message-State: AFqh2krM6kv5LNlDTDPoGg06CqeaiARgk7d4eMav8YuhPwdZzw0eauiY Y916ve/5auWu9EpKRiH6EVvYLrGXqjfV8A== X-Google-Smtp-Source: AMrXdXszXwn79l9Gfb+KYtiDbMPXtgqCNArs+vjslWDauUohP23J2vUQSyakr6tDwA+c7CfabNXqBrhVmzxocQ== X-Received: from talumbau.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:90d]) (user=talumbau job=sendgmr) by 2002:a05:6e02:4a4:b0:30b:e92a:56b with SMTP id e4-20020a056e0204a400b0030be92a056bmr547629ils.53.1674001129175; Tue, 17 Jan 2023 16:18:49 -0800 (PST) Date: Wed, 18 Jan 2023 00:18:27 +0000 In-Reply-To: <20230118001827.1040870-1-talumbau@google.com> Mime-Version: 1.0 References: <20230118001827.1040870-1-talumbau@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230118001827.1040870-8-talumbau@google.com> Subject: [PATCH mm-unstable v1 7/7] mm: multi-gen LRU: simplify lru_gen_look_around() From: "T.J. Alumbaugh" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-mm@google.com, "T.J. Alumbaugh" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update the folio generation in place with or without current->reclaim_state->mm_walk. The LRU lock is held for longer, if mm_walk is NULL and the number of folios to update is more than PAGEVEC_SIZE. This causes a measurable regression from the LRU lock contention during a microbencmark. But a tiny regression is not worth the complexity. Signed-off-by: T.J. Alumbaugh --- mm/vmscan.c | 73 +++++++++++++++++------------------------------------ 1 file changed, 23 insertions(+), 50 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index ff3b4aa3c31f..ac51150d2d36 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4587,13 +4587,12 @@ static void lru_gen_age_node(struct pglist_data *pg= dat, struct scan_control *sc) void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; - pte_t *pte; unsigned long start; unsigned long end; - unsigned long addr; struct lru_gen_mm_walk *walk; int young =3D 0; - unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)] =3D {}; + pte_t *pte =3D pvmw->pte; + unsigned long addr =3D pvmw->address; struct folio *folio =3D pfn_folio(pvmw->pfn); struct mem_cgroup *memcg =3D folio_memcg(folio); struct pglist_data *pgdat =3D folio_pgdat(folio); @@ -4610,25 +4609,28 @@ void lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) /* avoid taking the LRU lock under the PTL when possible */ walk =3D current->reclaim_state ? current->reclaim_state->mm_walk : NULL; =20 - start =3D max(pvmw->address & PMD_MASK, pvmw->vma->vm_start); - end =3D min(pvmw->address | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1; + start =3D max(addr & PMD_MASK, pvmw->vma->vm_start); + end =3D min(addr | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1; =20 if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { - if (pvmw->address - start < MIN_LRU_BATCH * PAGE_SIZE / 2) + if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end =3D start + MIN_LRU_BATCH * PAGE_SIZE; - else if (end - pvmw->address < MIN_LRU_BATCH * PAGE_SIZE / 2) + else if (end - addr < MIN_LRU_BATCH * PAGE_SIZE / 2) start =3D end - MIN_LRU_BATCH * PAGE_SIZE; else { - start =3D pvmw->address - MIN_LRU_BATCH * PAGE_SIZE / 2; - end =3D pvmw->address + MIN_LRU_BATCH * PAGE_SIZE / 2; + start =3D addr - MIN_LRU_BATCH * PAGE_SIZE / 2; + end =3D addr + MIN_LRU_BATCH * PAGE_SIZE / 2; } } =20 - pte =3D pvmw->pte - (pvmw->address - start) / PAGE_SIZE; + /* folio_update_gen() requires stable folio_memcg() */ + if (!mem_cgroup_trylock_pages(memcg)) + return; =20 - rcu_read_lock(); arch_enter_lazy_mmu_mode(); =20 + pte -=3D (addr - start) / PAGE_SIZE; + for (i =3D 0, addr =3D start; addr !=3D end; i++, addr +=3D PAGE_SIZE) { unsigned long pfn; =20 @@ -4653,56 +4655,27 @@ void lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) !folio_test_swapcache(folio))) folio_mark_dirty(folio); =20 + if (walk) { + old_gen =3D folio_update_gen(folio, new_gen); + if (old_gen >=3D 0 && old_gen !=3D new_gen) + update_batch_size(walk, folio, old_gen, new_gen); + + continue; + } + old_gen =3D folio_lru_gen(folio); if (old_gen < 0) folio_set_referenced(folio); else if (old_gen !=3D new_gen) - __set_bit(i, bitmap); + folio_activate(folio); } =20 arch_leave_lazy_mmu_mode(); - rcu_read_unlock(); + mem_cgroup_unlock_pages(); =20 /* feedback from rmap walkers to page table walkers */ if (suitable_to_scan(i, young)) update_bloom_filter(lruvec, max_seq, pvmw->pmd); - - if (!walk && bitmap_weight(bitmap, MIN_LRU_BATCH) < PAGEVEC_SIZE) { - for_each_set_bit(i, bitmap, MIN_LRU_BATCH) { - folio =3D pfn_folio(pte_pfn(pte[i])); - folio_activate(folio); - } - return; - } - - /* folio_update_gen() requires stable folio_memcg() */ - if (!mem_cgroup_trylock_pages(memcg)) - return; - - if (!walk) { - spin_lock_irq(&lruvec->lru_lock); - new_gen =3D lru_gen_from_seq(lruvec->lrugen.max_seq); - } - - for_each_set_bit(i, bitmap, MIN_LRU_BATCH) { - folio =3D pfn_folio(pte_pfn(pte[i])); - if (folio_memcg_rcu(folio) !=3D memcg) - continue; - - old_gen =3D folio_update_gen(folio, new_gen); - if (old_gen < 0 || old_gen =3D=3D new_gen) - continue; - - if (walk) - update_batch_size(walk, folio, old_gen, new_gen); - else - lru_gen_update_size(lruvec, folio, old_gen, new_gen); - } - - if (!walk) - spin_unlock_irq(&lruvec->lru_lock); - - mem_cgroup_unlock_pages(); } =20 /*************************************************************************= ***** --=20 2.39.0.314.g84b9a713c41-goog