From nobody Sun Sep 7 14:37:46 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45EE9EB64DA for ; Thu, 20 Jul 2023 07:08:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231359AbjGTHIj (ORCPT ); Thu, 20 Jul 2023 03:08:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229942AbjGTHIb (ORCPT ); Thu, 20 Jul 2023 03:08:31 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A8A12681 for ; Thu, 20 Jul 2023 00:08:30 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-563396c1299so385072a12.2 for ; Thu, 20 Jul 2023 00:08:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836910; x=1692428910; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=P4XNYMJYgxf4Y2ysW943aMZDD9DgQegnQSW9aAxie4w=; b=PyMiTiRslPlYxeAzNGyf+s2RMn2Y5D+io4F49/M2XRwJrS8TwV/LhLqLj0BM+kLdbs SJdh95igG91J+W+YW3QhUBoovRNlfDdy0rtENHO9Q0tdlMq6Mi6nZDp8wPmseQG3GEz7 yTNtdLVRNpjOoL79dp3VDPbMRHNV99200JwxoCivq9oH22XjolvYivx1g2ceZNC2Pn7s Da/CFyreh5mYpyR/lWRRdvMG1hkWdPWRhELE3hNZjAEKxkN7vhsHgeLNCSqpeMvOeAHq lRZ2ErdDGOl8n4urJ5Mk+4WBu+P+b25Mk8sxetPQU+cK2DCaPQPxX5kv/nJSdD/CPPwf W7UA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836910; x=1692428910; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=P4XNYMJYgxf4Y2ysW943aMZDD9DgQegnQSW9aAxie4w=; b=D+x9Ttgf66GMzNraB1oKm0s+5HHbQiMS3S5v1q/h68SHLqhKP76fPiVOij5ia9+47k GaFHOMH3ePsFs1IURfOdTYpkYR7o36HUeB79w1pk8OU85coUGOy/nYuQknzhP/LxIJ82 u3vSSU/Z0E2SyTv7GtuOIZ351YdACFL3y4iEU43Hw/7Gd25HN/OEm5dx7MNtQCJ4KtN3 TI/ae9DnzcaND4HDATLTVIT/VPpjv1QeAVXm7V88OUFWIrhhgEJK8WozXucND4lxGYCQ nEy6nK0WVBRICPbAYJBvkz2VH6OdLvYPjwJOW9Sw/MUoRpKdlfDDe9FG+ZihJsI8mXgs SfLA== X-Gm-Message-State: ABy/qLYkgWSyZSUlW5W+VRyV9L2M3N3W8Hm9bjYyqT2LavrdslHbHjBj S42cv8T9kykYPwsUsENlkbT84b0Vku846GD9 X-Google-Smtp-Source: APBJJlEGR67b7I5wM02HN8lsWXpPeepUUbVvmNMwkaq7frNphjV2zKpBSorV/PgnNk3Mo7NS/0jW1PWGUeUjmfjP X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a63:3e8f:0:b0:563:3b08:f869 with SMTP id l137-20020a633e8f000000b005633b08f869mr22818pga.2.1689836909753; Thu, 20 Jul 2023 00:08:29 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:18 +0000 In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-2-yosryahmed@google.com> Subject: [RFC PATCH 1/8] memcg: refactor updating memcg->moving_account From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" memcg->moving_account is used to signal that a memcg move is taking place, so that folio_memcg_lock() would start acquiring the per-memcg move lock instead of just initiating an rcu read section. Refactor incrementing and decrementing memcg->moving_account, together with rcu synchornization and the elaborate comment into helpers, to allow for reuse by incoming patches. Signed-off-by: Yosry Ahmed --- mm/memcontrol.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e8ca4bdcb03c..ffdb848f4003 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6305,16 +6305,26 @@ static const struct mm_walk_ops charge_walk_ops =3D= { .pmd_entry =3D mem_cgroup_move_charge_pte_range, }; =20 -static void mem_cgroup_move_charge(void) +static void mem_cgroup_start_move_charge(struct mem_cgroup *memcg) { - lru_add_drain_all(); /* * Signal folio_memcg_lock() to take the memcg's move_lock * while we're moving its pages to another memcg. Then wait * for already started RCU-only updates to finish. */ - atomic_inc(&mc.from->moving_account); + atomic_inc(&memcg->moving_account); synchronize_rcu(); +} + +static void mem_cgroup_end_move_charge(struct mem_cgroup *memcg) +{ + atomic_dec(&memcg->moving_account); +} + +static void mem_cgroup_move_charge(void) +{ + lru_add_drain_all(); + mem_cgroup_start_move_charge(mc.from); retry: if (unlikely(!mmap_read_trylock(mc.mm))) { /* @@ -6334,7 +6344,7 @@ static void mem_cgroup_move_charge(void) */ walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL); mmap_read_unlock(mc.mm); - atomic_dec(&mc.from->moving_account); + mem_cgroup_end_move_charge(mc.from); } =20 static void mem_cgroup_move_task(void) --=20 2.41.0.255.g8b1d071c50-goog From nobody Sun Sep 7 14:37:46 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A67C6EB64DA for ; Thu, 20 Jul 2023 07:08:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229555AbjGTHIt (ORCPT ); Thu, 20 Jul 2023 03:08:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229816AbjGTHIe (ORCPT ); Thu, 20 Jul 2023 03:08:34 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B9882122 for ; Thu, 20 Jul 2023 00:08:32 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bc68c4e046aso425968276.0 for ; Thu, 20 Jul 2023 00:08:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836912; x=1692428912; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GTtTsHYHZoypkh/ZbOFrb+97cNaIZGVMYPTBvEUZ8QA=; b=1DoBmPRpqoKxp5AI1ATOavPZeo0Whgkd5Tc59Swf0g3JyV+Sf4czsw5vt5hRV/iIWU m0gxZI0ZpwmKuzcQe6n8ABa+pR3RNfGEwG4M79yda3Vq21aVBJ/g7QPw4WwY0KIttAPP qZs11I8hXcrNiTzcuVsx+5sSznkYJ5ijPhs+K2QY4AKhJ+1OPsJskvfvc2riuWl1X0pr fjY+rxKO1ypRGpwy/gVhyEiUbivpXGQsHDrV1rAgXPICZpfqdKSn1Khg2rXXV4x7hO0B V/yFis6Q3bBvFsNeipAISnXIhdf4e5y1csYuoWKJL9e1bvfMGj/AyzmZSyj99NVp9/vT 61xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836912; x=1692428912; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GTtTsHYHZoypkh/ZbOFrb+97cNaIZGVMYPTBvEUZ8QA=; b=kLEBeK4TU+vAXxGxG6RpvoVc723ve4N5oDM0mtNyWjwTyX0I4OQiYXs9Ing/v514Td 0XK2X4TYugJzZaHbYDOOuC8AXrvZGt4qqFUwIWsCzvxiqncU7Sjo1Lve37CPiISXif0B K8NghvaZAuPLe2cM9Qa50qJOY3LuNVC+IvB/ct2i6os0ivjKu69DH9M5IZHC7KAP2JJ6 KZ0T87Bo0sGDjsLW/NnHuTi5Jm3tcuI1R81EfTwpHxlMVsGX80e+Ut35vi5aeZmNWzAt P+4zVQ3VKtnMN32Wc/rB8v0bk/l4aZapZIr6cRtbMSYplY6F+/1Jk4psOnAm9w4+mEdm j/9A== X-Gm-Message-State: ABy/qLaz92+k+PPLEtUNmf7f56yeY0QCW/7Yrde1J6VpUEBG1COi+16L rx4K6dO0E+SiRySe5k8Kl549WzMPKiW/4v8x X-Google-Smtp-Source: APBJJlFLKOPz8aT2ynaMGRvS7LvS60oh3DFyuJzygEiZuz8hLCEZV4fcd2R5Eiopcv2P65Ha4e5fn/ebih6t3IaU X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a25:53c4:0:b0:c6e:fe1a:3657 with SMTP id h187-20020a2553c4000000b00c6efe1a3657mr38214ybb.3.1689836911707; Thu, 20 Jul 2023 00:08:31 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:19 +0000 In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-3-yosryahmed@google.com> Subject: [RFC PATCH 2/8] mm: vmscan: add lruvec_for_each_list() helper From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This helper is used to provide a callback to be called for each lruvec list. This abstracts different lruvec implementations (MGLRU vs. classic LRUs). The helper is used by a following commit to iterate all folios in all LRUs lists for memcg recharging. Signed-off-by: Yosry Ahmed --- include/linux/swap.h | 8 ++++++++ mm/vmscan.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 456546443f1f..c0621deceb03 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -406,6 +406,14 @@ extern void lru_cache_add_inactive_or_unevictable(stru= ct page *page, struct vm_area_struct *vma); =20 /* linux/mm/vmscan.c */ +typedef bool (*lruvec_list_fn_t)(struct lruvec *lruvec, + struct list_head *list, + enum lru_list lru, + void *arg); +extern void lruvec_for_each_list(struct lruvec *lruvec, + lruvec_list_fn_t fn, + void *arg); + extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int orde= r, gfp_t gfp_mask, nodemask_t *mask); diff --git a/mm/vmscan.c b/mm/vmscan.c index 1080209a568b..e7956000a3b6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6254,6 +6254,34 @@ static void lru_gen_shrink_node(struct pglist_data *= pgdat, struct scan_control * =20 #endif /* CONFIG_LRU_GEN */ =20 +/* + * lruvec_for_each_list - make a callback for every folio list in the lruv= ec + * @lruvec: the lruvec to iterate lists in + * @fn: the callback to make for each list, iteration stops if it returns = true + * @arg: argument to pass to @fn + */ +void lruvec_for_each_list(struct lruvec *lruvec, lruvec_list_fn_t fn, void= *arg) +{ + enum lru_list lru; + +#ifdef CONFIG_LRU_GEN + if (lru_gen_enabled()) { + int gen, type, zone; + + for_each_gen_type_zone(gen, type, zone) { + lru =3D type * LRU_INACTIVE_FILE; + if (fn(lruvec, &lruvec->lrugen.folios[gen][type][zone], + lru, arg)) + break; + } + } else +#endif + for_each_evictable_lru(lru) { + if (fn(lruvec, &lruvec->lists[lru], lru, arg)) + break; + } +} + static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { unsigned long nr[NR_LRU_LISTS]; --=20 2.41.0.255.g8b1d071c50-goog From nobody Sun Sep 7 14:37:46 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72EDBEB64DC for ; Thu, 20 Jul 2023 07:08:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230337AbjGTHIw (ORCPT ); Thu, 20 Jul 2023 03:08:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229975AbjGTHIh (ORCPT ); Thu, 20 Jul 2023 03:08:37 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46695213D for ; Thu, 20 Jul 2023 00:08:34 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-ccabb20111dso461103276.3 for ; Thu, 20 Jul 2023 00:08:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836913; x=1692428913; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zkE9erfuSG8gEvl0NErmWP/HjsAjs5BO1W50P95ENGk=; b=e/D65lKvyv50V3yr9BFRknnH79sSEuD/+1vUrcA4yvr7h7CImtQrd+boASPoSkcO0L 6nRrg9RNuJbGc3QdJ+P9rn/E7NPWcOSDBxQOvrkM1p7NFo6krBFtBVk5c7HmdKOK4Es2 PuB6i+F5kcVT9zlYpRauNGwgIdsKA8qks2YqdR/pIFwq1jYHE090fjgDVGsSN7ZNarF3 4Cq9H5nWz2ZZjH50Yn64E7tOX0T+9lLwqWlGsXWl0d6eVKSMhor5hN/I8U314hOaUpAt UvHj1RrFPkxtGpaYcmmpPta/OEHLEEHI1epsszfPSw/hGh7pMPrxFrxoMgzBHB89kmi2 mFqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836913; x=1692428913; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zkE9erfuSG8gEvl0NErmWP/HjsAjs5BO1W50P95ENGk=; b=ePvJRHf25ymJOHKP3kFkxtgh2kMOcHWMEMrby2Zm1Lgo9M0l4v7mXBAR1oxiBpF2bp usxXs2vy7933iH1Lw2+4awWVWShjA/wMi+JJv4ENtpC9lw7guvdY8P3PcPMUGHaxbdJO 7wZ0ITYjFJoUhvSZymP5w8PeHM/5pmP5gMiu071Jm4Y+kF/9CxMCZez1QdT0GBfCrUG8 lbzmjGmKY5OiTcJFUv1KyMsUjymfF3XZMaCge+6Ek7PyJTewaRzJYgsJ5N3NDEGWBVMs 6qASg6i/uRN0B3k+As2DdNMsubKbTDpiUReNNhhHbtHdA7raHnXF2XVEH8iv/a2ST60C EA3w== X-Gm-Message-State: ABy/qLadT2LLXDHdpW3l5I44n6/oP3L6k/RdHuYZKHAiz1qIx3JVYxN6 HnimlX/fQCfEwV9q8oQ+mHCR9DAcSOTi/PgV X-Google-Smtp-Source: APBJJlF1XLkdIp+rrL6VnpxoZHQAd5Y1MvxoDSvMdyFfa3VhNOkSbCTVEqhKIhjXU4YLSRrvLIXqVgBHET8DWPA4 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a25:a021:0:b0:cf2:9e82:a5b5 with SMTP id x30-20020a25a021000000b00cf29e82a5b5mr32616ybh.8.1689836913480; Thu, 20 Jul 2023 00:08:33 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:20 +0000 In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-4-yosryahmed@google.com> Subject: [RFC PATCH 3/8] memcg: recharge mapped folios when a memcg is offlined From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a cgroup is removed by userspace and its memcg goes offline, attempt to recharge the pages charged to the memcg to other memcgs that are actually using the folios. Recharging is done in an asynchronous worker as it is an expensive operation and needs to acquire multiple locks. Recharge targets are identified by walking the rmap and checking the memcgs of the processes mapping the folio, if any. We avoid an OOM kill if we fail to charge the folio, to avoid inducing an OOM kill at a seemingly random point in time in the target memcg. If we fail for any reason (e.g. could not isolate all folios, could not lock folio, target memcg reached its limit, etc), we reschedule the worker to rety. Signed-off-by: Yosry Ahmed --- include/linux/memcontrol.h | 6 + mm/memcontrol.c | 230 +++++++++++++++++++++++++++++++++++++ 2 files changed, 236 insertions(+) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5818af8eca5a..b41d69685ead 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -326,6 +326,12 @@ struct mem_cgroup { struct lru_gen_mm_list mm_list; #endif =20 + /* async recharge of mapped folios for offline memcgs */ + struct { + struct delayed_work dwork; + int retries; + } recharge_mapped_work; + struct mem_cgroup_per_node *nodeinfo[]; }; =20 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ffdb848f4003..a46bc8f000c8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -96,6 +96,8 @@ static bool cgroup_memory_nobpf __ro_after_init; static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq); #endif =20 +static struct workqueue_struct *memcg_recharge_wq; + /* Whether legacy memory+swap accounting is active */ static bool do_memsw_account(void) { @@ -5427,6 +5429,8 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) return -ENOMEM; } =20 +static void memcg_recharge_mapped_folios(struct mem_cgroup *memcg); + static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); @@ -5455,6 +5459,8 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) drain_all_stock(memcg); =20 mem_cgroup_id_put(memcg); + + memcg_recharge_mapped_folios(memcg); } =20 static void mem_cgroup_css_released(struct cgroup_subsys_state *css) @@ -5487,6 +5493,7 @@ static void mem_cgroup_css_free(struct cgroup_subsys_= state *css) =20 vmpressure_cleanup(&memcg->vmpressure); cancel_work_sync(&memcg->high_work); + cancel_delayed_work_sync(&memcg->recharge_mapped_work.dwork); mem_cgroup_remove_from_trees(memcg); free_shrinker_info(memcg); mem_cgroup_free(memcg); @@ -6367,6 +6374,219 @@ static void mem_cgroup_move_task(void) } #endif =20 +/* Returns true if recharging is successful */ +static bool mem_cgroup_recharge_folio(struct folio *folio, + struct mem_cgroup *new_memcg) +{ + struct mem_cgroup *old_memcg =3D folio_memcg(folio); + gfp_t gfp =3D GFP_KERNEL | __GFP_RETRY_MAYFAIL; + long nr_pages =3D folio_nr_pages(folio); + int err =3D -1; + + if (!new_memcg) + goto out; + + err =3D try_charge(new_memcg, gfp, nr_pages); + if (err) + goto out; + + err =3D mem_cgroup_move_account(&folio->page, folio_test_large(folio), + old_memcg, new_memcg); + cancel_charge(err ? new_memcg : old_memcg, nr_pages); +out: + return err =3D=3D 0; +} + +struct folio_memcg_rmap_recharge_arg { + bool recharged; +}; + +static bool folio_memcg_rmap_recharge_one(struct folio *folio, + struct vm_area_struct *vma, unsigned long address, void *arg) +{ + struct folio_memcg_rmap_recharge_arg *recharge_arg =3D arg; + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + struct mem_cgroup *memcg; + + /* + * page_vma_mapped_walk() is only needed to grab any pte lock to + * serialize with page_remove_rmap(), as folio_mapped() must remain + * stable during the move. + */ + recharge_arg->recharged =3D false; + while (page_vma_mapped_walk(&pvmw)) { + memcg =3D get_mem_cgroup_from_mm(vma->vm_mm); + if (mem_cgroup_recharge_folio(folio, memcg)) + recharge_arg->recharged =3D true; + mem_cgroup_put(memcg); + page_vma_mapped_walk_done(&pvmw); + break; + } + + /* stop the rmap walk if we were successful */ + return !recharge_arg->recharged; +} + +/* Returns true if recharging is successful */ +static bool folio_memcg_rmap_recharge(struct folio *folio) +{ + struct folio_memcg_rmap_recharge_arg arg =3D { .recharged =3D false }; + struct rmap_walk_control rwc =3D { + .rmap_one =3D folio_memcg_rmap_recharge_one, + .arg =3D (void *)&arg, + .anon_lock =3D folio_lock_anon_vma_read, + .try_lock =3D true, + }; + + rmap_walk(folio, &rwc); + return arg.recharged; +} + +static unsigned long lruvec_nr_local_mapped_pages(struct lruvec *lruvec) +{ + return lruvec_page_state_local(lruvec, NR_ANON_MAPPED) + + lruvec_page_state_local(lruvec, NR_FILE_MAPPED); +} + +static unsigned long memcg_nr_local_mapped_pages(struct mem_cgroup *memcg) +{ + return memcg_page_state_local(memcg, NR_ANON_MAPPED) + + memcg_page_state_local(memcg, NR_FILE_MAPPED); +} + +static bool memcg_recharge_lruvec_list(struct lruvec *lruvec, + struct list_head *list, + enum lru_list lru, + void *arg) +{ + int isolated_idx =3D NR_ISOLATED_ANON + is_file_lru(lru); + struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); + unsigned long *nr_recharged =3D arg; + unsigned long nr_staged =3D 0; + LIST_HEAD(folios_skipped); + LIST_HEAD(folios_staged); + struct folio *folio; + + /* Are we done with mapped pages on this node? */ + if (!lruvec_nr_local_mapped_pages(lruvec)) + return true; + + /* + * Iterating the LRUs here is tricky, because we + * usually cannot iterate the entire list with the + * lock held, and the LRU can change once we release it. + * + * What we try to do is isolate as many folios as we can + * without hogging the lock or the cpu. We need to move + * all the folios we iterate off the list to try to + * avoid re-visiting them on retries. + * + * The folios we are interested in are moved to + * @folios_staged, and other folios are moved to + * @folios_skipped. Before releasing the lock, we splice + * @folios_skipped back into the beginning of the LRU. + * This essentially rotates the LRU, similar to reclaim, + * as lru_to_folio() fetches folios from the end of the + * LRU. + */ + spin_lock_irq(&lruvec->lru_lock); + while (!list_empty(list) && !need_resched() && + !spin_is_contended(&lruvec->lru_lock)) { + folio =3D lru_to_folio(list); + if (!folio_mapped(folio)) { + list_move(&folio->lru, &folios_skipped); + continue; + } + + if (unlikely(!folio_try_get(folio))) { + list_move(&folio->lru, &folios_skipped); + continue; + } + + if (!folio_test_clear_lru(folio)) { + list_move(&folio->lru, &folios_skipped); + folio_put(folio); + continue; + } + + lruvec_del_folio(lruvec, folio); + list_add(&folio->lru, &folios_staged); + nr_staged +=3D folio_nr_pages(folio); + } + list_splice(&folios_skipped, list); + spin_unlock_irq(&lruvec->lru_lock); + + mod_lruvec_state(lruvec, isolated_idx, nr_staged); + mem_cgroup_start_move_charge(memcg); + while (!list_empty(&folios_staged)) { + folio =3D lru_to_folio(&folios_staged); + list_del(&folio->lru); + + if (!folio_trylock(folio)) { + folio_putback_lru(folio); + continue; + } + + if (folio_memcg_rmap_recharge(folio)) + *nr_recharged +=3D folio_nr_pages(folio); + + folio_unlock(folio); + folio_putback_lru(folio); + cond_resched(); + } + mem_cgroup_end_move_charge(memcg); + mod_lruvec_state(lruvec, isolated_idx, -nr_staged); + return false; +} + +static void memcg_do_recharge_mapped_folios(struct work_struct *work) +{ + struct delayed_work *dwork =3D to_delayed_work(work); + struct mem_cgroup *memcg =3D container_of(dwork, struct mem_cgroup, + recharge_mapped_work.dwork); + unsigned long nr_recharged =3D 0; + struct lruvec *lruvec; + int nid; + + lru_add_drain_all(); + for_each_node_state(nid, N_MEMORY) { + lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(nid)); + lruvec_for_each_list(lruvec, memcg_recharge_lruvec_list, + &nr_recharged); + } + + /* Are we done with all mapped folios? */ + if (!memcg_nr_local_mapped_pages(memcg)) + return; + + /* Did we make progress? reset retries */ + if (nr_recharged > 0) + memcg->recharge_mapped_work.retries =3D 0; + + /* Exponentially increase delay before each retry (from 1ms to ~32s) */ + if (memcg->recharge_mapped_work.retries < MAX_RECLAIM_RETRIES) + queue_delayed_work(memcg_recharge_wq, + &memcg->recharge_mapped_work.dwork, + 1 << memcg->recharge_mapped_work.retries++); +} + +static void memcg_recharge_mapped_folios(struct mem_cgroup *memcg) +{ + /* + * We need to initialize the work here, even if we are not going to + * queue it, such that cancel_delayed_work_sync() in + * mem_cgroup_css_free() does not complain. + */ + INIT_DELAYED_WORK(&memcg->recharge_mapped_work.dwork, + memcg_do_recharge_mapped_folios); + + if (memcg_recharge_wq && memcg_nr_local_mapped_pages(memcg)) { + memcg->recharge_mapped_work.retries =3D 0; + queue_delayed_work(memcg_recharge_wq, + &memcg->recharge_mapped_work.dwork, 0); + } +} + #ifdef CONFIG_LRU_GEN static void mem_cgroup_attach(struct cgroup_taskset *tset) { @@ -7904,3 +8124,13 @@ static int __init mem_cgroup_swap_init(void) subsys_initcall(mem_cgroup_swap_init); =20 #endif /* CONFIG_SWAP */ + +static int __init memcg_recharge_wq_init(void) +{ + if (mem_cgroup_disabled()) + return 0; + memcg_recharge_wq =3D alloc_workqueue("memcg_recharge", WQ_UNBOUND, 0); + WARN_ON(!memcg_recharge_wq); + return 0; +} +subsys_initcall(memcg_recharge_wq_init); --=20 2.41.0.255.g8b1d071c50-goog From nobody Sun Sep 7 14:37:46 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19868EB64DA for ; Thu, 20 Jul 2023 07:08:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231517AbjGTHIz (ORCPT ); Thu, 20 Jul 2023 03:08:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231248AbjGTHIi (ORCPT ); Thu, 20 Jul 2023 03:08:38 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A2942684 for ; Thu, 20 Jul 2023 00:08:36 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-56fffdea2d0so5420707b3.1 for ; Thu, 20 Jul 2023 00:08:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836915; x=1692428915; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pwRzTa1HWBGDz2y6kKmJLG3qmZZBS7SfdYMDl6S6pYU=; b=buokeDrFKzaUQNx7mhOgxc7yhdstiZpTj86UXwGCh3nTqt6qszmfeuRufu8Lj7sJoB FZ8d+MffXHna2mt12Y1ia8+ZNaiYEL4KKqiWl5QyksN4EKDtA5MmamlaJzubK1L4iKoj JnIOyZVRWPOZEDrLS+Z7GpFAMpF6YowvDF+xXDYfmorPwNBAYyVmDpwyqMhca813Xnoe RKGYMhZTq7+fZsmgRNH21NnT88RSM7deT4LXZ9ISfVkZGDcZnSBTWZjY2y9bjfmsBNsN UTtLHPVn8pgWNVg/M6uRpXBrkyjKtRhGZuuMQmP3rtFreh0ZALGBo7Ltyu1x30E1hL2z 2vZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836915; x=1692428915; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pwRzTa1HWBGDz2y6kKmJLG3qmZZBS7SfdYMDl6S6pYU=; b=kiSvrOrfMZeuGUrBeQlSegxNuFK7A9+xleHREnrYVqPY6qSlPmj3AntcHeTuWhiMM/ nJLRcTYNavnnnCfUtp0H0lEzDumSzDgZEKRVLOWLM/Py8sBQUSNi/T86oahawihT0QWA h0vW6/joR0Axp9fx46Fxi7X3KW8uOKg7fB1SZRugBuKDAI3PwGFxXgbIhGLyu2dped1K uhywnXXV6fR0YspiOsF4wai8Ks3ej7nOH3E91YrpP2dnYIaE6cwxiBNPDuWWrn7Qexjl fJDnJ833uW4NvxbBVB3MCXd1YRam1yJH8soFTbBzDMAs89ZlVUw3Hdd5Qs622BCJI0S6 KIyQ== X-Gm-Message-State: ABy/qLaz5TjRfukd2M5bQAjPTzKDHjT22Et9KaTKnWfg/KjMyni7ysIh lkSu4EwT6MGXW+G0W5RzQC83kwVSblM6m/rg X-Google-Smtp-Source: APBJJlE07xGsF+T8VwpNf+vzMb7xCWgMW2wkAUXSoba/cYRX8PQTLWxD8xY/UgGAeecgR6Yrkm/cfEiUn0RQpzi8 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a81:8d51:0:b0:579:fa4c:1f22 with SMTP id w17-20020a818d51000000b00579fa4c1f22mr224500ywj.6.1689836915238; Thu, 20 Jul 2023 00:08:35 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:21 +0000 In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-5-yosryahmed@google.com> Subject: [RFC PATCH 4/8] memcg: support deferred memcg recharging From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The previous patch added support for memcg recharging for mapped folios when a memcg goes offline. For unmapped folios, it is not straighforward to find a rightful memcg to recharge the folio to. Hence, add support for deferred recharging. Deferred recharging provides a hook that can be added in data access paths: folio_memcg_deferred_recharge(). folio_memcg_deferred_recharge() will check if the memcg that the folio is charged to is offline. If so, it will queue an asynchronous worker to attempt to recharge the folio to the memcg of the process accessing the folio. An asynchronous worker is used for 2 reasons: (a) Avoid expensive operations on the data access path. (b) Acquring some locks (e.g. folio lock, lruvec lock) is not safe to do from all contexts. Deferring recharging will not cause an OOM kill in the target memcg. If recharging fails for any reason, the worker reschedules itself to retry, unless the folio is freed or the target memcg goes offline. Signed-off-by: Yosry Ahmed --- include/linux/memcontrol.h | 6 ++ mm/memcontrol.c | 125 +++++++++++++++++++++++++++++++++++-- 2 files changed, 126 insertions(+), 5 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b41d69685ead..59b653d4a76e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -956,6 +956,8 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memc= g); void folio_memcg_lock(struct folio *folio); void folio_memcg_unlock(struct folio *folio); =20 +void folio_memcg_deferred_recharge(struct folio *folio); + void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val); =20 /* try to stablize folio_memcg() for all the pages in a memcg */ @@ -1461,6 +1463,10 @@ static inline void mem_cgroup_unlock_pages(void) rcu_read_unlock(); } =20 +static inline void folio_memcg_deferred_recharge(struct folio *folio) +{ +} + static inline void mem_cgroup_handle_over_high(void) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a46bc8f000c8..cf9fb51ecfcc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6398,6 +6398,7 @@ static bool mem_cgroup_recharge_folio(struct folio *f= olio, } =20 struct folio_memcg_rmap_recharge_arg { + struct mem_cgroup *memcg; bool recharged; }; =20 @@ -6415,10 +6416,12 @@ static bool folio_memcg_rmap_recharge_one(struct fo= lio *folio, */ recharge_arg->recharged =3D false; while (page_vma_mapped_walk(&pvmw)) { - memcg =3D get_mem_cgroup_from_mm(vma->vm_mm); + memcg =3D recharge_arg->memcg ?: + get_mem_cgroup_from_mm(vma->vm_mm); if (mem_cgroup_recharge_folio(folio, memcg)) recharge_arg->recharged =3D true; - mem_cgroup_put(memcg); + if (!recharge_arg->memcg) + mem_cgroup_put(memcg); page_vma_mapped_walk_done(&pvmw); break; } @@ -6428,9 +6431,13 @@ static bool folio_memcg_rmap_recharge_one(struct fol= io *folio, } =20 /* Returns true if recharging is successful */ -static bool folio_memcg_rmap_recharge(struct folio *folio) +static bool folio_memcg_rmap_recharge(struct folio *folio, + struct mem_cgroup *memcg) { - struct folio_memcg_rmap_recharge_arg arg =3D { .recharged =3D false }; + struct folio_memcg_rmap_recharge_arg arg =3D { + .recharged =3D false, + .memcg =3D memcg, + }; struct rmap_walk_control rwc =3D { .rmap_one =3D folio_memcg_rmap_recharge_one, .arg =3D (void *)&arg, @@ -6527,7 +6534,7 @@ static bool memcg_recharge_lruvec_list(struct lruvec = *lruvec, continue; } =20 - if (folio_memcg_rmap_recharge(folio)) + if (folio_memcg_rmap_recharge(folio, NULL)) *nr_recharged +=3D folio_nr_pages(folio); =20 folio_unlock(folio); @@ -6587,6 +6594,114 @@ static void memcg_recharge_mapped_folios(struct mem= _cgroup *memcg) } } =20 +/* Result is only stable if @folio is locked */ +static bool should_do_deferred_recharge(struct folio *folio) +{ + struct mem_cgroup *memcg; + bool ret; + + rcu_read_lock(); + memcg =3D folio_memcg_rcu(folio); + ret =3D memcg && !!(memcg->css.flags & CSS_DYING); + rcu_read_unlock(); + + return ret; +} + +struct deferred_recharge_work { + struct folio *folio; + struct mem_cgroup *memcg; + struct work_struct work; +}; + +static void folio_memcg_do_deferred_recharge(struct work_struct *work) +{ + struct deferred_recharge_work *recharge_work =3D container_of(work, + struct deferred_recharge_work, work); + struct folio *folio =3D recharge_work->folio; + struct mem_cgroup *new =3D recharge_work->memcg; + struct mem_cgroup *old; + + /* We are holding the last ref to the folio, let it be freed */ + if (unlikely(folio_ref_count(folio) =3D=3D 1)) + goto out; + + if (!folio_isolate_lru(folio)) + goto out; + + if (unlikely(!folio_trylock(folio))) + goto out_putback; + + /* @folio was already recharged since the worker was queued? */ + if (unlikely(!should_do_deferred_recharge(folio))) + goto out_unlock; + + /* @folio was already recharged to @new and it already went offline? */ + old =3D folio_memcg(folio); + if (unlikely(old =3D=3D new)) + goto out_unlock; + + /* + * folio_mapped() must remain stable during the move. If the folio is + * mapped, we must use rmap recharge to serialize against unmapping. + * Otherwise, if the folio is unmapped, the folio lock is held so this + * should prevent faults against the pagecache or swapcache to map it. + */ + mem_cgroup_start_move_charge(old); + if (folio_mapped(folio)) + folio_memcg_rmap_recharge(folio, new); + else + mem_cgroup_recharge_folio(folio, new); + mem_cgroup_end_move_charge(old); + +out_unlock: + folio_unlock(folio); +out_putback: + folio_putback_lru(folio); +out: + folio_put(folio); + mem_cgroup_put(new); + kfree(recharge_work); +} + +/* + * Queue deferred work to recharge @folio to current's memcg if needed. + */ +void folio_memcg_deferred_recharge(struct folio *folio) +{ + struct deferred_recharge_work *recharge_work =3D NULL; + struct mem_cgroup *memcg =3D NULL; + + /* racy check, the async worker will check again with @folio locked */ + if (likely(!should_do_deferred_recharge(folio))) + return; + + if (unlikely(!memcg_recharge_wq)) + return; + + if (unlikely(!folio_try_get(folio))) + return; + + memcg =3D get_mem_cgroup_from_mm(current->mm); + if (!memcg) + goto fail; + + recharge_work =3D kmalloc(sizeof(*recharge_work), GFP_ATOMIC); + if (!recharge_work) + goto fail; + + /* we hold refs to both the folio and the memcg we are charging to */ + recharge_work->folio =3D folio; + recharge_work->memcg =3D memcg; + INIT_WORK(&recharge_work->work, folio_memcg_do_deferred_recharge); + queue_work(memcg_recharge_wq, &recharge_work->work); + return; +fail: + kfree(recharge_work); + mem_cgroup_put(memcg); + folio_put(folio); +} + #ifdef CONFIG_LRU_GEN static void mem_cgroup_attach(struct cgroup_taskset *tset) { --=20 2.41.0.255.g8b1d071c50-goog From nobody Sun Sep 7 14:37:46 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB17DEB64DA for ; Thu, 20 Jul 2023 07:09:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231394AbjGTHJF (ORCPT ); Thu, 20 Jul 2023 03:09:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230147AbjGTHIq (ORCPT ); Thu, 20 Jul 2023 03:08:46 -0400 Received: from mail-oo1-xc4a.google.com (mail-oo1-xc4a.google.com [IPv6:2607:f8b0:4864:20::c4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15CF92694 for ; Thu, 20 Jul 2023 00:08:38 -0700 (PDT) Received: by mail-oo1-xc4a.google.com with SMTP id 006d021491bc7-5634c4df8c4so838478eaf.1 for ; Thu, 20 Jul 2023 00:08:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836917; x=1692428917; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OzwAcynUjm4TDDZtYUdTlt8GRef6hCkFmXsjGPYLvQA=; b=QdsbdWEvmo4m7dFreE18whaI10HNxA5U6Eq4dlh5z46nHRuNKvgnmtyQrlecaO4sV8 pWMQ+xztiQx/wc1YGWhMYzOCtuiM6PjW82kZXGkYJ68Xwsx5KrsGhi4OgbRK80gIgcB7 clLYTavc+M9b93VW/QkD9WL4K9KunD/+0BOFjNktnoseyObLt2ro/bWXeUjfsH//HPvv iKo/LdVH7XZs+1rJF1M/AwxhXSl+9z13Axu/a0fHLc9te/mN1I0tCp0Us23Ispa/XUDp ufe5XWIgsZtbi6tiF99R36YBIvSUA+Db8luFMXb3WtMXUvnpGh0pFJrrPbseVvr7ZJUC YeoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836917; x=1692428917; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OzwAcynUjm4TDDZtYUdTlt8GRef6hCkFmXsjGPYLvQA=; b=lm6Q2GPAFex8aVBp72Rh4uCBJ7lZoY7F5Yh27gC40a1SNt6LK43Z1qv/tqhuBukToR m1nh5FEWdidPezq9BTt62WpGPyFmpr+u+9r9UsJnNbbRoQVHULDjYE/woOkUpZM+wzb+ leAKTRCf0M+GNSIj8njyy9s6IB6YKbwELyS8GgjPGJt//ISJj/vsH+EdmYVplEesoqGq OIN5NgNVSNyQmCZHtlNowQnUPkqZXlE3TIT80lo9okitRwMWNpKZyZFFcFzbD6pLkvaV rMgUnJJ2xbi8/AEApelBNDlHT7bNb8VFsz5clCt5x4n3CTOi1XcDnaMXi9hEOPeLfeqj 7+/A== X-Gm-Message-State: ABy/qLbTGoM3msKws6VhhzlnUvUsHx5Yj+PrWd76/6iza2+3718XsHbA GP6/apsK5hMYyUckDH1RkYRl6OFFv2XPMfYo X-Google-Smtp-Source: APBJJlHxP5P9yoiXFXdFsa+fxsrJ91qoDIJe3rq084GoRyXYn6ED56dX4GksMdKRSpbhY7hotG9fq78xxc+urVcp X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6820:34a:b0:563:4841:891f with SMTP id m10-20020a056820034a00b005634841891fmr2470502ooe.0.1689836917359; Thu, 20 Jul 2023 00:08:37 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:22 +0000 In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-6-yosryahmed@google.com> Subject: [RFC PATCH 5/8] memcg: recharge folios when accessed or dirtied From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The previous patch provided support for deferred recharging of folios when their memcgs go offline. This patch adds recharging hooks to folio_mark_accessed() and folio_mark_dirty(). This should cover a variety of code paths where folios are accessed by userspace. The hook, folio_memcg_deferred_recharge() only checks if the folio is charged to an offline memcg in the common fast path (i.e checks folio->memcg_data). If yes, an asynchronous worker is queued to do the actual work. Signed-off-by: Yosry Ahmed --- mm/page-writeback.c | 2 ++ mm/swap.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index d3f42009bb70..a644530d98c7 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2785,6 +2785,8 @@ bool folio_mark_dirty(struct folio *folio) { struct address_space *mapping =3D folio_mapping(folio); =20 + folio_memcg_deferred_recharge(folio); + if (likely(mapping)) { /* * readahead/folio_deactivate could remain diff --git a/mm/swap.c b/mm/swap.c index cd8f0150ba3a..296c0b87c967 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -457,6 +457,8 @@ static void folio_inc_refs(struct folio *folio) */ void folio_mark_accessed(struct folio *folio) { + folio_memcg_deferred_recharge(folio); + if (lru_gen_enabled()) { folio_inc_refs(folio); return; --=20 2.41.0.255.g8b1d071c50-goog From nobody Sun Sep 7 14:37:46 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4923CEB64DC for ; Thu, 20 Jul 2023 07:09:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231539AbjGTHJR (ORCPT ); Thu, 20 Jul 2023 03:09:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231425AbjGTHIr (ORCPT ); Thu, 20 Jul 2023 03:08:47 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33A4C26A5 for ; Thu, 20 Jul 2023 00:08:39 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bd69bb4507eso458315276.2 for ; Thu, 20 Jul 2023 00:08:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836919; x=1692428919; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YhI2s0e+GIGqReCnHtRMQPrzSnIqjbw1XBQvBPYjKY0=; b=oBM+gyrvc4kZmfzCDqqW014mddb+6ng6aUJcvm1PGzKPmmatk8vBmxnaO+jvawg8Cc cf0h8c+uRGKv+PpUjZOjju465YCs/RMkGS7FOfpfBg905u8KA48PZPv9pv5l5xU64hAu RRIbJwf2dG4G9KtiFyA18EerxfR/ghn9NkJLcaUskUA8yTC4t3T0XaugdSZxINO7vhKx oM7NXvmkzb7JDGBaiO8CTn60src/z33GmAEJWb+ZSh6IMofXorSGDhYJD6lnOVdfNi/O 3BRMWtylTLjUhnMgpCj7SaSMPCTWyEH1d0Cr6gcddMeYKbX7Wxwr5jKmhdVPpppLRUDQ nHOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836919; x=1692428919; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YhI2s0e+GIGqReCnHtRMQPrzSnIqjbw1XBQvBPYjKY0=; b=M0AZA5LjIwY55/Z4oLho8vHXZfjD5WeNujbUOFO0PSx/yYG9G8jOasme6bS/Kl87Ko e1N0jeAUdCnPZGqa1fq/2gPpQ9rdd3cSdbsmJ7RVJnZODvoBnLBswpBcik1vzuZsGgT7 3csYrDedQzi8/DDrc0F2e4bs7bR9mFedx42rJ8rNvQmf/eHciEHg/0qXenPAJW0t+8w+ 3VLgERltCPjWRIPAaPRHPshZ87czCluF+lTMjKBDYj7VoEi2k+HnLTNzoJYcdzFlw5J5 J17yoIvk4V5vjnxZMSyL15iRFyXApQsNCinskCfexNV1ogn6fqq1YeciVZktms/5WF20 7FWQ== X-Gm-Message-State: ABy/qLZs/of71GWIu7Qf9uQeYs/tL5X9dE1870bVNZ/4O1DQXfOUavyA xKUFJdb/AFSHhy/xrD4Xiv8pBVjzjPVCAMfv X-Google-Smtp-Source: APBJJlF2n+4mQLRSUDirJE55LlRkKhJrzGh/hC5rHcN/nPwLMBMZVZTXUQayuWTk1KX2FFq+vwWoN66wVm+m0JP7 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a25:ab04:0:b0:bfe:ea69:91b4 with SMTP id u4-20020a25ab04000000b00bfeea6991b4mr36455ybi.4.1689836919167; Thu, 20 Jul 2023 00:08:39 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:23 +0000 In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-7-yosryahmed@google.com> Subject: [RFC PATCH 6/8] memcg: add stats for offline memcgs recharging From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add vm events for scanning pages for recharge, successfully recharging pages, and cancelling a recharge due to failure to charge the target memcg. Signed-off-by: Yosry Ahmed --- include/linux/vm_event_item.h | 5 +++++ mm/memcontrol.c | 6 ++++++ mm/vmstat.c | 6 +++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 8abfa1240040..cd80c00c50c2 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -60,6 +60,11 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, PAGEOUTRUN, PGROTATED, DROP_PAGECACHE, DROP_SLAB, OOM_KILL, +#ifdef CONFIG_MEMCG + RECHARGE_PGSCANNED, + RECHARGE_PGMOVED, + RECHARGE_PGCANCELLED, +#endif #ifdef CONFIG_NUMA_BALANCING NUMA_PTE_UPDATES, NUMA_HUGE_PTE_UPDATES, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index cf9fb51ecfcc..2fe9c6f1be80 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6394,6 +6394,8 @@ static bool mem_cgroup_recharge_folio(struct folio *f= olio, old_memcg, new_memcg); cancel_charge(err ? new_memcg : old_memcg, nr_pages); out: + count_vm_events(err ? RECHARGE_PGCANCELLED : RECHARGE_PGMOVED, + nr_pages); return err =3D=3D 0; } =20 @@ -6469,6 +6471,7 @@ static bool memcg_recharge_lruvec_list(struct lruvec = *lruvec, int isolated_idx =3D NR_ISOLATED_ANON + is_file_lru(lru); struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); unsigned long *nr_recharged =3D arg; + unsigned long nr_scanned =3D 0; unsigned long nr_staged =3D 0; LIST_HEAD(folios_skipped); LIST_HEAD(folios_staged); @@ -6505,6 +6508,7 @@ static bool memcg_recharge_lruvec_list(struct lruvec = *lruvec, continue; } =20 + nr_scanned +=3D folio_nr_pages(folio); if (unlikely(!folio_try_get(folio))) { list_move(&folio->lru, &folios_skipped); continue; @@ -6543,6 +6547,7 @@ static bool memcg_recharge_lruvec_list(struct lruvec = *lruvec, } mem_cgroup_end_move_charge(memcg); mod_lruvec_state(lruvec, isolated_idx, -nr_staged); + count_vm_events(RECHARGE_PGSCANNED, nr_scanned); return false; } =20 @@ -6679,6 +6684,7 @@ void folio_memcg_deferred_recharge(struct folio *foli= o) if (unlikely(!memcg_recharge_wq)) return; =20 + count_vm_events(RECHARGE_PGSCANNED, folio_nr_pages(folio)); if (unlikely(!folio_try_get(folio))) return; =20 diff --git a/mm/vmstat.c b/mm/vmstat.c index b731d57996c5..e425a1aa7890 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1303,7 +1303,11 @@ const char * const vmstat_text[] =3D { "drop_pagecache", "drop_slab", "oom_kill", - +#ifdef CONFIG_MEMCG + "recharge_pgs_scanned", + "recharge_pgs_moved", + "recharge_pgs_cancelled", +#endif #ifdef CONFIG_NUMA_BALANCING "numa_pte_updates", "numa_huge_pte_updates", --=20 2.41.0.255.g8b1d071c50-goog From nobody Sun Sep 7 14:37:46 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C8E8EB64DA for ; Thu, 20 Jul 2023 07:09:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231461AbjGTHJZ (ORCPT ); Thu, 20 Jul 2023 03:09:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231458AbjGTHIs (ORCPT ); Thu, 20 Jul 2023 03:08:48 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B270626BB for ; Thu, 20 Jul 2023 00:08:41 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-56942442eb0so5423057b3.1 for ; Thu, 20 Jul 2023 00:08:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836921; x=1692428921; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=n6O0Lt5KhyslMSHzCGrHaCXx7KAwNPu5E55KNetzzTM=; b=sV87N/HxxcZHHL1alJO5nBf8HzjYMdw6nkcR38Ddq2rUuAf28cTRr0rb709fkiRMW+ qEy3FdhFn01ikwkfx7y9qLguXlaMHGLFmuRtJXH+L+mW9RyHbcOJWlAUGrGs6aAu/INw 41Y2qTn8C7MjL6RvRw/y0YgIGxiJWL68zL7cWNRAYxyzZ5Cms/cFb5dUzlTJ6iCOM/ec 43odK5EQY91sLY+01W8AhKo4cVLmfc3JuvWr3teNDoKfkk1Lyj7dpYXltbO4KTOFr6c4 DheP8D0OnAwjx0EXPzUp5uV1Ui6HMxgha12C+iK56/iEaIj2+ON6C0S7v2ijUdptonk9 M9ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836921; x=1692428921; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=n6O0Lt5KhyslMSHzCGrHaCXx7KAwNPu5E55KNetzzTM=; b=P+jr0rTmDwhcR8jue8e5J7Y1q8Ijin946yTKrN2t8cyfu/JB+x+3+rtWCj6f/WkcV7 c6fPtkZsj1DHJxA0sEzxBoszjbpg+Iy+fzfcBbPEPgmkLB7HyecRlkgOsO1XYFFxF7eL KEkTrptVO0Rmzp2ETz8YWEbMH6f8qprdul6rMCRMYRSijNn9Hpv81PVnwKuWhepjPJDw fG1/inRRC0Ya0K6N61/Fr45Z0L+zatzSE/WXCclQYhM4CekBja1+av0pToWz5OFale+1 wP70X3l60NuX2ROVXb8Inho5OEzgUOp62MiC9e66oMvrAznYC0wLLp+2HLurRlREgygL Anwg== X-Gm-Message-State: ABy/qLY9veaZ+aiMzSM9RcXqtEP9HDQxp3xsVxtXyb4xTKD2X8R13qhs 9rgDzPkn0b8RDD/jZD83nXCTTkjbhQVs/LKl X-Google-Smtp-Source: APBJJlGMoGTZrbru1fVb9JslthbAiF2wsIl2HsR1Hk3nalG2jl6ctXDHaNYmeI9gMwOHtknP/aKqVV0pkkfu/CUf X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a5b:b8b:0:b0:c4c:f97e:421a with SMTP id l11-20020a5b0b8b000000b00c4cf97e421amr39950ybq.4.1689836920855; Thu, 20 Jul 2023 00:08:40 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:24 +0000 In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-8-yosryahmed@google.com> Subject: [RFC PATCH 7/8] memcg: add sysctl and config option to control memory recharging From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a sysctl to enable/disable memory recharging for offline memcgs. Add a config option to control whether or not it is enabled by default. Signed-off-by: Yosry Ahmed --- include/linux/memcontrol.h | 2 ++ kernel/sysctl.c | 11 +++++++++++ mm/Kconfig | 12 ++++++++++++ mm/memcontrol.c | 9 ++++++++- 4 files changed, 33 insertions(+), 1 deletion(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 59b653d4a76e..ae9f09ee90cb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -60,6 +60,8 @@ struct mem_cgroup_reclaim_cookie { =20 #ifdef CONFIG_MEMCG =20 +extern int sysctl_recharge_offline_memcgs; + #define MEM_CGROUP_ID_SHIFT 16 #define MEM_CGROUP_ID_MAX USHRT_MAX =20 diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 354a2d294f52..1735d1d95652 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2249,6 +2249,17 @@ static struct ctl_table vm_table[] =3D { .extra2 =3D (void *)&mmap_rnd_compat_bits_max, }, #endif +#ifdef CONFIG_MEMCG + { + .procname =3D "recharge_offline_memcgs", + .data =3D &sysctl_recharge_offline_memcgs, + .maxlen =3D sizeof(sysctl_recharge_offline_memcgs), + .mode =3D 0644, + .proc_handler =3D proc_dointvec_minmax, + .extra1 =3D SYSCTL_ZERO, + .extra2 =3D SYSCTL_ONE, + }, +#endif /* CONFIG_MEMCG */ { } }; =20 diff --git a/mm/Kconfig b/mm/Kconfig index 09130434e30d..9462c4b598d9 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1236,6 +1236,18 @@ config LOCK_MM_AND_FIND_VMA bool depends on !STACK_GROWSUP =20 +config MEMCG_RECHARGE_OFFLINE_ENABLED + bool "Recharge memory charged to offline memcgs" + depends on MEMCG + help + When a memory cgroup is removed by userspace, try to recharge any + memory still charged to it to avoid having it live on as an offline + memcg. Offline memcgs potentially consume memory and limit scalability + of some operations. + + This option enables the above behavior by default. It can be override + at runtime through /proc/sys/vm/recharge_offline_memcgs. + source "mm/damon/Kconfig" =20 endmenu diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2fe9c6f1be80..25cdb17eaaa3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -96,6 +96,9 @@ static bool cgroup_memory_nobpf __ro_after_init; static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq); #endif =20 +int sysctl_recharge_offline_memcgs __read_mostly =3D IS_ENABLED( + CONFIG_MEMCG_RECHARGE_OFFLINE_ENABLED); + static struct workqueue_struct *memcg_recharge_wq; =20 /* Whether legacy memory+swap accounting is active */ @@ -6592,7 +6595,8 @@ static void memcg_recharge_mapped_folios(struct mem_c= group *memcg) INIT_DELAYED_WORK(&memcg->recharge_mapped_work.dwork, memcg_do_recharge_mapped_folios); =20 - if (memcg_recharge_wq && memcg_nr_local_mapped_pages(memcg)) { + if (sysctl_recharge_offline_memcgs && + memcg_recharge_wq && memcg_nr_local_mapped_pages(memcg)) { memcg->recharge_mapped_work.retries =3D 0; queue_delayed_work(memcg_recharge_wq, &memcg->recharge_mapped_work.dwork, 0); @@ -6605,6 +6609,9 @@ static bool should_do_deferred_recharge(struct folio = *folio) struct mem_cgroup *memcg; bool ret; =20 + if (!sysctl_recharge_offline_memcgs) + return false; + rcu_read_lock(); memcg =3D folio_memcg_rcu(folio); ret =3D memcg && !!(memcg->css.flags & CSS_DYING); --=20 2.41.0.255.g8b1d071c50-goog From nobody Sun Sep 7 14:37:46 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93EB9EB64DA for ; Thu, 20 Jul 2023 07:09:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231591AbjGTHJf (ORCPT ); Thu, 20 Jul 2023 03:09:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230291AbjGTHIu (ORCPT ); Thu, 20 Jul 2023 03:08:50 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C9AC270C for ; Thu, 20 Jul 2023 00:08:43 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-cb8263615d7so475357276.1 for ; Thu, 20 Jul 2023 00:08:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836922; x=1692428922; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VyMIM9+tcYiRGMTsBGjvi0vU3Mwws8djANktEDDglwA=; b=byOeZzDwSe+TlCjYRKBoXqqWywqR6C1yG/A40dOXCXNDbvhvWImB25mBWajRKzlpsd 7pw0yGkSnP78uE/qIJ91xiFtjZ8WsotdIkdUAUo9pCwMIjy6NapBzMWDaD5jSQ5WRKqG Se0fctUKe/3aEu5ubRL9pul76zHCCtINkNBEyKURGH2aFpknJ55HycotBhqzgpzYPQKE UJtv9JLEiNXgfyyCUh834Zn94be/WG5kodBP7zmNrBOk5cu/mZ3dShiUkQYtjSSYO1Rs 4gRgPSThLVz2L65vzOc7OySwGSHuufaREX+F7eqZX0F2Ko4WmwxK9yLR5upAijtlIKWO +cGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836922; x=1692428922; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VyMIM9+tcYiRGMTsBGjvi0vU3Mwws8djANktEDDglwA=; b=l3lnzggkIXNlIM57sO+q5/QTKbraS6Ef+Q2r199w1mOkh8UTQj7uXC0GX1x3vax9E8 RTjxMQCwvwYKyscw14Do4jwFfGX9fhAfn/CbseZQ1bibfwKP8n+Fcom8ke+IqR5NIdDM jU0M5BaOUcvhrfhOdUfr4ZUCSW7TNQPofqtZAZzWcAwtD7LKAP89bRVZtLLdKZdfKOnY wozW30atVEu5DL7Ibugxd8g38S4KqCIj95+AXEHAmubT1OnRlsm+gFI8MQfIyDYWVu62 Pt2OOuu7O6cjKMvbSI5EW2P9nqbBwggA/RxsfDAcQHklpqf6NEWapZUNdgbtWDGSgLGO mCTg== X-Gm-Message-State: ABy/qLaJjnGHAs31CmIpaLaXHgmwfZX6RsVrKiARY8dRHIp7yVy5CGhB yKbWzTVSI/P+nhqCKnNl2RsxL+fV/BAYoqQB X-Google-Smtp-Source: APBJJlGnnUAk0gRTSVnJDIUOzDDJab7OFbTWyp59V1sIEHIaW1b9MjwRCYZFrCmirYEKnPFJ8IXeXLXbdqJdCfyP X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:1709:b0:cab:e42c:876b with SMTP id by9-20020a056902170900b00cabe42c876bmr40935ybb.3.1689836922401; Thu, 20 Jul 2023 00:08:42 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:25 +0000 In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-9-yosryahmed@google.com> Subject: [RFC PATCH 8/8] selftests: cgroup: test_memcontrol: add a selftest for memcg recharging From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a memcg is removed, any mapped pages charged to it are recharged to the memcg of the process(es) mapping them. Any remaining pages are recharged using deferred recharge on the next time they are accessed or ditied. Add a selftest that exercises these paths for shmem and normal files: - A page is recharged on offlining if it is already mapped into the address space of a process in a different memcg. - A page is recharged after offlining when written to by a process in a different memcg (if the write results in dirtying the page). - A page is recharged after offlining when read by a process in a different memcg. - A page is recharged after offlining when mapped by a process in a different memcg. Signed-off-by: Yosry Ahmed --- tools/testing/selftests/cgroup/cgroup_util.c | 14 + tools/testing/selftests/cgroup/cgroup_util.h | 1 + .../selftests/cgroup/test_memcontrol.c | 310 ++++++++++++++++++ 3 files changed, 325 insertions(+) diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/s= elftests/cgroup/cgroup_util.c index e8bbbdb77e0d..e853b2a4db77 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.c +++ b/tools/testing/selftests/cgroup/cgroup_util.c @@ -519,6 +519,20 @@ int is_swap_enabled(void) return cnt > 1; } =20 + +int is_memcg_recharging_enabled(void) +{ + char buf[10]; + bool enabled; + + if (read_text("/proc/sys/vm/recharge_offline_memcgs", + buf, sizeof(buf)) <=3D 0) + return -1; + + enabled =3D strtol(buf, NULL, 10); + return enabled; +} + int set_oom_adj_score(int pid, int score) { char path[PATH_MAX]; diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/s= elftests/cgroup/cgroup_util.h index c92df4e5d395..10c0fa36bfd7 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.h +++ b/tools/testing/selftests/cgroup/cgroup_util.h @@ -49,6 +49,7 @@ extern int get_temp_fd(void); extern int alloc_pagecache(int fd, size_t size); extern int alloc_anon(const char *cgroup, void *arg); extern int is_swap_enabled(void); +extern int is_memcg_recharging_enabled(void); extern int set_oom_adj_score(int pid, int score); extern int cg_wait_for_proc_count(const char *cgroup, int count); extern int cg_killall(const char *cgroup); diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testi= ng/selftests/cgroup/test_memcontrol.c index c7c9572003a8..4e1ea93e0a54 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include =20 #include "../kselftest.h" #include "cgroup_util.h" @@ -1287,6 +1289,313 @@ static int test_memcg_oom_group_score_events(const = char *root) return ret; } =20 +/* Map 50M from the beginning of a file */ +static int map_fd_50M_noexit(const char *cgroup, void *arg) +{ + size_t size =3D MB(50); + int ppid =3D getppid(); + int fd =3D (long)arg; + char *memory; + + memory =3D mmap(NULL, size, PROT_READ, MAP_SHARED | MAP_POPULATE, fd, 0); + if (memory =3D=3D MAP_FAILED) { + fprintf(stderr, "error: mmap, errno %d\n", errno); + return -1; + } + + while (getppid() =3D=3D ppid) + sleep(1); + + munmap(memory, size); + return 0; +} + +/* + * Write 50M to the beginning of a file. + * The file is sync'ed first to make sure any dirty pages are laundered be= fore + * we dirty them again. + */ +static int write_fd_50M(const char *cgroup, void *arg) +{ + size_t size =3D MB(50); + int fd =3D (long)arg; + char buf[PAGE_SIZE]; + int i; + + fsync(fd); + lseek(fd, 0, SEEK_SET); + for (i =3D 0; i < size; i +=3D sizeof(buf)) + write(fd, buf, sizeof(buf)); + + return 0; +} + +/* See write_fd_50M() */ +static int write_fd_50M_noexit(const char *cgroup, void *arg) +{ + int ppid =3D getppid(); + + write_fd_50M(cgroup, arg); + + while (getppid() =3D=3D ppid) + sleep(1); + + return 0; +} + +/* Read 50M from the beginning of a file */ +static int read_fd_50M_noexit(const char *cgroup, void *arg) +{ + size_t size =3D MB(50); + int ppid =3D getppid(); + int fd =3D (long)arg; + char buf[PAGE_SIZE]; + int i; + + lseek(fd, 0, SEEK_SET); + for (i =3D 0; i < size; i +=3D sizeof(buf)) + read(fd, buf, sizeof(buf)); + + while (getppid() =3D=3D ppid) + sleep(1); + + return 0; +} + +#define TEST_RECHARGE_DIR "/test-recharge" + +static int __test_memcg_recharge(const char *root, char *stat_name) +{ + char *parent =3D NULL, *child1 =3D NULL, *child2 =3D NULL; + long stat, prev, pstat, current; + int ret =3D KSFT_FAIL; + char file_path[256]; + int i, pid; + struct { + int fd; + int (*before_fn)(const char *cgroup, void *arg); + int (*after_fn)(const char *cgroup, void *arg); + } test_files[] =3D { + /* test recharge for already mapped file */ + { + .before_fn =3D map_fd_50M_noexit, + }, + /* test recharge on new mapping after offline */ + { + .after_fn =3D map_fd_50M_noexit, + }, + /* test recharge on write after offline */ + { + .after_fn =3D write_fd_50M_noexit, + }, + /* test recharge on read after offline */ + { + .after_fn =3D read_fd_50M_noexit, + } + }; + + parent =3D cg_name(root, "parent"); + if (!parent) + goto cleanup; + + if (cg_create(parent)) + goto cleanup; + + if (cg_write(parent, "cgroup.subtree_control", "+memory")) + goto cleanup; + + child1 =3D cg_name(parent, "child1"); + if (!child1) + goto cleanup; + + if (cg_create(child1)) + goto cleanup; + + child2 =3D cg_name(parent, "child2"); + if (!child2) + goto cleanup; + + if (cg_create(child2)) + goto cleanup; + + for (i =3D 0; i < ARRAY_SIZE(test_files); i++) { + long target =3D MB(50) * (i+1); /* 50MB per file */ + int fd; + + snprintf(file_path, sizeof(file_path), "%s/file%d", + TEST_RECHARGE_DIR, i); + + fd =3D open(file_path, O_CREAT | O_RDWR); + if (fd < 0) + goto cleanup; + + test_files[i].fd =3D fd; + if (cg_run(child1, write_fd_50M, (void *)(long) fd)) + goto cleanup; + + stat =3D 0; + do { + sleep(1); + prev =3D stat; + stat =3D cg_read_key_long(child1, "memory.stat", + stat_name); + } while (stat < target && stat > prev); + + if (stat < target) { + fprintf(stderr, "error: child1 %s %ld < %ld", + stat_name, stat, target); + goto cleanup; + } + + current =3D cg_read_long(child1, "memory.current"); + if (current < target) { + fprintf(stderr, "error: child1 current %ld < %ld", + current, target); + goto cleanup; + } + + if (test_files[i].before_fn) { + pid =3D cg_run_nowait(child2, test_files[i].before_fn, + (void *)(long)fd); + if (pid < 0) + goto cleanup; + /* make sure before_fn() finishes executing before offlining */ + sleep(1); + } + } + + current =3D cg_read_long(child2, "memory.current"); + if (current > MB(1)) { + fprintf(stderr, "error: child2 current %ld > 1M\n", current); + goto cleanup; + } + + stat =3D cg_read_key_long(child2, "memory.stat", stat_name); + if (stat > 0) { + fprintf(stderr, "error: child2 %s %ld > 0\n", + stat_name, stat); + goto cleanup; + } + + if (cg_destroy(child1) < 0) + goto cleanup; + + for (i =3D 0; i < ARRAY_SIZE(test_files); i++) { + long target =3D MB(50) * (i+1); + int fd =3D test_files[i].fd; + + if (test_files[i].after_fn) { + pid =3D cg_run_nowait(child2, test_files[i].after_fn, + (void *)(long)fd); + if (pid < 0) + goto cleanup; + } + + stat =3D 0; + do { + sleep(1); + prev =3D stat; + stat =3D cg_read_key_long(child2, "memory.stat", + stat_name); + } while (stat < target && stat > prev); + + if (stat < target) { + fprintf(stderr, "error: child2 %s %ld < %ld\n", + stat_name, stat, target); + goto cleanup; + } + + current =3D cg_read_long(child2, "memory.current"); + if (current < target) { + fprintf(stderr, "error: child2 current %ld < %ld\n", + current, target); + goto cleanup; + } + } + + pstat =3D cg_read_key_long(parent, "memory.stat", stat_name); + if (stat < pstat) { + fprintf(stderr, "error: recharged %s (%ld) < total (%ld)\n", + stat_name, stat, pstat); + goto cleanup; + } + + ret =3D KSFT_PASS; +cleanup: + if (child2) { + cg_destroy(child2); + free(child2); + } + if (child1) { + cg_destroy(child1); + free(child1); + } + if (parent) { + cg_destroy(parent); + free(parent); + } + for (i =3D 0; i < ARRAY_SIZE(test_files); i++) { + close(test_files[i].fd); + snprintf(file_path, sizeof(file_path), "%s/file%d", + TEST_RECHARGE_DIR, i); + remove(file_path); + } + return ret; +} + +static int test_memcg_recharge(const char *root) +{ + int i, ret =3D KSFT_PASS; + struct { + char *mount_type, *stat_name; + } test_setups[] =3D { + /* test both shmem & normal files */ + { + .mount_type =3D "tmpfs", + .stat_name =3D "shmem", + }, + { + .stat_name =3D "file", + } + }; + + if (!is_memcg_recharging_enabled()) + return KSFT_SKIP; + + if (unshare(CLONE_NEWNS) < 0) + return KSFT_FAIL; + + if (mount(NULL, "/", "", MS_REC | MS_PRIVATE, NULL) < 0) + return KSFT_FAIL; + + for (i =3D 0; i < ARRAY_SIZE(test_setups); i++) { + int setup_ret =3D KSFT_FAIL; + char *mount_type =3D test_setups[i].mount_type; + char *stat_name =3D test_setups[i].stat_name; + + if (mkdir(TEST_RECHARGE_DIR, 0777) < 0) + goto next; + + if (mount_type && + mount(NULL, TEST_RECHARGE_DIR, mount_type, 0, NULL) < 0) + goto next; + + setup_ret =3D __test_memcg_recharge(root, stat_name); + +next: + if (mount_type) + umount(TEST_RECHARGE_DIR); + remove(TEST_RECHARGE_DIR); + + if (setup_ret =3D=3D KSFT_FAIL) { + ret =3D KSFT_FAIL; + break; + } + } + umount("/"); + return ret; +} + #define T(x) { x, #x } struct memcg_test { int (*fn)(const char *root); @@ -1306,6 +1615,7 @@ struct memcg_test { T(test_memcg_oom_group_leaf_events), T(test_memcg_oom_group_parent_events), T(test_memcg_oom_group_score_events), + T(test_memcg_recharge), }; #undef T =20 --=20 2.41.0.255.g8b1d071c50-goog