From nobody Tue Dec 2 02:32:09 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 402FB2F0670 for ; Wed, 19 Nov 2025 08:52:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763542343; cv=none; b=NglDc29+6f9GgGRzbJeDlidGanwF/KhQJwbxO6SYsYRmxr2Us15MY/HzrFHzz+xNPN6vuvVjoDObF6HRYVBSw6k5j79BE5jG54kKhWJ4Wy9IrU1kPJmjZUe/KmrZhRtmvJ1hzrv5c8CdCNmx/DKFxEVJmjkcpaqx/s1SUvBHfqk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763542343; c=relaxed/simple; bh=m/6T5piI4XISxHO00m+KS+br3ikyMSlIIc8CSe0R31k=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=Ge4vesZgQkfoAV+9fTPDM4CebsuludCQVgkDWer5TIC8x8e78D8V5/fHEmo1XLxuf4Qb4WOJXIRftG+GOaNcM1msacgx/+dVcbni9zqwyq9JtF1WB5GgjTRISNsoVLiyKH47AvQIbPlRKXSDiZw/CASnpHldK/fc4xQN5Z3J/5Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4dBFZw581hzYQvKr for ; Wed, 19 Nov 2025 16:51:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 9C12C1A2311 for ; Wed, 19 Nov 2025 16:52:16 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP2 (Coremail) with SMTP id Syh0CgDXf3Y5hR1pCbIYBQ--.49555S2; Wed, 19 Nov 2025 16:52:16 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, lujialin4@huawei.com, chenridong@huawei.com Subject: [RFC -next] memcg: Optimize creation performance when LRU_GEN is enabled Date: Wed, 19 Nov 2025 08:37:22 +0000 Message-Id: <20251119083722.1365680-1-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgDXf3Y5hR1pCbIYBQ--.49555S2 X-Coremail-Antispam: 1UD129KBjvJXoW3Xw4xKw4fXw4Dtr45Ar1UZFb_yoWxXFWxpF Z8G3sI9a95Jr43Kr43Jr4DC3ZIyw18XryYvry7Ga4akr13Gry8K3W8KF4jyFW5ZrZ5urs7 Wry5tw17GayUK3JanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUv2b4IE77IF4wAFF20E14v26ryj6rWUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0E n4kS14v26r4a6rW5MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I 0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8 ZVWrXwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcV CY1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAF wI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa 7IU0bAw3UUUUU== X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong With LRU_GEN=3Dy and LRU_GEN_ENABLED=3Dn, a performance regression occurs when creating a large number of memory cgroups (memcgs): # time mkdir testcg_{1..10000} real 0m7.167s user 0m0.037s sys 0m6.773s # time mkdir testcg_{1..20000} real 0m27.158s user 0m0.079s sys 0m26.270s In contrast, with LRU_GEN=3Dn, creation of the same number of memcgs performs better: # time mkdir testcg_{1..10000} real 0m3.386s user 0m0.044s sys 0m3.009s # time mkdir testcg_{1..20000} real 0m6.876s user 0m0.075s sys 0m6.121s The root cause is that lru_gen node onlining uses hlist_nulls_add_tail_rcu, which traverses the entire list to find the tail. This traversal scales with the number of memcgs, even when LRU_GEN is runtime-disabled. Fix this by adding a per-lru_gen tail pointer to track the list's tail. Appending new nodes now uses the tail pointer directly, eliminating full list traversal. After applying this patch, memcg creation performance with LRU_GEN=3Dy matches the fully disabled baseline: #time mkdir testcg_{1..10000} real 0m3.368s user 0m0.025s sys 0m3.012s # time mkdir testcg_{1..20000} real 0m6.742s user 0m0.085s sys 0m5.995s Signed-off-by: Chen Ridong --- include/linux/mmzone.h | 4 +++ mm/vmscan.c | 78 ++++++++++++++++++++++++++++++++++++++---- 2 files changed, 75 insertions(+), 7 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4398e027f450..bdee57b35126 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -513,6 +513,8 @@ struct lru_gen_folio { u8 gen; /* the list segment this lru_gen_folio belongs to */ u8 seg; + /* the bin index this lru_gen_folio is queued on */ + u8 bin; /* per-node lru_gen_folio list for global reclaim */ struct hlist_nulls_node list; }; @@ -610,6 +612,8 @@ struct lru_gen_memcg { unsigned long nr_memcgs[MEMCG_NR_GENS]; /* per-node lru_gen_folio list for global reclaim */ struct hlist_nulls_head fifo[MEMCG_NR_GENS][MEMCG_NR_BINS]; + /* cached tails to speed up enqueueing */ + struct hlist_nulls_node *tails[MEMCG_NR_GENS][MEMCG_NR_BINS]; /* protects the above */ spinlock_t lock; }; diff --git a/mm/vmscan.c b/mm/vmscan.c index 8890f4b58673..6c2665e48f19 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4299,6 +4299,66 @@ enum { MEMCG_LRU_YOUNG, }; =20 +static void memcg_lru_add_head_locked(struct pglist_data *pgdat, + struct lruvec *lruvec, int gen, int bin) +{ + struct lru_gen_memcg *memcg_lru =3D &pgdat->memcg_lru; + struct hlist_nulls_head *head =3D &memcg_lru->fifo[gen][bin]; + struct hlist_nulls_node *node =3D &lruvec->lrugen.list; + bool empty =3D !memcg_lru->tails[gen][bin]; + + hlist_nulls_add_head_rcu(node, head); + lruvec->lrugen.bin =3D bin; + + if (empty) + memcg_lru->tails[gen][bin] =3D node; +} + +static void memcg_lru_add_tail_locked(struct pglist_data *pgdat, + struct lruvec *lruvec, int gen, int bin) +{ + struct lru_gen_memcg *memcg_lru =3D &pgdat->memcg_lru; + struct hlist_nulls_head *head =3D &memcg_lru->fifo[gen][bin]; + struct hlist_nulls_node *node =3D &lruvec->lrugen.list; + struct hlist_nulls_node *tail =3D memcg_lru->tails[gen][bin]; + + if (tail) { + WRITE_ONCE(node->next, tail->next); + WRITE_ONCE(node->pprev, &tail->next); + rcu_assign_pointer(hlist_nulls_next_rcu(tail), node); + } else { + hlist_nulls_add_head_rcu(node, head); + } + + memcg_lru->tails[gen][bin] =3D node; + lruvec->lrugen.bin =3D bin; +} + +static void memcg_lru_del_locked(struct pglist_data *pgdat, struct lruvec = *lruvec, + bool reinit) +{ + int gen =3D lruvec->lrugen.gen; + int bin =3D lruvec->lrugen.bin; + struct lru_gen_memcg *memcg_lru =3D &pgdat->memcg_lru; + struct hlist_nulls_head *head =3D &memcg_lru->fifo[gen][bin]; + struct hlist_nulls_node *node =3D &lruvec->lrugen.list; + struct hlist_nulls_node *prev =3D NULL; + + if (hlist_nulls_unhashed(node)) + return; + + if (memcg_lru->tails[gen][bin] =3D=3D node) { + if (node->pprev !=3D &head->first) + prev =3D container_of(node->pprev, struct hlist_nulls_node, next); + memcg_lru->tails[gen][bin] =3D prev; + } + + if (reinit) + hlist_nulls_del_init_rcu(node); + else + hlist_nulls_del_rcu(node); +} + static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) { int seg; @@ -4326,15 +4386,15 @@ static void lru_gen_rotate_memcg(struct lruvec *lru= vec, int op) else VM_WARN_ON_ONCE(true); =20 + memcg_lru_del_locked(pgdat, lruvec, false); + WRITE_ONCE(lruvec->lrugen.seg, seg); WRITE_ONCE(lruvec->lrugen.gen, new); =20 - hlist_nulls_del_rcu(&lruvec->lrugen.list); - if (op =3D=3D MEMCG_LRU_HEAD || op =3D=3D MEMCG_LRU_OLD) - hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne= w][bin]); + memcg_lru_add_head_locked(pgdat, lruvec, new, bin); else - hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne= w][bin]); + memcg_lru_add_tail_locked(pgdat, lruvec, new, bin); =20 pgdat->memcg_lru.nr_memcgs[old]--; pgdat->memcg_lru.nr_memcgs[new]++; @@ -4365,7 +4425,7 @@ void lru_gen_online_memcg(struct mem_cgroup *memcg) =20 lruvec->lrugen.gen =3D gen; =20 - hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ge= n][bin]); + memcg_lru_add_tail_locked(pgdat, lruvec, gen, bin); pgdat->memcg_lru.nr_memcgs[gen]++; =20 spin_unlock_irq(&pgdat->memcg_lru.lock); @@ -4399,7 +4459,7 @@ void lru_gen_release_memcg(struct mem_cgroup *memcg) =20 gen =3D lruvec->lrugen.gen; =20 - hlist_nulls_del_init_rcu(&lruvec->lrugen.list); + memcg_lru_del_locked(pgdat, lruvec, true); pgdat->memcg_lru.nr_memcgs[gen]--; =20 if (!pgdat->memcg_lru.nr_memcgs[gen] && gen =3D=3D get_memcg_gen(pgdat->= memcg_lru.seq)) @@ -5664,8 +5724,10 @@ void lru_gen_init_pgdat(struct pglist_data *pgdat) spin_lock_init(&pgdat->memcg_lru.lock); =20 for (i =3D 0; i < MEMCG_NR_GENS; i++) { - for (j =3D 0; j < MEMCG_NR_BINS; j++) + for (j =3D 0; j < MEMCG_NR_BINS; j++) { INIT_HLIST_NULLS_HEAD(&pgdat->memcg_lru.fifo[i][j], i); + pgdat->memcg_lru.tails[i][j] =3D NULL; + } } } =20 @@ -5687,6 +5749,8 @@ void lru_gen_init_lruvec(struct lruvec *lruvec) =20 if (mm_state) mm_state->seq =3D MIN_NR_GENS; + + lrugen->bin =3D 0; } =20 #ifdef CONFIG_MEMCG --=20 2.34.1