From nobody Mon Jun 8 17:39:47 2026 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C1C43B2FF6 for ; Wed, 27 May 2026 20:48:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914893; cv=none; b=uqa0W4doyM9ebzI5XNLDsMHgIQ5ddBnVL3skHxFkCpKv6YcKKkI9zL+QKNNF2GqQ5GivfHekVdqZ4BiGG6oQG9TrvnZ/19WAd31JtqtMmjSp1R6t2Cc5Ky4hK5ZXTj5GsxOy1OdTDHmR5fV9lKXR7siMiVvFPoYqbVNCD6VP+NM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914893; c=relaxed/simple; bh=gquPeyIrgrXq7dLjLlFiTq8DRz00IYvKdULLyzd3nQI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NijtHHBegez9g08SfTh58qjzBwiZGj5JS1HEnLwYOc9zvsM41YxVIL+nSxn6fFpNdS9eGU8wAYK50A0MNX0/Qnms8ebB+rHmLwVNOjT46Q4rJvW8tPGgLkr7NCi8iND3alS3Jy9s1YdJt31FOsgeorPkhzF5zv1wtYVOVthew5M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=QB0CiFIL; arc=none smtp.client-ip=209.85.160.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="QB0CiFIL" Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-5102582e23eso95868211cf.1 for ; Wed, 27 May 2026 13:48:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1779914888; x=1780519688; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GBXzyPhWkzGmNH/N9A+/kfLtlL9nu+YFCZMUKmgaFzk=; b=QB0CiFILJDYJbNEV//W8WxUN7gH4RvSyq3pCkLHgldimXgbNAR7sqZYaSmrg9uIQEl oc4zjfpob2K2S3id7/3gaensZcIMiWDlGvZnlvkDy5L7RzkFd5ysUg85xop5oBdNJZyi IlzxyKOkqCVWneZs3Dt+5pR+qu8L7hfis6IFx9vCKiMlepAvHmOc1kvpCXJjv0YWa8O1 TH2jS0ll2jZUNUfCy9rAMktfA+EJDoqtZlSU+ldwZAjrIs/VaPexAuLTFICSbx9EPcBF wtmSWAiMP1OxMS+OnmPjkFIIIoRfjnnhhWkhIfBaQSY6IonAXkTIRVwFUBinik9pSvTw 9fgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779914888; x=1780519688; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=GBXzyPhWkzGmNH/N9A+/kfLtlL9nu+YFCZMUKmgaFzk=; b=LHim+uGbSJPqXerPwR12cKJzi1eMyncU7E0JPKWSsauj5+cTILXFcuykt+EKCNhuQI SgyEKcswJ8EBP9PZ2qs2rvJ5uUQDA7DGHFmUAFoPQ8r9o2QW1Rkj+IMbNeAY942sZ1bL L78PoxV6ovBynoZ/C1OB92dwQPCRpFLzxYfXDrufvImS0Q45d056RTfqWS/pnLpLLC1k 5+GEMNja3sTufBpaZuhpWwIDTq0lE4Ue0jNOnGuhttElx/uBRGS7wAgTtsjY+0oE8zP9 Dynh5vpGdkynWSzZsAp/+rIPnlHc2RxrtsgQlfgONj78D508Qke2VwJEF5bXL9wO2hQ7 FQLQ== X-Forwarded-Encrypted: i=1; AFNElJ/3jwd7p9OjZNBgaw4v1vCQfhL3MIo/owoh7gnzMIKdFfmpvphXy9MJPu6krI3nsCSRuDsBF2kQtQfNZEQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxNLSm8Q6bfbyvOsEkryuCMK7GGDjNUWk75X2hGGMZfScoN0W4G k1YeVwlAEHyLa0QX1vuqPklLiBrptSs3FTLXgaJffUZ/9s+V9IEYfo26PAJRE8X4EAo= X-Gm-Gg: Acq92OF2kZdcXOZmwdlJ/iKNHSp7AYivSmcLV37kPuo19BQdKu0A3s0sLbizlp439tr Ew66naDjcTIQitzzO/CtjJ20u/ndkX8AnmoXJKUR4oPVeQoIOVcaEmj/LSZ39ISzqV3GhmIgNdO bJxSTILRg31kKOvFHxpGkZVYGeLaYQDsu8n+kwPcSTy6ji2ldbhkp8Lnu/RUgqYmuTr2QcqyUGk XJiqxFdTehcm6mnWP4/+PcZ6VyCn4vNnOqVi2v1eopsKobZLFOFI7JjRHADyF+BkBX2iz++cr/G n25IpyMR7l+Gszde8IWLORd2nn1zd8r1kVZgSytI4D768OJU5qlFHflH2uPkDXLryAOoeI16TPJ 7d/MeMgb2ABN6V7JMTRu6b+TblYmqeYihXgzVzJ6ZdARtyDMJQvPY6E8ui7bLQmWgHt0p3koKIb +u2CvytJeZteMg/SzNQrpAInnZcoj5hLez X-Received: by 2002:a05:622a:244b:b0:50d:e471:2d1e with SMTP id d75a77b69052e-516d43cb2dbmr336768581cf.35.1779914887963; Wed, 27 May 2026 13:48:07 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-517069f2d2fsm59376831cf.3.2026.05.27.13.48.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 13:48:07 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Michal Hocko , Dave Chinner , Roman Gushchin , Muchun Song , Qi Zheng , Yosry Ahmed , Zi Yan , "Liam R . Howlett" , Usama Arif , Kiryl Shutsemau , Vlastimil Babka , Kairui Song , Mikhail Zaslonko , Vasily Gorbik , Baolin Wang , Barry Song , Dev Jain , Lance Yang , Nico Pache , Ryan Roberts , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 1/9] mm: list_lru: fix set_shrinker_bit() call during race with cgroup deletion Date: Wed, 27 May 2026 16:45:08 -0400 Message-ID: <20260527204757.2544958-2-hannes@cmpxchg.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260527204757.2544958-1-hannes@cmpxchg.org> References: <20260527204757.2544958-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When list_lru_add() races with cgroup deletion, the shrinker bit is set on the wrong group and lost. This can cause a shrinker run to miss the cgroup that actually has the object. When the passed in memcg is dead, the function finds the first non-dead parent from the passed in memcg and adds the object there; but the shrinker bit is set on the memcg that was passed in. This bug is as old as the shrinker bitmap itself. Fix it by returning the "effective" memcg from the locking function, and have the caller use that. Fixes: fae91d6d8be5 ("mm/list_lru.c: set bit in memcg shrinker bitmap on fi= rst list_lru item appearance") Reported-by: Usama Arif Reported-by: Sashiko Signed-off-by: Johannes Weiner Acked-by: Usama Arif Reported-by: Lance Yang Reviewed-by: Wei Yang --- mm/list_lru.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/mm/list_lru.c b/mm/list_lru.c index dd29bcf8eb5f..45d1b97737ea 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -77,14 +77,14 @@ static inline bool lock_list_lru(struct list_lru_one *l= , bool irq) } =20 static inline struct list_lru_one * -lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *m= emcg, - bool irq, bool skip_empty) +lock_list_lru_of_memcg(struct list_lru *lru, int nid, + struct mem_cgroup **memcg, bool irq, bool skip_empty) { struct list_lru_one *l; =20 rcu_read_lock(); again: - l =3D list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg)); + l =3D list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(*memcg)); if (likely(l) && lock_list_lru(l, irq)) { rcu_read_unlock(); return l; @@ -97,8 +97,8 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, str= uct mem_cgroup *memcg, rcu_read_unlock(); return NULL; } - VM_WARN_ON(!css_is_dying(&memcg->css)); - memcg =3D parent_mem_cgroup(memcg); + VM_WARN_ON(!css_is_dying(&(*memcg)->css)); + *memcg =3D parent_mem_cgroup(*memcg); goto again; } =20 @@ -135,8 +135,8 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, = int idx) } =20 static inline struct list_lru_one * -lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *m= emcg, - bool irq, bool skip_empty) +lock_list_lru_of_memcg(struct list_lru *lru, int nid, + struct mem_cgroup **memcg, bool irq, bool skip_empty) { struct list_lru_one *l =3D &lru->node[nid].lru; =20 @@ -164,12 +164,16 @@ bool list_lru_add(struct list_lru *lru, struct list_h= ead *item, int nid, struct list_lru_node *nlru =3D &lru->node[nid]; struct list_lru_one *l; =20 - l =3D lock_list_lru_of_memcg(lru, nid, memcg, false, false); + l =3D lock_list_lru_of_memcg(lru, nid, &memcg, false, false); if (!l) return false; if (list_empty(item)) { list_add_tail(item, &l->list); - /* Set shrinker bit if the first element was added */ + /* + * Set shrinker bit on the memcg that owns the locked + * sublist - lock_list_lru_of_memcg() may have walked up + * past a dying memcg, and the bit must be set there. + */ if (!l->nr_items++) set_shrinker_bit(memcg, nid, lru_shrinker_id(lru)); unlock_list_lru(l, false); @@ -204,7 +208,7 @@ bool list_lru_del(struct list_lru *lru, struct list_hea= d *item, int nid, { struct list_lru_node *nlru =3D &lru->node[nid]; struct list_lru_one *l; - l =3D lock_list_lru_of_memcg(lru, nid, memcg, false, false); + l =3D lock_list_lru_of_memcg(lru, nid, &memcg, false, false); if (!l) return false; if (!list_empty(item)) { @@ -288,7 +292,7 @@ __list_lru_walk_one(struct list_lru *lru, int nid, stru= ct mem_cgroup *memcg, unsigned long isolated =3D 0; =20 restart: - l =3D lock_list_lru_of_memcg(lru, nid, memcg, irq_off, true); + l =3D lock_list_lru_of_memcg(lru, nid, &memcg, irq_off, true); if (!l) return isolated; list_for_each_safe(item, n, &l->list) { --=20 2.54.0 From nobody Mon Jun 8 17:39:47 2026 Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AF473B777F for ; Wed, 27 May 2026 20:48:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914895; cv=none; b=s+V7uwAdQm4K0dVRZb4OGI3z0aNX4bVvxaZ9lzjeiMyYdkx/+4Ya/0AQpPnq2INbvCntgmk+ckY//zZHLvknDfFF47xjjbNI9uQiCde+PAAEd0AdMrN788DyIkv0r4Nrvk8fgU4pnM7j4J2FvVdIQot1YSupChYuRkmngsVXirU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914895; c=relaxed/simple; bh=xYOAXmbD5tgE83jUvwcepWiOiMWhzYpTMw9lq13ecuk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Fm+YVR9chk0DB2xuIMIQvCkeLjZTWbEscrPWWy+QrIJWQcu92oj7ZPlRD2hCdn6sZKUK03Op78MjM8VFerEVtANS1pFrvYO3KEe9UJpdNpeASirlUjJr3MReeSLGwC+wSSlOVkE1k+N1mNhgpJ/yFXDGpaymdFxBrzit5OxVaJE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=FTUDVFrU; arc=none smtp.client-ip=209.85.160.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="FTUDVFrU" Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-50e5bea4045so94121641cf.3 for ; Wed, 27 May 2026 13:48:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1779914889; x=1780519689; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TD5lidUUheRvEbRogBFjUkTmTjQ7nHIsLysSJlSHMzs=; b=FTUDVFrUX3A3O6aKdwycdgeHHiuAEC7hNluhlqdOI8Cj0NpsmgerS2K+2LQO2yfxZ4 i9vahFPakvJ9n2rwlEPcD8zdtG0SqVniCtfSV4fj8ri4DJc03lFvM/vAhNgVtwTaqNiU QMOr3+EWMsATYjOrwNU35ivBeWo/GStb03T5rmn9xWFOam6pZYYXr7Aj3CGop0ny2eB+ UHxEsZ3r+c0nNwcMatjCNrU473CdyXo/lhmgmk9ZcGTjFqOS4ugklF6ISfk+3tXmyVjB Gw1QAodXWcNJkq9bSYzQNS99ZT6NK6PdK1YRrvHoTXiHWwZdzD/zFRe4yow10o2xSNLE ep1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779914889; x=1780519689; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=TD5lidUUheRvEbRogBFjUkTmTjQ7nHIsLysSJlSHMzs=; b=CBJYD5ITRNTkjvOTEMHHL6mep2fUgq9KmFdICch247fpdTxL1Km7JuHQgnXQ+s3rKR i4fVh98XNjKoHzpneFc0O9C0GyES+Hne7ux9hwL1y+1ZaaaHvAI0v5hSJUpCcOJb/HCc XyghfI1tlrSRdOWwiKVdXm9VSr/yHRy266gtMWqQSzB8A1zMOH9mrmLxespY1WOgbQuQ /G84WD2RowVAuUiFTAdnbRbJUxC+T6LlaZqqQMU6AoYDRWcy6HNGrFVYRVyu63EUg/IS V2Wq7K8npDv64al1vN3qHKqHOV2kh/bC2J7ODBx3iAsgFU8mMTy+PeS2whM2DzW6eIEt Ktug== X-Forwarded-Encrypted: i=1; AFNElJ8rRO9fFgIV+IuRuqReXKwvEM7zcIUNthr9uKOx0N812kz6JT2ti4lytc35WELepgLUoSpPcxgcRncZ340=@vger.kernel.org X-Gm-Message-State: AOJu0YyJ839yebp1/PPEQNsnBIVpqU1qRhGB1W9KedhHRJaf4johwi4i GXkhPhcC/pAVtYrQ5JnWhiV57iZf7eRQfb5VmO5thVFCEYuTZd0e2kwqEzT1yVwUR5M= X-Gm-Gg: Acq92OF4a3G2MkzuSQ+5O77FmU6L45lzybRQ6vGnPwkSX+ckFsOXglL1Od6ZQBq8Hrv sTL3Ht8A6X0+I7VjoDwkTxM+f/jhdwDwkcd5guoJtu72AtBoA8mKgck82BwhNFDIZ76TqIwzqkL M89Faio6Rxym3ph0CcfjWtCHfdakpPv8BdB/K83+hfyUjy99OFkO5mboZ+0TPdjKocidRnySx78 9y20h2puFPegDEu8yXS71tH06UKucgcoiOHQBey9gzL2QI4SvtReMida5udseNjGj9o4nptae9t FOYEKgjjzklxRt7ev+dKuSjiP6Q/WUWnYC25YtyTNjQWwsxyRuhlBodsPFQz6wEoyb+Kr18OvL7 KZ0DLLxBO9/wMSnoJLSgfG2rCX0JR11fcMsw2OgYgPTt6qrcYpRTgY2wG/GYxbLtZ85XVE5HXEU ZqSwxbVnbJ8ITV/5JVJjF7rPNzpqgHnok1 X-Received: by 2002:a05:622a:2508:b0:516:ddc3:622e with SMTP id d75a77b69052e-516ddc36797mr330589551cf.33.1779914889279; Wed, 27 May 2026 13:48:09 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-51706adcb59sm52568411cf.17.2026.05.27.13.48.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 13:48:08 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Michal Hocko , Dave Chinner , Roman Gushchin , Muchun Song , Qi Zheng , Yosry Ahmed , Zi Yan , "Liam R . Howlett" , Usama Arif , Kiryl Shutsemau , Vlastimil Babka , Kairui Song , Mikhail Zaslonko , Vasily Gorbik , Baolin Wang , Barry Song , Dev Jain , Lance Yang , Nico Pache , Ryan Roberts , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 2/9] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Date: Wed, 27 May 2026 16:45:09 -0400 Message-ID: <20260527204757.2544958-3-hannes@cmpxchg.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260527204757.2544958-1-hannes@cmpxchg.org> References: <20260527204757.2544958-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" skip_empty is only for the shrinker to abort and skip a list that's empty or whose cgroup is being deleted. For list additions and deletions, the cgroup hierarchy is walked upwards until a valid list_lru head is found, or it will fall back to the node list. Acquiring the lock won't fail. Remove the NULL checks in those callers. Reviewed-by: David Hildenbrand (Arm) Acked-by: Shakeel Butt Reviewed-by: Lorenzo Stoakes (Oracle) Signed-off-by: Johannes Weiner Reviewed-by: Liam R. Howlett (Oracle) Reported-by: Lance Yang --- mm/list_lru.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/mm/list_lru.c b/mm/list_lru.c index 45d1b97737ea..77999ed78fa5 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -165,8 +165,6 @@ bool list_lru_add(struct list_lru *lru, struct list_hea= d *item, int nid, struct list_lru_one *l; =20 l =3D lock_list_lru_of_memcg(lru, nid, &memcg, false, false); - if (!l) - return false; if (list_empty(item)) { list_add_tail(item, &l->list); /* @@ -208,9 +206,8 @@ bool list_lru_del(struct list_lru *lru, struct list_hea= d *item, int nid, { struct list_lru_node *nlru =3D &lru->node[nid]; struct list_lru_one *l; + l =3D lock_list_lru_of_memcg(lru, nid, &memcg, false, false); - if (!l) - return false; if (!list_empty(item)) { list_del_init(item); l->nr_items--; --=20 2.54.0 From nobody Mon Jun 8 17:39:47 2026 Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5331213D53C for ; Wed, 27 May 2026 20:48:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914896; cv=none; b=e1p+VgT3lKT6Pby1J8PRSd2W36+B32179cKkL4PMKc1Omfr6TnamGmJKryrlwK5KSNPCemfKfxyrqTyEutTezMOa/o2AW8ZuwMIVLqPgMnPJrd0i3auj12/JCij8YAg4gT6gkusbGD2oTqOawYWjfTfgwiPPi5az4aMR2l0Iw9A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914896; c=relaxed/simple; bh=c1f+DX2H5xUC03KS0fGkNUDbCLWT6iyE15zMiC1PVlI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lVZiIqzKxHHKewSoelk87QaQEy4FFf4bXA4NGGLdY3PKTgXnee8dDOqiuoxtVWpEl3/4GTWU/QQNyyFQdNE9HEzsVMWDTBbaC+zvOAfrysOfYJ8Ur1W264D9hk/zqv5l4aWTeOo07tIn9uc3jOn8phPX5hLXojP28VDtqrZ71jk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=uyN+DClH; arc=none smtp.client-ip=209.85.160.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="uyN+DClH" Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-516d0119e4aso59961421cf.1 for ; Wed, 27 May 2026 13:48:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1779914891; x=1780519691; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sDvTkX11jVGNHNrahXzBPb3ZfYee1UhmGAXLDAQTujI=; b=uyN+DClHYRu0RexPP4ecU+sMVANxESB3BQmZgRdra2FUnddQMP7C4sgRCfOeXK2CsN tmQiAOhvaYQK8H4j6m/qpGuIjR5U2QspVNffgC81sS7sIMhTVh6mnQKjmS/76y7xDfeP 4b0zfr9hxduzyGg/CEWSjH1wvwE62MoML5jR4VbX+i+u8aaggvnPxMHTDsXoHdsovwW4 iCgGVL5tlrlninPLdzCe0WgL1RZKoOS4PJsmYHyo1GmN8eWhyG0kNSHQ2wPFiu6+pPTd bc4mxpzKldSDq2DwYXUhoEdRfVJvjoZw8R4DKaz1P0IDlpDdi/f4QdHtBEWye6eh46L6 qf/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779914891; x=1780519691; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=sDvTkX11jVGNHNrahXzBPb3ZfYee1UhmGAXLDAQTujI=; b=cqTdFkg8Ds4ibBG7/JAngOETSvh/gpT6dxC00BOkuy+mytTHb81kYBf8kRDWc4pFTI wdU32XNdsiF7dQMAEk+ZbP8jy/O0j8wE9TIJ8L6o0cHbHnAo9wFYYkIb5Dcl2hq9TflB 9eHJL4E2qx8MtIz6I/oG7cEn5mhzhHLqUAFDoUiG+/obBW141Modrx0VdZ6g04c8lz22 S1jIuFUV0N28iv7E6CG4OwnNSmdNy7bVchYge3thLuNQdagoAO9YxqISYJeq6/PP80UM LDuUNei8UxZVu2BwwMzSoHxPWre1NjL8Lt7CbKUMvgADpn/Lv8PSGbYLcfnyQT0/bgIP 7s8Q== X-Forwarded-Encrypted: i=1; AFNElJ8rO9nKtvVehm03yYlEdih+eDCtDIHPhyHImF6BHdeGaZt2yiEeRm9MqD0qB77eM0ryiRXRPd6IgSdtVNI=@vger.kernel.org X-Gm-Message-State: AOJu0Yx9nSi2qQf31rI2HCUZ4Zyso51QzG8Ct6qdHfpDbeGLZ/DW3wXk zqyPQuQ7aXhgMbKr7gQA+6Jo7WGWU+95QbC+jgHAryyMrCbcaUtP/is7rbm1UteMg9w= X-Gm-Gg: Acq92OFCmg+WWPBWRO9T+eMU0yrQrJjOSHlk7W/PQX8qeA6Mzo6p9qTGJmW0G5+ac1p iE8Gr2a9m4lZoZXOpm8b05NStfnekw0cwHAkEd14akeu8bWpF21Jwu6HD5qyDuAgb61nLQ91QKZ WLEo/Tp1t1mXUjQP3X8F9eTSDRZXlp3gCrcNB4YcFOn8c4TBF1VkYlYki5xXVNzUEOp87256gUw uRVMTYHiyt6iG2JynZLW5+2rj/8gFH7fKJIFCiWERCumhWEbOZB4gHxyI9yOycuemGyzZ/nYvvk qzItOdGjT2ggSd4P91bRpl1T7vjOGnbWXTmlCaDA1hqTmhyjTfdATgKIxzEdwLSkZ233+sgktgD Nb2knPsZ04UnAFWBR3ZmVtewBIg6uRypBbWt8C3pdDDcHNOSO6wpog27A3hU/qpLnXFMnk+ggDX BDCTUgpVlpz8d7FTbCe0/Iht9329jUDDp1 X-Received: by 2002:ac8:7f8f:0:b0:50f:a53b:9d3 with SMTP id d75a77b69052e-516d58a803emr268033681cf.27.1779914891532; Wed, 27 May 2026 13:48:11 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-51706b0bad4sm53558101cf.31.2026.05.27.13.48.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 13:48:10 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Michal Hocko , Dave Chinner , Roman Gushchin , Muchun Song , Qi Zheng , Yosry Ahmed , Zi Yan , "Liam R . Howlett" , Usama Arif , Kiryl Shutsemau , Vlastimil Babka , Kairui Song , Mikhail Zaslonko , Vasily Gorbik , Baolin Wang , Barry Song , Dev Jain , Lance Yang , Nico Pache , Ryan Roberts , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 3/9] mm: list_lru: deduplicate unlock_list_lru() Date: Wed, 27 May 2026 16:45:10 -0400 Message-ID: <20260527204757.2544958-4-hannes@cmpxchg.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260527204757.2544958-1-hannes@cmpxchg.org> References: <20260527204757.2544958-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The MEMCG and !MEMCG variants are the same. lock_list_lru() has the same pattern when bailing. Consolidate into a common implementation. Reviewed-by: David Hildenbrand (Arm) Acked-by: Shakeel Butt Reviewed-by: Lorenzo Stoakes (Oracle) Signed-off-by: Johannes Weiner Reviewed-by: Liam R. Howlett (Oracle) Reported-by: Lance Yang --- mm/list_lru.c | 29 +++++++++-------------------- 1 file changed, 9 insertions(+), 20 deletions(-) diff --git a/mm/list_lru.c b/mm/list_lru.c index 77999ed78fa5..5497034e80f3 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -15,6 +15,14 @@ #include "slab.h" #include "internal.h" =20 +static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off) +{ + if (irq_off) + spin_unlock_irq(&l->lock); + else + spin_unlock(&l->lock); +} + #ifdef CONFIG_MEMCG static LIST_HEAD(memcg_list_lrus); static DEFINE_MUTEX(list_lrus_mutex); @@ -67,10 +75,7 @@ static inline bool lock_list_lru(struct list_lru_one *l,= bool irq) else spin_lock(&l->lock); if (unlikely(READ_ONCE(l->nr_items) =3D=3D LONG_MIN)) { - if (irq) - spin_unlock_irq(&l->lock); - else - spin_unlock(&l->lock); + unlock_list_lru(l, irq); return false; } return true; @@ -101,14 +106,6 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, *memcg =3D parent_mem_cgroup(*memcg); goto again; } - -static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off) -{ - if (irq_off) - spin_unlock_irq(&l->lock); - else - spin_unlock(&l->lock); -} #else static void list_lru_register(struct list_lru *lru) { @@ -147,14 +144,6 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, =20 return l; } - -static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off) -{ - if (irq_off) - spin_unlock_irq(&l->lock); - else - spin_unlock(&l->lock); -} #endif /* CONFIG_MEMCG */ =20 /* The caller must ensure the memcg lifetime. */ --=20 2.54.0 From nobody Mon Jun 8 17:39:47 2026 Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com [209.85.222.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56A3D3B7753 for ; Wed, 27 May 2026 20:48:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914896; cv=none; b=ol9fZ1bzmr7LOmsKULn4vYNpfSf1Ys9ygtZeQjEFMqHmyNttJt9OBfz9uV+OTtfSyAARMxPXZCjw5fYbij9zZ6MvBgjGBqmovQoAbg1g1/LXH32ooAPZBT+O0yTuClSx2Xi0UFW2XscLSlNv3pXvQhaV4U+5iv1/aojlVSlTrig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914896; c=relaxed/simple; bh=twE9vUEfezOAkp4ZHRWC6Tg0E59cYU8YT+AlrQhkxVk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NAQDIg1C65hrX+7ftX5FDPBo31kqcSZAq/Xm1+L/FBQ6ZSr4FRhCM7eVsostrXD9HKe1ygkFRfhreQf8Ov2mIA8sVPr0wTIK7jMMYY4mW8W8VD0hyV2nxFagE4J7syw6SBQJVSVrixT17Ryl67anSyJO0XupoRHFIT7wGjbop80= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=UeJF5SHG; arc=none smtp.client-ip=209.85.222.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="UeJF5SHG" Received: by mail-qk1-f180.google.com with SMTP id af79cd13be357-914db83362aso405582485a.1 for ; Wed, 27 May 2026 13:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1779914893; x=1780519693; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5D0XhUH8wxuCdonGZS3m1jXTkJvh4TQTyemwB8RtUGE=; b=UeJF5SHGfGQkmPqjCD+y6FwDBqzvYgJH1Kz7iZIvV0wsgLLaiogeM8Zkjdv1sc3Xmi RnQjv87UulchFK4yR2ZVB4o8tyNhLvWI5QZ2OKgj/xIjFUNNDLKgJm0v8t1BydaiRIoV adLyc8TE7vnZrDHMQUJLg8nFXWfoN265DYvsiClIyPOU0eE+71obGMKujGbr+WVhI9D6 +wLbuQox5t1zdxz87kX7K4yKJqlgJAxux2+JuGr+8ldiSlgWbL37O1KLhh7Q5327TKqy 0Y6Cof7FSvAtEi/RcJUkqJp26GhkrZdvbqYGsYiHsQetj177PgB1wYQnnDwFE6mzGQD2 90yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779914893; x=1780519693; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=5D0XhUH8wxuCdonGZS3m1jXTkJvh4TQTyemwB8RtUGE=; b=MneDAz9I1eB0ifoOHOx8mU7BF7krs2ljZNPRc1QqLK60h2k3vceMqj57wJOADSjAEH ofTOIyo4ReQMh7MvklU9BdOkmcdDOQkM8kV5+eggtOBvTC7C4vB4WJcTm39Xi3FT1Bn0 VofEG72X1cwOtNCHzfzy+kuZd/xorCPTyo6LrxxXtmgS2N+yvFkYg/z9vhemZ1aQzG0c wU9xrjtGipxxv8yYFQ+zSGvsKmTNRVZRC+BsmgAHwyvU+n4O+Vts/whR3Z2Qs+SrdY/O VXtm3ro2gFOcHI+svvj3cOWiNsGlh5POQxzdOsC6jVHhsR1xF9V6CYgfsS3KWwRh1/1/ 6wEQ== X-Forwarded-Encrypted: i=1; AFNElJ+dpH5iDpL0bYTJzuNVnI7YMbNJDU7IXF5RXc+OVRLYQpl1GTiUF8gN1Wzdcf1hHnu/iWPN/zfDjAzbCb4=@vger.kernel.org X-Gm-Message-State: AOJu0YxoogRw/F9j84yCoMPQBbQWTByCuOi60iZZv/29GKs3TvT1hQ6N JdFn5J8843HSZR+rUe/EkZPB3/3MdwfIjmsgpBBCDfoADY49GCV8VYBORFOrGBF9hjU= X-Gm-Gg: Acq92OEADZ2uPVTjVtCynzskl9AJNPOHHfNDrXtKEmO8WhDhwAIk9Zxt3YqK+clRrqj njaAmW8eNRD7zFFvj87C3Yb/19xkeY/dMKphiGCGoClNSiAbh5Ep/ttYQTpjYm39JxcY/vcyt4z jHz4AywzB/UJjs2Ibq0wcjFUTjrdwVgi1VsawerJL1mMO6K53lCMHCRJwHhuY0k6/P5Y6gFueJW FQra/zrkTIxOCGqh6TsIroeWEGKYPxZKVcybwAI7SsngR+7JtsKVDVOpgwV1pzgDtTutS0GSn2X 3bwNwfiXqrQONTpI3UHQ+NTulk97b+49ADQXcE4BI9p2r8Ij+F5YkNhaAAIcsICdafIIhPSSajJ RM7SrNNb9KyU9kE3gONxZTSb3W5p/soFCgsYJQTcoQ/JOZnQ0t4+BO/Iy1PMDFErCWX/eSUD33m Y3EuuM8OAwXQuSGDNuJfK1Kfj/OTBpdc/j X-Received: by 2002:a05:620a:40c5:b0:90f:786c:4a82 with SMTP id af79cd13be357-914a240d00cmr3496069885a.39.1779914893344; Wed, 27 May 2026 13:48:13 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id af79cd13be357-914f881f214sm578376685a.45.2026.05.27.13.48.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 13:48:12 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Michal Hocko , Dave Chinner , Roman Gushchin , Muchun Song , Qi Zheng , Yosry Ahmed , Zi Yan , "Liam R . Howlett" , Usama Arif , Kiryl Shutsemau , Vlastimil Babka , Kairui Song , Mikhail Zaslonko , Vasily Gorbik , Baolin Wang , Barry Song , Dev Jain , Lance Yang , Nico Pache , Ryan Roberts , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 4/9] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Date: Wed, 27 May 2026 16:45:11 -0400 Message-ID: <20260527204757.2544958-5-hannes@cmpxchg.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260527204757.2544958-1-hannes@cmpxchg.org> References: <20260527204757.2544958-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Only the MEMCG variant of lock_list_lru() needs to check if there is a race with cgroup deletion and list reparenting. Move the check to the caller, so that the next patch can unify the lock_list_lru() variants. Reviewed-by: David Hildenbrand (Arm) Acked-by: Shakeel Butt Reviewed-by: Lorenzo Stoakes (Oracle) Signed-off-by: Johannes Weiner Reviewed-by: Liam R. Howlett (Oracle) Reported-by: Lance Yang --- mm/list_lru.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/mm/list_lru.c b/mm/list_lru.c index 5497034e80f3..7d0523e44010 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -68,17 +68,12 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, = int idx) return &lru->node[nid].lru; } =20 -static inline bool lock_list_lru(struct list_lru_one *l, bool irq) +static inline void lock_list_lru(struct list_lru_one *l, bool irq) { if (irq) spin_lock_irq(&l->lock); else spin_lock(&l->lock); - if (unlikely(READ_ONCE(l->nr_items) =3D=3D LONG_MIN)) { - unlock_list_lru(l, irq); - return false; - } - return true; } =20 static inline struct list_lru_one * @@ -90,9 +85,13 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, rcu_read_lock(); again: l =3D list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(*memcg)); - if (likely(l) && lock_list_lru(l, irq)) { - rcu_read_unlock(); - return l; + if (likely(l)) { + lock_list_lru(l, irq); + if (likely(READ_ONCE(l->nr_items) !=3D LONG_MIN)) { + rcu_read_unlock(); + return l; + } + unlock_list_lru(l, irq); } /* * Caller may simply bail out if raced with reparenting or --=20 2.54.0 From nobody Mon Jun 8 17:39:47 2026 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 334073BE15F for ; Wed, 27 May 2026 20:48:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914905; cv=none; b=JU3ma/n9AhDoEtxZJ2LhVlzYx/Q/36lE9bMAzSK1VUOO6XsSl0x/zU+wlrJJdwKIlaPmlUGmN8Sd5OBUdJFMVCZYBTX6nUCL2kJnecsBjevKUiNo8qxG5WtZj2f/E2ktBq0rVXjc+lAnGrB0pKTY2pyGJmiGo0rtzOrQ053BjZQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914905; c=relaxed/simple; bh=YRW4qnSpMUAMG+rquHRhuw/mFIuPIMQywrOsR9YkwJc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T6IMeyB+W5qxYlaElFs1wppnGTUQS23NCvwCunxkcXyNsIZ/GHkaCJ3IBoA7uZdnx6NcGheH3D93gg4oJmGxYJ+X+jExa1BvUMcIyvKm3Ap6SdcIyFFfSSS9BHZekbCjOBbtblAj1PO3jb98TexgYfQ/u5rdIFoCvi1mU+sC0Yw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=LRcSNc2z; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="LRcSNc2z" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-51306c36c3eso132087871cf.0 for ; Wed, 27 May 2026 13:48:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1779914895; x=1780519695; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dlbqhKiDqWS+kHnDvkCX36mrjE33rRvlYf46I/uZLtE=; b=LRcSNc2zN/RbyK+BTAF7DboHL7NVV+Gg8fzmn8QP/ENFZawV4Ccm8bNEeuW3VzIxj2 4JPn7ohKScy4ce9bO5mveyRaf8Gg0NGiaZTjDw6rNc6g87Dh6uIunaqxKgQN6WB5jWV2 LORJhRstTTPtBBL16zLkjwPn3jyijOJ824nHnXNkQ3+6iqCxRUSfZe1Hod/vmjF8soQY sKz5Tv9O1uTchYJxrlpKmXrI8BH214XbSRREzbDdp710wmEF+8dU5/K2O9knbynzvdeL gr4R9FDPLVEK4it7WCFOa3tHjQ4CoBS4DJ17/DG+b65ur+2n/8LOrcs55k6fkWMDCIR/ 48CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779914895; x=1780519695; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=dlbqhKiDqWS+kHnDvkCX36mrjE33rRvlYf46I/uZLtE=; b=Vn94m5Qeejak1MLiCx5fYCB8Ca4GSUcM4xJKH9lVfsIbpo6m7Dos3b4L/9uBQ731y0 3IZlyr3lGh3TQYkw7/W0kt3U1m4gKqkG6xOa11PGzNJc0ct4zueG2VE5dURSDtOuE2q4 SUMpnU5xD6JVpu7G7deKO3QRhJMFKJ02D7ZzNxvYXthXCncCvgv6ovkCrQbkO0KGsu84 aCTfkusBE0nvWiVzFTs91KpjhrENmncvXneymvXd3iIExijpEwYk6golYGSZ2hpuLjq4 s4MaFP3CIfRi9Lck8rS1RkVsD9ghlBfiQFKh+Gk2IWh7tGfsX26TFdxW4hY/RGZde+Mx fQjw== X-Forwarded-Encrypted: i=1; AFNElJ/oen8+oPEkVMEOLtkynoE6MyJmUk8E5++0QBGeHQMRv/xKJ8FrQriUmHav7C5LV/ZlHoa4G4cL4MCj17g=@vger.kernel.org X-Gm-Message-State: AOJu0YyzGTq7jJfJtHvKJfSYY7Y4fH9udkOWxGFTbqU2Mk1dfsVGOGXi NQCHeUsj9uB3kQEbQ5+7Qyg5Y+3+3dFrU6mSwOpKPL0t2B7W1ixyD3QORlgwm2RPLCw= X-Gm-Gg: Acq92OHyTPO8A2EEHp3nZDDhBn1qZ1BrBfkcPorIvZVcRILIqE7aJ/y1nqJxhEjnJkY oNErNtmwzrwSQ9PrFFTw+xjd+QPJQpI4CKtT/JsYkjwxu+T52dRrGaVzOJmnECucKQOMyyji6Xz JxpKUrA9bZuFXr1nTvL6qB1N504PcOz+ZeXDfaxWjN4Y0KmOCc+whyj5c8SXOM1v2v6E6xjJukz Q7th2zhcd8zVsxK81I/uNbbBIvJdCUAScvpi7zogEeBpVHbV9fQOuDzdrTlxFjanac6mVd2cp79 VPg2LlI4DI5j7CzXKtJHPV/mRIdQX9CxkR6n9b0o7fmkFE8Mg9fomj4FMONR5XB1AbU7bq7A/G5 72cvf+uAWUXNaviHPIHyCDnnToJ94PJdIdslEpHZnguGUhmwV/3MU8USawJ6qJQD+EQV+0yHYMU ooKg1cumZI2XEXEY4TjvyvVghv0pV3IMR2 X-Received: by 2002:a05:622a:134d:b0:50f:c5d3:a191 with SMTP id d75a77b69052e-516d429f541mr332100001cf.13.1779914895147; Wed, 27 May 2026 13:48:15 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-51706af4fd2sm52466431cf.25.2026.05.27.13.48.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 13:48:14 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Michal Hocko , Dave Chinner , Roman Gushchin , Muchun Song , Qi Zheng , Yosry Ahmed , Zi Yan , "Liam R . Howlett" , Usama Arif , Kiryl Shutsemau , Vlastimil Babka , Kairui Song , Mikhail Zaslonko , Vasily Gorbik , Baolin Wang , Barry Song , Dev Jain , Lance Yang , Nico Pache , Ryan Roberts , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 5/9] mm: list_lru: deduplicate lock_list_lru() Date: Wed, 27 May 2026 16:45:12 -0400 Message-ID: <20260527204757.2544958-6-hannes@cmpxchg.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260527204757.2544958-1-hannes@cmpxchg.org> References: <20260527204757.2544958-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The MEMCG and !MEMCG paths have the same pattern. Share the code. Reviewed-by: David Hildenbrand (Arm) Acked-by: Shakeel Butt Reviewed-by: Lorenzo Stoakes (Oracle) Signed-off-by: Johannes Weiner Reviewed-by: Liam R. Howlett (Oracle) Reported-by: Lance Yang --- mm/list_lru.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/mm/list_lru.c b/mm/list_lru.c index 7d0523e44010..fdb3fe2ea64f 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -15,6 +15,14 @@ #include "slab.h" #include "internal.h" =20 +static inline void lock_list_lru(struct list_lru_one *l, bool irq) +{ + if (irq) + spin_lock_irq(&l->lock); + else + spin_lock(&l->lock); +} + static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off) { if (irq_off) @@ -68,14 +76,6 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, i= nt idx) return &lru->node[nid].lru; } =20 -static inline void lock_list_lru(struct list_lru_one *l, bool irq) -{ - if (irq) - spin_lock_irq(&l->lock); - else - spin_lock(&l->lock); -} - static inline struct list_lru_one * lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup **memcg, bool irq, bool skip_empty) @@ -136,10 +136,7 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, { struct list_lru_one *l =3D &lru->node[nid].lru; =20 - if (irq) - spin_lock_irq(&l->lock); - else - spin_lock(&l->lock); + lock_list_lru(l, irq); =20 return l; } --=20 2.54.0 From nobody Mon Jun 8 17:39:47 2026 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2F9F3C5853 for ; Wed, 27 May 2026 20:48:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914907; cv=none; b=XZr+Lv4WXKtAe6DJBQOtPyNVaBIe9plpOqCYrGWHzpaUPWhTr8xWfzv02U9yyCggHdpIiSmzh1ne+Cyau+65pugIvd1ujOfWuRs8muTqGa9/sT39bJAAv9YwvpHoqIbiOjwFR6llyoX/8AFHE0Ruk/RGifffkJwZMB7iukJ64Ok= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914907; c=relaxed/simple; bh=d7WDZsdV2/aQKgp+6i2HTqrXT82J3vqCl3Frd8mspGo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kDOA3Is9JbOpR1OD+hPD7yRDKLvynGXPApj7giPRCR7gHt2tDQHFNFpFDHwqkx62kMeF1ChTB4ZFUs2DQqAHAer5ZLJC0mKQaWKuKL9CV9Zb26+VS4iGOpTTBlr6Ybt/Bd7w/4285dlNL4xWu40ujbF9fLs7DYe1fVDK9uP4d7M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=kDYMnlvx; arc=none smtp.client-ip=209.85.160.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="kDYMnlvx" Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-50e5dbd8e0eso139970391cf.1 for ; Wed, 27 May 2026 13:48:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1779914897; x=1780519697; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3AOQtP7Xnnp/Xq7xApSFU4g8ZA9bsT8U0JJPgTL+uXI=; b=kDYMnlvxmW29C1xP3Cp/45n2MazTQCmeV6qu88TowIQzsyH6SA5QfHIeo98yyA0sAQ tEgVlVQAbZHXjI6cDShBxK6ZKE8vyyKNSQVHXEung51Hr9UAcdu6ZZWsMqxYpVo8wAhs CgcQK1KwRLVhFYU+fRbavn/Vra5HsWvDHIvBw0VY1m9iY+CYEsmkmI2ZMZTCZCwAtUWE 6bP/Id4xkX8ln//AR0uLdqE4rgIB3/cQehxX1QSqDLQalzp8o4ufKdKfFsEjI9i3eZdX KrdRADTqEvzn9YSgfy+N830whmjKrHrPuknx4ldGTS9RK+A45GTblD/i/BWalgmDRrHj uDJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779914897; x=1780519697; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=3AOQtP7Xnnp/Xq7xApSFU4g8ZA9bsT8U0JJPgTL+uXI=; b=gCP0SYaBHlf6pUvamRvTAYyesgkPBQQ4nZodCJqz62cMjSxKeT2pRphhuu/BkhH4mq kr9xYd+QGluvMi/FkF3h/mHBuO8FyDXeA1cERBBgiBGTRcH44IBXad1NUCDTNvZc/mQX vjXynKjVZuee0UwJO5hUZbbY/6JUBam+CJYfixC2aQRinUdKaY/KEDj+YnqPs+faZFev IZTE06SljP/OFOS7sXGrxcEyBKLV/1Tkq+51IG2bXz1h2D3wnKkvxkwcWZGyRBWYRMtJ 9OXT4ZcS/RG1D+MLyIVwaik4MH683oNWw7zjr06rQWjS/UGSB3hgQpQdySXsV3OQeusA DImQ== X-Forwarded-Encrypted: i=1; AFNElJ8K8dQeDEpPu+Qz+rtum5jY1zOkLGYIAJCzlhBSvhN1rBQxYdUq4D7wnNrLCEuec1fOdZY2dC6E0xwd3iQ=@vger.kernel.org X-Gm-Message-State: AOJu0YyRPi3Fig28N2kGZrICtN+WGFlBHXqxvYVr2IcuJzJ9avIs6CFF 7UnxRjhbHVzZrFG6VL31/p1ktyd7fvUmF9nFLpZ2bZo+wW8A102RuaS6COOiAGWTSIQ= X-Gm-Gg: Acq92OFbxDfz1ot3qiTA5cnDTSdLGgJCkVvUxAJxKEr8HO+DfLaB4i4zVLHkKE3fC/J cpAjsywzG6sCVUKNDmsyWFODZ84/nNFmTX+8kymqa1UAiBm3n4BELSpujhs2Ac3oyRM+0P0CA2V UVLbeoTgh01qeS02GwOjhwhTxzfrl7tWv5tiOL0JAZm9WJ4BVvtD4ntdvmDzcm0QwMKw8wNR6Ga 7i/dXvU7CKD/0YuDDigor2zGTZz9bOCk76W9jAFev0vJo5NQhSs6d63QXc8gJvJvYag//hlGHT2 900onQSUCVCXX5hbLMY66Jx1Ekv4qPSSgc6VFn+kqomoTlP1+Je5f+mWzsmGzkQmIWH7lTn9SE1 qEIbYqrthBMJ7kIU2/u1tkYFerGMlc87nYYipiUNEp7bzScw7vEV0asDvSBvg9pyy61YmtKD4nT 9crAv21nFVk2A08qdfl+aN2OAO9OTwrceB1xHScqe7l3s= X-Received: by 2002:a05:622a:2d5:b0:516:4fc0:27ac with SMTP id d75a77b69052e-516d43e4561mr353992391cf.50.1779914896842; Wed, 27 May 2026 13:48:16 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-51706a28ddcsm58877121cf.13.2026.05.27.13.48.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 13:48:15 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Michal Hocko , Dave Chinner , Roman Gushchin , Muchun Song , Qi Zheng , Yosry Ahmed , Zi Yan , "Liam R . Howlett" , Usama Arif , Kiryl Shutsemau , Vlastimil Babka , Kairui Song , Mikhail Zaslonko , Vasily Gorbik , Baolin Wang , Barry Song , Dev Jain , Lance Yang , Nico Pache , Ryan Roberts , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 6/9] mm: list_lru: introduce caller locking for additions and deletions Date: Wed, 27 May 2026 16:45:13 -0400 Message-ID: <20260527204757.2544958-7-hannes@cmpxchg.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260527204757.2544958-1-hannes@cmpxchg.org> References: <20260527204757.2544958-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Locking is currently internal to the list_lru API. However, a caller might want to keep auxiliary state synchronized with the LRU state. For example, the THP shrinker uses the lock of its custom LRU to keep PG_partially_mapped and vmstats consistent. To allow the THP shrinker to switch to list_lru, provide normal and irqsafe locking primitives as well as caller-locked variants of the addition and deletion functions. Reviewed-by: David Hildenbrand (Arm) Acked-by: Shakeel Butt Reviewed-by: Lorenzo Stoakes (Oracle) Signed-off-by: Johannes Weiner Reviewed-by: Liam R. Howlett (Oracle) Reported-by: Lance Yang --- include/linux/list_lru.h | 43 +++++++++++++ mm/list_lru.c | 133 ++++++++++++++++++++++++++++++--------- 2 files changed, 145 insertions(+), 31 deletions(-) diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index fe739d35a864..134cb3e5652a 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -83,6 +83,46 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, struc= t list_lru *lru, gfp_t gfp); void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup = *parent); =20 +/** + * list_lru_lock: lock the sublist for the given node and memcg + * @lru: the lru pointer + * @nid: the node id of the sublist to lock. + * @memcg: pointer to the cgroup of the sublist to lock. On return, + * updated to the cgroup whose sublist was actually locked, + * which may be an ancestor if the original memcg was dying. + * + * Returns the locked list_lru_one sublist. The caller must call + * list_lru_unlock() when done. + * + * You must ensure that the memcg is not freed during this call (e.g., with + * rcu or by taking a css refcnt). + * + * Return: the locked list_lru_one, or NULL on failure + */ +struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid, + struct mem_cgroup **memcg); + +/** + * list_lru_unlock: unlock a sublist locked by list_lru_lock() + * @l: the list_lru_one to unlock + */ +void list_lru_unlock(struct list_lru_one *l); + +struct list_lru_one *list_lru_lock_irq(struct list_lru *lru, int nid, + struct mem_cgroup **memcg); +void list_lru_unlock_irq(struct list_lru_one *l); + +struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid, + struct mem_cgroup **memcg, unsigned long *irq_flags); +void list_lru_unlock_irqrestore(struct list_lru_one *l, + unsigned long *irq_flags); + +/* Caller-locked variants, see list_lru_add() etc for documentation */ +bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l, + struct list_head *item, int nid, struct mem_cgroup *memcg); +bool __list_lru_del(struct list_lru *lru, struct list_lru_one *l, + struct list_head *item, int nid); + /** * list_lru_add: add an element to the lru list's tail * @lru: the lru pointer @@ -115,6 +155,9 @@ void memcg_reparent_list_lrus(struct mem_cgroup *memcg,= struct mem_cgroup *paren bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid, struct mem_cgroup *memcg); =20 +bool list_lru_add_irq(struct list_lru *lru, struct list_head *item, int ni= d, + struct mem_cgroup *memcg); + /** * list_lru_add_obj: add an element to the lru list's tail * @lru: the lru pointer diff --git a/mm/list_lru.c b/mm/list_lru.c index fdb3fe2ea64f..402bb028114d 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -15,17 +15,23 @@ #include "slab.h" #include "internal.h" =20 -static inline void lock_list_lru(struct list_lru_one *l, bool irq) +static inline void lock_list_lru(struct list_lru_one *l, bool irq, + unsigned long *irq_flags) { - if (irq) + if (irq_flags) + spin_lock_irqsave(&l->lock, *irq_flags); + else if (irq) spin_lock_irq(&l->lock); else spin_lock(&l->lock); } =20 -static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off) +static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off, + unsigned long *irq_flags) { - if (irq_off) + if (irq_flags) + spin_unlock_irqrestore(&l->lock, *irq_flags); + else if (irq_off) spin_unlock_irq(&l->lock); else spin_unlock(&l->lock); @@ -78,7 +84,8 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, in= t idx) =20 static inline struct list_lru_one * lock_list_lru_of_memcg(struct list_lru *lru, int nid, - struct mem_cgroup **memcg, bool irq, bool skip_empty) + struct mem_cgroup **memcg, bool irq, + unsigned long *irq_flags, bool skip_empty) { struct list_lru_one *l; =20 @@ -86,12 +93,12 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, again: l =3D list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(*memcg)); if (likely(l)) { - lock_list_lru(l, irq); + lock_list_lru(l, irq, irq_flags); if (likely(READ_ONCE(l->nr_items) !=3D LONG_MIN)) { rcu_read_unlock(); return l; } - unlock_list_lru(l, irq); + unlock_list_lru(l, irq, irq_flags); } /* * Caller may simply bail out if raced with reparenting or @@ -132,24 +139,58 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid= , int idx) =20 static inline struct list_lru_one * lock_list_lru_of_memcg(struct list_lru *lru, int nid, - struct mem_cgroup **memcg, bool irq, bool skip_empty) + struct mem_cgroup **memcg, bool irq, + unsigned long *irq_flags, bool skip_empty) { struct list_lru_one *l =3D &lru->node[nid].lru; =20 - lock_list_lru(l, irq); + lock_list_lru(l, irq, irq_flags); =20 return l; } #endif /* CONFIG_MEMCG */ =20 -/* The caller must ensure the memcg lifetime. */ -bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid, - struct mem_cgroup *memcg) +struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid, + struct mem_cgroup **memcg) { - struct list_lru_node *nlru =3D &lru->node[nid]; - struct list_lru_one *l; + return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=3D*/false, + /*irq_flags=3D*/NULL, /*skip_empty=3D*/false); +} + +void list_lru_unlock(struct list_lru_one *l) +{ + unlock_list_lru(l, /*irq_off=3D*/false, /*irq_flags=3D*/NULL); +} + +struct list_lru_one *list_lru_lock_irq(struct list_lru *lru, int nid, + struct mem_cgroup **memcg) +{ + return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=3D*/true, + /*irq_flags=3D*/NULL, /*skip_empty=3D*/false); +} + +void list_lru_unlock_irq(struct list_lru_one *l) +{ + unlock_list_lru(l, /*irq_off=3D*/true, /*irq_flags=3D*/NULL); +} =20 - l =3D lock_list_lru_of_memcg(lru, nid, &memcg, false, false); +struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid, + struct mem_cgroup **memcg, + unsigned long *flags) +{ + return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=3D*/true, + /*irq_flags=3D*/flags, /*skip_empty=3D*/false); +} + +void list_lru_unlock_irqrestore(struct list_lru_one *l, unsigned long *fla= gs) +{ + unlock_list_lru(l, /*irq_off=3D*/true, /*irq_flags=3D*/flags); +} + +bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l, + struct list_head *item, int nid, + struct mem_cgroup *memcg) +{ if (list_empty(item)) { list_add_tail(item, &l->list); /* @@ -159,15 +200,50 @@ bool list_lru_add(struct list_lru *lru, struct list_h= ead *item, int nid, */ if (!l->nr_items++) set_shrinker_bit(memcg, nid, lru_shrinker_id(lru)); - unlock_list_lru(l, false); - atomic_long_inc(&nlru->nr_items); + atomic_long_inc(&lru->node[nid].nr_items); return true; } - unlock_list_lru(l, false); return false; } EXPORT_SYMBOL_GPL(list_lru_add); =20 +bool __list_lru_del(struct list_lru *lru, struct list_lru_one *l, + struct list_head *item, int nid) +{ + if (!list_empty(item)) { + list_del_init(item); + l->nr_items--; + atomic_long_dec(&lru->node[nid].nr_items); + return true; + } + return false; +} + +/* The caller must ensure the memcg lifetime. */ +bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid, + struct mem_cgroup *memcg) +{ + struct list_lru_one *l; + bool ret; + + l =3D list_lru_lock(lru, nid, &memcg); + ret =3D __list_lru_add(lru, l, item, nid, memcg); + list_lru_unlock(l); + return ret; +} + +bool list_lru_add_irq(struct list_lru *lru, struct list_head *item, + int nid, struct mem_cgroup *memcg) +{ + struct list_lru_one *l; + bool ret; + + l =3D list_lru_lock_irq(lru, nid, &memcg); + ret =3D __list_lru_add(lru, l, item, nid, memcg); + list_lru_unlock_irq(l); + return ret; +} + bool list_lru_add_obj(struct list_lru *lru, struct list_head *item) { bool ret; @@ -189,19 +265,13 @@ EXPORT_SYMBOL_GPL(list_lru_add_obj); bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid, struct mem_cgroup *memcg) { - struct list_lru_node *nlru =3D &lru->node[nid]; struct list_lru_one *l; + bool ret; =20 - l =3D lock_list_lru_of_memcg(lru, nid, &memcg, false, false); - if (!list_empty(item)) { - list_del_init(item); - l->nr_items--; - unlock_list_lru(l, false); - atomic_long_dec(&nlru->nr_items); - return true; - } - unlock_list_lru(l, false); - return false; + l =3D list_lru_lock(lru, nid, &memcg); + ret =3D __list_lru_del(lru, l, item, nid); + list_lru_unlock(l); + return ret; } =20 bool list_lru_del_obj(struct list_lru *lru, struct list_head *item) @@ -274,7 +344,8 @@ __list_lru_walk_one(struct list_lru *lru, int nid, stru= ct mem_cgroup *memcg, unsigned long isolated =3D 0; =20 restart: - l =3D lock_list_lru_of_memcg(lru, nid, &memcg, irq_off, true); + l =3D lock_list_lru_of_memcg(lru, nid, &memcg, /*irq=3D*/irq_off, + /*irq_flags=3D*/NULL, /*skip_empty=3D*/true); if (!l) return isolated; list_for_each_safe(item, n, &l->list) { @@ -315,7 +386,7 @@ __list_lru_walk_one(struct list_lru *lru, int nid, stru= ct mem_cgroup *memcg, BUG(); } } - unlock_list_lru(l, irq_off); + unlock_list_lru(l, irq_off, NULL); out: return isolated; } --=20 2.54.0 From nobody Mon Jun 8 17:39:47 2026 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C735C3B8949 for ; Wed, 27 May 2026 20:48:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914907; cv=none; b=N9eBweKBd6ntZsacrFXYnRWTo1my6g23615u+pbChxSB1AnB1qBVRor41I7MmC+JQy6VFa3PdXnGC8XaHXnRYtrw65uo5OT8UOoqTKgWE2CwscQJg5C07eiB5cLVbwS0ELELvhvDWQGl87c+3aPUIXYbriyjWisBDleY6Nd5zbk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914907; c=relaxed/simple; bh=N2Ysi3RJi1LDN5IxBqkDR80jjekbuSRg9CVC7t69+nw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tgOyvgiVvKi0TAS3emXA50Pjp6UmhGLpKmIU4XJIlCS8wACPXbTvCfwVxh6ZdBVhcaJCRKMuUFlvqWqJg64pjuDJcmjJJrwmoSP98d8td8gjwgjJaBHDz2xrFMmigHcpC2lFlfymTQnH86RDg33MKQVCz0h1FAHXgeM7YeU1ZYc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=NVRoLY7g; arc=none smtp.client-ip=209.85.222.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="NVRoLY7g" Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-913cc4d7c71so1324086085a.2 for ; Wed, 27 May 2026 13:48:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1779914899; x=1780519699; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=36NJZcIxb79ojrPGdd3UkzDEUVDRZxNst0ZLZe3TsZo=; b=NVRoLY7gfcJNCveCkv+9G3Ch6GovJZZD65SGTeTn8jxRuD/LawEekJ+YSmmSyLQxg5 40lZXflbHqx/dFbuyhp0H225GyVIVBh7rWPlWnWNUuVKF+ufOhvQNf4kUoQ3PP/WKkoC hST01QBrH7VgV6y/0ajHM5U5EfrzG/F5eP/GC95zgmnyvEo+t6ZB3QLYBU4bqGOxPaaG MyA2iSRa0cBz7yu5Sa0d9t9lhAi3b3QokVRNadjg1HiPGUxMFoM3B1wqq88zR/9BzYkV X+pAkC9IdSEqJpjetpKqUWHZNFcXPNrBhHUd7ikQTttHcqHnlC7C6t6AfRQW4JTBGmYw a8/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779914899; x=1780519699; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=36NJZcIxb79ojrPGdd3UkzDEUVDRZxNst0ZLZe3TsZo=; b=YEHMTHgYXr7x1ckUl7Ofyq2Ocw1um/Z+xaOwA0+jRrx/JJF8wnO02o6ieMhyg54NOh ouxvCw/2Q3Xj58803eh/Z2vzv/R2ULJiZH65voWF7yLCRIJEMY65RjeRDujGsKB4l316 A7v35llVU6xd3Fo19YgXdDZdXvphhVLHfRHgL0pGBitDSwLBWGMfzNpnweT0lDzgZRrv Z5rzQ0I3H3ToFZfU2WMPKcl04qy7tA4bL1b6wLBKYYf9ABbrrcbfUtXQmISDcClBahsb e7Dtpcznno/ca5o36vY2oHCemkIsoXb3JlpeGFscsxC5+ja8qSohqY5Xb019K5PiY9U7 y8cw== X-Forwarded-Encrypted: i=1; AFNElJ+S7PoY1DAv8OKjy9mm/x5J2mus5KkPS2KuGaqEpBu7+9EkN+3pZN42OKYB0l8o3fcsZpY1PZ3j40oUylc=@vger.kernel.org X-Gm-Message-State: AOJu0YwsMHBMXaWSMMM4WJUBosMlREcBq5/nsdGIbxl/8DQrstbtoTTy MPbOKi6iaGoKUebDxgU8d3/OSXm4dedQTatbZskVVePvxOY2GkOwuzBI5xd40cUvBgg= X-Gm-Gg: Acq92OGKZQi8YiZJVsHtrvZrayXO3bGN5Z+382yQtdt+5R/XGJSbJrm7iM6u6kUPq/E iiiunFhf3z3HDumiu7FmL5EWRqv6yzSZL3bCba0o3RDJ8mEpWudTOrLGdZBnbwIp7P+zNPxpRE9 XsV0ZWJSDQYpYkjd4exJYsN6Iq3BjqV7BGwdELTGHq9XJ7f4tAgwELVjNTKyl4nC5gymmnW7I8Q yxBZq+rAtEA48rg8MbO0iguI0J0JTFbxu5n0vdbeCzPzufAxzfIQE8p9MlaErrfa2Iyr0naHWIe dijhQBbL0KSlKDLqZYpYlF+kiX3ujfKf09d/FYMPPDfjIXJDTYgv9sBQfRl+tuQ1RVa5jCJNyU8 x05yoSTDqurPBdAWTd5jbOme2uZxE5iathdrzTNIqIYctRoyhsZNC0hlVwgkSrhqhIT1T1+idWK R7DuP4PbC5nFA3sTiBZqhdoum3bp7NOMQA X-Received: by 2002:a05:620a:2909:b0:8f2:c47b:962e with SMTP id af79cd13be357-914b49f5399mr3644650685a.49.1779914898663; Wed, 27 May 2026 13:48:18 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id af79cd13be357-914f87d1a27sm589350185a.28.2026.05.27.13.48.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 13:48:17 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Michal Hocko , Dave Chinner , Roman Gushchin , Muchun Song , Qi Zheng , Yosry Ahmed , Zi Yan , "Liam R . Howlett" , Usama Arif , Kiryl Shutsemau , Vlastimil Babka , Kairui Song , Mikhail Zaslonko , Vasily Gorbik , Baolin Wang , Barry Song , Dev Jain , Lance Yang , Nico Pache , Ryan Roberts , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 7/9] mm: list_lru: introduce folio_memcg_list_lru_alloc() Date: Wed, 27 May 2026 16:45:14 -0400 Message-ID: <20260527204757.2544958-8-hannes@cmpxchg.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260527204757.2544958-1-hannes@cmpxchg.org> References: <20260527204757.2544958-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" memcg_list_lru_alloc() is called every time an object that may end up on the list_lru is created. It needs to quickly check if the list_lru heads for the memcg already exist, and allocate them when they don't. Doing this with folio objects is tricky: folio_memcg() is not stable and requires either RCU protection or pinning the cgroup. But it's desirable to make the existence check lightweight under RCU, and only pin the memcg when we need to allocate list_lru heads and may block. In preparation for switching the THP shrinker to list_lru, add a helper function for allocating list_lru heads coming from a folio. Reviewed-by: David Hildenbrand (Arm) Acked-by: Shakeel Butt Reviewed-by: Lorenzo Stoakes (Oracle) Signed-off-by: Johannes Weiner Reported-by: Lance Yang --- include/linux/list_lru.h | 27 +++++++++++++++++++++++++++ mm/list_lru.c | 39 ++++++++++++++++++++++++++++++++++----- 2 files changed, 61 insertions(+), 5 deletions(-) diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 134cb3e5652a..a450fffe1550 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -81,6 +81,33 @@ static inline int list_lru_init_memcg_key(struct list_lr= u *lru, struct shrinker =20 int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, gfp_t gfp); + +#ifdef CONFIG_MEMCG +/** + * folio_memcg_list_lru_alloc - allocate list_lru heads for shrinkable fol= io + * @folio: the newly allocated & charged folio + * @lru: the list_lru this might be queued on + * @gfp: gfp mask + * + * Allocate list_lru heads (per-memcg, per-node) needed to queue this + * particular folio down the line. + * + * This does memcg_list_lru_alloc(), but on the memcg that @folio is + * associated with. Handles folio_memcg() access rules in the fast + * path (list_lru heads allocated) and the allocation slowpath. + * + * Returns 0 on success, a negative error value otherwise. + */ +int folio_memcg_list_lru_alloc(struct folio *folio, struct list_lru *lru, + gfp_t gfp); +#else +static inline int folio_memcg_list_lru_alloc(struct folio *folio, + struct list_lru *lru, gfp_t gfp) +{ + return 0; +} +#endif + void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup = *parent); =20 /** diff --git a/mm/list_lru.c b/mm/list_lru.c index 402bb028114d..41a811966063 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -568,17 +568,14 @@ static inline bool memcg_list_lru_allocated(struct me= m_cgroup *memcg, return idx < 0 || xa_load(&lru->xa, idx); } =20 -int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, - gfp_t gfp) +static int __memcg_list_lru_alloc(struct mem_cgroup *memcg, + struct list_lru *lru, gfp_t gfp) { unsigned long flags; struct list_lru_memcg *mlru =3D NULL; struct mem_cgroup *pos, *parent; XA_STATE(xas, &lru->xa, 0); =20 - if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru)) - return 0; - gfp &=3D GFP_RECLAIM_MASK; /* * Because the list_lru can be reparented to the parent cgroup's @@ -619,6 +616,38 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, str= uct list_lru *lru, =20 return xas_error(&xas); } + +int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, + gfp_t gfp) +{ + if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru)) + return 0; + return __memcg_list_lru_alloc(memcg, lru, gfp); +} + +int folio_memcg_list_lru_alloc(struct folio *folio, struct list_lru *lru, + gfp_t gfp) +{ + struct mem_cgroup *memcg; + int res; + + if (!list_lru_memcg_aware(lru)) + return 0; + + /* Fast path when list_lru heads already exist */ + rcu_read_lock(); + memcg =3D folio_memcg(folio); + res =3D memcg_list_lru_allocated(memcg, lru); + rcu_read_unlock(); + if (likely(res)) + return 0; + + /* Allocation may block, pin the memcg */ + memcg =3D get_mem_cgroup_from_folio(folio); + res =3D __memcg_list_lru_alloc(memcg, lru, gfp); + mem_cgroup_put(memcg); + return res; +} #else static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aw= are) { --=20 2.54.0 From nobody Mon Jun 8 17:39:47 2026 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 804F03B637F for ; Wed, 27 May 2026 20:48:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914906; cv=none; b=pbfLmQfVlFNH6yp3jo3LkpXToPrDdFr+DFf2HvJo2PQ2svvWO3HrNhFM5URZHpgEWFYia+w2E5Dwk0GOCv8xFvZCEML4DjrAZsLS/ptb9GtQKLBVJ+9AgJHIoV1H/hPlKh6C6XXILqYY2sFPNapZK2IJT2ASjxQISYb0fYgVKyA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914906; c=relaxed/simple; bh=l5utMigt3ZQHnFSdncqv9s5JEG4L2+Jjugd/3yPMeAo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TVV8MOma+KcFAA1iafknf7PLA8rq7pdbd7ln7ql7ICx3OFUEiOZxN2Ol/berHPvKW8SGQcLA7XoEPiWR6y3uWrBDBAIWr3XDyrQo5Y2BigLpftaexDIzIZe2bXSS+HVdIIyJ5bO0CFany2msGydITKsKUkYetDHmttiBSX3DLcw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=cbV/DC5v; arc=none smtp.client-ip=209.85.222.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="cbV/DC5v" Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-91173f20ccdso539464385a.0 for ; Wed, 27 May 2026 13:48:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1779914901; x=1780519701; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=B4EOQagJ0AFWLKeiqsu95H2CZJTRs9F1Y/QS2dFLRqc=; b=cbV/DC5v+IpAs92+MCZxGBOmglWChdoH+QI93I2+2BRDcmeAewQN31l4VkD1mswSGY p3zS7NsiJU+N+i0azB/mlU/hhjh2owlnrqjoPDlP+v/BxhXjCJiYOaKzpjxtX7OKL0JH V9w3z4CzpzdTmrDbll5zByIUQ4BdtTASGz0+TUG5SqBYU2S57Uh0sKTWVAY5uIi5CP98 TtHyApApfykaplIucc/g8BKTFcwWMfhgfgYQtvcHwvpE6wnqdjzQnVSlIrm2+JhIFFA5 KeokoPBQkBma8n+PJyGxIl506d+x1gTTww4/Ifp59ntM0KYoLrI2mxS0bSglF/aR8tYw CJXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779914901; x=1780519701; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=B4EOQagJ0AFWLKeiqsu95H2CZJTRs9F1Y/QS2dFLRqc=; b=kGTgi/VArYLF+kw/HDVYktwDsKJuCnNB/5pbD9IH6xT0f7E9yWDoUSod0SW3HPivQ5 HRkWzUbsjIcjelWPyDjU7fuIQPLyXk1yHuFndCEqoxD4bOJOrETKP6WQXGNDaKyrk6KM okoA2NyImqKPgNv4nYS211woctqOgQObYTYhjGF+Fxl3cOScKXp8wLkZ75kHYBD6fVYn W0eW96nNZN76nLJHdzBbQSoAuu6SgU0bT0ggNPoYE6X8PTq8SK+9wNhT207814qV/CrJ 19ujuYlx4FoPLim+5Txg1sysGpwxZOD14TQbXLxnus6XdEMLNUACo4QtKw7h/iIXMBgh hhhA== X-Forwarded-Encrypted: i=1; AFNElJ/fBLGXE1xREZpLzkEMCZ/7WU0Un8ezjGQ0EjhTVPkSV4qtmyD1p/vYV7yb7ltNfFBhjnZ6F9kGMsb4bxw=@vger.kernel.org X-Gm-Message-State: AOJu0YzpMzmZoOjihEWWMY/rTHmrKYa6SrdvV6DtRxL/Kn+qm0Sj+Ixb sV+0w8oRHIOiaSSwdQNu8ZoD90VEVHTmHeIeYRjijrTw0F/viTHQQjgIY7SsJYqyDX8= X-Gm-Gg: Acq92OH6JbURhUViPZTUwvuxeuIFrSl3RgFS1IWy3jxCoj0NKKGi3GiktIuoKDiE6rI DvLlMfEnjrHqvp0BB9ypUpSuLy4r5GrF2KSQ79ngChTrHkI6f9OK9b/geayjgz88hAq21rYYGWz lUChwzqwjAx1BV87ntLQCpKg9dMf28MxgJ1TQhI9iGbopvkHqr9KpqO7ReSf7V330t0mAssxvEh VeabV/uSGXsn354fiktrEFLllCtqzIQPwtBr8Vh20wOXdsVIszsdMrZ9mpMZOo3p5Y26BUF1/pj rb6YcQaxZGE8yVgRRjMJqFdHcHNYv9OeLoUDy/VzTewpv3ZxBUU/vgZPHxaB9QYaB7EevL+4jR6 SJmdkmJZUcBRAs1sCNu6MJ1UdJlCvDvJHnUAsanIrev/s8dj19pd/jfsXr21h8PBq7F0kcV4riQ aNGgov+Rs6F04BjCtG3o3U1m1ugqlTdqE5 X-Received: by 2002:a05:620a:4408:b0:902:eacc:bf88 with SMTP id af79cd13be357-914b4a19480mr3620622585a.62.1779914900580; Wed, 27 May 2026 13:48:20 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id af79cd13be357-914f8810a50sm601500585a.41.2026.05.27.13.48.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 13:48:19 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Michal Hocko , Dave Chinner , Roman Gushchin , Muchun Song , Qi Zheng , Yosry Ahmed , Zi Yan , "Liam R . Howlett" , Usama Arif , Kiryl Shutsemau , Vlastimil Babka , Kairui Song , Mikhail Zaslonko , Vasily Gorbik , Baolin Wang , Barry Song , Dev Jain , Lance Yang , Nico Pache , Ryan Roberts , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 8/9] mm: memory: flatten alloc_anon_folio() retry loop Date: Wed, 27 May 2026 16:45:15 -0400 Message-ID: <20260527204757.2544958-9-hannes@cmpxchg.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260527204757.2544958-1-hannes@cmpxchg.org> References: <20260527204757.2544958-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" alloc_anon_folio() uses a top-level if (folio) that buries the success path four levels deep. This makes for awkward long lines and wrapping. The next patch will add more code here, so flatten this now to keep things clean and simple. The next label is already there, use it for !folio. No functional change intended. Suggested-by: Lorenzo Stoakes (Oracle) Signed-off-by: Johannes Weiner Acked-by: Usama Arif Acked-by: Shakeel Butt Reported-by: Lance Yang Reviewed-by: Dev Jain --- mm/memory.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 7c020995eafc..135f5c0f57bd 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5215,24 +5215,24 @@ static struct folio *alloc_anon_folio(struct vm_fau= lt *vmf) while (orders) { addr =3D ALIGN_DOWN(vmf->address, PAGE_SIZE << order); folio =3D vma_alloc_folio(gfp, order, vma, addr); - if (folio) { - if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) { - count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); - folio_put(folio); - goto next; - } - folio_throttle_swaprate(folio, gfp); - /* - * When a folio is not zeroed during allocation - * (__GFP_ZERO not used) or user folios require special - * handling, folio_zero_user() is used to make sure - * that the page corresponding to the faulting address - * will be hot in the cache after zeroing. - */ - if (user_alloc_needs_zeroing()) - folio_zero_user(folio, vmf->address); - return folio; + if (!folio) + goto next; + if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) { + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); + folio_put(folio); + goto next; } + folio_throttle_swaprate(folio, gfp); + /* + * When a folio is not zeroed during allocation + * (__GFP_ZERO not used) or user folios require special + * handling, folio_zero_user() is used to make sure + * that the page corresponding to the faulting address + * will be hot in the cache after zeroing. + */ + if (user_alloc_needs_zeroing()) + folio_zero_user(folio, vmf->address); + return folio; next: count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK); order =3D next_order(&orders, order); --=20 2.54.0 From nobody Mon Jun 8 17:39:47 2026 Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1D622DAFBB for ; Wed, 27 May 2026 20:48:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914907; cv=none; b=D4PNbMze3e78YutYg4WhgaGZRM4uxBHT94yLPwIhIHFZd+agphqks5orMhxkNoDZhOyVpWb3botRfOREbvFK7u4Z87RBXEvxyzmwc8fUwtj+HEPVwoW00KXIbCj+JdBUxJA46t8cxBxyH4TFDXYsoCHZNc7K+rqd7W9fN5/67Bk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779914907; c=relaxed/simple; bh=oj17sk2CsqQBJK+6xCoeOutECmtNJDXlZaawujhA8hI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=vD0i8cRzWGdN8dlsDG1kFkKvf+fLg66n2asuzrMeKG/ASoi9JX9CCwzUU3M4yeE4bHp6+CDTQHKljQ94gDZ4+72Lsus0ZuL+ZgPj4BNP87JJS1dBxWDyKJpz0/qEE3YG8vDe3KZCVE2ppuEEDqEo8XcPHrjXNxoR9eth4ErXnCg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=DVGct6zh; arc=none smtp.client-ip=209.85.219.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="DVGct6zh" Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-8cc715824a0so49589656d6.3 for ; Wed, 27 May 2026 13:48:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1779914903; x=1780519703; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6WV+GIG3sgQzlSKpDlDd5qT1ioFws4ENdSbDD6GcE1k=; b=DVGct6zhmbHCfR60rrzbt/aQE0WBrI6FuGLJEGd9TKZ3+nEtIHG6bNJpJ5yQ7cYsOC Nc8m1edo+k0gv4EJlRw8gdSvFSj7s88/G3bMnZp7obsy401UB+RsRfw0l8K3hmHAu2LJ 5hm7PUv4RrsIm3b8eWojnhy3kc/HX31hq99duY6GpSGb+s2HlBQ/FTdXumCOF2Xf2YsS opR76ySmQ/7JkUsYOKk194febAXdB4pHskgqMUCMG8zkSYc0MXOPNv5bXHMMVbBXJcHr iYcTl/zGQpy/Hp/awZhB8ISBjQuPIFq2LyMuNdMzmQR3VflUAIzyED6ird0IVXtq8lWL iSwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779914903; x=1780519703; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=6WV+GIG3sgQzlSKpDlDd5qT1ioFws4ENdSbDD6GcE1k=; b=OJYyR3Dgf52qE05/WwsThhdelTXB+2fpLqsa+vnIUHkbICrVInh+Qne5R+hKU/Q3fI 0tvNDF3ux92G7pw9gznxcjPiUnezNOMGtA7o/oCS9/6lA3UEvtPVtqeqHXdWp+A2574u BnUNytJn3LApeYlLdjNKPvvYJhUfm62eHaQVMQ8s57JV8sr5dNvjQ7nQ8ij/sLJbM0yM XYhZZdhhvIdS1kS/Er7eCX/lAdOuxd/iri7mnxQVrDyR4BpGpYVkF1XWwvLkVAqCeaoq S6WYBBIVnovAtrB+r63p43dC3FJ6DKsdNhsGhM1qJ9HkuRmGwMY7dX6zKsXvPb/L37NK WEcA== X-Forwarded-Encrypted: i=1; AFNElJ/HQrZ1Io/gVmdzTl7TimqlvAoE4EFEFiQg9uR1pKs28OmcS65rpOpslLs5vnBUAhuHSgJooFLXaJOtngs=@vger.kernel.org X-Gm-Message-State: AOJu0Yy3mR2azHqCXzI+n2lwm0n6tDoNyq0BroXUL3a3cpl1Rdco3DFI VYHexKbgSw50MOw0VUvRPPu9K/lVYqOiolSf1sZlTrLwtn5D9+shUgsbBIoPrAD+uQ8= X-Gm-Gg: Acq92OHxD3X616wtKleYDENthDAlCQOut+eeaWoMavEVHZ7RdDlFyUdMQTnzBAE86rf 8jTdfay3bnhVPzhXN0QffMklnEML7RKCX4jyLFYMaF3TnckTPkcgBB0zzUO/2hJzbu9qRPbSMi9 Nv8rgHBYuAxiMg5ELyuaQL7lca8sEcrXcmS1+LhC25LjGlINWxsasIyzNnkYWSHKx/wsjIcaOff apRp8JOymL9wEj5r2ss8B5KvOBdPh9frefF1GxU7lHyZXNoNjyP13MMV0R2NVKFOIDISFJXQttL qZHhPlnc7KTdB8byhAXQ7h8W4Rjt9ExFhwHbdsFNheg6C/zunb/LUEym4oYhFgo4pzL2WwYitNb 8CpcraOQMZKVYB/AqVkj45QF3z5MYWcY/mp59OlhXSRjoIFUsyTK+//jaFyYHpQ4HS09d2wlk8o FRqMwuvJIj+RCH/L1EhU1LmJUdcm3jnFu0 X-Received: by 2002:a05:6214:2f85:b0:8ca:57f:d444 with SMTP id 6a1803df08f44-8cc7b5a7f4emr119188776d6.28.1779914902483; Wed, 27 May 2026 13:48:22 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cc812df29esm179594216d6.29.2026.05.27.13.48.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 13:48:21 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: David Hildenbrand , Lorenzo Stoakes , Shakeel Butt , Michal Hocko , Dave Chinner , Roman Gushchin , Muchun Song , Qi Zheng , Yosry Ahmed , Zi Yan , "Liam R . Howlett" , Usama Arif , Kiryl Shutsemau , Vlastimil Babka , Kairui Song , Mikhail Zaslonko , Vasily Gorbik , Baolin Wang , Barry Song , Dev Jain , Lance Yang , Nico Pache , Ryan Roberts , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 9/9] mm: switch deferred split shrinker to list_lru Date: Wed, 27 May 2026 16:45:16 -0400 Message-ID: <20260527204757.2544958-10-hannes@cmpxchg.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260527204757.2544958-1-hannes@cmpxchg.org> References: <20260527204757.2544958-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The deferred split queue handles cgroups in a suboptimal fashion. The queue is per-NUMA node or per-cgroup, not the intersection. That means on a cgrouped system, a node-restricted allocation entering reclaim can end up splitting large pages on other nodes: alloc/unmap deferred_split_folio() list_add_tail(memcg->split_queue) set_shrinker_bit(memcg, node, deferred_shrinker_id) for_each_zone_zonelist_nodemask(restricted_nodes) mem_cgroup_iter() shrink_slab(node, memcg) shrink_slab_memcg(node, memcg) if test_shrinker_bit(memcg, node, deferred_shrinker_id) deferred_split_scan() walks memcg->split_queue The shrinker bit adds an imperfect guard rail. As soon as the cgroup has a single large page on the node of interest, all large pages owned by that memcg, including those on other nodes, will be split. list_lru properly sets up per-node, per-cgroup lists. As a bonus, it streamlines a lot of the list operations and reclaim walks. It's used widely by other major shrinkers already. Convert the deferred split queue as well. The list_lru per-memcg heads are instantiated on demand when the first object of interest is allocated for a cgroup, by calling folio_memcg_alloc_deferred(). Add calls to where splittable pages are created: anon faults, swapin faults, khugepaged collapse. These calls create all possible node heads for the cgroup at once, so the migration code (between nodes) doesn't need any special care. Reported-by: Mikhail Zaslonko Tested-by: Mikhail Zaslonko Acked-by: Shakeel Butt Reviewed-by: Lorenzo Stoakes (Oracle) Signed-off-by: Johannes Weiner Acked-by: Johannes Weiner Acked-by: Usama Arif Reported-by: Lance Yang Reviewed-by: Kairui Song --- include/linux/huge_mm.h | 7 +- include/linux/memcontrol.h | 4 - include/linux/mmzone.h | 12 -- mm/huge_memory.c | 364 +++++++++++++------------------------ mm/internal.h | 2 +- mm/khugepaged.c | 5 + mm/memcontrol.c | 12 +- mm/memory.c | 4 + mm/mm_init.c | 15 -- mm/swap_state.c | 10 + 10 files changed, 150 insertions(+), 285 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index edece3e26985..f6c2531a27a3 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -423,10 +423,10 @@ static inline int split_huge_page(struct page *page) { return split_huge_page_to_list_to_order(page, NULL, 0); } + +int folio_memcg_alloc_deferred(struct folio *folio); + void deferred_split_folio(struct folio *folio, bool partially_mapped); -#ifdef CONFIG_MEMCG -void reparent_deferred_split_queue(struct mem_cgroup *memcg); -#endif =20 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze); @@ -664,7 +664,6 @@ static inline int folio_split(struct folio *folio, unsi= gned int new_order, } =20 static inline void deferred_split_folio(struct folio *folio, bool partiall= y_mapped) {} -static inline void reparent_deferred_split_queue(struct mem_cgroup *memcg)= {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) =20 diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index bf1a6e131eca..20404e59fb3b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -278,10 +278,6 @@ struct mem_cgroup { struct memcg_cgwb_frn cgwb_frn[MEMCG_CGWB_FRN_CNT]; #endif =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - struct deferred_split deferred_split_queue; -#endif - #ifdef CONFIG_LRU_GEN_WALKS_MMU /* per-memcg mm_struct list */ struct lru_gen_mm_list mm_list; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1331a7b93f33..8e449f524f26 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1431,14 +1431,6 @@ struct zonelist { */ extern struct page *mem_map; =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -struct deferred_split { - spinlock_t split_queue_lock; - struct list_head split_queue; - unsigned long split_queue_len; -}; -#endif - #ifdef CONFIG_MEMORY_FAILURE /* * Per NUMA node memory failure handling statistics. @@ -1564,10 +1556,6 @@ typedef struct pglist_data { unsigned long first_deferred_pfn; #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - struct deferred_split deferred_split_queue; -#endif - #ifdef CONFIG_NUMA_BALANCING /* start time in ms of current promote rate limit period */ unsigned int nbp_rl_start; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index bf9b480bb3b0..72f6caf0fec6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -67,6 +68,8 @@ unsigned long transparent_hugepage_flags __read_mostly = =3D (1<count_objects =3D deferred_split_count; deferred_split_shrinker->scan_objects =3D deferred_split_scan; shrinker_register(deferred_split_shrinker); @@ -973,6 +990,7 @@ static int __init thp_shrinker_init(void) huge_zero_folio_shrinker =3D shrinker_alloc(0, "thp-zero"); if (!huge_zero_folio_shrinker) { shrinker_free(deferred_split_shrinker); + list_lru_destroy(&deferred_split_lru); return -ENOMEM; } =20 @@ -987,6 +1005,7 @@ static void __init thp_shrinker_exit(void) { shrinker_free(huge_zero_folio_shrinker); shrinker_free(deferred_split_shrinker); + list_lru_destroy(&deferred_split_lru); } =20 static int __init hugepage_init(void) @@ -1166,119 +1185,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_s= truct *vma) return pmd; } =20 -static struct deferred_split *split_queue_node(int nid) -{ - struct pglist_data *pgdata =3D NODE_DATA(nid); - - return &pgdata->deferred_split_queue; -} - -#ifdef CONFIG_MEMCG -static inline -struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, - struct deferred_split *queue) -{ - if (mem_cgroup_disabled()) - return NULL; - if (split_queue_node(folio_nid(folio)) =3D=3D queue) - return NULL; - return container_of(queue, struct mem_cgroup, deferred_split_queue); -} - -static struct deferred_split *memcg_split_queue(int nid, struct mem_cgroup= *memcg) -{ - return memcg ? &memcg->deferred_split_queue : split_queue_node(nid); -} -#else -static inline -struct mem_cgroup *folio_split_queue_memcg(struct folio *folio, - struct deferred_split *queue) -{ - return NULL; -} - -static struct deferred_split *memcg_split_queue(int nid, struct mem_cgroup= *memcg) -{ - return split_queue_node(nid); -} -#endif - -static struct deferred_split *split_queue_lock(int nid, struct mem_cgroup = *memcg) -{ - struct deferred_split *queue; - -retry: - queue =3D memcg_split_queue(nid, memcg); - spin_lock(&queue->split_queue_lock); - /* - * There is a period between setting memcg to dying and reparenting - * deferred split queue, and during this period the THPs in the deferred - * split queue will be hidden from the shrinker side. - */ - if (unlikely(memcg_is_dying(memcg))) { - spin_unlock(&queue->split_queue_lock); - memcg =3D parent_mem_cgroup(memcg); - goto retry; - } - - return queue; -} - -static struct deferred_split * -split_queue_lock_irqsave(int nid, struct mem_cgroup *memcg, unsigned long = *flags) -{ - struct deferred_split *queue; - -retry: - queue =3D memcg_split_queue(nid, memcg); - spin_lock_irqsave(&queue->split_queue_lock, *flags); - if (unlikely(memcg_is_dying(memcg))) { - spin_unlock_irqrestore(&queue->split_queue_lock, *flags); - memcg =3D parent_mem_cgroup(memcg); - goto retry; - } - - return queue; -} - -static struct deferred_split *folio_split_queue_lock(struct folio *folio) -{ - struct deferred_split *queue; - - rcu_read_lock(); - queue =3D split_queue_lock(folio_nid(folio), folio_memcg(folio)); - /* - * The memcg destruction path is acquiring the split queue lock for - * reparenting. Once you have it locked, it's safe to drop the rcu lock. - */ - rcu_read_unlock(); - - return queue; -} - -static struct deferred_split * -folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags) -{ - struct deferred_split *queue; - - rcu_read_lock(); - queue =3D split_queue_lock_irqsave(folio_nid(folio), folio_memcg(folio), = flags); - rcu_read_unlock(); - - return queue; -} - -static inline void split_queue_unlock(struct deferred_split *queue) -{ - spin_unlock(&queue->split_queue_lock); -} - -static inline void split_queue_unlock_irqrestore(struct deferred_split *qu= eue, - unsigned long flags) -{ - spin_unlock_irqrestore(&queue->split_queue_lock, flags); -} - static inline bool is_transparent_hugepage(const struct folio *folio) { if (!folio_test_large(folio)) @@ -1379,6 +1285,14 @@ static struct folio *vma_alloc_anon_folio_pmd(struct= vm_area_struct *vma, count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); return NULL; } + + if (folio_memcg_alloc_deferred(folio)) { + folio_put(folio); + count_vm_event(THP_FAULT_FALLBACK); + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK); + return NULL; + } + folio_throttle_swaprate(folio, gfp); =20 /* @@ -3890,34 +3804,43 @@ static int __folio_freeze_and_split_unmapped(struct= folio *folio, unsigned int n struct folio *end_folio =3D folio_next(folio); struct folio *new_folio, *next; int old_order =3D folio_order(folio); + struct list_lru_one *lru; + bool dequeue_deferred; int ret =3D 0; - struct deferred_split *ds_queue; =20 VM_WARN_ON_ONCE(!mapping && end); - /* Prevent deferred_split_scan() touching ->_refcount */ - ds_queue =3D folio_split_queue_lock(folio); + /* + * If this folio can be on the deferred split queue, lock out + * the shrinker before freezing the ref. If the shrinker sees + * a 0-ref folio, it assumes it beat folio_put() to the list + * lock and must clean up the LRU state - the same dequeue we + * will do below as part of the split. + */ + dequeue_deferred =3D folio_test_anon(folio) && old_order > 1; + if (dequeue_deferred) { + struct mem_cgroup *memcg; + + rcu_read_lock(); + memcg =3D folio_memcg(folio); + lru =3D list_lru_lock(&deferred_split_lru, + folio_nid(folio), &memcg); + } if (folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1)) { struct swap_cluster_info *ci =3D NULL; struct lruvec *lruvec; =20 - if (old_order > 1) { - if (!list_empty(&folio->_deferred_list)) { - ds_queue->split_queue_len--; - /* - * Reinitialize page_deferred_list after removing the - * page from the split_queue, otherwise a subsequent - * split will see list corruption when checking the - * page_deferred_list. - */ - list_del_init(&folio->_deferred_list); - } + if (dequeue_deferred) { + __list_lru_del(&deferred_split_lru, lru, + &folio->_deferred_list, folio_nid(folio)); if (folio_test_partially_mapped(folio)) { folio_clear_partially_mapped(folio); mod_mthp_stat(old_order, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } + list_lru_unlock(lru); + rcu_read_unlock(); } - split_queue_unlock(ds_queue); + if (mapping) { int nr =3D folio_nr_pages(folio); =20 @@ -4017,7 +3940,10 @@ static int __folio_freeze_and_split_unmapped(struct = folio *folio, unsigned int n if (ci) swap_cluster_unlock(ci); } else { - split_queue_unlock(ds_queue); + if (dequeue_deferred) { + list_lru_unlock(lru); + rcu_read_unlock(); + } return -EAGAIN; } =20 @@ -4383,33 +4309,37 @@ int split_folio_to_list(struct folio *folio, struct= list_head *list) * queueing THP splits, and that list is (racily observed to be) non-empty. * * It is unsafe to call folio_unqueue_deferred_split() until folio refcoun= t is - * zero: because even when split_queue_lock is held, a non-empty _deferred= _list - * might be in use on deferred_split_scan()'s unlocked on-stack list. + * zero: because even when the list_lru lock is held, a non-empty + * _deferred_list might be in use on deferred_split_scan()'s unlocked + * on-stack list. * - * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: i= t is - * therefore important to unqueue deferred split before changing folio mem= cg. + * The list_lru sublist is determined by folio's memcg: it is therefore + * important to unqueue deferred split before changing folio memcg. */ bool __folio_unqueue_deferred_split(struct folio *folio) { - struct deferred_split *ds_queue; + struct mem_cgroup *memcg; + struct list_lru_one *lru; + int nid =3D folio_nid(folio); unsigned long flags; bool unqueued =3D false; =20 WARN_ON_ONCE(folio_ref_count(folio)); WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio)); =20 - ds_queue =3D folio_split_queue_lock_irqsave(folio, &flags); - if (!list_empty(&folio->_deferred_list)) { - ds_queue->split_queue_len--; + rcu_read_lock(); + memcg =3D folio_memcg(folio); + lru =3D list_lru_lock_irqsave(&deferred_split_lru, nid, &memcg, &flags); + if (__list_lru_del(&deferred_split_lru, lru, &folio->_deferred_list, nid)= ) { if (folio_test_partially_mapped(folio)) { folio_clear_partially_mapped(folio); mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } - list_del_init(&folio->_deferred_list); unqueued =3D true; } - split_queue_unlock_irqrestore(ds_queue, flags); + list_lru_unlock_irqrestore(lru, &flags); + rcu_read_unlock(); =20 return unqueued; /* useful for debug warnings */ } @@ -4417,7 +4347,9 @@ bool __folio_unqueue_deferred_split(struct folio *fol= io) /* partially_mapped=3Dfalse won't clear PG_partially_mapped folio flag */ void deferred_split_folio(struct folio *folio, bool partially_mapped) { - struct deferred_split *ds_queue; + struct list_lru_one *lru; + int nid; + struct mem_cgroup *memcg; unsigned long flags; =20 /* @@ -4440,7 +4372,11 @@ void deferred_split_folio(struct folio *folio, bool = partially_mapped) if (folio_test_swapcache(folio)) return; =20 - ds_queue =3D folio_split_queue_lock_irqsave(folio, &flags); + nid =3D folio_nid(folio); + + rcu_read_lock(); + memcg =3D folio_memcg(folio); + lru =3D list_lru_lock_irqsave(&deferred_split_lru, nid, &memcg, &flags); if (partially_mapped) { if (!folio_test_partially_mapped(folio)) { folio_set_partially_mapped(folio); @@ -4448,36 +4384,20 @@ void deferred_split_folio(struct folio *folio, bool= partially_mapped) count_vm_event(THP_DEFERRED_SPLIT_PAGE); count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1= ); - } } else { /* partially mapped folios cannot become non-partially mapped */ VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio); } - if (list_empty(&folio->_deferred_list)) { - struct mem_cgroup *memcg; - - memcg =3D folio_split_queue_memcg(folio, ds_queue); - list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); - ds_queue->split_queue_len++; - if (memcg) - set_shrinker_bit(memcg, folio_nid(folio), - shrinker_id(deferred_split_shrinker)); - } - split_queue_unlock_irqrestore(ds_queue, flags); + __list_lru_add(&deferred_split_lru, lru, &folio->_deferred_list, nid, mem= cg); + list_lru_unlock_irqrestore(lru, &flags); + rcu_read_unlock(); } =20 static unsigned long deferred_split_count(struct shrinker *shrink, struct shrink_control *sc) { - struct pglist_data *pgdata =3D NODE_DATA(sc->nid); - struct deferred_split *ds_queue =3D &pgdata->deferred_split_queue; - -#ifdef CONFIG_MEMCG - if (sc->memcg) - ds_queue =3D &sc->memcg->deferred_split_queue; -#endif - return READ_ONCE(ds_queue->split_queue_len); + return list_lru_shrink_count(&deferred_split_lru, sc); } =20 static bool thp_underused(struct folio *folio) @@ -4507,45 +4427,49 @@ static bool thp_underused(struct folio *folio) return false; } =20 +static enum lru_status deferred_split_isolate(struct list_head *item, + struct list_lru_one *lru, + void *cb_arg) +{ + struct folio *folio =3D container_of(item, struct folio, _deferred_list); + struct list_head *freeable =3D cb_arg; + + if (folio_try_get(folio)) { + list_lru_isolate_move(lru, item, freeable); + return LRU_REMOVED; + } + + /* + * We lost race with folio_put(). Read folio state before the + * isolate: folio_unqueue_deferred_split() checks list_empty() + * locklessly, so once removed the folio can be freed any time. + */ + if (folio_test_partially_mapped(folio)) { + folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); + } + list_lru_isolate(lru, item); + return LRU_REMOVED; +} + static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc) { - struct deferred_split *ds_queue; - unsigned long flags; + LIST_HEAD(dispose); struct folio *folio, *next; - int split =3D 0, i; - struct folio_batch fbatch; - - folio_batch_init(&fbatch); + int split =3D 0; + unsigned long isolated; =20 -retry: - ds_queue =3D split_queue_lock_irqsave(sc->nid, sc->memcg, &flags); - /* Take pin on all head pages to avoid freeing them under us */ - list_for_each_entry_safe(folio, next, &ds_queue->split_queue, - _deferred_list) { - if (folio_try_get(folio)) { - folio_batch_add(&fbatch, folio); - } else if (folio_test_partially_mapped(folio)) { - /* We lost race with folio_put() */ - folio_clear_partially_mapped(folio); - mod_mthp_stat(folio_order(folio), - MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); - } - list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; - if (!--sc->nr_to_scan) - break; - if (!folio_batch_space(&fbatch)) - break; - } - split_queue_unlock_irqrestore(ds_queue, flags); + isolated =3D list_lru_shrink_walk_irq(&deferred_split_lru, sc, + deferred_split_isolate, &dispose); =20 - for (i =3D 0; i < folio_batch_count(&fbatch); i++) { + list_for_each_entry_safe(folio, next, &dispose, _deferred_list) { bool did_split =3D false; bool underused =3D false; - struct deferred_split *fqueue; =20 - folio =3D fbatch.folios[i]; + list_del_init(&folio->_deferred_list); + if (!folio_test_partially_mapped(folio)) { /* * See try_to_map_unused_to_zeropage(): we cannot @@ -4574,63 +4498,23 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, * underused, then consider it used and don't add it back to * split_queue. */ - if (did_split || !folio_test_partially_mapped(folio)) - continue; + if (!did_split && folio_test_partially_mapped(folio)) { requeue: - /* - * Add back partially mapped folios, or underused folios that - * we could not lock this round. - */ - fqueue =3D folio_split_queue_lock_irqsave(folio, &flags); - if (list_empty(&folio->_deferred_list)) { - list_add_tail(&folio->_deferred_list, &fqueue->split_queue); - fqueue->split_queue_len++; + rcu_read_lock(); + list_lru_add_irq(&deferred_split_lru, + &folio->_deferred_list, + folio_nid(folio), + folio_memcg(folio)); + rcu_read_unlock(); } - split_queue_unlock_irqrestore(fqueue, flags); - } - folios_put(&fbatch); - - if (sc->nr_to_scan && !list_empty(&ds_queue->split_queue)) { - cond_resched(); - goto retry; + folio_put(folio); } =20 - /* - * Stop shrinker if we didn't split any page, but the queue is empty. - * This can happen if pages were freed under us. - */ - if (!split && list_empty(&ds_queue->split_queue)) + if (!split && !isolated) return SHRINK_STOP; return split; } =20 -#ifdef CONFIG_MEMCG -void reparent_deferred_split_queue(struct mem_cgroup *memcg) -{ - struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); - struct deferred_split *ds_queue =3D &memcg->deferred_split_queue; - struct deferred_split *parent_ds_queue =3D &parent->deferred_split_queue; - int nid; - - spin_lock_irq(&ds_queue->split_queue_lock); - spin_lock_nested(&parent_ds_queue->split_queue_lock, SINGLE_DEPTH_NESTING= ); - - if (!ds_queue->split_queue_len) - goto unlock; - - list_splice_tail_init(&ds_queue->split_queue, &parent_ds_queue->split_que= ue); - parent_ds_queue->split_queue_len +=3D ds_queue->split_queue_len; - ds_queue->split_queue_len =3D 0; - - for_each_node(nid) - set_shrinker_bit(parent, nid, shrinker_id(deferred_split_shrinker)); - -unlock: - spin_unlock(&parent_ds_queue->split_queue_lock); - spin_unlock_irq(&ds_queue->split_queue_lock); -} -#endif - #ifdef CONFIG_DEBUG_FS static void split_huge_pages_all(void) { diff --git a/mm/internal.h b/mm/internal.h index 5602393054f3..181e79f1d6a2 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -852,7 +852,7 @@ static inline bool folio_unqueue_deferred_split(struct = folio *folio) /* * At this point, there is no one trying to add the folio to * deferred_list. If folio is not in deferred_list, it's safe - * to check without acquiring the split_queue_lock. + * to check without acquiring the list_lru lock. */ if (data_race(list_empty(&folio->_deferred_list))) return false; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 35a5f8c44c18..8ffb47f1e845 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1306,6 +1306,11 @@ static enum scan_result collapse_huge_page(struct mm= _struct *mm, unsigned long s if (result !=3D SCAN_SUCCEED) goto out_nolock; =20 + if (folio_memcg_alloc_deferred(folio)) { + result =3D SCAN_ALLOC_HUGE_PAGE_FAIL; + goto out_nolock; + } + mmap_read_lock(mm); result =3D hugepage_vma_revalidate(mm, pmd_addr, /*expect_anon=3D*/ true, &vma, cc, order); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 92269740eef1..d93564af82b5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4035,11 +4035,6 @@ static struct mem_cgroup *mem_cgroup_alloc(struct me= m_cgroup *parent) for (i =3D 0; i < MEMCG_CGWB_FRN_CNT; i++) memcg->cgwb_frn[i].done =3D __WB_COMPLETION_INIT(&memcg_cgwb_frn_waitq); -#endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - spin_lock_init(&memcg->deferred_split_queue.split_queue_lock); - INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue); - memcg->deferred_split_queue.split_queue_len =3D 0; #endif lru_gen_init_memcg(memcg); return memcg; @@ -4191,11 +4186,10 @@ static void mem_cgroup_css_offline(struct cgroup_su= bsys_state *css) zswap_memcg_offline_cleanup(memcg); =20 memcg_offline_kmem(memcg); - reparent_deferred_split_queue(memcg); /* - * The reparenting of objcg must be after the reparenting of the - * list_lru and deferred_split_queue above, which ensures that they will - * not mistakenly get the parent list_lru and deferred_split_queue. + * The reparenting of objcg must be after the reparenting of + * the list_lru in memcg_offline_kmem(), which ensures that + * they will not mistakenly get the parent list_lru. */ memcg_reparent_objcgs(memcg); reparent_shrinker_deferred(memcg); diff --git a/mm/memory.c b/mm/memory.c index 135f5c0f57bd..f22e61d8c8de 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5222,6 +5222,10 @@ static struct folio *alloc_anon_folio(struct vm_faul= t *vmf) folio_put(folio); goto next; } + if (order > 1 && folio_memcg_alloc_deferred(folio)) { + folio_put(folio); + goto fallback; + } folio_throttle_swaprate(folio, gfp); /* * When a folio is not zeroed during allocation diff --git a/mm/mm_init.c b/mm/mm_init.c index db5568cf36e1..c0a7f1cf6fef 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1373,19 +1373,6 @@ static void __init calculate_node_totalpages(struct = pglist_data *pgdat, pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages); } =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static void pgdat_init_split_queue(struct pglist_data *pgdat) -{ - struct deferred_split *ds_queue =3D &pgdat->deferred_split_queue; - - spin_lock_init(&ds_queue->split_queue_lock); - INIT_LIST_HEAD(&ds_queue->split_queue); - ds_queue->split_queue_len =3D 0; -} -#else -static void pgdat_init_split_queue(struct pglist_data *pgdat) {} -#endif - #ifdef CONFIG_COMPACTION static void pgdat_init_kcompactd(struct pglist_data *pgdat) { @@ -1401,8 +1388,6 @@ static void __meminit pgdat_init_internals(struct pgl= ist_data *pgdat) =20 pgdat_resize_init(pgdat); pgdat_kswapd_lock_init(pgdat); - - pgdat_init_split_queue(pgdat); pgdat_init_kcompactd(pgdat); =20 init_waitqueue_head(&pgdat->kswapd_wait); diff --git a/mm/swap_state.c b/mm/swap_state.c index 04f5ce992401..9c3a5cf99778 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -465,6 +465,16 @@ static struct folio *__swap_cache_alloc(struct swap_cl= uster_info *ci, return ERR_PTR(-ENOMEM); } =20 + if (order > 1 && folio_memcg_alloc_deferred(folio)) { + spin_lock(&ci->lock); + __swap_cache_do_del_folio(ci, folio, entry, shadow); + spin_unlock(&ci->lock); + folio_unlock(folio); + /* nr_pages refs from swap cache, 1 from allocation */ + folio_put_refs(folio, nr_pages + 1); + return ERR_PTR(-ENOMEM); + } + /* memsw uncharges swap when folio is added to swap cache */ memcg1_swapin(folio); if (shadow) --=20 2.54.0