From nobody Thu Apr 2 17:16:01 2026 Received: from lgeamrelo03.lge.com (lgeamrelo03.lge.com [156.147.51.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C87A340F8E5 for ; Wed, 25 Mar 2026 17:55:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=156.147.51.102 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774461307; cv=none; b=o3zBAYkG+MiYhp6Yn8UPSLH9kPtqOHS6SgeaiMbYZLFzu1zw5S5hZJ6kr5sp0XsYoJ1+7WZOIYVFB09Pm0dT158Hj+90lmmay2y5fzemHAjbHbhKpO44G1JcqwkxGIuJFIRuTeDIcsNgApuggYIGDHi6HhNDwWWGYXjW+b9ZIqg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774461307; c=relaxed/simple; bh=3jReesnbo+6fxWSQiMSlmEbBtfmn95VxBkJ86UYJpXk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bWyxMYcxJdTX2bWh5rzRM7/XsPAOVVoSPmSXN1UKQCXNaQ54U6hpODMILzQA2xU34eStNKDK21loHHoKUJBo/oFGyZZNoXHy3bLGZ39wxXyCeUIa3GvgZadixiFsvN0a4ihMBJTRVE+03pVvin2Lg3sYBiGyDsKPEEWlBSHNny4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lge.com; spf=pass smtp.mailfrom=lge.com; arc=none smtp.client-ip=156.147.51.102 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lge.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lge.com Received: from unknown (HELO yjaykim-PowerEdge-T330.lge.net) (10.177.112.156) by 156.147.51.102 with ESMTP; 26 Mar 2026 02:55:01 +0900 X-Original-SENDERIP: 10.177.112.156 X-Original-MAILFROM: youngjun.park@lge.com From: Youngjun Park To: Andrew Morton Cc: Chris Li , Youngjun Park , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kasong@tencent.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, gunho.lee@lge.com, taejoon.song@lge.com, hyungjun.cho@lge.com, mkoutny@suse.com Subject: [PATCH v5 4/4] mm: swap: filter swap allocation by memcg tier mask Date: Thu, 26 Mar 2026 02:54:53 +0900 Message-Id: <20260325175453.2523280-5-youngjun.park@lge.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260325175453.2523280-1-youngjun.park@lge.com> References: <20260325175453.2523280-1-youngjun.park@lge.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Apply memcg tier effective mask during swap slot allocation to enforce per-cgroup swap tier restrictions. In the fast path, check the percpu cached swap_info's tier_mask against the folio's effective mask. If it does not match, fall through to the slow path. In the slow path, skip swap devices whose tier_mask is not covered by the folio's effective mask. This works correctly when there is only one non-rotational device in the system and no devices share the same priority. However, there are known limitations: - When multiple non-rotational devices exist, percpu swap caches from different memcg contexts may reference mismatched tiers, causing unnecessary fast path misses. - When multiple non-rotational devices are assigned to different tiers and same-priority devices exist among them, cluster-based rotation may not work correctly. These edge cases do not affect the primary use case of directing swap traffic per cgroup. Further optimization is planned for future work. Signed-off-by: Youngjun Park --- mm/swapfile.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 645e10c3af28..627b09e57c1d 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1352,15 +1352,22 @@ static bool swap_alloc_fast(struct folio *folio) struct swap_cluster_info *ci; struct swap_info_struct *si; unsigned int offset; + int mask =3D folio_tier_effective_mask(folio); =20 /* * Once allocated, swap_info_struct will never be completely freed, * so checking it's liveness by get_swap_device_info is enough. */ si =3D this_cpu_read(percpu_swap_cluster.si[order]); + if (!si || !swap_tiers_mask_test(si->tier_mask, mask) || + !get_swap_device_info(si)) + return false; + offset =3D this_cpu_read(percpu_swap_cluster.offset[order]); - if (!si || !offset || !get_swap_device_info(si)) + if (!offset) { + put_swap_device(si); return false; + } =20 ci =3D swap_cluster_lock(si, offset); if (cluster_is_usable(ci, order)) { @@ -1379,10 +1386,14 @@ static bool swap_alloc_fast(struct folio *folio) static void swap_alloc_slow(struct folio *folio) { struct swap_info_struct *si, *next; + int mask =3D folio_tier_effective_mask(folio); =20 spin_lock(&swap_avail_lock); start_over: plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { + if (!swap_tiers_mask_test(si->tier_mask, mask)) + continue; + /* Rotate the device and switch to a new cluster */ plist_requeue(&si->avail_list, &swap_avail_head); spin_unlock(&swap_avail_lock); --=20 2.34.1